Assessing difficulty of French podcasts

I was looking for French podcasts recently and found it difficult to find ones that were interesting but still understandable. I thought this would be an interesting programming project, so I created a script to assess the oral comprehension difficulty of French podcasts.

How the script works

  1. It takes the Spotify URL for a French podcast link
  2. It downloads a 30-60 sec excerpt of the podcast using the Spotify API
  3. It transcribes the audio using Google Cloud Speech API (French words only)
  4. It checks what % of the words are uncommon. It does this by checking what % of the words are not in a list of the 1000 most common French words. Both the list of 1000 common words and the transcript are lemmatized so that conjugated verbs and plurals are still identified as the same word. E.g., so that “suis” and “es” both match “être”.
  5. It estimates the talking speed by calculating how many words were spoken per minute (number of French words in transcript / excerpt duration in seconds * 60)

Output

See output of the script run against 28 popular French podcasts below. Note that the script only transcribes French words. So podcasts that also contain English (e.g., the Duolingo podcast) will have a low amount of French words per minute. However, that still reflects an easier podcast to understand so kept it as is.

To make the podcasts easier to rank I added a score/rank for each metric. So the podcast with the lowest degree of uncommon words/words per min has rank 1 in that metric, the one with the highest has 28. I then combined the ranks/scores for the two metrics into a combined score/rank to make it easier to see which podcasts are easier to understand taking both metrics in account.

Overall this was a fun experiment. I might continue building this out with better logic in the future.

All the code is available here

NameCombined score/rankPct uncommon wordsFrench words per minURL
Duolingo French Podcast315%47Link
One Thing In A French Day1321%126Link
InnerFrench1625%117Link
Hondelatte Raconte – Christophe Hondelatte1926%115Link
L’Heure du Monde1922%140Link
La société de minuit2028%89Link
Coffee Break French2130%77Link
Pépites d’Histoire2326%130Link
Mythes et Légendes2425%148Link
La Story2730%124Link
Easy French: Learn French through authentic conversations | Conversations authentiques pour apprendre le français2727%130Link
Learn French by Podcast2835%50Link
Podcast Français Authentique2826%167Link
HVF – Histoires Vraies et Flippantes3034%109Link
French Through Stories3246%75Link
Ces questions que tout le monde se pose3222%245Link
Les Baladeurs3327%175Link
Entrez dans l’Histoire3325%189Link
Transfert3323%200Link
Le Précepteur3635%127Link
Le Podkatz3933%144Link
Canapé Six Places4132%174Link
Passe le plaid4131%178Link
BURGER RING4129%191Link
French with Jeanne4337%141Link
Little Talk in Slow French : Learn French through conversations4331%181Link
J’ai peur, donc j’y vais4731%197Link
Les actus du jour – Hugo Décrypte4835%177Link

I built a web app to track running plans

Last year I read 80/20 running by Matt Fitzgerald. The plans were good, but there’s an inconvenient layout where the high-level workout names (e.g. “Hill Repetition 2”) and the details of what to do during the workout (e.g. amount of intervals) are in different sections of the book. This means you end up flipping back and forth a lot every time you want to go for a run. I thought it would be a fun side project to make a web app that could make it more convenient to follow the plans.

Example running plan (lists only the high-level workout names)
Workout instructions for an example workout (in a different section of the book)

In my last project I relied on vanilla JavaScript and jQuery and it was lacking structure. So for this project I wanted to use a more modern framework and settled on React. I also wanted to write my own CSS since I have been relying on Bootstrap for previous web projects.

See GIFs below showing how the application turned out. Overall it was a good project to get started with React and to learn CSS at a more fundamental level. The full code can be found on Github here.

The most useful learning resources I found during the project were:

  • React crash course. Good intro to React by building a task manager. A lot of the concepts in this tutorial ended up being very relevant for my application. Especially how to use json-server to mock an API for development.
  • Kevin Powell’s Youtube channel. Very clear explanations of CSS fundamentals such as display grid, absolute vs. relative position and more.
  • Miguel Grinberg. Great tutorials for how to make Flask and React work together. I also used this tutorial of his to deploy the app to Linode.
  • This blog post was very useful to understand how to work with databases when using React and Flask. There’s some odd code in there, but seeing how to serialize query results using Marshmallow was very useful.

GIFs of the application

Browsing plans and selecting a plan
Viewing a plan (double-click marks workout as complete)
Updating heart rate zones based on lactate threshold

I built an online board game

During the pandemic, my board game group moved online. That meant we couldn’t play Arboretum, one of my favorite board games. As a learning project, I’ve now built a multiplayer online version of the game.

I’ve been playing the game with my friends, but I’m not hosting it anywhere since I don’t own any of the rights to the game. See the full code here and some screenshots below.

Arboretum is a card-playing game where you place cards to build the most beautiful arboretum. Since only one player can score for each tree type, a key part is figuring out the strategy of the other players and adjusting accordingly.

Arboretum basic turn
Basic turn – draw cards, play a card, discard a card
Arboretum scoring overview
Scoring – the game calculates the highest-scoring paths and highlights them on mouse-over

Qualifying for the NYC Marathon through 9+1

This year I decided to try to qualify for the NYC Marathon through the 9+1 program.

The 9+1 program is a way to qualify for non-complimentary entry to the marathon by running 9 NYRR races and volunteering at one. Apart from 9+1, it’s also possible to qualify on time, by raising ~$2,500 for a charity, or by winning the lottery. Having lost the lottery twice and qualifying on time not being feasible for me 🙂 , I settled for the 9+1 option.

Races ran

MonthRaceDistanceCost
JanNYRR Joe Kleinerman 10K10 km (6.2 miles)35$
JanNYRR Fred Lebow Half-Marathon21.1 km (13.1 miles)35$
FebNYRR Manhattan 10K (volunteered)
FebNYRR Gridiron 4m6.4 km (4 miles)25$
FebNYRR Al Gordon 4m6.4 km (4 miles)25$
MarNYRR Washington Heights Salsa, Blues and Shamrocks 5k6.4 km (4 miles)30$
AprilRBC Race for the Kids 4m6.4 km (4 miles)25$
MayRBC Brooklyn Half21.1 km (13.1 miles)90$
MayVirtual NYRR Global Running Day 5k5 km (3.1 miles)25$
JuneFront Runners New York LGBT Pride Run6.4 km (4 miles)25$

Total cost

Total total cost for the races was 315$. The required annual NYRR membership is 40$. Signing up for the 2023 NYC Marathon will be ~255$, which brings the total to ~610$. I think it’s well worth it for the value you’re getting (the races themselves, gear, post-race bagel). It’s also a great way to stay motivated to run.

Best races

All races were really well run by NYRR. I liked the Joe Kleinerman and the Brooklyn Half races the best.

I loved Kleinerman since it was a race I prepared for several months in advance, which made the race feel special. Since it takes place in January it was pretty cold, but I like that much better than the summer races. It also stands out for being a 10K. Since NYRR offers so many 4 mile races they stand out less.

Central Park after the 10K Kleinerman run

The Brooklyn Half stood out because it was the only NYRR “flagship” race ran. Because it was a flagship the roads were closed. The race goes around Prospect Park and then down to Cony Island. Having never been to Cony Island before, it was a cool experience to run there. I got placed in a corral that was way faster than I could manage so the race was quite tough, but it was still very memorable.

The very long, straight path to Cony Island in 20+ Celsius heat and 80%+ humidity

9+1 Learnings

  • Book races well in advance as they definitely sell out. I booked most races around two months in advance. For the NYRR Manhattan 10K I tried to book a couple weeks in advance but it was sold out so I had to volunteer for that one.
  • Review the NYC race calendar as it updates maybe once per month with new races. There was a virtual race in the beginning of the year with a 9+1 credit that I missed because I wasn’t paying attention.
  • Since most races are in Central Park and require a bib pickup at the NYRR Runcenter at 57th street, living close to Central Park really helps.
  • For the January and February races: dress for that you’re going to have to stand still in a corral for ~20 min. Especially the Joe Kleinerman was pretty rough with below freezing before the race. Once you start running it’s fine.
  • There are female-only races like Shape or Manhattan Mini that you can’t run as a man, but you can still volunteer so those can be great volunteer opportunities.
  • The virtual races only make sense as a 9+1 credit. It’s much more fun to run the in-person races.
  • It’s definitely possible to complete the program in the first half of the year. I tried completing it as quickly as possible to minimize the risk that I’d lose interest over time, but I was out of town for some races. It’s probably possible to finish in May if you hit every 9+1-eligible race.
  • Never bring a bag if you can help it. Bag check is usually a long line. They force you to put everything in a transparent plastic bag that can be picked up in the run center or on-site at the race day.
  • Always pick up your bib in advance if you can. Usually there are long lines to get the bib on race day. Getting an extra hour to sleep is definitely work the hour spent picking up the bib in advance.
  • You can be pretty late and they will still let you run. It usually takes at least 10 min to get all the corrals past the starting line. And from what I understand they let you start a bit even after the last corral has started. For a couple races I arrived at the corral at starting time and was still able to run.
  • The r/RunNYC subreddit is really helpful to learn from other runners doing the program.

Summary

I would definitely recommend doing the 9+1. It kept me running consistently for six months and now I have a cool race to look forward to. Since so many people are doing the program, it’s also a great opportunity to meet other people.

Book recommendations – 2021

This year was a pretty good year for reading. One reason is the extra time at home due to COVID. I also started giving up on books I couldn’t get into more quickly . That kept me reading since finishing boring books results in procrastination as reading becomes a chore. Below are the best books I read 2021.

History

Cathedral, Forge, and Waterwheel: Technology and Invention in the Middle Ages

The Middle Ages started in the “dark ages” with the decline of the Roman Empire. It ended around 1000 years later with a Europe that greatly surpassed antiquity. The improvement was driven by new technologies such as the water mill and heavy plough. Many of the new technologies came from Asia. For instance blast furnaces existed in China around 1500 years before they came to Europe. Many of the technologies imported from Asia, such as paper and gunpowder, ended up transforming European society. Another good book on the same topic is “Medieval Technology and Social Change”.

Postwar: Europe after 1945

As someone born after the fall of the Berlin Wall, it’s easy to take a stable and industrialized Europe for granted. This book does a great job of showing the road to get there after 1945 and how it was by no means guaranteed. The knowledge of the author is impressive. I wrote more about it here.

Life in the Medieval Village

It can be difficult to understand the people of the Middle Ages. For instance when reading about customs like trial by ordeal or events like the People’s Crusade. This book does a great job of showing what it was like to live in a medieval village.

Other Non-Fiction

Status Anxiety

Many of us go through a lot of worry comparing ourselves to other people. But what our society considers prestigious (i.e. financial success) hasn’t always been the only measure. This book provides a great perspective on how chasing status is impacting us. It also gives strategies of using philosophy and art to counteract it. The author Alain de Botton is always entertaining and thought-provoking. I would highly recommend his Youtube channel The School of Life.

Almanack of Naval Ravikant

A lot of business books should have been blog posts. This book is the opposite where every page is interesting and thought-provoking. The book is available for free here.

Transforming NOKIA: The Power of Paranoid Optimism to Lead Through Colossal Change

This Forbes cover from 2007 is one of my favorite pictures:



In 2008 Nokia and Apple had around the same market cap. In 2012 Apple’s market cap was 60 times greater. This book is written by a board member during those years and the transformation that followed. It’s a good case study of a culture that got too sure of its’ own success.

Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python

This book does a great job of showing how statistics relate to data science in practice. A lot of technical books try to be exhaustive and become too long. This is an admirably concise book.

Fiction

The Road

A father and his son walk alone towards the ocean in a post-apocalyptic America. Amazing writing. It’s hard to let go of this book after having finished it. Truly horrifying.

A Special Providence

Richard Yates is one of my favorite authors and this is a great book. The book follows a young man in the second world war and the relationship with his mother.

The Lost World

Great old time adventure story by Sir Arthur Conan Doyle about an expedition in South America. Easy to get hooked into the adventure and keep reading.

Reviewing “Postwar: A History of Europe since 1945” with the help of natural language processing

A few months ago, Amazon recommended that I read Postwar. It is an incredible book, but it’s so long and contains so much information I had forgotten almost everything by the time I finished.

Therefore I wanted to see if I could write code to help review the book. I especially wanted to understand what years and events would be worth remembering. As a proxy for importance, I therefore extracted the frequency each year is mentioned:

Amount of times a specific year between 1900-1999 is mentioned in the book

One interesting takeaway is that 1960 is the only year after 1945 that’s never mentioned. I tried to confirm if this is random or not through Google Ngram searches. Of the three years 1959-1961, 1960 seems to be mentioned the least often (though it is pretty close). So it might be that 1960 is the most uneventful year in European post-war history ?

I then reviewed the book by reading the sentences that contains the most frequently mentioned years. Here’s an annotated version of the graph above showing some of the key events:

Annotations showing key events for some of the most commonly mentioned years

I also wanted to summarize broad themes from the post-war decades 1950-1980. To do that I wrote another script to extract each sentence which referenced a decade, e.g. containing “1950s”. I then used Term Frequency-Inverse Document Frequency (TF-IDF) to identify word significance. I visualized the output in Power BI so that word size is determined by TF-IDF.

Word clouds for sentences mentioning 1950:s (top left), 1960:s (top right), 1970:s (bottom left), 1980:s (bottom right)

Unfortunately the word significance was not very informative. There are some insights, for instance an important word of the 1950:s is “Film”. Cinema attendance was on it’s way down due to the introduction of the TV, but the average person in the UK still went to the cinema 28 times per year, 40% more than before the war.

But since there are relatively few sentences per decade, a few sentences can heavily skew the results. For instance the top word for the 1980:s is “percent” due to sentences like this:

“…the [French Communist] Party saw its share of the vote fall steadily at every election: from a post-war peak of 28 percent in 1946 to 18.6 percent in 1977 and thence, in a vertiginous collapse, to under 10 percent in the elections of the 1980s”

Overall this was an interesting experiment in trying to make it easier to remember key points from a book. All code can be found on Github here.