A bit of web-scraping with Cheerio

17 Feb 2025

I had an idea for a little holiday project that required a list of episodes from The Rest Is History podcast. On their ‘Episodes’ page, they have a player, and a list of post entries for the most recent eighteen podcasts. There is a ‘show all’ button, but it doesn’t work.

The player does contain the full list of episodes (about 600) including a number of duplicates, so I expected if I inspected the network calls that I’d see a JSON package arriving with what I wanted. This is what I almost always find these days so I’ve had very little call to do any real web scraping - it’s normally just a matter of locating the endpoint and perhaps extracting an API key from a header.

So the list must be in the HTML - let’s have a look. This is a big file (4000 lines formatted) with a lot of divs and jQuery, but here’s our