I analyzed the lineups at the most popular nightclubs

admin

Nov 6, 2025 - 14:05

0 0

I analyzed the lineups at the most popular nightclubs

How I analyzed the lineups at the world's most popular nightclubs

6 November 2025·6 mins

A few years back I did a bit of dance music related data visualization over at Lazily Evaluated. My favourite was an analysis of clubs and their lineups using Resident Advisor / RA data, I called it Clubster Analysis. I always wanted to dig into the technical aspects of gathering the data, analyzing it and building the charts and graphs to tell a story and give people insight. With this blog I now have the right venue for that kind of tech talk, so here goes.

Data gathering #

To visualize data, first you have to get some! For this purpose I wrote a little scraper in Python. I used Beautiful Soup to parse the html and grab the bits and pieces I was interested in.

My scraping of a few thousand pages didn’t cause considerable load on the RA servers. But in the age of overzealous AI scrapers it’s worth being polite, so I throttled according to their robots.txt. I also maintained a local cache of html files I had already downloaded, so that I wouldn’t have fetch the same data repeatedly (past lineups are unlikely to change after the fact) just because I discovered some bug or error in my parsing.

The order I scraped in was:

Get the 20 most popular regions in RA (and then I dropped “Streamland” which was a pandemic era pseudo-region)
Fetch the most popular clubs and some related metadata for all of those regions.
For each club, get the lineups for every 2019 event of theirs (the last full year before the pandemic started).
Save the results to csv files

Clean up, verification and Analysis #

I did some spot checks to verify that my parsing was working as I expected and added tests to make sure I handled edge cases and normalized artist names. There was a lot of variance in how dates were formatted, how artists were linked, etc.

After that I analyzed the data. I built one big table/dataframe in Pandas by joining all the info from the csv files. Then I calculated the similarities between each pair of clubs in the data set using the Jaccard index. Consider all the artists that have played at two given clubs, take the intersection (number of artists that have featured in lineups at both clubs) over the union (all the artists that have performed at one or the other). As an example if Club A had 100 artists booked and Club B had 100 artists, and they had 10 bookings in common, the Jaccard index would be 10/190 = ~5%. This gives you a good way to compare large and small clubs and balances large and small lineups (some of the clubs have multiple rooms with very long events, others have one dj playing in one room all night long once a week).

Based on the Jaccard index we can build a graph, using NetworkX from all the clubs. The edges between two nodes are weighted by the similarity of those clubs. On top of the graph we run community detection to create clusters (hence the clubster name). This gives us a rough idea of which clubs are most similar, that is to say, have similar tastes in their bookings.

Results #

For the year 2019, there were 131 clubs in the data set with 8.502 events. There were 9.405 unique artists making up 30.482 individual bookings. This means that the average artist in the dataset was booked 3.24 times at those clubs in that year and the average event had 3.5 artists on the line up.

As a whole, out of 8.515 possible pairs of clubs, 3.716 pairs had some overlap in their bookings and out of those the average overlap was 1%. This was lower than I thought, the bookings at European clubs felt more homogenous to me, but I suppose they book a lot of artists. It would be interesting to get more data, recent and historic, and see how this has evolved through time.

Visualization #

This was my first time using D3 to draw charts. There was a bit of a learning curve, in earlier projects I had used higher level charting libraries which have simpler apis. But with D3 you get a lot of control over how your charts look and behave which I think I used to good effect in this instance.

My main goal was to visualize the clusters and to allow people to interact with the clubs. I coloured the clubs according to their clusters and sized them based on the number of followers they had on RA. I played around with the gravity and placement of the cluster, trying to find a balance that worked on different screen sizes as well as being a fair portrayal of the different communities.

Users can click and compare different clubs in the dataset.

I then did some scrollytelling to tell the story of the data, as I saw it, while the reader scrolls down the page. But I also added filters and interactivity for people to explore and see if they agree with my telling of the story or if they can find one of their own.

At the time I didn’t find any great React and D3 bridges, so it was a bit of a hassle getting the React components to play nice with the D3 graph, but in the end I was able to connect the two with createRef to the D3 svg component.

Besides the clustering I looked into the “resident factor”, how many times an artist was booked at a club repeatedly compared to all the one offs. This was lower than expected, most of these clubs were booking a constantly rotating assembly of talent, residents don’t play as big a part as I would have thought.

Transitioning between the different sections of these graphs was one of my favourite parts. Seeing the clusters morph into dots and candlestick charts (and back again) was oddly satisfying. Took a lot of tweaking, but I think it really tied together the scrollytelling experience.

D3 let me build neat animations to transition between charts.

I don’t think these transitions would have been possible with the higher level charting libraries I’d used previously. So the decision to go with D3 felt justified.

Summary #

This was a great pandemic project that combined web scraping, data analysis, and interactive visualization to explore the global dance music club scene. I learned me some D3 for the visualization, got better at doing cartesian graphing calculations in my head and learned about the underlying svg mechanics that power those graphs.

The results surprised me: despite my perceived homogeneity of European club bookings, only 1% average overlap between venues suggested more diverse landscape than I expected. The diminished role of residents compared to one-off bookings also challenged my assumptions about how these clubs operate. For the story telling maintaining the balance between a narrative and letting users explore and decide for themselves was a fun challenge. I think these sort of passion projects can give us deep insights into our world and culture.

The technical stack I worked with: Python, Pandas, NetworkX, D3, and React proved powerful despite some integration challenges. The complete project is available on GitHub and you can explore the live interactive visualization yourself.

I had a lot of fun building this and am proud of the result. If you’re working on cultural data analysis, need help with web scraping and visualization, or just want to discuss interesting datasets, feel free to reach out.