Skip to content

Instantly share code, notes, and snippets.

@alaiacano
Created January 14, 2016 16:02
Show Gist options
  • Save alaiacano/2a98c95f355cc7697f08 to your computer and use it in GitHub Desktop.
Save alaiacano/2a98c95f355cc7697f08 to your computer and use it in GitHub Desktop.

Blink 182 and Green Day were and are still very popular bands. Therefore, they'll end up on a lot of playlists. Here are some genres where it would make sense to see them. I'll assign a distribution too, since obviously a band isn't limited to one kind of playlist.

So let's say Green Day is on 2,000 different playlists, broken down among these genres:

  • Punk - 35%
  • Rock - 20%
  • 90s - 20%
  • Alternative - 20%
  • Party - 5%

U2 is an even more popular band. Say they're on 50,000 playlists. Here are some playlists that they might show up:

  • Rock - 50%
  • Alternative - 10%
  • 80s - 25%
  • 90s - 15%
  • Punk - 0.1%

How about Simple Minds ("don't you forget about me"). Let's guess that they're on 500 playlists, all about 80's songs:

  • 80s - 100%

Crass was not a popular band and still is not. Maybe 50 smelly dudes and their dogs have collected enough money outside Peet's to afford spotify.

  • Punk - 60%
  • Crust - 40%

Now if we look at the number of tracks on each playlist genre, we get this:

Punk:

  • Green Day = 2,000 * (35/100) = 700 playlists
  • U2 = 50,000 * (0.1 / 100) = 50 playlists
  • Crass = 50 * (60/100) = 30 playlists

90s:

  • U2 = 50,000 * (15/100) = 7,500 playlists
  • Green Day = 2,000 * (20/100) = 400 playlists

80s:

  • U2 = 50,000 * (20/100) = 10,000 playlists
  • Simple Minds = 500 * (100/100) = 500 playlists

So by ranking this way, Green Day is still the most punk, followed by U2, then Crass, even though only 0.1% of the playlists that U2 are on are considered "punk" and 60% of Crass's.

U2 is also "more 80's" than a band that only existed in the 80's and "more 90's" than a band that was only good in the 90's. Just because it's so popular.

If you said "suggest a punk band to me" nobody would pick U2. Not even the 7,500 hypothetical people who put them on a "punk" playlist would pick U2. The real qualifier would be that a band is more "punk" than they are "not-punk."

What people use in the world of information retrieval (aka search engines) is a statistic called tf-idf. You can go pretty deep into explaining how to implement this, but it comes down to two numbers:

  • Term Frequency - "how often is this band in this genre" aka the percentages I made up above.
  • Inverse Document Frequency - "How often is this band in any playlist" - aka how popular is the band overall.

When combined, the incredibly popular bands get penalized for having a broad genre distribution. If U2 was on 50,000 punk playlists, then sure they come out on top in the rankings. But they aren't.

The (very pretty) poly-graph post only calculates the term frequency. Maybe the author doesn't have access to the Spotify data needed to calculate the other part of the equation, but the whole thing is about the "most popular punk bands" not the "most punk bands."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment