Skip to content

Instantly share code, notes, and snippets.

@tbwester
Last active May 28, 2018 13:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tbwester/35c9d73cb67d78895daf85a96076de14 to your computer and use it in GitHub Desktop.
Save tbwester/35c9d73cb67d78895daf85a96076de14 to your computer and use it in GitHub Desktop.

9 Years of Angry: An Angry Metal Guy Retrospective by the Numbers

The code used to make the figures for this study can be found at http://github.com/oruxl/angrymetalpy

Introduction

A few years ago, a friend of mine, who you may know as Kronos, began writing album reviews for a little site which you may know as Angry Metal Guy. I started reading AMG regularly after that, and have enjoyed following the site ever since. The staff here are superb writers, and the community that's been built around the reviews, made manifest through comments, often has uniquely thoughtful insights not found on your typical music critique website.

Praise aside, I am a man of hard science, and damn it, I want to be sure that whatever enjoyment I'm getting out of this site comes from 100% farm-fed, home-grown and statistically unbiased methodology. To this end, I decided to look for clues in the cold, sterile numbers of AMG's history. The following report details my findings.

The Data

I collected information from each review posted on the site back to the "first" review of 2009 (Amorphis - Skyforger). There is one review from 2004 that was posted and scored (Orphaned Land - Mabool), but I did not include it in the main body of works due to its temporal isolation from the rest of the articles. I kept track of the each review's author, tags, score, and publication date. In total, I found over 3300 reviews of scored albums (i.e. excluding unscored albums and special reviews like TYMHM, Yer Metal is Old, live reviews, etc.), which, as far statistics go, is a gold mine for analysis.

Most of the information from each review can be found in it's webpage metadata. Specifically, I used the official scores listed in each review page's tag. Some care had to be taken here since not all reviews have this tag. In these cases, I had to fill the scores in manually, but in total there were only about 80 of these articles.

I also found three albums that were scored twice. Resumed - Alienations, Deafheaven - New Bermuda and Leprous - Malina. These reviews are added twice to the data set, but, as you will soon see, their scores are averaged together in the analysis.

Fun Facts

There have been 23 reviews that have been deemed perfect (score = 5.0) on this site. I show their historical occurrences on the timeline below, along with the total number of reviews posted each month. One thing to note is that the density of 5.0 albums seems to have decreased in recent times. I'll discuss this trend more thoroughly in the next section.

alt text

I also wanted to see the comings and goings of site staff. I show the number of reviews posted by site reviewers on the heat map below (brighter colors mean more reviews posted by a reviewer in a particular month). A surprising fact for me was that clearly site staff are recruited in groups, as indicated by the blocky nature of reviewers popping up out of the sea of nothingness. Most regular contributors write a few articles per month, with the exception of one over-achiever...

alt text

Now, the age old question: What's the average score awarded on AMG? Below I show the distribution of scores for all 3300+ reviews, and you can see that it is slightly skewed compared to your off-the-shelf bell curve. The average score is 3.07.

alt text

You might see this and think "Wow, that distribution is skewed! 3.0 is far above a fair 2.5 average!" Well, astute reader, read on and you'll soon find out that things become more interesting when looking at trends over time.

I also looked at the score distributions for some common genres I see frequently listed in reviews. I present a table of "average scores for albums tagged with [genre]". Now, genres are definitely a bit fuzzy, and any specific album may appear in multiple categories if it contains multiple genre tags. Plus, some genres are underrepresented on this site. Bear these caveats in mind as you feast your eyes on the following table:

Reviews tagged with... Average score
Death Metal 3.10 +/- 0.04
Black Metal 3.11 +/- 0.04
Doom Metal 3.16 +/- 0.05
Progressive Metal 3.24 +/- 0.06
Folk Metal 3.32 +/- 0.09
Thrash Metal 3.01 +/- 0.06
Heavy Metal 3.09 +/- 0.05
Hardcore 3.11 +/- 0.11
Power Metal 3.00 +/- 0.07
Hard Rock 2.94 +/- 0.11

I think in general we can see that all averages are roughly the same. Quantitatively, they are all within one or two uncertainties (the number after the +/-) of each other. The uncertainties are proportional to the number of albums that contribute to each average score. For example, we have a lot of death metal albums, so we can be pretty confident in the average. In contrast, there aren't many hard rock reviews, so even though the average is lower, it is inherently not as precise of a number.

Analysis

With nine years of data, there certainly must be trends to uncover. Below I show a plot of average monthly scores awarded by all reviewers over the entire site's history. Immediately obvious is the downward trend. To be more quantitative, I ran a weighted linear regression, using weights proportional to the total number of reviews in each month (represented by error bars). The regression result is the red line in the figure, and indeed it has a downward trend that is statistically significant1. One could interpret this in a few ways. A less-likely interpretation (albeit more entertaining) is that the AMG staff are perfect arbiters of musical quality, and that music has simply been getting worse over time. A more plausible interpretation might be that the AMG staff have refined their expectations of good albums over the years and are indeed approaching the coveted unbiased 2.5 average, as one might expect.

This plot also reveals what may be the worst month for music on AMG. In December 2013, I found 5 reviews, all with scores less than 3.0. They are: Netherbird - The Ferocious Tides of Fate (2.5), Alcest - Shelter (2.0), Thrall - Aokigahara Jukai (2.0), Boston - Life, Love & Hope (1.0) and Sheol - Sepulchral Ruins Below the Temple (2.5). Ouch.

alt text

1 If this trend continues, in 40 year's time, AMG will give all albums scores of 0, which is a humorous yet cautionary tale about the dangers of extrapolation.

Something else we can do is look for genre effects. In other words, do the AMG staff favor particular kinds of music? This might be worth knowing if you, the reader, are particularly concerned about staff favoritism on your review site. I investigated this question by separating the average score versus month data above by the most popular genre tags used in the reviews (Death Metal, Folk Metal, etc.). I then performed the same linear regressions on each of these subsets, and I plot the fitted slope (i.e. change in average score vs. time) for each genre below2. Data points below the blue line trend to lower scores over time, while data points above the blue line trend to higher scores. You can also see the total site trend as a red bar (from the slope of the line in first plot) on this axis too. Once again, the error bars are proportional to the number of reviews of each genre.

alt text

If AMG were biased towards particular genres, you might expect the average score to decrease due to one or more genres, but stay the same or increase for the favored genres. I am sorry to report the somewhat boring conclusion that there is no evidence of any bias evident from this plot. Not only do all of the genres (except one--which I will discuss momentarily) trend down, they are all trending down at roughly the same rate. This last point can be quantified by looking at the overlap in error bars between all genres. Hardcore, while trending upwards, has very few reviews associated with it, so its error bars are large, and furthermore, its error bars overlap with 0 (i.e. no trend).

2 This is addressing "Simpson's Paradox", which isn't really a paradox, but describes instances when a regression over an entire data set trends one direction while subsets of the data trend another. This is often the case when there are inherent differences in groups of data one hasn't considered, e.g. data sets containing men and women, or, metal album genres.


Trends are nice, but patterns are better. One natural thing to look for in a data set like this one are so-called "seasonal effects". A seasonal effect in this context might be: Do AMG reviewers give higher scores during the [summer/fall/winter/spring]? To address these questions, we can look at the autocorrelation of the data set. Autocorrelation is a measure of the extent to which an individual data point depends on previous data points in a series. I plot the autocorrelation function (ACF) below for the de-trended3 AMG's average monthly scores, as a function of the lag (i.e. the number of months back one looks). If there were repeating patterns, one would see spikes in the autocorrelation around a lag of 12, which would mean that average scores from one year are correlated with average scores from the same season in the previous year. All of the values are low and random; anything above 0.2 might otherwise be cause for investigation, and the spike at lag 0 is not relevant. We can therefore conclude that there are no seasonal trends in AMG's review process, so if you are a band looking to submit material for a review on AMG, rest assured that your timing will not affect your score.

alt text

3Detrending (or looking at the regression residuals) is important here since the whole data set trends downward. You won't see seasonal patterns in the ACF plot looing at the raw data, because the slope means that each data point is necessarily correlated with the previous point. We want to see if there's any additional patterns in spite of the overall trend.

Conclusions

Delving into the AMG archives has convinced me of one thing: The AMG reviewers are very self-consistent. There is no evident favoritism towards particular genres, and scores seem to be not inflated or skewed when looking at recent years. Furthermore, over the years, the consistency is increasing. I suspect in the next few years, we will see the average site score each month level out to around 2.5. At that point, it will be interesting to look at above-average- and below-average-score months to get a robust sense of the metal scene. All this said, I can now sleep easy knowing that my beloved metal review site appears satisfactorily and statistically fair.

@richardtomsett
Copy link

Great stuff! I'm Jean-Luc Ricard, proud to have provided you with some data :) What does your scraping tool pull down - did you grab the full review texts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment