Skip to content

Instantly share code, notes, and snippets.

@muchanem
Created December 23, 2022 21:16
Show Gist options
  • Save muchanem/e98cd68ab4532e1ad3224163f7d7e2ed to your computer and use it in GitHub Desktop.
Save muchanem/e98cd68ab4532e1ad3224163f7d7e2ed to your computer and use it in GitHub Desktop.
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as stats
dataseturl = "https://zenodo.org/record/3603330/files/output-data.csv?download=1"
df = pd.read_csv(dataseturl)
df = df.dropna(subset=["reviewdate", "releaseyear"])
df["rev_year"] = pd.to_datetime(df["reviewdate"]).dt.year
df["rel_year"] = df["releaseyear"].astype("int64")
current = df[df["rev_year"] == df["rel_year"]]
old = df[df["rev_year"] != df["rel_year"]]
print(stats.ttest_ind(current["score"],old["score"]))
plt.hist(current["score"],alpha=0.5,label="Current Reviews",density=True)
plt.hist(old["score"],alpha=0.5,label="Archival Reviews",density=True)
plt.legend(loc='upper right')
plt.title("Distribution of Pitchfork Scores for Current vs Archival Reviews, ~1970-2019")
plt.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment