Skip to content

Instantly share code, notes, and snippets.

@zmjones
Last active January 2, 2016 10:18
Show Gist options
  • Save zmjones/8288612 to your computer and use it in GitHub Desktop.
Save zmjones/8288612 to your computer and use it in GitHub Desktop.
scrapes and visualizes ratings for Futurama from IMDB
import requests
from bs4 import BeautifulSoup
from pandas import DataFrame
url = 'http://www.imdb.com/title/tt0149460/eprate?ref_=ttep_sa_2'
soup = BeautifulSoup(requests.get(url).content)
def parse_table(soup, df):
table = soup.find("table")
for row in table.find_all("tr")[1:]:
data = list(row.find_all("td"))
for i in range(len(data)):
data[i] = data[i].get_text(strip=True)
df.append(data)
return df
table = parse_table(soup, [])
df = DataFrame(table, columns=['episode', 'title', 'rating', 'votes', ''])
df = df.drop('', 1)
df.to_csv('futurama.csv', na_rep="NA", index=False, encoding="utf-8")
require(ggplot2)
require(plyr)
df <- read.csv("futurama.csv", colClasses = "character")
df[, c("season", "episode")] <- ldply(strsplit(as.character(df$episode), ".", fixed = TRUE))
ind <- by(df, list(df$season), function(x) {
x <- x[order(as.integer(x$episode)), ]
row.names(x)
})
df$title <- factor(df$title, levels = df$title[as.integer(unlist(ind))])
df$title <- factor(df$title, levels = rev(levels(df$title)))
p <- ggplot(df, aes(x = as.numeric(rating), y = title, colour = season))
p <- p + geom_point()
p <- p + labs(x = "user rating", y = "episode title", title = "Futurama Ratings via IMDB")
ggsave("futurama.png", height = 14, width = 6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment