Skip to content

Instantly share code, notes, and snippets.

@zmjones zmjones/futurama.R
Last active Jan 2, 2016

Embed
What would you like to do?
scrapes and visualizes ratings for Futurama from IMDB
import requests
from bs4 import BeautifulSoup
from pandas import DataFrame
url = 'http://www.imdb.com/title/tt0149460/eprate?ref_=ttep_sa_2'
soup = BeautifulSoup(requests.get(url).content)
def parse_table(soup, df):
table = soup.find("table")
for row in table.find_all("tr")[1:]:
data = list(row.find_all("td"))
for i in range(len(data)):
data[i] = data[i].get_text(strip=True)
df.append(data)
return df
table = parse_table(soup, [])
df = DataFrame(table, columns=['episode', 'title', 'rating', 'votes', ''])
df = df.drop('', 1)
df.to_csv('futurama.csv', na_rep="NA", index=False, encoding="utf-8")
require(ggplot2)
require(plyr)
df <- read.csv("futurama.csv", colClasses = "character")
df[, c("season", "episode")] <- ldply(strsplit(as.character(df$episode), ".", fixed = TRUE))
ind <- by(df, list(df$season), function(x) {
x <- x[order(as.integer(x$episode)), ]
row.names(x)
})
df$title <- factor(df$title, levels = df$title[as.integer(unlist(ind))])
df$title <- factor(df$title, levels = rev(levels(df$title)))
p <- ggplot(df, aes(x = as.numeric(rating), y = title, colour = season))
p <- p + geom_point()
p <- p + labs(x = "user rating", y = "episode title", title = "Futurama Ratings via IMDB")
ggsave("futurama.png", height = 14, width = 6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.