Skip to content

Instantly share code, notes, and snippets.

@pbiecek
Created March 10, 2015 21:11
Show Gist options
  • Save pbiecek/7e4e24b21486a945378a to your computer and use it in GitHub Desktop.
Save pbiecek/7e4e24b21486a945378a to your computer and use it in GitHub Desktop.
library(rvest)
library(PogromcyDanych)
serialsToParse <- levels(serialeIMDB$imdbId)
# prepare matrix for results
ratingsGroup <- matrix("", length(serialsToParse), 14)
rownames(ratingsGroup) <- serialsToParse
colnames(ratingsGroup) <- c("Males", "Females", "Aged under 18", "Males under 18",
"Females under 18", "Aged 18-29", "Males Aged 18-29", "Females Aged 18-29",
"Aged 30-44", "Males Aged 30-44", "Females Aged 30-44", "Aged 45+",
"Males Aged 45+", "Females Aged 45+")
# for all series
for (serial in serialsToParse) {
page <- html(paste0("http://www.imdb.com/title/",serial,"/ratings"))
nodes3 <- html_nodes(page, "table:nth-child(11) td:nth-child(3)")
ratingsGroup[serial,] <- gsub(html_text(nodes3)[-1], pattern="[^0-9\\.]", replacement="")[1:14]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment