Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Scrape the performance data from ROH collections
library(rvest)
# URL: http://www.rohcollections.org.uk/SearchResults.aspx?searchtype=performance&page=0&genre=Opera
performances <- c()
for (i in 0:233){
site_perf <- paste0("http://www.rohcollections.org.uk/SearchResults.aspx?searchtype=performance&page=",
i,
"&genre=Opera")
print(site_perf) # optional print to see the progress
html_perf <- read_html(site_perf)
cast_perf <- html_nodes(html_perf, "tr td") %>% html_text()
performances <- append(performances, cast_perf)
}
# turn into a data frame - splits the vector into a 4-column DF
perfdf <- as.data.frame(matrix(performances, ncol = 4, byrow = T))
colnames(perfdf) <- c("Title", "Date", "DayTime", "Company") # rename columns
# turn Date column into date type
perfdf$Date <- as.Date(perfdf$Date, format = "%d %B %Y")
write.csv(perfdf, "perfdf.csv", row.names = F)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.