Skip to content

Instantly share code, notes, and snippets.

@stephaniehicks
Created February 6, 2017 21:09
Show Gist options
  • Save stephaniehicks/482e53c1d7e89035e119009a061f5d53 to your computer and use it in GitHub Desktop.
Save stephaniehicks/482e53c1d7e89035e119009a061f5d53 to your computer and use it in GitHub Desktop.
Scape ASA Awards website
library(rvest)
library(stringr)
urlpage <- "http://www.amstat.org/ASA/Your-Career/Awards-and-Scholarships.aspx"
awards <- read_html(urlpage)
awardTitles <- awards %>%
html_nodes("#ctl00_TemplateBody_WebPartManager1_gwpciNewContentHtml_ciNewContentHtml_Panel_NewContentHtml a") %>% html_text()
awardURLs <- awards %>%
html_nodes("#ctl00_TemplateBody_WebPartManager1_gwpciNewContentHtml_ciNewContentHtml_Panel_NewContentHtml a") %>% html_attr("href")
awardURLs <- paste0("http://www.amstat.org", awardURLs)
awardDescription <- NULL
for(i in 1:length(awardURLs)){
awardtmp <- read_html(awardURLs[i])
tmp <- awardtmp %>% html_nodes("#WebPartZone1_Page1") %>% html_text()
awardDescription <- c(awardDescription, tmp)
}
dat = data.frame("title" = awardTitles, "url" = awardURLs, "description" = awardDescription)
write.csv(dat, "ASA_Awards.csv", row.names = FALSE)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment