Skip to content

Instantly share code, notes, and snippets.

@jilmun
Last active May 3, 2016 17:07
Show Gist options
  • Save jilmun/dcd5b7d5a21cb8b4447bb81e0f2fb4ea to your computer and use it in GitHub Desktop.
Save jilmun/dcd5b7d5a21cb8b4447bb81e0f2fb4ea to your computer and use it in GitHub Desktop.
require(dplyr)
require(rvest)
options(stringsAsFactors = FALSE)
url_base <- "http://securities.stanford.edu/list-mode.html?page="
tbl.clactions <- data.frame(
"Filing.Name" = character(0),
"Filing.Date" = character(0),
"District.Court" = character(0),
"Exchange" = character(0),
"Ticker" = character(0) )
for (i in 1:ceiling(4149/20)) { # total filings: 4149, listed 20 per page
url <- paste0(url_base, i)
tbl.page <- url %>%
read_html() %>%
html_nodes(xpath='//*[@id="records"]/table') %>%
html_table()
names(tbl.page[[1]]) <- names(tbl.clactions)
tbl.clactions <- bind_rows(tbl.clactions, tbl.page[[1]])
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment