Skip to content

Instantly share code, notes, and snippets.

@jamesdunham
Created November 14, 2016 15:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jamesdunham/a43d4a0957757a3744296f89d1ba53ac to your computer and use it in GitHub Desktop.
Save jamesdunham/a43d4a0957757a3744296f89d1ba53ac to your computer and use it in GitHub Desktop.
scrape Congressional session-years from history.house.gov
library(rvest)
library(data.table)
library(lubridate)
doc = read_html("http://history.house.gov/Institution/Session-Dates/All/")
sessions = doc %>%
html_nodes(xpath = "//div[contains(@class, 'manual-table')]/table") %>%
html_table() %>%
as.data.frame()
setDT(sessions)
sessions[, congress := gsub("^\\s*([0-9]{1,3}).*", "\\1", Congress)]
sessions[, congress := gsub("[^0-9]", "", congress)]
sessions[, had_special := grepl("[*]", Congress)]
sessions[, start_date := mdy(Beginning.Date)]
sessions[, end_date := mdy(AdjournmentDate1)]
int_cols = c("CalendarDays2", "LegislativeDays")
sessions[, c(int_cols) := lapply(.SD, function(x) as.integer(trimws(x))), .SDcols = int_cols]
sessions[, c("start_year", "end_year") := lapply(.SD, floor_date, "year"),
.SDcols = c("start_date", "end_date")]
setnames(sessions, c("CalendarDays2", "LegislativeDays", "Session"),
c("calendar_days", "legislative_days", "session"))
sessions = sessions[, .(congress, session, had_special, start_date, end_date,
start_year, end_year)]
sessions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment