Skip to content

Instantly share code, notes, and snippets.

@randomgambit
Forked from expersso/scrape_nfl.R
Created February 20, 2018 21:46
Show Gist options
  • Save randomgambit/aee9fd866dc9a339c78b73686e46f6b6 to your computer and use it in GitHub Desktop.
Save randomgambit/aee9fd866dc9a339c78b73686e46f6b6 to your computer and use it in GitHub Desktop.
Scraping NFL data with purrr and tidyr goodness
# Replicating https://t.co/Jq1QfFGpjA
library(rvest)
library(stringr)
library(dplyr)
library(tidyr)
library(purrr)
library(lubridate)
get_and_clean_table <- function(url) {
paste0("http://www.pro-football-reference.com", url) %>%
read_html() %>%
html_nodes("table#game_logs") %>%
html_table() %>%
first() %>%
set_names(tolower(names(.))) %>%
filter(year != "Year") %>%
mutate(game = str_replace(game, "\\*", "")) %>%
separate(game, c("away", "home"), sep = " @ ") %>%
mutate_each(funs(as.integer), vpts:hpyds) %>%
mutate(year = ymd(year))
}
## IO
officials <- read_html("http://www.pro-football-reference.com/officials/") %>%
html_nodes("table a") %>%
{data_frame(name = html_text(.), url = html_attr(., "href"))} %>%
mutate(data = url %>% map(get_and_clean_table)) %>%
unnest() %>%
walk(write_csv, "officials_data.csv")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment