Created
October 14, 2015 14:15
-
-
Save jimhester/01087e190618cc91a213 to your computer and use it in GitHub Desktop.
Alternative rvest script for parsing basketball reference ("http://www.basketball-reference.com/boxscores/201506140GSW.html"), from https://www.dataquest.io/blog/python-vs-r/ and https://news.ycombinator.com/item?id=10386174
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(rvest) | |
reformat_table <- function(tbl) { | |
# Fix the column names | |
colnames(tbl) <- tbl[1, ] | |
# Missing values for players who did not play | |
dnp <- tbl[[2]] == "Did Not Play" | |
tbl[dnp, -1] <- NA | |
# Add a type column to signify starters and reserves | |
reserves <- which(tbl[[1]] == "Reserves")[1] | |
tbl$type <- "Reserve" | |
tbl$type[seq_len(reserves - 1)] <- "Starter" | |
tbl$type[dnp] <- "DNP" | |
# Remove header and summary columns | |
tbl[c(-1, -reserves, -NROW(tbl)),] | |
} | |
# Parse the data | |
"http://www.basketball-reference.com/boxscores/201506140GSW.html" %>% | |
read_html() %>% | |
html_nodes(".stats_table[id*='_basic']") %>% | |
html_table %>% | |
lapply(reformat_table) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
See also tidyverse/rvest#111 ;)