Skip to content

Instantly share code, notes, and snippets.

@ramnathv
Created February 9, 2011 04:30
Show Gist options
  • Save ramnathv/817883 to your computer and use it in GitHub Desktop.
Save ramnathv/817883 to your computer and use it in GitHub Desktop.
Scrape Historical NHL Skaters Data
# LOAD LIBRARIES REQUIRED
library(plyr);
library(XML)
# FIGURE OUT PATTERN OF URL FOR EACH SEASON
url.b1 = 'http://ca.sports.yahoo.com/nhl/stats/byposition?pos=C,RW,LW,D';
url.b2 = '&sort=14&conference=NHL&year=season_';
url.b3 = y;
# WRITE FUNCTION TO EXTRACT DATA FOR A SEASON
extract_data = function(y){
url = paste(url.b1, url.b2, as.character(y), sep = '');
tab = readHTMLTable(url, stringsAsFactors = F)[[4]];
tab = tab[,-c(2*(2:16))] # remove empty columns
names(tab) = tab[1,];
tab = tab[-1,];
tab$year = y;
tab
}
# APPLY FUNCTION TO EXTRACT DATA FOR ALL SEASONS
skaters = ldply(2005:2010, extract_data, .progress = 'text');
# CLEAN DATA FRAME AS REQUIRED
skaters[,-c(1, 2)] = sapply(skaters[, -c(1, 2)], as.numeric);
skaters[, c(1, 2)] = sapply(skaters[, c(1, 2)], as.factor);
names(skaters) = tolower(names(skaters));
names(skaters)[7] = 'pm'
write.csv(skaters, 'skaters.csv', row.names = F);
@dingdongStatyuan
Copy link

I meet this error in R '错误于class(output[[var]]) <- class(value) : NULL是不能有属性的', I make y equal to 2012 and have ldply(2009:2012, extract_data, .progress = 'text'). Could explain to why i meet this problem? Than you so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment