public
Last active

XML package example

  • Download Gist
advent_XML.R
R
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
doInstall <- TRUE
toInstall <- c("XML")
if(doInstall){install.packages(toInstall, repos = "http://cran.us.r-project.org")}
lapply(toInstall, library, character.only = TRUE)
 
myURL <- "http://en.wikipedia.org/wiki/United_States_presidential_election,_2012"
 
allTables <- readHTMLTable(myURL)
str(allTables) # Look at the allTables object to find the specific table we want
stateTable <- allTables[[14]] # We want the 14th table in the list (maybe 13th?)
head(stateTable)
 
# Clean up:
stateTable <- stateTable[1:(nrow(stateTable)-2), ] # Drop summary lines
stateTable$State <- do.call(rbind, strsplit(as.character(stateTable$State), "\\["))[, 1]
stateTable$State[stateTable$State == "District of ColumbiaDistrict of Columbia"] <- "District of Columbia"
whichAreNumeric <- colMeans(apply(stateTable, 2, function(cc){
regexpr(",", cc) != -1})) > 0
stateTable[, whichAreNumeric] <- apply(stateTable[, whichAreNumeric], 2, function(cc){
as.numeric(gsub(",", "", cc))})
 
# Display in order of Obama's proportion of the vote:
stateTable[, c("State", "Obama", "Romney")][with(stateTable, order(Obama/Total)), ]

Should that be table 14? Table 13 appears to just be the summary "table" right above it.

Using table 13:

stateTable 2 obs. of 1 variables

                          V1

1 States/districts won by Obama
2 States/districts won by Romney

Thanks! Fixed the Gist.

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.