Skip to content

Instantly share code, notes, and snippets.

@dsparks
Created November 30, 2012 16:31
  • Star 3 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save dsparks/4176820 to your computer and use it in GitHub Desktop.
XML package example
doInstall <- TRUE
toInstall <- c("XML")
if(doInstall){install.packages(toInstall, repos = "http://cran.us.r-project.org")}
lapply(toInstall, library, character.only = TRUE)
myURL <- "http://en.wikipedia.org/wiki/United_States_presidential_election,_2012"
allTables <- readHTMLTable(myURL)
str(allTables) # Look at the allTables object to find the specific table we want
stateTable <- allTables[[14]] # We want the 14th table in the list (maybe 13th?)
head(stateTable)
# Clean up:
stateTable <- stateTable[1:(nrow(stateTable)-2), ] # Drop summary lines
stateTable$State <- do.call(rbind, strsplit(as.character(stateTable$State), "\\["))[, 1]
stateTable$State[stateTable$State == "District of ColumbiaDistrict of Columbia"] <- "District of Columbia"
whichAreNumeric <- colMeans(apply(stateTable, 2, function(cc){
regexpr(",", cc) != -1})) > 0
stateTable[, whichAreNumeric] <- apply(stateTable[, whichAreNumeric], 2, function(cc){
as.numeric(gsub(",", "", cc))})
# Display in order of Obama's proportion of the vote:
stateTable[, c("State", "Obama", "Romney")][with(stateTable, order(Obama/Total)), ]
@ClintWeathers
Copy link

Should that be table 14? Table 13 appears to just be the summary "table" right above it.

Using table 13:

stateTable 2 obs. of 1 variables

                          V1

1 States/districts won by Obama
2 States/districts won by Romney

@dsparks
Copy link
Author

dsparks commented Dec 2, 2012

Thanks! Fixed the Gist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment