Skip to content

Instantly share code, notes, and snippets.

@cpsievert
Created January 14, 2014 03:21
Show Gist options
  • Save cpsievert/8412484 to your computer and use it in GitHub Desktop.
Save cpsievert/8412484 to your computer and use it in GitHub Desktop.
pbp_all.R
#template for fixing bad xml files from nba.com
library(XML)
con <- url("http://www.nba.com/games/game_component/dynamic/20130528/MIAIND/pbp_all.xml")
corrupt <- readLines(con)
close(con)
tmp <- gsub("<![CDATA[", "", corrupt, fixed=TRUE) #do I really need "&lt;![CDATA"?
file <- gsub("]]>", "", tmp, fixed=TRUE) #do I really need "]]&gt;"?
doc <- xmlParseString(xml(file))
node <- getNodeSet(doc, path="/")
l <- xmlToList(node[[1]])
l[[1]][[1]][[1]][[1]]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment