Skip to content

Instantly share code, notes, and snippets.

@cpsievert
Created March 4, 2014 21:26
Show Gist options
  • Save cpsievert/9356058 to your computer and use it in GitHub Desktop.
Save cpsievert/9356058 to your computer and use it in GitHub Desktop.
Scrape OECD Data
library(XML2R)
file <- "http://stats.oecd.org/restsdmx/sdmx.ashx/GetData/UN_DEN/AUS+CAN+FRA+DEU+NZL+GBR+USA+OECD/OECD?startTime=1960&endTime=2012"
obs <- XML2Obs(file)
# Rename observations so we can 'recycle' country labels to time/value
nms <- names(obs)
nms[grepl("SeriesKey//Value$", nms)] <- "root"
nms[grepl("Obs//Time$", nms)] <- "root//time"
nms[grepl("Obs//ObsValue$", nms)] <- "root//value"
names(obs) <- nms
obs <- add_key(obs, parent="root", recycle="value", key.name="country")
tabs <- collapse_obs(obs)
# There has to be a more graceful way to combine value/time into the same data.frame
# I might work on a new XML2R verb to 'squish' two 'siblings' into the same observation
dat <- data.frame(tabs[["root//value"]][,c("value", "country")],
time = tabs[["root//time"]][,c("XML_value")],
stringsAsFactors = FALSE)
dat$value2 <- round(as.numeric(dat$value))
head(dat)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment