Skip to content

Instantly share code, notes, and snippets.

@jeroen
Last active August 29, 2015 14:06
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jeroen/d33a24958d99bb969ac0 to your computer and use it in GitHub Desktop.
Save jeroen/d33a24958d99bb969ac0 to your computer and use it in GitHub Desktop.
Parse JSON HTTP stream
# This http data stream contains lines of minified JSON data. Therefore
# we can batch process this line by line. To speed things up, we actually
# process 100 lines at a time, by collapsing them into a JSON array.
library(jsonlite)
stopifnot(packageVersion("jsonlite") >= "0.9.11")
# note that open="r" results in line-by-line reading.
gzstream <- gzcon(url("http://78.46.48.103/sample/hourly_14.json.gz", open="r"))
batches <- list();
i <- 1;
while(length(records <- readLines(gzstream, n = 100))){
message("Batch ", i, ": found ", length(records), " lines of json...")
json <- paste0("[", paste0(records, collapse=","), "]")
batches[[i]] <- fromJSON(json, validate=TRUE)
i <- i+1
}
close(gzstream)
weather <- rbind.pages(batches)
rm(batches)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment