Skip to content

Instantly share code, notes, and snippets.

@markdanese
Last active April 22, 2016 11:35
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save markdanese/28b9f5412df55efceba754fee2363444 to your computer and use it in GitHub Desktop.
Save markdanese/28b9f5412df55efceba754fee2363444 to your computer and use it in GitHub Desktop.
A test of the new feather package in R using Medicare Part D drug reimbursement data
# load libraries --------------------------------------------------------------------
library(data.table)
library(feather)
# US Part D Drug prices 2013: 500 MB zip, 2.9 GB uncompressed -----------------------
pde_link <- "http://download.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Downloads/PartD_Prescriber_PUF_NPI_DRUG_13.zip"
tf <- tempfile()
download.file(pde_link, tf)
x <- unzip(tf, exdir = tempdir())
df <- fread(x[2], verbose = TRUE)
unlink(x)
rm(x, tf)
# various write/save options --------------------------------------------------------------
write_feather_time <-
system.time(
write_feather(df, "./data/analysis/pde2013.fthr")
)
write_rds_T_time <-
system.time(
saveRDS(df, "./data/analysis/pde2013T.rds", compress = TRUE)
)
write_rds_F_time <-
system.time(
saveRDS(df, "./data/analysis/pde2013F.rds", compress = FALSE)
)
write_csv_time <-
system.time(
fwrite(df, "./data/analysis/pde2013.csv")
) # requires data.table 1.9.7 + with fwrite added
# various write options -------------------------------------------------------------
read_feather_time <-
system.time(
df1 <- read_feather("./data/analysis/pde2013.fthr")
)
rm(df1)
gc()
read_rds_T_time <-
system.time(
df2 <- readRDS("./data/analysis/pde2013T.rds")
)
rm(df2)
gc()
read_rds_F_time <-
system.time(
df3 <- readRDS("./data/analysis/pde2013F.rds")
)
rm(df3)
gc()
# summarize results -----------------------------------------------------------------
output <- ls(pattern = "_time")
times <- lapply(output, function(x) get(x))
names(times) <- output
print(times)
@jangorecki
Copy link

Nice, the speed up is amazing. You may update the R script and uncomment fwrite lones so it can be reproduced for copy-paste.

@markdanese
Copy link
Author

Thanks @jangorecki -- fixed.
Today's update (20 April 2016) now has fwrite() writing the same file in 3.8-4.0 seconds. (By the way, the previous version was 11.2 sec and not 13. Not that it matters any more!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment