Last active
April 22, 2016 11:35
-
-
Save markdanese/28b9f5412df55efceba754fee2363444 to your computer and use it in GitHub Desktop.
A test of the new feather package in R using Medicare Part D drug reimbursement data
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# load libraries -------------------------------------------------------------------- | |
library(data.table) | |
library(feather) | |
# US Part D Drug prices 2013: 500 MB zip, 2.9 GB uncompressed ----------------------- | |
pde_link <- "http://download.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Downloads/PartD_Prescriber_PUF_NPI_DRUG_13.zip" | |
tf <- tempfile() | |
download.file(pde_link, tf) | |
x <- unzip(tf, exdir = tempdir()) | |
df <- fread(x[2], verbose = TRUE) | |
unlink(x) | |
rm(x, tf) | |
# various write/save options -------------------------------------------------------------- | |
write_feather_time <- | |
system.time( | |
write_feather(df, "./data/analysis/pde2013.fthr") | |
) | |
write_rds_T_time <- | |
system.time( | |
saveRDS(df, "./data/analysis/pde2013T.rds", compress = TRUE) | |
) | |
write_rds_F_time <- | |
system.time( | |
saveRDS(df, "./data/analysis/pde2013F.rds", compress = FALSE) | |
) | |
write_csv_time <- | |
system.time( | |
fwrite(df, "./data/analysis/pde2013.csv") | |
) # requires data.table 1.9.7 + with fwrite added | |
# various write options ------------------------------------------------------------- | |
read_feather_time <- | |
system.time( | |
df1 <- read_feather("./data/analysis/pde2013.fthr") | |
) | |
rm(df1) | |
gc() | |
read_rds_T_time <- | |
system.time( | |
df2 <- readRDS("./data/analysis/pde2013T.rds") | |
) | |
rm(df2) | |
gc() | |
read_rds_F_time <- | |
system.time( | |
df3 <- readRDS("./data/analysis/pde2013F.rds") | |
) | |
rm(df3) | |
gc() | |
# summarize results ----------------------------------------------------------------- | |
output <- ls(pattern = "_time") | |
times <- lapply(output, function(x) get(x)) | |
names(times) <- output | |
print(times) |
Thanks @jangorecki -- fixed.
Today's update (20 April 2016) now has fwrite()
writing the same file in 3.8-4.0 seconds. (By the way, the previous version was 11.2 sec and not 13. Not that it matters any more!)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Nice, the speed up is amazing. You may update the R script and uncomment fwrite lones so it can be reproduced for copy-paste.