Skip to content

Instantly share code, notes, and snippets.

@markdanese
Last active April 22, 2016 11:35
Show Gist options
  • Save markdanese/28b9f5412df55efceba754fee2363444 to your computer and use it in GitHub Desktop.
Save markdanese/28b9f5412df55efceba754fee2363444 to your computer and use it in GitHub Desktop.
A test of the new feather package in R using Medicare Part D drug reimbursement data
# load libraries --------------------------------------------------------------------
library(data.table)
library(feather)
# US Part D Drug prices 2013: 500 MB zip, 2.9 GB uncompressed -----------------------
pde_link <- "http://download.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Downloads/PartD_Prescriber_PUF_NPI_DRUG_13.zip"
tf <- tempfile()
download.file(pde_link, tf)
x <- unzip(tf, exdir = tempdir())
df <- fread(x[2], verbose = TRUE)
unlink(x)
rm(x, tf)
# various write/save options --------------------------------------------------------------
write_feather_time <-
system.time(
write_feather(df, "./data/analysis/pde2013.fthr")
)
write_rds_T_time <-
system.time(
saveRDS(df, "./data/analysis/pde2013T.rds", compress = TRUE)
)
write_rds_F_time <-
system.time(
saveRDS(df, "./data/analysis/pde2013F.rds", compress = FALSE)
)
write_csv_time <-
system.time(
fwrite(df, "./data/analysis/pde2013.csv")
) # requires data.table 1.9.7 + with fwrite added
# various write options -------------------------------------------------------------
read_feather_time <-
system.time(
df1 <- read_feather("./data/analysis/pde2013.fthr")
)
rm(df1)
gc()
read_rds_T_time <-
system.time(
df2 <- readRDS("./data/analysis/pde2013T.rds")
)
rm(df2)
gc()
read_rds_F_time <-
system.time(
df3 <- readRDS("./data/analysis/pde2013F.rds")
)
rm(df3)
gc()
# summarize results -----------------------------------------------------------------
output <- ls(pattern = "_time")
times <- lapply(output, function(x) get(x))
names(times) <- output
print(times)
@markdanese
Copy link
Author

Thanks @jangorecki -- fixed.
Today's update (20 April 2016) now has fwrite() writing the same file in 3.8-4.0 seconds. (By the way, the previous version was 11.2 sec and not 13. Not that it matters any more!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment