Created
July 1, 2021 15:39
-
-
Save jmcastagnetto/fef3f3a2778028e7efb6836d6d8e3f8e to your computer and use it in GitHub Desktop.
Testing readr::read_csv(), data.table::fread() and vroom::vroom()
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Test done to check/answer the question at https://stackoverflow.com/questions/68211842/why-is-vroom-so-slow | |
# Downloaded CSV file on 2021-07-01 from: | |
# https://www.datosabiertos.gob.pe/dataset/vacunaci%C3%B3n-contra-covid-19-ministerio-de-salud-minsa | |
# and then compressed it with gzip | |
# $ zcat vacunas_covid.csv.gz | wc -l | |
# 7311644 | |
library(readr) | |
library(vroom) | |
library(data.table) | |
library(microbenchmark) | |
csv_file <- "vacunas_covid.csv.gz" | |
microbenchmark( | |
readr={ | |
t <- read_csv(csv_file, col_types=cols()) | |
write_csv(t, csv_file) | |
},data.table={ | |
t <- fread(csv_file) | |
fwrite(t, csv_file, sep=",") | |
},vroom={ | |
t <- vroom(csv_file, delim=",", show_col_types = F) | |
vroom_write(t, csv_file, delim=",") | |
}, | |
times=5 | |
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
R version 4.1.0 (2021-05-18) -- "Camp Pontanezen" | |
Copyright (C) 2021 The R Foundation for Statistical Computing | |
Platform: x86_64-pc-linux-gnu (64-bit) | |
R is free software and comes with ABSOLUTELY NO WARRANTY. | |
You are welcome to redistribute it under certain conditions. | |
Type 'license()' or 'licence()' for distribution details. | |
Natural language support but running in an English locale | |
R is a collaborative project with many contributors. | |
Type 'contributors()' for more information and | |
'citation()' on how to cite R or R packages in publications. | |
Type 'demo()' for some demos, 'help()' for on-line help, or | |
'help.start()' for an HTML browser interface to help. | |
Type 'q()' to quit R. | |
> # Test done to check/answer the question at https://stackoverflow.com/questions/68211842/why-is-vroom-so-slow | |
> # Downloaded CSV file on 2021-07-01 from: | |
> # https://www.datosabiertos.gob.pe/dataset/vacunaci%C3%B3n-contra-covid-19-ministerio-de-salud-minsa | |
> # and then compressed it with gzip | |
> | |
> library(readr) | |
> library(vroom) | |
> library(data.table) | |
> library(microbenchmark) | |
> csv_file <- "vacunas_covid.csv.gz" | |
> microbenchmark( | |
+ readr={ | |
+ t <- read_csv(csv_file, col_types=cols()) | |
+ write_csv(t, csv_file) | |
+ },data.table={ | |
+ t <- fread(csv_file) | |
+ fwrite(t, csv_file, sep=",") | |
+ },vroom={ | |
+ t <- vroom(csv_file, delim=",", show_col_types = F) | |
+ vroom_write(t, csv_file, delim=",") | |
+ }, | |
+ times=5 | |
+ ) | |
Unit: seconds | |
expr min lq mean median uq max neval cld | |
readr 101.72094 105.75384 109.16869 106.08111 108.06967 124.21788 5 c | |
data.table 28.18751 30.32570 31.06592 30.44838 33.12746 33.24055 5 a | |
vroom 48.65399 51.52445 55.78264 52.89823 53.83582 72.00071 5 b | |
> | |
> | |
> | |
> proc.time() | |
user system elapsed | |
1065.499 39.475 990.722 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment