Skip to content

Instantly share code, notes, and snippets.

@kjhealy
Forked from grantmcdermott/collapse_mask.R
Created February 15, 2022 03:13
Show Gist options
  • Save kjhealy/88fcf3ba730fa3ad722f62ad6437f05e to your computer and use it in GitHub Desktop.
Save kjhealy/88fcf3ba730fa3ad722f62ad6437f05e to your computer and use it in GitHub Desktop.
Benchmarking collapse_mask
## Context: https://twitter.com/grant_mcdermott/status/1493400952878952448
options(collapse_mask = "all") # NB: see `help('collapse-options')`
library(dplyr)
library(data.table)
library(collapse) # Needs to come after library(dplyr) for collapse_mask to work
flights = fread('https://raw.githubusercontent.com/Rdatatable/data.table/master/vignettes/flights14.csv')
vars = c('dep_delay', 'arr_delay', 'air_time', 'distance', 'hour')
## Note we explicitly call dplyr::<function> for the 1st line in this benchmark,
## since we've masked the regular dplyr operations with their collapse
## equivalents (i.e. 2nd line).
library(microbenchmark)
microbenchmark(
dplyr = flights |> dplyr::group_by(month, day, origin, dest) |> dplyr::summarise(across(vars, sum)),
collapse = flights |> group_by(month, day, origin, dest) |> summarise(across(vars, sum)),
data.table = flights[, lapply(.SD, sum), by=.(month, day, origin, dest), .SDcols=vars],
times = 2
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval cld
#> dplyr 1041.243193 1041.243193 1061.813553 1061.813553 1082.383912 1082.383912 2 b
#> collapse 10.350356 10.350356 10.428991 10.428991 10.507626 10.507626 2 a
#> data.table 9.615242 9.615242 9.778382 9.778382 9.941521 9.941521 2 a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment