Skip to content

Instantly share code, notes, and snippets.

@mpjdem
Created November 25, 2019 12:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mpjdem/73068cd83d259b943da58e1c3864d5f7 to your computer and use it in GitHub Desktop.
Save mpjdem/73068cd83d259b943da58e1c3864d5f7 to your computer and use it in GitHub Desktop.
test of dtplyr speed
library(dplyr)
library(data.table)
library(dtplyr)
df <- data.frame(customer = sample(100000, size = 5000000, replace = TRUE),
value = runif(5000000))
# dplyr
t0 <- Sys.time()
res_dp <- df %>%
group_by(customer) %>%
summarise(value = sum(value))
Sys.time() - t0
# data.table
t0 <- Sys.time()
dt <- as.data.table(df)
res_dt <- dt[, .(value = sum(value)), keyby = .(customer)]
Sys.time() - t0
# dtplyr
t0 <- Sys.time()
res_dtp <- df %>%
lazy_dt() %>%
group_by(customer) %>%
summarise(value = sum(value)) %>%
as_tibble()
Sys.time() - t0
# dtplyr is /lazy/
# dtplyr shows how R can manipulate language
#
# why use dtplyr if you know DT?
# - consistency of style
# - verbosity is good for larger code bases
# - immutability is good for production code
# - cooperating with people who don't use DT
#
# what if you must use DT snippets but don't want to import it all?
# withr::with_package()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment