Skip to content

Instantly share code, notes, and snippets.

@jhofman
Created January 20, 2016 16:45
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save jhofman/35828d7662609a1a8469 to your computer and use it in GitHub Desktop.
Save jhofman/35828d7662609a1a8469 to your computer and use it in GitHub Desktop.
careful when filtering with many groups in dplyr
library(dplyr)
# create a dummy dataframe with 100,000 groups and 1,000,000 rows
# and partition by group_id
df <- data.frame(group_id=sample(1:1e5, 1e6, replace=T),
val=sample(1:100, 1e6, replace=T)) %>%
group_by(group_id)
# filter rows with a value of 1 naively
system.time(df %>% filter(val == 1))
# user system elapsed
# 1.447 0.017 1.476
# ungroup before filtering for a huge speedup
system.time(df %>% ungroup() %>% filter(val == 1))
# user system elapsed
# 0.007 0.003 0.010
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment