Skip to content

Instantly share code, notes, and snippets.

@tomhopper
Forked from jhofman/dplyr_filter_ungroup.R
Created January 29, 2016 20:31
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tomhopper/edb10f680510d092cd56 to your computer and use it in GitHub Desktop.
Save tomhopper/edb10f680510d092cd56 to your computer and use it in GitHub Desktop.
careful when filtering with many groups in dplyr
library(dplyr)
# create a dummy dataframe with 100,000 groups and 1,000,000 rows
# and partition by group_id
df <- data.frame(group_id=sample(1:1e5, 1e6, replace=T),
val=sample(1:100, 1e6, replace=T)) %>%
group_by(group_id)
# filter rows with a value of 1 naively
system.time(df %>% filter(val == 1))
# user system elapsed
# 1.447 0.017 1.476
# ungroup before filtering for a huge speedup
system.time(df %>% ungroup() %>% filter(val == 1))
# user system elapsed
# 0.007 0.003 0.010
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment