Skip to content

Instantly share code, notes, and snippets.

@MilesMcBain
Last active July 6, 2016 00:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save MilesMcBain/d08cc4f675e00edf806b7f5f20f29ea2 to your computer and use it in GitHub Desktop.
Save MilesMcBain/d08cc4f675e00edf806b7f5f20f29ea2 to your computer and use it in GitHub Desktop.
A demonstration the effect of increasing number groups on mutate() in dplyr.
library(ggplot2)
library(dplyr)
system.time({
nycflights13::flights %>%
mutate(timestring = lubridate::date(time_hour))
})
#0 groups
# user system elapsed
# 0.048 0.000 0.047
system.time({
nycflights13::flights %>%
group_by(carrier) %>%
mutate(timestring = lubridate::date(time_hour))
})
#16 groups
#user system elapsed
#0.068 0.000 0.070
system.time({
nycflights13::flights %>%
group_by(day) %>%
mutate(timestring = lubridate::date(time_hour))
})
#365 groups
# user system elapsed
#0.06 0.00 0.06
system.time({
nycflights13::flights %>%
group_by(origin, dest, sched_dep_time) %>%
mutate(timestring = lubridate::date(time_hour))
})
#13 017 groups
# user system elapsed
# 1.344 0.012 1.355
#
system.time({
nycflights13::flights %>%
group_by(tailnum, flight) %>%
mutate(timestring = lubridate::date(time_hour))
})
#179 858 groups
# user system elapsed
# 20.36 0.30 20.66
system.time({
nycflights13::flights %>%
group_by(row_number(flight)) %>%
mutate(timestring = lubridate::date(time_hour))
})
#336 776 groups
# user system elapsed
# 38.896 0.496 39.396
time_df <- data_frame(
groups = c(0, 16, 365, 13017, 179858, 336776),
time = c(0.048, 0.068, 0.06, 1.344, 20.36, 38.896)
)
ggplot(time_df) + geom_line(mapping = aes(x=groups, y=time)) +
geom_point(mapping = aes(x=groups, y=time)) +
labs(x="Number of groups", y="Seconds to mutate()", title="The Cost of Groups") +
scale_x_continuous(labels = comma) +
theme_minimal()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment