Skip to content

Instantly share code, notes, and snippets.

@jennybc
Last active March 4, 2018 16:14
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jennybc/e7da3b1be68be611a16ea64f573537ee to your computer and use it in GitHub Desktop.
Save jennybc/e7da3b1be68be611a16ea64f573537ee to your computer and use it in GitHub Desktop.
library(tidyverse)
## approximates big_df in the post
big_df <- mtcars %>%
select(cyl, mpg, disp) %>%
arrange(cyl) %>%
slice(17:22) %>%
rename(ID = cyl)
## dummy function that needs access to ID and data
complex_func2 <- function(ID, data) {
tibble(
ID = ID,
half = ID / 2,
N = nrow(data)
)
}
big_df %>%
group_by(ID) %>%
nest() %>%
pmap_dfr(complex_func2)
#> # A tibble: 2 x 3
#> ID half N
#> <dbl> <dbl> <int>
#> 1 6. 3. 2
#> 2 8. 4. 4
big_df_nested <- big_df %>%
group_by(ID) %>%
nest()
map2_dfr(big_df_nested$ID, big_df_nested$data, complex_func2)
#> # A tibble: 2 x 3
#> ID half N
#> <dbl> <dbl> <int>
#> 1 6. 3. 2
#> 2 8. 4. 4
## OP says:
## "this requires writing a version of "complex_func" to be explicitly aware of
## the grouping variable. When grouping changes you will need to update the
## function signature *and* the pmap call. High cognitive overhead?"
## This arises in my work when I've been using "ID" as group var in one project.
## In similar project, IDs have been re-used within GROUPs. so now group_by(ID,
## GROUP) is necessary to get data.frames for individual entities.
## complex_func doesn't change, but grouping does.
## JB: So all approaches will need a new grouping statement. And the body for
## complex_func needs or change (or not) for all approaches as well. So if this
## is just about the signature of complex_func ... this is getting very
## split-y but w/o the replication of the key vars
## dummy function that needs access to key vars and data
complex_func3 <- function(x) {
tibble(
ID = x$ID,
half = x$ID / 2,
N = nrow(x$data)
)
}
big_df %>%
group_by(ID) %>%
nest() %>%
transpose() %>%
map_dfr(complex_func3)
#> # A tibble: 2 x 3
#> ID half N
#> <dbl> <dbl> <int>
#> 1 6. 3. 2
#> 2 8. 4. 4
#' Created on 2018-03-03 by the [reprex package](http://reprex.tidyverse.org) (v0.2.0).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment