Last active
March 4, 2018 16:14
-
-
Save jennybc/e7da3b1be68be611a16ea64f573537ee to your computer and use it in GitHub Desktop.
split apply combine exploration, adding more to this post https://coolbutuseless.bitbucket.io/2018/03/03/split-apply-combine-my-search-for-a-replacement-for-group_by---do/
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(tidyverse) | |
## approximates big_df in the post | |
big_df <- mtcars %>% | |
select(cyl, mpg, disp) %>% | |
arrange(cyl) %>% | |
slice(17:22) %>% | |
rename(ID = cyl) | |
## dummy function that needs access to ID and data | |
complex_func2 <- function(ID, data) { | |
tibble( | |
ID = ID, | |
half = ID / 2, | |
N = nrow(data) | |
) | |
} | |
big_df %>% | |
group_by(ID) %>% | |
nest() %>% | |
pmap_dfr(complex_func2) | |
#> # A tibble: 2 x 3 | |
#> ID half N | |
#> <dbl> <dbl> <int> | |
#> 1 6. 3. 2 | |
#> 2 8. 4. 4 | |
big_df_nested <- big_df %>% | |
group_by(ID) %>% | |
nest() | |
map2_dfr(big_df_nested$ID, big_df_nested$data, complex_func2) | |
#> # A tibble: 2 x 3 | |
#> ID half N | |
#> <dbl> <dbl> <int> | |
#> 1 6. 3. 2 | |
#> 2 8. 4. 4 | |
## OP says: | |
## "this requires writing a version of "complex_func" to be explicitly aware of | |
## the grouping variable. When grouping changes you will need to update the | |
## function signature *and* the pmap call. High cognitive overhead?" | |
## This arises in my work when I've been using "ID" as group var in one project. | |
## In similar project, IDs have been re-used within GROUPs. so now group_by(ID, | |
## GROUP) is necessary to get data.frames for individual entities. | |
## complex_func doesn't change, but grouping does. | |
## JB: So all approaches will need a new grouping statement. And the body for | |
## complex_func needs or change (or not) for all approaches as well. So if this | |
## is just about the signature of complex_func ... this is getting very | |
## split-y but w/o the replication of the key vars | |
## dummy function that needs access to key vars and data | |
complex_func3 <- function(x) { | |
tibble( | |
ID = x$ID, | |
half = x$ID / 2, | |
N = nrow(x$data) | |
) | |
} | |
big_df %>% | |
group_by(ID) %>% | |
nest() %>% | |
transpose() %>% | |
map_dfr(complex_func3) | |
#> # A tibble: 2 x 3 | |
#> ID half N | |
#> <dbl> <dbl> <int> | |
#> 1 6. 3. 2 | |
#> 2 8. 4. 4 | |
#' Created on 2018-03-03 by the [reprex package](http://reprex.tidyverse.org) (v0.2.0). |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment