Skip to content

Instantly share code, notes, and snippets.

@EmilHvitfeldt
Last active July 16, 2019 16:17
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save EmilHvitfeldt/9dce9bc068d2ec71c7f13675ea06a29c to your computer and use it in GitHub Desktop.
Save EmilHvitfeldt/9dce9bc068d2ec71c7f13675ea06a29c to your computer and use it in GitHub Desktop.
perform overlapping groups with custom grouped data frame
library(tidyverse)
group_by_in <- function(data, x, sep) {
  enquo_x <- rlang::quo_name(enquo(x))
  
  splitted <- strsplit(data[[enquo_x]], sep)
  
  values <- unique(unlist(splitted))
  
  which_list <- function(data, value) {
    which(map_lgl(data, ~ value %in% .x))
  }

  new_grouped_df(
    x = data, 
    groups = tibble(!!enquo_x := values,
                    ".rows" := map(values, ~ which_list(splitted, .x)))
  )
}

data <- tribble(
  ~x, ~y,
  1, "a",
  1, "b",
  2, "a,b",
  3, "b",
  2, "a",
  3, "c",
  1, "a,c",
  3, "a,b,c",
  2, NA
)

group_by_in(data, y, ",") %>%
  summarise(mean = mean(x),
            count = n())
#> # A tibble: 4 x 3
#>   y      mean count
#>   <chr> <dbl> <int>
#> 1 a      1.8      5
#> 2 b      2.25     4
#> 3 c      2.33     3
#> 4 <NA>   2        1

Created on 2019-07-13 by the reprex package (v0.3.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment