Skip to content

Instantly share code, notes, and snippets.

@grosscol
Created April 24, 2018 14:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save grosscol/84d6f75a60b83cb4d27e3cd7a5b0ab1a to your computer and use it in GitHub Desktop.
Save grosscol/84d6f75a60b83cb4d27e3cd7a5b0ab1a to your computer and use it in GitHub Desktop.
Background mean per id from count data.
library(tibble)
library(dplyr)
count_data <- tibble::tribble(
~id, ~numerator, ~denominator,
"Aly", 14, 20,
"Aly", 13, 20,
"Aly", 12, 20,
"Bob", 11, 20,
"Bob", 12, 20,
"Bob", 13, 20,
"Cam", 18, 20,
"Cam", 19, 20,
"Cam", 20, 20
)
background_aves <- function(ids, vals){
masks <- lapply(ids, FUN=function(id,idz){idz != id}, idz=ids)
sapply(masks, FUN=function(mask, valz){mean(valz[mask])}, vals)
}
rate_data <- count_data %>%
mutate(rate=numerator/denominator) %>%
group_by(id) %>%
summarize(mean_rate = mean(rate)) %>%
mutate(bkgd_rate = background_aves(id, mean_rate))
# A tibble: 3 x 3
# id mean_rate bkgd_rate
# <chr> <dbl> <dbl>
# 1 Aly 0.650 0.775
# 2 Bob 0.600 0.800
# 3 Cam 0.950 0.625
@grosscol
Copy link
Author

grosscol commented Apr 24, 2018

claytonjy [2 hours ago]
not much tidier, but a bit:

library(purrr)

background_rates <- unique(count_data$id) %>%
  map_dfr(
    ~count_data %>%
      filter(id != .x) %>%
      summarize(bkgd_rate = mean(numerator/denominator)) %>%
      mutate(id = .x)
  )

count_data %>%
  group_by(id) %>%
  summarize(mean_rate = mean(numerator / denominator)) %>%
  left_join(background_rates, by = "id")

claytonjy [2 hours ago]
hard to avoid making the background-rate-table separately due to issues referring to group labels inside summarize

grosscol [2 hours ago]
That gets me away from having to use my usual mask and wickets. Thanks! The pipes inside the formula are new to me.

claytonjy [2 hours ago]
I tend to over-abuse doing way too much inside a map-function, and this might be one of those cases ¯_(ツ)_/¯

claytonjy [2 hours ago]
more compact, much more esoteric:

count_data %>%
  group_by(id) %>%
  do(
    mean_rate = mean(.$numerator / .$denominator),
    bkgd_rate = filter(count_data, id != unique(.$id)) %>% {mean(.$numerator / .$denominator)}
  ) %>%
  mutate_at(vars(-id), unlist)

grosscol [2 hours ago]
That is neater. The trailing unlist seems like the esoteric part

claytonjy [2 hours ago]
yeah, do-patterns always make me a bit weary. Also hate that the flatten_* family doesn't work in place of unlist

grosscol [2 hours ago]
Doesn't rlang have a version of flatten that works as expected?

claytonjy [2 hours ago]
seems to throw the same error with rlang:: in this instance, but swapping mutate for modify works, though you lose the vars:

count_data %>%
  group_by(id) %>%
  do(
    mean_rate = mean(.$numerator / .$denominator),
    bkgd_rate = filter(count_data, id != unique(.$id)) %>% {mean(.$numerator / .$denominator)}
  ) %>%
  modify_at(c("mean_rate", "bkgd_rate"), flatten_dbl)

claytonjy [2 hours ago]
can also avoid the un-listing entirely by sticking tibble inside do:

count_data %>%
  group_by(id) %>%
  do(tibble(
    id = unique(.$id),
    mean_rate = mean(.$numerator / .$denominator),
    bkgd_rate = filter(count_data, id != unique(.$id)) %>% {mean(.$numerator / .$denominator)}
  )) %>%
  ungroup()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment