Created
July 14, 2023 13:47
-
-
Save mikelove/d788831af3cf76de642ba03af7a0124b to your computer and use it in GitHub Desktop.
issues with summarizing over many groups in plyranges - related to Issue #30
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(plyranges) | |
library(microbenchmark) | |
library(dplyr) | |
library(tibble) | |
make_rand_gr <- function(N, grps) { | |
data.frame(seqnames = sample(c("seq1", "seq2", "seq3"), N, replace = TRUE), | |
strand = sample(c("+", "-", "*"), N, replace = TRUE), start = rpois(N, | |
N), width = rpois(N, N), grps = sample(grps, N, replace = TRUE), | |
score = runif(N)) %>% as_granges() | |
} | |
set.seed(1) | |
r <- make_rand_gr(10000L, 1:100) | |
microbenchmark( | |
plyra = r %>% group_by(grps) %>% mutate(n = plyranges::n(), mn = mean(score)), | |
dplyr = r %>% as_tibble() %>% group_by(grps) %>% mutate(n = n(), mn = mean(score)), | |
times = 5) | |
## Unit: milliseconds | |
## expr min lq mean median uq max neval | |
## plyra 607.878177 608.307529 613.92383 609.944290 621.163120 622.326044 5 | |
## dplyr 3.563556 3.630468 4.09772 4.117753 4.179581 4.997244 5 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Edge case, want to be able to preserve S4 columns, or at least push this choice to user:
In current plyranges this works but is slow: