Skip to content

Instantly share code, notes, and snippets.

@strengejacke
Created May 22, 2020 10:29
Show Gist options
  • Save strengejacke/9332e70152872182a20b318c47ab8a78 to your computer and use it in GitHub Desktop.
Save strengejacke/9332e70152872182a20b318c47ab8a78 to your computer and use it in GitHub Desktop.
performance comparison of table functions in R
See https://twitter.com/malte_grosser/status/1262862749794795520?s=20
``` r
table2 <- function(x) {
x_u <- if (is.factor(x)) sort(x[!duplicated(x)]) else sort(unique(x))
x_f <- factor(x, levels = as.character((x_u)), exclude = NULL)
t_x_f <- tabulate(x_f)
names(t_x_f) <- as.character(x_u)
t_x_f
}
table3 <- function(x) {
x_u <- if (is.factor(x)) x[!duplicated(x)] else unique(x)
x_f <- factor(x, levels = as.character((x_u)), exclude = NULL)
t_x_f <- tabulate(x_f)
data.frame(values = x_u, count = t_x_f[seq_along(x_u)])
}
set.seed(123)
x <- paste0(letters, sample(500, size = 25000, replace = TRUE))
bench::mark(
table(x),
table2(x),
table3(x),
check = FALSE,
iterations = 100
)[1:7]
#> # A tibble: 3 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 table(x) 80.5ms 90ms 10.6 2.93MB 0.560
#> 2 table2(x) 78.3ms 86.8ms 11.2 1.59MB 0.345
#> 3 table3(x) 80.9ms 94.6ms 10.2 2.96MB 0.652
x <- sample(1:100, size = 10000, replace = T)
set.seed(123)
bench::mark(
table(x),
table2(x),
table3(x),
check = FALSE,
iterations = 100
)[1:7]
#> # A tibble: 3 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 table(x) 680us 903us 1037. 689KB 10.5
#> 2 table2(x) 488us 496us 1862. 409KB 18.8
#> 3 table3(x) 652us 663us 1474. 409KB 0
```
<sup>Created on 2020-05-22 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment