Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save AlbanSagouis/94a6a3eeff5e13fa4d0e74384625d7a3 to your computer and use it in GitHub Desktop.
Save AlbanSagouis/94a6a3eeff5e13fa4d0e74384625d7a3 to your computer and use it in GitHub Desktop.
microbenchmarking various ways of creating columns depending on values in another column of a data.table
library(microbenchmark)
nrows <- 10^5
tst1 <- data.table::data.table(
char = c("bb", "ds", "ok", "pb")[sample(1:4, nrows, replace = TRUE)]
)
tst2 <- tst3 <- tst4 <- tst5 <- tst6 <- data.table::copy(tst1)
microbenchmark(
times = 200L,
"base::ifelse" = tst1[, new := base::ifelse(char == "bb", "level1",
base::ifelse(char == "ds", "level2",
base::ifelse(char == "ok", "level3", "level4")
)
)],
"dplyr::if_else" = tst3[, new := dplyr::if_else(char == "bb", "level1",
dplyr::if_else(char == "ds", "level2",
dplyr::if_else(char == "ok", "level3", "level4")
)
)],
"data.table::fifelse" = tst2[, new := data.table::fifelse(char == "bb", "level1",
data.table::fifelse(char == "ds", "level2",
data.table::fifelse(char == "ok", "level3", "level4")
)
)],
"data.table::fcase" = tst4[, new := data.table::fcase(
char == "bb", "level1",
char == "ds", "level2",
char == "ok", "level3",
default = "level4"
)],
"base::match" = tst5[, new := c("level1", "level2", "level3", "level4")[base::match(char, c("bb", "ds", "ok", "pb"))]],
"data.table::chmatch" = tst6[, new := c("level1", "level2", "level3", "level4")[data.table::chmatch(char, c("bb", "ds", "ok", "pb"))]]
)
# base::match is the fastest for small (10^2) and large (10^7) columns. data.table::fcase is the most readable and allows using expressions/functions which match does not(?depends on what is put in the vector of values).
@AlbanSagouis
Copy link
Author

Unit: milliseconds

expr min lq mean median uq max neval
base::ifelse 85.2767 98.56945 114.245198 106.44735 119.03000 285.4805 200
dplyr::if_else 18.0696 25.30790 33.638454 29.59110 33.40365 120.1349 200
data.table::fifelse 5.3107 6.37775 8.843653 7.60870 9.66705 39.8622 200
data.table::fcase 4.4215 5.64410 7.818536 6.60635 8.13225 27.8380 200
base::match 1.7898 2.22105 3.871345 2.65265 3.79105 42.6920 200
data.table::chmatch 1.9193 2.22685 3.839901 2.80290 3.56775 92.9833 200

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment