Last active
August 29, 2015 13:56
-
-
Save wch/9233873 to your computer and use it in GitHub Desktop.
R string collection speed tests
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
String collection speed tests | |
======================================================== | |
Source for this document at https://gist.github.com/wch/9233873 | |
What's the fastest way to collect strings together in R and put them into a single output string? Probably the fastest way is to simply use `paste0('string1', 'string2')`, and so on -- but this assumes that you have all the strings collected and ready at one time. In many cases, this isn't possible, and you need to collect the strings together as you go. | |
This document contains benchmarks for different ways of collecting strings together. Some highlights: | |
* `textConnection` is super slow. | |
* Writing to an anonymous file, with `file(open = "w+")` is much faster. | |
* Collecting the results in a character vector is even faster, if you're smart about allocating space for the vector. | |
Some setup code for the benchmarks: | |
```{r, tidy = FALSE} | |
# Number of iterations | |
count <- 20000 | |
# Some text to output | |
txt <- paste(rep("a", 100), collapse = "") | |
# The expected output | |
expected <- paste(rep(txt, count), collapse = "") | |
assert <- function(val) { | |
if (!val) stop("Assertion failed") | |
} | |
``` | |
## Naive string concatenation | |
This grows a character vector as it goes along. | |
```{r, tidy = FALSE} | |
system.time({ | |
res <- character() | |
for (i in 1:count) res[i] <- txt | |
out <- paste(res, collapse = "") | |
assert(identical(out, expected)) | |
}) | |
``` | |
## String concatenation, with vector preallocated | |
The drawback to this method is that you can't always know the total number of strings ahead of time. | |
```{r, tidy = FALSE} | |
system.time({ | |
res <- character(count) | |
for (i in 1:count) res[i] <- txt | |
out <- paste(res, collapse = "") | |
assert(identical(out, expected)) | |
}) | |
``` | |
## Using `textConnection` and `cat` | |
```{r, tidy = FALSE} | |
system.time({ | |
htmlResult <- NULL | |
conn <- textConnection("htmlResult", "w", local = TRUE) | |
for (i in 1:count) cat(txt, file = conn) | |
close(conn) | |
out <- paste(htmlResult, collapse = "\n") | |
assert(identical(out, expected)) | |
}) | |
``` | |
## With `file` and `cat` | |
```{r, tidy = FALSE} | |
system.time({ | |
conn <- file(open="w+") | |
for (i in 1:count) cat(txt, file = conn) | |
flush(conn) | |
out <- readLines(conn, warn = FALSE) | |
close(conn) | |
assert(identical(out, expected)) | |
}) | |
``` | |
## With `file` and `writeChar` | |
```{r, tidy = FALSE} | |
system.time({ | |
conn <- file(open="w+b") | |
for (i in 1:count) writeChar(txt, conn, eos = NULL) | |
flush(conn) | |
out <- readLines(conn, warn = FALSE) | |
close(conn) | |
assert(identical(out, expected)) | |
}) | |
``` | |
## textVector, implemented with character vector | |
`textVector` uses a character vector that doubles in length whenever a new item is added that makes it exceed its current length. | |
```{r, tidy = FALSE} | |
# textVector implemented with char vector | |
textVector <- function(n = 1e2) { | |
output <- vector("character", n) | |
i <- 0 | |
add <- function(text) { | |
i <<- i + 1 | |
if (i > n) { | |
n <<- 2 * n | |
length(output) <<- n | |
} | |
output[i] <<- text | |
} | |
extract <- function() { | |
paste(output[seq_len(i)], collapse ="") | |
} | |
list(add = add, extract = extract) | |
} | |
system.time({ | |
tv <- textVector() | |
add <- tv$add | |
for (i in 1:count) add(txt) | |
out <- tv$extract() | |
assert(identical(out, expected)) | |
}) | |
``` | |
## textVector, implemented with lists | |
This version of `textVector2` uses a list that doubles in length whenever a new item is added that makes it exceed its current length. | |
```{r, tidy = FALSE} | |
# textVector implemented with lists | |
textVector2 <- function(n = 1e2) { | |
output <- list() | |
length(output) <- n | |
i <- 0 | |
add <- function(text) { | |
i <<- i + 1 | |
if (i > n) { | |
n <<- 2 * n | |
length(output) <<- n | |
} | |
output[[i]] <<- text | |
} | |
extract <- function() { | |
paste(output[seq_len(i)], collapse ="") | |
} | |
list(add = add, extract = extract) | |
} | |
system.time({ | |
tv <- textVector2() | |
add <- tv$add | |
for (i in 1:count) add(txt) | |
out <- tv$extract() | |
assert(identical(out, expected)) | |
}) | |
``` | |
## Session information | |
```{r, tidy = FALSE} | |
sessionInfo() | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment