Skip to content

Instantly share code, notes, and snippets.

@wch
Last active August 29, 2015 13:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wch/9233873 to your computer and use it in GitHub Desktop.
Save wch/9233873 to your computer and use it in GitHub Desktop.
R string collection speed tests
String collection speed tests
========================================================
Source for this document at https://gist.github.com/wch/9233873
What's the fastest way to collect strings together in R and put them into a single output string? Probably the fastest way is to simply use `paste0('string1', 'string2')`, and so on -- but this assumes that you have all the strings collected and ready at one time. In many cases, this isn't possible, and you need to collect the strings together as you go.
This document contains benchmarks for different ways of collecting strings together. Some highlights:
* `textConnection` is super slow.
* Writing to an anonymous file, with `file(open = "w+")` is much faster.
* Collecting the results in a character vector is even faster, if you're smart about allocating space for the vector.
Some setup code for the benchmarks:
```{r, tidy = FALSE}
# Number of iterations
count <- 20000
# Some text to output
txt <- paste(rep("a", 100), collapse = "")
# The expected output
expected <- paste(rep(txt, count), collapse = "")
assert <- function(val) {
if (!val) stop("Assertion failed")
}
```
## Naive string concatenation
This grows a character vector as it goes along.
```{r, tidy = FALSE}
system.time({
res <- character()
for (i in 1:count) res[i] <- txt
out <- paste(res, collapse = "")
assert(identical(out, expected))
})
```
## String concatenation, with vector preallocated
The drawback to this method is that you can't always know the total number of strings ahead of time.
```{r, tidy = FALSE}
system.time({
res <- character(count)
for (i in 1:count) res[i] <- txt
out <- paste(res, collapse = "")
assert(identical(out, expected))
})
```
## Using `textConnection` and `cat`
```{r, tidy = FALSE}
system.time({
htmlResult <- NULL
conn <- textConnection("htmlResult", "w", local = TRUE)
for (i in 1:count) cat(txt, file = conn)
close(conn)
out <- paste(htmlResult, collapse = "\n")
assert(identical(out, expected))
})
```
## With `file` and `cat`
```{r, tidy = FALSE}
system.time({
conn <- file(open="w+")
for (i in 1:count) cat(txt, file = conn)
flush(conn)
out <- readLines(conn, warn = FALSE)
close(conn)
assert(identical(out, expected))
})
```
## With `file` and `writeChar`
```{r, tidy = FALSE}
system.time({
conn <- file(open="w+b")
for (i in 1:count) writeChar(txt, conn, eos = NULL)
flush(conn)
out <- readLines(conn, warn = FALSE)
close(conn)
assert(identical(out, expected))
})
```
## textVector, implemented with character vector
`textVector` uses a character vector that doubles in length whenever a new item is added that makes it exceed its current length.
```{r, tidy = FALSE}
# textVector implemented with char vector
textVector <- function(n = 1e2) {
output <- vector("character", n)
i <- 0
add <- function(text) {
i <<- i + 1
if (i > n) {
n <<- 2 * n
length(output) <<- n
}
output[i] <<- text
}
extract <- function() {
paste(output[seq_len(i)], collapse ="")
}
list(add = add, extract = extract)
}
system.time({
tv <- textVector()
add <- tv$add
for (i in 1:count) add(txt)
out <- tv$extract()
assert(identical(out, expected))
})
```
## textVector, implemented with lists
This version of `textVector2` uses a list that doubles in length whenever a new item is added that makes it exceed its current length.
```{r, tidy = FALSE}
# textVector implemented with lists
textVector2 <- function(n = 1e2) {
output <- list()
length(output) <- n
i <- 0
add <- function(text) {
i <<- i + 1
if (i > n) {
n <<- 2 * n
length(output) <<- n
}
output[[i]] <<- text
}
extract <- function() {
paste(output[seq_len(i)], collapse ="")
}
list(add = add, extract = extract)
}
system.time({
tv <- textVector2()
add <- tv$add
for (i in 1:count) add(txt)
out <- tv$extract()
assert(identical(out, expected))
})
```
## Session information
```{r, tidy = FALSE}
sessionInfo()
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment