wch/writetest.Rmd

## writetest.Rmd
String collection speed tests
========================================================

Source for this document at https://gist.github.com/wch/9233873

What's the fastest way to collect strings together in R and put them into a single output string? Probably the fastest way is to simply use `paste0('string1', 'string2')`, and so on -- but this assumes that you have all the strings collected and ready at one time. In many cases, this isn't possible, and you need to collect the strings together as you go.

This document contains benchmarks for different ways of collecting strings together. Some highlights:

* `textConnection` is super slow.
* Writing to an anonymous file, with `file(open = "w+")` is much faster.
* Collecting the results in a character vector is even faster, if you're smart about allocating space for the vector.


Some setup code for the benchmarks:

```{r, tidy = FALSE}
# Number of iterations
count <- 20000

# Some text to output
txt <- paste(rep("a", 100), collapse = "")

# The expected output
expected <- paste(rep(txt, count), collapse = "")

assert <- function(val) {
  if (!val) stop("Assertion failed")
}
```

## Naive string concatenation

This grows a character vector as it goes along.

```{r, tidy = FALSE}
system.time({
  res <- character()
  for (i in 1:count) res[i] <- txt
  out <- paste(res, collapse = "")
  assert(identical(out, expected))
})
```

## String concatenation, with vector preallocated

The drawback to this method is that you can't always know the total number of strings ahead of time.

```{r, tidy = FALSE}
system.time({
  res <- character(count)
  for (i in 1:count) res[i] <- txt
  out <- paste(res, collapse = "")
  assert(identical(out, expected))
})
```

## Using `textConnection` and `cat`

```{r, tidy = FALSE}
system.time({
  htmlResult <- NULL
  conn <- textConnection("htmlResult", "w", local = TRUE)
  for (i in 1:count) cat(txt, file = conn)
  close(conn)
  out <- paste(htmlResult, collapse = "\n")
  assert(identical(out, expected))
})
```


## With `file` and `cat`

```{r, tidy = FALSE}
system.time({
  conn <- file(open="w+")
  for (i in 1:count) cat(txt, file = conn)
  flush(conn)
  out <- readLines(conn, warn = FALSE)
  close(conn)
  assert(identical(out, expected))
})
```

## With `file` and `writeChar`

```{r, tidy = FALSE}
system.time({
  conn <- file(open="w+b")
  for (i in 1:count) writeChar(txt, conn, eos = NULL)
  flush(conn)
  out <- readLines(conn, warn = FALSE)
  close(conn)
  assert(identical(out, expected))
})
```


## textVector, implemented with character vector

`textVector` uses a character vector that doubles in length whenever a new item is added that makes it exceed its current length.

```{r, tidy = FALSE}
# textVector implemented with char vector
textVector <- function(n = 1e2) {
  output <- vector("character", n)
  i <- 0

  add <- function(text) {
    i <<- i + 1
    if (i > n) {
      n <<- 2 * n
      length(output) <<- n
    }
    output[i] <<- text
  }
  extract <- function() {
    paste(output[seq_len(i)], collapse ="")
  }

  list(add = add, extract = extract)
}

system.time({
  tv <- textVector()
  add <- tv$add
  for (i in 1:count) add(txt)
  out <- tv$extract()
  assert(identical(out, expected))
})
```


## textVector, implemented with lists

This version of `textVector2` uses a list that doubles in length whenever a new item is added that makes it exceed its current length.

```{r, tidy = FALSE}
# textVector implemented with lists
textVector2 <- function(n = 1e2) {
  output <- list()
  length(output) <- n
  i <- 0

  add <- function(text) {
    i <<- i + 1
    if (i > n) {
      n <<- 2 * n
      length(output) <<- n
    }
    output[[i]] <<- text
  }
  extract <- function() {
    paste(output[seq_len(i)], collapse ="")
  }

  list(add = add, extract = extract)
}

system.time({
  tv <- textVector2()
  add <- tv$add
  for (i in 1:count) add(txt)
  out <- tv$extract()
  assert(identical(out, expected))
})
```


## Session information

```{r, tidy = FALSE}
sessionInfo()
```
	String collection speed tests
	========================================================

	Source for this document at https://gist.github.com/wch/9233873

	What's the fastest way to collect strings together in R and put them into a single output string? Probably the fastest way is to simply use `paste0('string1', 'string2')`, and so on -- but this assumes that you have all the strings collected and ready at one time. In many cases, this isn't possible, and you need to collect the strings together as you go.

	This document contains benchmarks for different ways of collecting strings together. Some highlights:

	* `textConnection` is super slow.
	* Writing to an anonymous file, with `file(open = "w+")` is much faster.
	* Collecting the results in a character vector is even faster, if you're smart about allocating space for the vector.


	Some setup code for the benchmarks:

	```{r, tidy = FALSE}
	# Number of iterations
	count <- 20000

	# Some text to output
	txt <- paste(rep("a", 100), collapse = "")

	# The expected output
	expected <- paste(rep(txt, count), collapse = "")

	assert <- function(val) {
	if (!val) stop("Assertion failed")
	}
	```

	## Naive string concatenation

	This grows a character vector as it goes along.

	```{r, tidy = FALSE}
	system.time({
	res <- character()
	for (i in 1:count) res[i] <- txt
	out <- paste(res, collapse = "")
	assert(identical(out, expected))
	})
	```

	## String concatenation, with vector preallocated

	The drawback to this method is that you can't always know the total number of strings ahead of time.

	```{r, tidy = FALSE}
	system.time({
	res <- character(count)
	for (i in 1:count) res[i] <- txt
	out <- paste(res, collapse = "")
	assert(identical(out, expected))
	})
	```

	## Using `textConnection` and `cat`

	```{r, tidy = FALSE}
	system.time({
	htmlResult <- NULL
	conn <- textConnection("htmlResult", "w", local = TRUE)
	for (i in 1:count) cat(txt, file = conn)
	close(conn)
	out <- paste(htmlResult, collapse = "\n")
	assert(identical(out, expected))
	})
	```


	## With `file` and `cat`

	```{r, tidy = FALSE}
	system.time({
	conn <- file(open="w+")
	for (i in 1:count) cat(txt, file = conn)
	flush(conn)
	out <- readLines(conn, warn = FALSE)
	close(conn)
	assert(identical(out, expected))
	})
	```

	## With `file` and `writeChar`

	```{r, tidy = FALSE}
	system.time({
	conn <- file(open="w+b")
	for (i in 1:count) writeChar(txt, conn, eos = NULL)
	flush(conn)
	out <- readLines(conn, warn = FALSE)
	close(conn)
	assert(identical(out, expected))
	})
	```


	## textVector, implemented with character vector

	`textVector` uses a character vector that doubles in length whenever a new item is added that makes it exceed its current length.

	```{r, tidy = FALSE}
	# textVector implemented with char vector
	textVector <- function(n = 1e2) {
	output <- vector("character", n)
	i <- 0

	add <- function(text) {
	i <<- i + 1
	if (i > n) {
	n <<- 2 * n
	length(output) <<- n
	}
	output[i] <<- text
	}
	extract <- function() {
	paste(output[seq_len(i)], collapse ="")
	}

	list(add = add, extract = extract)
	}

	system.time({
	tv <- textVector()
	add <- tv$add
	for (i in 1:count) add(txt)
	out <- tv$extract()
	assert(identical(out, expected))
	})
	```


	## textVector, implemented with lists

	This version of `textVector2` uses a list that doubles in length whenever a new item is added that makes it exceed its current length.

	```{r, tidy = FALSE}
	# textVector implemented with lists
	textVector2 <- function(n = 1e2) {
	output <- list()
	length(output) <- n
	i <- 0

	add <- function(text) {
	i <<- i + 1
	if (i > n) {
	n <<- 2 * n
	length(output) <<- n
	}
	output[[i]] <<- text
	}
	extract <- function() {
	paste(output[seq_len(i)], collapse ="")
	}

	list(add = add, extract = extract)
	}

	system.time({
	tv <- textVector2()
	add <- tv$add
	for (i in 1:count) add(txt)
	out <- tv$extract()
	assert(identical(out, expected))
	})
	```


	## Session information

	```{r, tidy = FALSE}
	sessionInfo()
	```