Skip to content

Instantly share code, notes, and snippets.

@wch
Last active December 24, 2023 17:33
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wch/c5642fa8aef65c9c1d3b076dc9e2b813 to your computer and use it in GitHub Desktop.
Save wch/c5642fa8aef65c9c1d3b076dc9e2b813 to your computer and use it in GitHub Desktop.
Tests with growing vectors in a loop in R
# The code below demonstrates that in R, growing a vector in a loop can be fast,
# as long as there is only reference to the object. When there's only one
# reference to the vector, R grows it in place (in most cases). However, if
# there are other references to the object, R must make a copy the object
# instead of growing it in place, leading to slower performance.
# =========================================================================
# Timing tests
# =========================================================================
# Growing a vector in a loop in R is pretty fast.
gc()
system.time({
x <- list()
for (i in 1:50000) {
x[[i]] <- i
}
})
#> user system elapsed
#> 0.014 0.000 0.014
# However, if there's another reference to the underlying object, then R is
# forced to make a copy of the vector each time you grow it, making it much
# slower. In this example, we create another reference, y, in each iteration.
# The result is about 1300x slower.
gc()
system.time({
x <- list()
for (i in 1:50000) {
y <- x
x[[i]] <- i
}
})
#> user system elapsed
#> 16.532 1.564 18.089
# You might think that the mere assignment to y in each iteration is what makes
# it slow. But if we keep that line and add another line, rm(y), that speeds
# things up about 50x. So it's not the assignment to y that causes slowness,
# it's that the y binding exists when x is modified.
gc()
system.time({
x <- list()
for (i in 1:5e4) {
y <- x
rm(y)
x[[i]] <- i
}
})
#> user system elapsed
#> 0.395 0.005 0.401
# =========================================================================
# Use tracemem to print out a message each time x is copied
# =========================================================================
# Growing a vector in a loop: In each iteration, we've added an additional
# reference to the underlying object. When there are multiple references to the
# underlying object, assigning past the end of the vector forces a copy to be
# made. Note that tracemem() causes a message to be printed when the underlying
# object is copied.
x <- list()
for (i in 1:4) {
tracemem(x)
cat(i, "\n")
y <- x # Make an additional reference to the list
x[[i]] <- i
}
#> 1
#> tracemem[0x10c928700 -> 0x105780038]:
#> 2
#> tracemem[0x1342a92a8 -> 0x1342a93c0]:
#> 3
#> tracemem[0x135a79c08 -> 0x135a79d08]:
#> 4
#> tracemem[0x1371211a8 -> 0x1371211f8]:
# When there isn't the additional reference (y) it does not make a copy when
# growing the vector in a loop. tracemem() causes nothing to be printed because
# no copies are made.
x <- list()
for (i in 1:4) {
tracemem(x)
cat(i, "\n")
x[[i]] <- i
}
#> 1
#> 2
#> 3
#> 4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment