Last active
December 24, 2023 17:33
-
-
Save wch/c5642fa8aef65c9c1d3b076dc9e2b813 to your computer and use it in GitHub Desktop.
Tests with growing vectors in a loop in R
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# The code below demonstrates that in R, growing a vector in a loop can be fast, | |
# as long as there is only reference to the object. When there's only one | |
# reference to the vector, R grows it in place (in most cases). However, if | |
# there are other references to the object, R must make a copy the object | |
# instead of growing it in place, leading to slower performance. | |
# ========================================================================= | |
# Timing tests | |
# ========================================================================= | |
# Growing a vector in a loop in R is pretty fast. | |
gc() | |
system.time({ | |
x <- list() | |
for (i in 1:50000) { | |
x[[i]] <- i | |
} | |
}) | |
#> user system elapsed | |
#> 0.014 0.000 0.014 | |
# However, if there's another reference to the underlying object, then R is | |
# forced to make a copy of the vector each time you grow it, making it much | |
# slower. In this example, we create another reference, y, in each iteration. | |
# The result is about 1300x slower. | |
gc() | |
system.time({ | |
x <- list() | |
for (i in 1:50000) { | |
y <- x | |
x[[i]] <- i | |
} | |
}) | |
#> user system elapsed | |
#> 16.532 1.564 18.089 | |
# You might think that the mere assignment to y in each iteration is what makes | |
# it slow. But if we keep that line and add another line, rm(y), that speeds | |
# things up about 50x. So it's not the assignment to y that causes slowness, | |
# it's that the y binding exists when x is modified. | |
gc() | |
system.time({ | |
x <- list() | |
for (i in 1:5e4) { | |
y <- x | |
rm(y) | |
x[[i]] <- i | |
} | |
}) | |
#> user system elapsed | |
#> 0.395 0.005 0.401 | |
# ========================================================================= | |
# Use tracemem to print out a message each time x is copied | |
# ========================================================================= | |
# Growing a vector in a loop: In each iteration, we've added an additional | |
# reference to the underlying object. When there are multiple references to the | |
# underlying object, assigning past the end of the vector forces a copy to be | |
# made. Note that tracemem() causes a message to be printed when the underlying | |
# object is copied. | |
x <- list() | |
for (i in 1:4) { | |
tracemem(x) | |
cat(i, "\n") | |
y <- x # Make an additional reference to the list | |
x[[i]] <- i | |
} | |
#> 1 | |
#> tracemem[0x10c928700 -> 0x105780038]: | |
#> 2 | |
#> tracemem[0x1342a92a8 -> 0x1342a93c0]: | |
#> 3 | |
#> tracemem[0x135a79c08 -> 0x135a79d08]: | |
#> 4 | |
#> tracemem[0x1371211a8 -> 0x1371211f8]: | |
# When there isn't the additional reference (y) it does not make a copy when | |
# growing the vector in a loop. tracemem() causes nothing to be printed because | |
# no copies are made. | |
x <- list() | |
for (i in 1:4) { | |
tracemem(x) | |
cat(i, "\n") | |
x[[i]] <- i | |
} | |
#> 1 | |
#> 2 | |
#> 3 | |
#> 4 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment