Under the section Vectorise
(and also briefly mentioned under section Do as little as possible
), one point I think would be nice to have is to be aware of the data structure the vectorised functions are implemented for. Using vectorised code without understanding that is a form of "premature optimisation" as well, IMHO.
For example, consider the case of rowSums
on a data.frame
. Some issues to consider here are:
- Memory - using
rowSums
on adata.frame
will coerce into amatrix
first. Imagine a huge (> 1Gb) data.frame and this might turn out to be a bad idea if the conversion drains memory and starts swapping.
Note: I personally think discussion about performance should merit on trade-offs between "speed" and "memory".
- Data structure - We can do much more in terms of speed (and memory) by taking advantage of the data structure here. Here's an example: