Suppose you have a long running calculation:
f <- function(x) {
message("Evaluating slow function")
Sys.sleep(5) # sleep 5 seconds to simulate long running time
x
}
Which is used like so:
f(10)
However, you only want to rerun f() sometimes (say when an upsteam data source changes). I usually do something like this:
run.cached <- function(expr, filename, regenerate=FALSE) {
if ( file.exists(filename) && !regenerate ) {
res <- readRDS(filename)
} else {
res <- eval.parent(substitute(expr))
saveRDS(res, file=filename)
}
res
}
This is a simple caching function; tries to load the .rds
file
indicated by filename
if it exists, otherwise it runs the expression
in expr
and saves the output in the file filename
. If you specify
regenerate=TRUE
it will rerun the expression
Simple caching; run 'expr' and save the output in 'filename'; if 'filename' already exists just load that. If regenerate is TRUE, it always runs the expression.
So you can do this:
run.cached(f(5), 'mycache.rds') # runs the slow function
run.cached(f(5), 'mycache.rds') # won't run, returns cached result
run.cached(f(10), 'mycache.rds', TRUE) # runs the slow function
When I want to make sure everything works correctly for the final
published version, I delete the .rds
files, which forces everything
to be recalculated.
There are a variety of packages on CRAN that do this already, apparently: R.cache, SOAR, and (for Sweave) cacheSweave. These may be more robust!
knitr has a caching option as well, which has worked well for me so far. It seems to do some pretty clever wizardry to tell if any recalculating is needed.