richfitz/caching.md

## caching.md

      
    Raw
  

              caching.md
            
          
    Suppose you have a long running calculation:
f <- function(x) {
  message("Evaluating slow function")
  Sys.sleep(5) # sleep 5 seconds to simulate long running time
  x
}
Which is used like so:
f(10)
However, you only want to rerun f() sometimes (say when an upsteam
data source changes).  I usually do something like this:
run.cached <- function(expr, filename, regenerate=FALSE) {
  if ( file.exists(filename) && !regenerate ) {
    res <- readRDS(filename)
  } else {
    res <- eval.parent(substitute(expr))
    saveRDS(res, file=filename)
  }
  res
}
This is a simple caching function; tries to load the .rds file
indicated by filename if it exists, otherwise it runs the expression
in expr and saves the output in the file filename.  If you specify
regenerate=TRUE it will rerun the expression
Simple caching; run 'expr' and save the output in 'filename'; if
'filename' already exists just load that.  If regenerate is TRUE,
it always runs the expression.
So you can do this:
run.cached(f(5), 'mycache.rds') # runs the slow function
run.cached(f(5), 'mycache.rds') # won't run, returns cached result
run.cached(f(10), 'mycache.rds', TRUE) # runs the slow function
When I want to make sure everything works correctly for the final
published version, I delete the .rds files, which forces everything
to be recalculated.
There are a variety of packages on CRAN that do this already,
apparently:
R.cache,
SOAR, and (for Sweave)
cacheSweave.
These may be more robust!