Skip to content

Instantly share code, notes, and snippets.

@jeroenjanssens
Last active September 25, 2020 14:33
Show Gist options
  • Save jeroenjanssens/3e27381a799e4449c9bd to your computer and use it in GitHub Desktop.
Save jeroenjanssens/3e27381a799e4449c9bd to your computer and use it in GitHub Desktop.
Cache the result of an expression in R
#' Cache the result of an expression.
#'
#' Use \code{options(cache.path = "...")} to change the cache directory (which
#' is the current working directory by default).
#'
#' @param expr expression to evaluate
#' @param key basename for cache file
#' @param ignore_cache evalute expression regardless of cache file?
#' @return result of expression or read from cache file
#'
#' @example
#' answer <- cache({
#' \dontrun{
#' Sys.sleep(7500000 * 365 * 86400)
#' }
#' 42
#' }, "life_universe_everything")
#'
#' @seealso \code{\link[R.cache]}
cache <- function(expr, key, ignore_cache = FALSE) {
filename <- file.path(getOption("cache.path", "."), paste0(key, ".rds"))
if (!ignore_cache && file.exists(filename)) {
message(sprintf("Loading result from %s", filename))
result <- readRDS(filename)
} else {
result <- expr
message(sprintf("Saving result to %s", filename))
saveRDS(result, filename)
}
result
}
@hadley
Copy link

hadley commented Oct 29, 2015

You could also make a little terser by doing:

cache <- function(expr, key, ignore_cache = FALSE) {
  filename <- file.path(getOption("cache.path", "."), paste0(key, ".rds"))
  if (!ignore_cache && file.exists(filename)) {
    message(sprintf("Loading result from %s", filename))
    readRDS(filename)
  } else {
    message(sprintf("Saving result to %s", filename))
    saveRDS(expr, filename)
    expr
  }
}

@hadley
Copy link

hadley commented Oct 29, 2015

I'd also wonder about making the cache key the first argument, since it will usually be short and the expr will usually be long

@hadley
Copy link

hadley commented Oct 29, 2015

Another approach is to use an infix function:

`%<cache-%` <- function(key, value) {
  key <- substitute(key)
  stopifnot(is.name(key))

  filename <- file.path(getOption("cache.path", "."), paste0(deparse(key), ".rds"))
  if (file.exists(filename)) {
    message(sprintf("Loading result from %s", filename))
    value <- readRDS(filename)
  } else {
    message(sprintf("Saving result to %s", filename))
    saveRDS(value, filename)
  }

  assign(as.character(key), value, env = parent.frame())
  invisible(key)
}

a %<cache-% {
  Sys.sleep(1)
  10
}

But then you can't pass in any extra arguments

@jeroenjanssens
Copy link
Author

Thanks @hadley. I really like the infix approach. I can come up with all sorts of arguments to a cache function, but I think the most important one is the ability to re-evaluate the expression regardless whether a cache file exists. One possible solution is to define an additional infix function:

.cache <- function(key, value, ignore_cache = FALSE) {
  stopifnot(is.name(key))
  filename <- file.path(getOption("cache.path", "."),
                        paste0(deparse(key), ".rds"))
  if (!ignore_cache && file.exists(filename)) {
    message("Loading result from ", filename)
    value <- readRDS(filename)
  } else {
    message("Saving result to ", filename)
    saveRDS(value, filename)
  }

  assign(as.character(key), value, env = parent.frame())
  invisible(key)
}

`%<cache-%` <- function(key, value) .cache(substitute(key), value)
`%<cache!-%` <- function(key, value) .cache(substitute(key), value, TRUE)

Of course, encoding any additional arguments into function names quickly gets messy, but this I can see working.

@jamesonquinn
Copy link

If you want a way to force recalculation, I think an option is the best way:

`%<cache-%` <- function(key, value) {
  key <- substitute(key)
  stopifnot(is.name(key))

  filename <- file.path(getOption("cache.path", "."), paste0(deparse(key), ".rds"))
  if (file.exists(filename) & !getOption("refresh.cache", F)) {
    message(sprintf("Loading result from %s", filename))
    value <- readRDS(filename)
  } else {
    message(sprintf("Saving result to %s", filename))
    saveRDS(value, filename)
  }

  assign(as.character(key), value, env = parent.frame())
  invisible(key)
}

a %<cache-% {
  Sys.sleep(1)
  10
}

@jeroenjanssens
Copy link
Author

Thanks @jamesonquinn, that makes sense. This allows you to refresh the cache in an interactive way, rather than changing the code.

@kirel
Copy link

kirel commented May 23, 2018

You could even invalidate the cache automatically when the parse tree of the cached expression changes.

`%<cache-%` <- function(key, value) {
  key <- substitute(key)
  stopifnot(is.name(key))
  previoushash = "none"
  hash <- digest::digest(substitute(value))

  filename <- file.path(getOption("cache.path", "."), paste0(deparse(key), ".rds"))
  hashfilename <- file.path(getOption("cache.path", "."), paste0(deparse(key), ".hash"))
  if (file.exists(hashfilename)) {
    previoushash <- readRDS(hashfilename)
  }
  if (file.exists(filename) & (hash == previoushash) & !getOption("refresh.cache", F)) {
    message(sprintf("Loading result from %s", filename))
    value <- readRDS(filename)
  } else {
    message(sprintf("Saving result to %s", filename))
    saveRDS(value, filename)
    saveRDS(hash, hashfilename)
  }

  assign(as.character(key), value, env = parent.frame())
  invisible(key)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment