Skip to content

Instantly share code, notes, and snippets.

@akorn

akorn/runcached

Last active May 8, 2021
Embed
What would you like to do?
Run speficied (presumably expensive) command with specified arguments and cache result. If cache is fresh enough, don't run command again but return cached output.
#!/bin/zsh
#
# Purpose: run speficied command with specified arguments and cache result. If cache is fresh enough, don't run command again but return cached output.
# Also cache exit status and stderr.
# Copyright (c) 2019-2020 András Korn; License: GPLv3
# Use silly long variable names to avoid clashing with whatever the invoked program might use
RUNCACHED_MAX_AGE=${RUNCACHED_MAX_AGE:-300}
RUNCACHED_IGNORE_ENV=${RUNCACHED_IGNORE_ENV:-0}
RUNCACHED_IGNORE_PWD=${RUNCACHED_IGNORE_PWD:-0}
[[ -n "$HOME" ]] && RUNCACHED_CACHE_DIR=${RUNCACHED_CACHE_DIR:-$HOME/.runcached}
RUNCACHED_CACHE_DIR=${RUNCACHED_CACHE_DIR:-/var/cache/runcached}
function usage() {
echo "Usage: runcached [--ttl <max cache age>] [--cache-dir <cache directory>]"
echo " [--ignore-env] [--ignore-pwd] [--help] [--prune-cache]"
echo " [--] command [arg1 [arg2 ...]]"
echo
echo "Run 'command' with the specified args and cache stdout, stderr and exit"
echo "status. If you run the same command again and the cache is fresh, cached"
echo "data is returned and the command is not actually run."
echo
echo "Normally, all exported environment variables as well as the current working"
echo "directory are included in the cache key. The --ignore options disable this."
echo "The OLDPWD variable is always ignored."
echo
echo "--prune-cache deletes all cache entries older than the maximum age. There is"
echo "no other mechanism to prevent the cache growing without bounds."
echo
echo "The default cache directory is ${RUNCACHED_CACHE_DIR}."
echo "Maximum cache age defaults to ${RUNCACHED_MAX_AGE}."
echo
echo "CAVEATS:"
echo
echo "Side effects of 'command' are obviously not cached."
echo
echo "There is no cache invalidation logic except cache age (specified in seconds)."
echo
echo "If the cache can't be created, the command is run uncached."
echo
echo "This script is always silent; any output comes from the invoked command. You"
echo "may thus not notice errors creating the cache and such."
echo
echo "stdout and stderr streams are saved separately. When both are written to a"
echo "terminal from cache, they will almost certainly be interleaved differently"
echo "than originally. Ordering of messages within the two streams is preserved."
exit 0
}
while [[ -n "$1" ]]; do
case "$1" in
--ttl) RUNCACHED_MAX_AGE="$2"; shift 2;;
--cache-dir) RUNCACHED_CACHE_DIR="$2"; shift 2;;
--ignore-env) RUNCACHED_IGNORE_ENV=1; shift;;
--ignore-pwd) RUNCACHED_IGNORE_PWD=1; shift;;
--prune-cache) RUNCACHED_PRUNE=1; shift;;
--help) usage;;
--) shift; break;;
*) break;;
esac
done
zmodload zsh/datetime
zmodload zsh/stat
zmodload zsh/system
zmodload zsh/files
# the built-in mv doesn't fall back to copy if renaming fails due to EXDEV;
# since the cache directory is likely on a different fs than the tmp
# directory, this is an important limitation, so we use /bin/mv instead
disable mv
mkdir -p "$RUNCACHED_CACHE_DIR" >/dev/null 2>/dev/null
((RUNCACHED_PRUNE)) && find "$RUNCACHED_CACHE_DIR/." -maxdepth 1 -type f \! -newermt @$[EPOCHSECONDS-RUNCACHED_MAX_AGE] -delete 2>/dev/null
[[ -n "$@" ]] || exit 0 # if no command specified, exit silently
(
# Almost(?) nothing uses OLDPWD, but taking it into account potentially reduces cache efficency.
# Thus, we ignore it for the purpose of coming up with a cache key.
unset OLDPWD
((RUNCACHED_IGNORE_PWD)) && unset PWD
((RUNCACHED_IGNORE_ENV)) || env
echo -E "$@"
) | md5sum | read RUNCACHED_CACHE_KEY RUNCACHED__crap__
# make the cache dir hashed unless a cache file already exists (created by a previous version that didn't use hashed dirs)
if ! [[ -f $RUNCACHED_CACHE_DIR/$RUNCACHED_CACHE_KEY.exitstatus ]]; then
RUNCACHED_CACHE_KEY=$RUNCACHED_CACHE_KEY[1,2]/$RUNCACHED_CACHE_KEY[3,4]/$RUNCACHED_CACHE_KEY[5,$]
mkdir -p "$RUNCACHED_CACHE_DIR/${RUNCACHED_CACHE_KEY:h}" >/dev/null 2>/dev/null
fi
# If we can't obtain a lock, we want to run uncached; otherwise
# 'runcached' wouldn't be transparent because it would prevent
# parallel execution of several instances of the same command.
# Locking is necessary to avoid races between the mv(1) command
# below replacing stderr with a newer version and another instance
# of runcached using a newer stdout with the older stderr.
: >>$RUNCACHED_CACHE_DIR/$RUNCACHED_CACHE_KEY.lock 2>/dev/null
if zsystem flock -t 0 $RUNCACHED_CACHE_DIR/$RUNCACHED_CACHE_KEY.lock 2>/dev/null; then
if [[ -f $RUNCACHED_CACHE_DIR/$RUNCACHED_CACHE_KEY.stdout ]]; then
if [[ $[EPOCHSECONDS-$(zstat +mtime $RUNCACHED_CACHE_DIR/$RUNCACHED_CACHE_KEY.stdout)] -le $RUNCACHED_MAX_AGE ]]; then
cat $RUNCACHED_CACHE_DIR/$RUNCACHED_CACHE_KEY.stdout &
cat $RUNCACHED_CACHE_DIR/$RUNCACHED_CACHE_KEY.stderr >&2 &
wait
exit $(<$RUNCACHED_CACHE_DIR/$RUNCACHED_CACHE_KEY.exitstatus)
else
rm -f $RUNCACHED_CACHE_DIR/$RUNCACHED_CACHE_KEY.{stdout,stderr,exitstatus} 2>/dev/null
fi
fi
# only reached if cache didn't exist or was too old
if [[ -d $RUNCACHED_CACHE_DIR/. ]]; then
RUNCACHED_tempdir=$(mktemp -d 2>/dev/null)
if [[ -d $RUNCACHED_tempdir/. ]]; then
$@ >&1 >$RUNCACHED_tempdir/${RUNCACHED_CACHE_KEY:t}.stdout 2>&2 2>$RUNCACHED_tempdir/${RUNCACHED_CACHE_KEY:t}.stderr
RUNCACHED_ret=$?
echo $RUNCACHED_ret >$RUNCACHED_tempdir/${RUNCACHED_CACHE_KEY:t}.exitstatus 2>/dev/null
mv $RUNCACHED_tempdir/${RUNCACHED_CACHE_KEY:t}.{stdout,stderr,exitstatus} $RUNCACHED_CACHE_DIR/${RUNCACHED_CACHE_KEY:h} 2>/dev/null
rmdir $RUNCACHED_tempdir 2>/dev/null
exit $RUNCACHED_ret
fi
fi
fi
# only reached if cache not created successfully or lock couldn't be obtained
exec $@
@jpbochi

This comment has been minimized.

Copy link

@jpbochi jpbochi commented Aug 2, 2019

I looked for something like this more than once. Yours is lovely. If you don't mind, I'll try to adapt it to regular bash.

@akorn

This comment has been minimized.

Copy link
Owner Author

@akorn akorn commented Aug 2, 2019

Sure, go ahead and good luck; it's GPLv3 for a reason.

I don't think bash has such an elegant locking solution, though.

Maybe post a pointer to your bash version here when you're done?

Also, I'm curious what your use case is; care to share?

@jpbochi

This comment has been minimized.

Copy link

@jpbochi jpbochi commented Aug 2, 2019

I'm want to have a custom status bar for iTerm2 that displays info about my current external IP. In order to not flood the IP api host, I'd like to cache the results for some time.

I'll post a link for the bash version, for sure.

One issue I'm facing is that the whole thing is a bit slow. Between md5 and checking the age of the cache files, it's adding some 200+ ms.

@jpbochi

This comment has been minimized.

Copy link

@jpbochi jpbochi commented Aug 2, 2019

here it is: https://gist.github.com/jpbochi/c7971a0f3d7a3b9fdb71c5621c8e41c5

I had to get rid of the lock feature. I'm not sure how I'd do something equivalent to what you did.

@akorn

This comment has been minimized.

Copy link
Owner Author

@akorn akorn commented Aug 2, 2019

One issue I'm facing is that the whole thing is a bit slow. Between md5 and checking the age of the cache files, it's adding some 200+ ms.

I don't think that can be helped -- we must generate a cache key somehow. I can't think of a faster way than md5sum.

Maybe instead of running the actual command that gets your external IP from iTerm2, have it read a file, and update the file from a cronjob? You don't even need runcached then.

I had to get rid of the lock feature. I'm not sure how I'd do something equivalent to what you did.

You can approximate it using flock(1), but it'll be slower. See https://gist.github.com/akorn/51ee2fe7d36fa139723c851d87e56096/0219183b886a3268fd3b707cd15b335b70567eb1 for an older version that used flock(1).

You really need the locking to avoid race conditions when several instances of runcached are executing.

@jpbochi

This comment has been minimized.

Copy link

@jpbochi jpbochi commented Aug 5, 2019

You really need the locking to avoid race conditions when several instances of runcached are executing.

In some cases, like mine, it's okay if two instances are running at the same time. As long as some value gets cached for following calls, it would be fine.

Maybe instead of running the actual command that gets your external IP from iTerm2, have it read a file, and update the file from a cronjob? You don't even need runcached then.

Yeah, I ended up using a much shorter version of this script in my case.

@dimo414

This comment has been minimized.

Copy link

@dimo414 dimo414 commented Apr 23, 2020

Here's a Bash caching library I implemented a while back: https://github.com/dimo414/bash-cache

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment