Skip to content

Instantly share code, notes, and snippets.

@svx
Forked from AJMaxwell/cache-warmer.sh
Created May 25, 2020 12:27
Show Gist options
  • Save svx/bbe48ab261d71776a83f4f73a17f9fed to your computer and use it in GitHub Desktop.
Save svx/bbe48ab261d71776a83f4f73a17f9fed to your computer and use it in GitHub Desktop.
A quick bash script I wrote to prime the cache of all my websites. This script grabs the sitemap of the site you wish to warm, then grep the urls to wget each one to cache it on the server.
#!/bin/bash
#####################################################################################################
# Cache Warmer
#
# Useage: cache-warmer.sh ...args
#
# This script grabs the sitemap of the site you wish to warm, then grep the urls to wget each one
# to cache it on the server. I'm sure there are better ways to do this, but this was a simple enough
# method for my needs. I didn't want to have to type in the urls each time I warmed their cache, so
# I just made simple functions with short names to feed those urls into the cache warming function.
# I also created an 'all' function to run all of short name functions, with pause breaks, when 'all'
# or no argument is provided.
#####################################################################################################
# This is the user agent of my local machine. The sites I use this script on disallow blank and wget
# useragent strings (among others)
USERAGENT="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36"
example () {
echo "Warming cache for example.com..."
warmUp "www.example.com"
echo "www.example.com cache warmed"
}
shop () {
echo "Warming cache for shop.example.com..."
warmUp "shop.example.com"
echo "shop.example.com cache warmed"
}
pause () {
echo; echo
echo "Pausing for 30 seconds to avoid possible connection limits..."
sleep 30s
echo "Resuming..."; echo
}
all () {
echo "Warming cache for all sites...."
example; pause
shop
}
warmUp () {
# If there are particular subdirectories in your sitemap that you do not wish to parse
# (i.e. because they cannot be cached), you can use the following regex:
# grep -oP "https?://$1\/((?!subdirectory))[^<]*"
wget --user-agent="$USERAGENT" -q "https://$1/sitemap.xml" -O - | grep -oP "https?://$1\/[^<]*" | wget -nv --user-agent="$USERAGENT" -i - -O /dev/null -w 1
}
if [ $# -eq 0 ]; then
all
elif [ $# -eq 1 ]; then
${1}
else
for var in "$@"; do
${var}; pause
done
fi
@svx
Copy link
Author

svx commented May 25, 2020

Assuming each url is on its own line, probably something like this....
wget --user-agent="$USERAGENT" -q "https://$1/sitemap.txt" -O - | grep -oP "https?://$1(.*)$" | wget -nv --user-agent="$USERAGENT" -i - -O /dev/null -w 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment