Skip to content

Instantly share code, notes, and snippets.

@DonRichards
Last active October 2, 2023 20:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save DonRichards/c0d299a4a50e8676af0d835cf73b4e53 to your computer and use it in GitHub Desktop.
Save DonRichards/c0d299a4a50e8676af0d835cf73b4e53 to your computer and use it in GitHub Desktop.
Script to warm the IIIF cache by PID. This looks is a specific viewer is on the page to fetch the URL to start preheating (warming) the cache.
#!/usr/bin/env bash
DOMAIN="https://digital.library.jhu.edu"
# check if $1 is empty
if [ -z "$1" ]; then
echo "No argument supplied"
echo " Must include a node id"
echo " Example: ./preheat.sh 42"
exit 1
fi
function urlencode() {
old_lc_collate=$LC_COLLATE
LC_COLLATE=C
local length="${#1}"
for (( i = 0; i < length; i++ )); do
local c="${1:$i:1}"
case $c in
[a-zA-Z0-9.~_-]) printf '%s' "$c" ;;
*) printf '%%%02X' "'$c" ;;
esac
done
LC_COLLATE=$old_lc_collate
}
function urldecode() { : "${*//+/ }"; echo -e "${_//%/\\x}"; }
NODEID="$1"
echo $NODEID
FULL_PATH=$(echo $(curl ${DOMAIN}/node/$NODEID | grep drupal-settings-json | grep -o '{"openseadragon.*' | sed "s/<\/script>$//" | jq ".\"openseadragon-viewer-46283\".\"options\".\"tileSources\"" | tr -d '\n' | sed -e 's/"//' -e 's/"//' -e 's/\[//' -e 's/\]//' | sed -e 's/^[ \t]*//'))
echo $FULL_PATH
PREFIX="${DOMAIN}/cantaloupe/iiif/2/"
# FILEPATH="http%3A%2F%2Fdigital.library.jhu.edu%2Fsystem%2Ffiles%2F"
# FILENAME="2022-03-22%2FG3844_B2_P1_1979_MapB_c1.jpg"
SUFFIX=",1024,1024/1024,/0/default.jpg"
declare -a TOTAL_TIMES=()
# if $FILE is not an empty string
if [ -n "$FULL_PATH" ]; then
# FILE=$(urldecode "$FILEPATH")$(urldecode "$FILENAME")
FILE=${FULL_PATH#"${PREFIX}"}
curl -L $(urldecode ${FILE}) -o test.jpg
width=$(identify test.jpg | awk '{print $3}' | sed 's/[^0-9x]//g' | awk -Fx '{print $1}')
height=$(identify test.jpg | awk '{print $3}' | sed 's/[^0-9x]//g' | awk -Fx '{print $2}')
rm -f test.jpg
for (( x=0; x<${width}; x+=1024 )); do
for (( y=0; y<${height}; y+=1024 )); do
echo "Warning up ${NODEID} at X: ${x} | Y: ${y}"
echo "${PREFIX}${FILE}/${x},${y}${SUFFIX}"
# echo "${PREFIX}${FILEPATH}${FILENAME}/${x},${y}${SUFFIX}"
# time=$(printf "%.2f" $(curl -kso /dev/null -w "%{time_total}" "${PREFIX}${FILEPATH}${FILENAME}/${x},${y}${SUFFIX}"))
time=$(printf "%.2f" $(curl -kso /dev/null -w "%{time_total}" "${PREFIX}${FILE}/${x},${y}${SUFFIX}"))
TOTAL_TIMES+=($time)
done
done
total=0
sum=0
for i in "${TOTAL_TIMES[@]}"; do
sum=$(bc <<< "$sum + $i" )
((total++))
done
echo "Total time: $(bc <<< "$sum / 60") minutes"
echo "Average: $(bc <<< "scale=2; $sum / $total") seconds per slice."
else
echo "No image file found"
fi

urls_to_prewarm.py

This script is designed to "pre-warm" the cache of a Cantaloupe IIIF server by sending requests to specified image URLs with multiple IIIF endpoints. The script reads image URLs from a provided text file, combines each URL with a set of IIIF endpoints, and then sends concurrent requests to the server to access and cache these images. By doing so, it ensures that the specified images are readily available in the server's cache for faster subsequent access. The script also includes error handling for missing or empty input files and provides a help menu for user guidance.

preheat.sh

More or less the same as the python script.

Review

To fetch a list of frequently accessed IIIF assets. (Untested)

cat access.log | grep "/iiif/2/" | awk '{print $7}' | sort | uniq -c | sort -n
import requests
import urllib.parse
import argparse
import os
import concurrent.futures
# Python version of preheat but with python extras.
# Define a function to handle the requests
def cache_url(url):
try:
response = requests.get(url)
if response.status_code == 200:
return f"Successfully cached: {url}"
else:
return f"Failed to cache: {url}. Status code: {response.status_code}"
except Exception as e:
return f"Error caching {url}: {e}"
# Define the help menu using argparse with a placeholder for the program name
desc_template = """{prog} - \033[1mA script to pre-warm Cantaloupe's IIIF server's cache.\033[0m
\033[94mThe input file should contain URLs of the images you want to prewarm, one URL per line.\033[0m
For example:
http://digital.library.jhu.edu/system/files/2022-03-22/G3844_B2_F7_1920_Ward_10_SMALL_c1.jpg
\033[91mNote:\033[0m Only the part of the URL between "/cantaloupe/iiif/2/" and the image extension (e.g., ".jpg", ".tif", ".jp2") should be included.
"""
parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter)
parser.description = desc_template.format(prog=parser.prog) # Format the description with the program's name
parser.add_argument('-f', '--file', default='urls_to_prewarm.txt', help="Path to the file containing image URLs to prewarm. Default is 'urls_to_prewarm.txt'.")
args = parser.parse_args()
# Check if the file exists
if not os.path.exists(args.file):
print(f"Error: File '{args.file}' does not exist.")
exit(1)
# Check if the file is empty
if os.path.getsize(args.file) == 0:
print(f"Error: File '{args.file}' is empty.")
exit(1)
IIIF_BASE_URL = "https://digital.library.jhu.edu/cantaloupe/iiif/2/"
# These are common IIIF endpoints. Add more as needed and are influenced by site's theme.
IIIF_ENDPOINTS = [
"/full/full/0/default.jpg",
"/full/214,/0/default.jpg"
]
full_urls = []
# Construct the full URLs for each image and each endpoint
with open(args.file, 'r') as file:
image_urls = [line.strip() for line in file]
for image_url in image_urls:
for endpoint in IIIF_ENDPOINTS:
full_url = IIIF_BASE_URL + urllib.parse.quote(image_url, safe='') + endpoint
full_urls.append(full_url)
MAX_WORKERS = 50 # Adjust based on your needs
with concurrent.futures.ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
for result in executor.map(cache_url, full_urls):
print(result)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment