Skip to content

Instantly share code, notes, and snippets.

@jazzsequence
Last active January 24, 2024 16:59
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jazzsequence/5c35fe8135209a82667a40082fc248e9 to your computer and use it in GitHub Desktop.
Save jazzsequence/5c35fe8135209a82667a40082fc248e9 to your computer and use it in GitHub Desktop.
Download JT$ gif collection

Download Sternberg Gifs

When I worked at WebDevStudios, we spoke in gifs. I guess, these days, that makes me old. Whatever. I miss them. It represented a shared language even when the source of the gif was unknown. (I remember distinctly how it felt when I found the Key and Peele sketch from which this gem was taken:)

noice

(It kinda felt like that ☝️)

Justin Sternberg even wrote an Alfred workflow called WDS Giffy which predated (in practice, if not in actuality) the gif-sharing service Giphy that pointed to his personal collection of gifs. I can't say for certain, but I think most of us used that, at least while I was there.

I still have Giffy in Alfred, I just don't use it. I have my own gifs site that uses the (now ancient) GifDrop WordPress plugin developed by Mark Jaquith. But I got curious and wanted to see how Justin's Alfred workflow worked. Specifically, where the images came from.

The script

This led me to a long conversation with ChatGPT.

You see, after I found the API endpoint feeding the Alfred WDS Giffy workflow, I decided it would be cool to write a script that could just...download all those gifs. I don't know that I want all of them, but there are a lot that I do want. And hey, maybe other people want them, too?

How it works

The API endpoint that powers the Alfred workflow is hard-coded into the script. The script asks you one question: where do you want to put the files. It will create a subdirectory in your ~/Pictures folder with that name and start downloading. The script is specifically tailored to this API endpoint -- while it might work to swap something else in, it would have to assume the same structure and architecture as Justin's (which is to say, it might work, but don't expect any miracles).

I tried a bunch of different things. I wanted to be able to keep the original modified date -- because some of these are historic relics -- but I was running into issues with some files breaking. There are also inconsistencies in the API data, too. Some images are missing, some entries are corrupted, it's a lot of data to parse through (5,470 entities, to be specific). Since it's so many, I decided it should concurrently download files -- so it downloads them 10 at a time with curl. It shows a running count of how many you've downloaded and how many there are total and it outputs any errors or issues to a download_log.txt in the folder you specified.

When it's done, it deletes any files that have zero bytes, any temporary files it might have left behind and tells you how many files it's downloaded. I haven't yet gone through the error log, so there might be some things I could clean up even more, but it downloaded 5,446/5,470 which is a 99% success rate, so I'd call it pretty good, actually.

Here it is for your perusal and/or downloading pleasure.

Running it yourself

The script was written with the assumption that you're on a Mac. If you're on Windows or Linux, you're kind of SOL because BASH and some of the tools used might not be 100% the same. But ChatGPT got me here and it could probably adjust the script to your use case.

If you want to run it yourself, download the download-sternberg-gifs.sh file somewhere onto the computer you want to download the gifs to. In your terminal, cd to that directory.

Run chmod +x ./download-sternberg-gifs.sh to ensure it can execute.

You will need jq and curl to run the script. If a which jq or which curl come up empty, you can install either of these with Homebrew via brew install jq or brew install curl.

Assuming you have all your prereqs in place, you can run the script by just typing: ./download-sternberg-gifs.sh. It will prompt you for a directory and then get to downloading.

If you have any issues with the script, let me know and I'll probably ask ChatGPT to fix them. 😄

#!/bin/bash
# URL-decoding function
urldecode() {
local url_encoded_filename=$1
echo -e "$(sed 's/+/ /g;s/%\(..\)/\\x\1/g;' <<< "$url_encoded_filename")"
}
# Function to extract the filename from a URL, excluding query parameters
extract_filename_from_url() {
local url=$1
local filename=$(echo "${url##*/}" | cut -d '?' -f 1)
echo "$(urldecode "$filename")"
}
# Function to get the file extension
get_extension() {
local filename=$1
# Extract extension and convert it to lowercase
local extension="${filename##*.}"
echo ".$extension" | tr '[:upper:]' '[:lower:]'
}
# Function to extract the file extension from the filename
extract_current_extension() {
local filename=$1
echo "${filename##*.}"
}
# Function to check and append the file extension if necessary
append_extension_if_needed() {
local filename=$1
local new_extension=$2
local current_extension=$(extract_current_extension "$filename")
# Check if the current extension matches the new extension
if [ ".$current_extension" != "$new_extension" ]; then
# Append new extension
echo "${filename}${new_extension}"
else
# Extension already present, return filename as is
echo "$filename"
fi
}
# Function to sanitize file names
sanitize_filename() {
local filename=$1
echo "$filename" | sed 's/[\/:*?"<>|]/-/g'
}
# Cleanup function to delete zero-byte files
cleanup() {
echo -ne "\nPerforming cleanup..."
find "$dir_path" -size 0 -delete
downloaded_files=$(wc -l < "$temp_file")
echo -ne "\nCleanup complete. Total images downloaded: $downloaded_files.\n"
}
# Function to download a file and log errors
download_file() {
local json_entry="$1"
local src=$(echo "$json_entry" | jq -r '.src')
local dest=$2
# Change HTTP to HTTPS for all URLs
if [[ "$src" == http://* ]]; then
src="https://${src#http://}"
fi
# Download the image with curl
local curl_output
curl_output=$(curl -s -w "%{http_code}" -o "$dest" "$src")
local curl_exit_status=$?
if [ "$curl_output" != "200" ]; then
echo "Failed to download: $src, HTTP Status: $curl_output, JSON Entry: $json_entry" >> "$log_file"
rm -f "$dest"
return
fi
if [ ! -s "$dest" ]; then
echo "Downloaded zero-byte file: $src, JSON Entry: $json_entry" >> "$log_file"
rm "$dest"
else
echo "1" >> "$temp_file"
fi
}
# Set trap to catch termination and run cleanup
trap cleanup INT TERM
# Static API endpoint
api_url="https://jtsternberg.com/?json&gifs"
# Fetch JSON data from the API
json_data=$(curl -s "$api_url")
if [ -z "$json_data" ]; then
echo "Failed to fetch data from the API. Exiting."
exit 1
fi
echo "Enter directory name:"
read directory
# Create directory in ~/Pictures
dir_path="$HOME/Pictures/$directory"
log_file="$dir_path/download_log.txt"
if [ -d "$dir_path" ]; then
echo "Directory already exists. Exiting."
exit 1
else
mkdir -p "$dir_path"
touch "$log_file"
fi
# Create a temporary file for tracking downloads
temp_file=$(mktemp)
# Count the number of images in the JSON data
total_images=$(echo "$json_data" | jq '.data | length')
# Max number of parallel downloads
max_parallel=10
count=0
# Iterate over each entry in the JSON data
echo "$json_data" | jq -c '.data[] | {name, src}' | while IFS= read -r line; do
if [[ $(echo "$line" | jq -r '.src') == *"?"* ]]; then
continue
fi
# Extract name and src
name=$(echo "$line" | jq -r '.name')
src=$(echo "$line" | jq -r '.src')
# Extract the filename from the URL and sanitize it
raw_filename=$(extract_filename_from_url "$src")
filename=$(sanitize_filename "$raw_filename")
extension=$(get_extension "$src")
# Check and append the extension if needed
full_filename="$dir_path/"$(append_extension_if_needed "$filename" "$extension")
# Convert the JSON line to a JSON string and pass to the download function
json_string=$(echo "$line" | jq -c .)
download_file "$json_string" "$full_filename" &
# Update count and wait if we reach max parallel downloads
((count++))
if ((count >= max_parallel)); then
wait
count=0
fi
# Update progress
downloaded_files=$(wc -l < "$temp_file")
echo -ne "Downloaded $downloaded_files/$total_images images...\r"
done
# Wait for all background jobs to finish
wait
# Perform cleanup
cleanup
# Remove the temporary file
rm "$temp_file" 2>> "$log_file"
rm wget_error.log 2>> "$log_file" # Clean up the wget error log
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment