jazzsequence/download-sternberg-gifs.md

## download-sternberg-gifs.md

      
    Raw
  

              download-sternberg-gifs.md
            
          
    Download Sternberg Gifs

When I worked at WebDevStudios, we spoke in gifs. I guess, these days, that makes me old. Whatever. I miss them. It represented a shared language even when the source of the gif was unknown. (I remember distinctly how it felt when I found the Key and Peele sketch from which this gem was taken:)

(It kinda felt like that ☝️)
Justin Sternberg even wrote an Alfred workflow called WDS Giffy which predated (in practice, if not in actuality) the gif-sharing service Giphy that pointed to his personal collection of gifs. I can't say for certain, but I think most of us used that, at least while I was there.
I still have Giffy in Alfred, I just don't use it. I have my own gifs site that uses the (now ancient) GifDrop WordPress plugin developed by Mark Jaquith. But I got curious and wanted to see how Justin's Alfred workflow worked. Specifically, where the images came from.
The script

This led me to a long conversation with ChatGPT.
You see, after I found the API endpoint feeding the Alfred WDS Giffy workflow, I decided it would be cool to write a script that could just...download all those gifs. I don't know that I want all of them, but there are a lot that I do want. And hey, maybe other people want them, too?
How it works

The API endpoint that powers the Alfred workflow is hard-coded into the script. The script asks you one question: where do you want to put the files. It will create a subdirectory in your ~/Pictures folder with that name and start downloading. The script is specifically tailored to this API endpoint -- while it might work to swap something else in, it would have to assume the same structure and architecture as Justin's (which is to say, it might work, but don't expect any miracles).
I tried a bunch of different things. I wanted to be able to keep the original modified date -- because some of these are historic relics -- but I was running into issues with some files breaking. There are also inconsistencies in the API data, too. Some images are missing, some entries are corrupted, it's a lot of data to parse through (5,470 entities, to be specific). Since it's so many, I decided it should concurrently download files -- so it downloads them 10 at a time with curl. It shows a running count of how many you've downloaded and how many there are total and it outputs any errors or issues to a download_log.txt in the folder you specified.
When it's done, it deletes any files that have zero bytes, any temporary files it might have left behind and tells you how many files it's downloaded. I haven't yet gone through the error log, so there might be some things I could clean up even more, but it downloaded 5,446/5,470 which is a 99% success rate, so I'd call it pretty good, actually.
Here it is for your perusal and/or downloading pleasure.
Running it yourself

The script was written with the assumption that you're on a Mac. If you're on Windows or Linux, you're kind of SOL because BASH and some of the tools used might not be 100% the same. But ChatGPT got me here and it could probably adjust the script to your use case.
If you want to run it yourself, download the download-sternberg-gifs.sh file somewhere onto the computer you want to download the gifs to. In your terminal, cd to that directory.
Run chmod +x ./download-sternberg-gifs.sh to ensure it can execute.
You will need jq and curl to run the script. If a which jq or which curl come up empty, you can install either of these with Homebrew via brew install jq or brew install curl.
Assuming you have all your prereqs in place, you can run the script by just typing: ./download-sternberg-gifs.sh. It will prompt you for a directory and then get to downloading.
If you have any issues with the script, let me know and I'll probably ask ChatGPT to fix them. 😄

  
## download-sternberg-gifs.sh
#!/bin/bash

# URL-decoding function
urldecode() {
    local url_encoded_filename=$1
    echo -e "$(sed 's/+/ /g;s/%\(..\)/\\x\1/g;' <<< "$url_encoded_filename")"
}

# Function to extract the filename from a URL, excluding query parameters
extract_filename_from_url() {
    local url=$1
    local filename=$(echo "${url##*/}" | cut -d '?' -f 1)
    echo "$(urldecode "$filename")"
}

# Function to get the file extension
get_extension() {
    local filename=$1
    # Extract extension and convert it to lowercase
    local extension="${filename##*.}"
    echo ".$extension" | tr '[:upper:]' '[:lower:]'
}

# Function to extract the file extension from the filename
extract_current_extension() {
    local filename=$1
    echo "${filename##*.}"
}

# Function to check and append the file extension if necessary
append_extension_if_needed() {
    local filename=$1
    local new_extension=$2
    local current_extension=$(extract_current_extension "$filename")

    # Check if the current extension matches the new extension
    if [ ".$current_extension" != "$new_extension" ]; then
        # Append new extension
        echo "${filename}${new_extension}"
    else
        # Extension already present, return filename as is
        echo "$filename"
    fi
}

# Function to sanitize file names
sanitize_filename() {
    local filename=$1
    echo "$filename" | sed 's/[\/:*?"<>|]/-/g'
}

# Cleanup function to delete zero-byte files
cleanup() {
    echo -ne "\nPerforming cleanup..."
    find "$dir_path" -size 0 -delete
    downloaded_files=$(wc -l < "$temp_file")
    echo -ne "\nCleanup complete. Total images downloaded: $downloaded_files.\n"
}

# Function to download a file and log errors
download_file() {
    local json_entry="$1"
    local src=$(echo "$json_entry" | jq -r '.src')
    local dest=$2

    # Change HTTP to HTTPS for all URLs
    if [[ "$src" == http://* ]]; then
        src="https://${src#http://}"
    fi

    # Download the image with curl
    local curl_output
    curl_output=$(curl -s -w "%{http_code}" -o "$dest" "$src")
    local curl_exit_status=$?

    if [ "$curl_output" != "200" ]; then
        echo "Failed to download: $src, HTTP Status: $curl_output, JSON Entry: $json_entry" >> "$log_file"
        rm -f "$dest"
        return
    fi

    if [ ! -s "$dest" ]; then
        echo "Downloaded zero-byte file: $src, JSON Entry: $json_entry" >> "$log_file"
        rm "$dest"
    else
        echo "1" >> "$temp_file"
    fi
}

# Set trap to catch termination and run cleanup
trap cleanup INT TERM

# Static API endpoint
api_url="https://jtsternberg.com/?json&gifs"

# Fetch JSON data from the API
json_data=$(curl -s "$api_url")

if [ -z "$json_data" ]; then
    echo "Failed to fetch data from the API. Exiting."
    exit 1
fi

echo "Enter directory name:"
read directory

# Create directory in ~/Pictures
dir_path="$HOME/Pictures/$directory"
log_file="$dir_path/download_log.txt"

if [ -d "$dir_path" ]; then
    echo "Directory already exists. Exiting."
    exit 1
else
    mkdir -p "$dir_path"
    touch "$log_file"
fi

# Create a temporary file for tracking downloads
temp_file=$(mktemp)

# Count the number of images in the JSON data
total_images=$(echo "$json_data" | jq '.data | length')

# Max number of parallel downloads
max_parallel=10
count=0

# Iterate over each entry in the JSON data
echo "$json_data" | jq -c '.data[] | {name, src}' | while IFS= read -r line; do
    if [[ $(echo "$line" | jq -r '.src') == *"?"* ]]; then
        continue
    fi
    # Extract name and src
    name=$(echo "$line" | jq -r '.name')
    src=$(echo "$line" | jq -r '.src')

    # Extract the filename from the URL and sanitize it
    raw_filename=$(extract_filename_from_url "$src")
    filename=$(sanitize_filename "$raw_filename")
    extension=$(get_extension "$src")
	# Check and append the extension if needed
    full_filename="$dir_path/"$(append_extension_if_needed "$filename" "$extension")

    # Convert the JSON line to a JSON string and pass to the download function
    json_string=$(echo "$line" | jq -c .)
    download_file "$json_string" "$full_filename" &

    # Update count and wait if we reach max parallel downloads
    ((count++))
    if ((count >= max_parallel)); then
        wait
        count=0
    fi

    # Update progress
    downloaded_files=$(wc -l < "$temp_file")
    echo -ne "Downloaded $downloaded_files/$total_images images...\r"
done

# Wait for all background jobs to finish
wait

# Perform cleanup
cleanup

# Remove the temporary file
rm "$temp_file" 2>> "$log_file"
rm wget_error.log 2>> "$log_file"  # Clean up the wget error log
	#!/bin/bash

	# URL-decoding function
	urldecode() {
	local url_encoded_filename=$1
	echo -e "$(sed 's/+/ /g;s/%\(..\)/\\x\1/g;' <<< "$url_encoded_filename")"
	}

	# Function to extract the filename from a URL, excluding query parameters
	extract_filename_from_url() {
	local url=$1
	local filename=$(echo "${url##*/}" \| cut -d '?' -f 1)
	echo "$(urldecode "$filename")"
	}

	# Function to get the file extension
	get_extension() {
	local filename=$1
	# Extract extension and convert it to lowercase
	local extension="${filename##*.}"
	echo ".$extension" \| tr '[:upper:]' '[:lower:]'
	}

	# Function to extract the file extension from the filename
	extract_current_extension() {
	local filename=$1
	echo "${filename##*.}"
	}

	# Function to check and append the file extension if necessary
	append_extension_if_needed() {
	local filename=$1
	local new_extension=$2
	local current_extension=$(extract_current_extension "$filename")

	# Check if the current extension matches the new extension
	if [ ".$current_extension" != "$new_extension" ]; then
	# Append new extension
	echo "${filename}${new_extension}"
	else
	# Extension already present, return filename as is
	echo "$filename"
	fi
	}

	# Function to sanitize file names
	sanitize_filename() {
	local filename=$1
	echo "$filename" \| sed 's/[\/:*?"<>\|]/-/g'
	}

	# Cleanup function to delete zero-byte files
	cleanup() {
	echo -ne "\nPerforming cleanup..."
	find "$dir_path" -size 0 -delete
	downloaded_files=$(wc -l < "$temp_file")
	echo -ne "\nCleanup complete. Total images downloaded: $downloaded_files.\n"
	}

	# Function to download a file and log errors
	download_file() {
	local json_entry="$1"
	local src=$(echo "$json_entry" \| jq -r '.src')
	local dest=$2

	# Change HTTP to HTTPS for all URLs
	if [[ "$src" == http://* ]]; then
	src="https://${src#http://}"
	fi

	# Download the image with curl
	local curl_output
	curl_output=$(curl -s -w "%{http_code}" -o "$dest" "$src")
	local curl_exit_status=$?

	if [ "$curl_output" != "200" ]; then
	echo "Failed to download: $src, HTTP Status: $curl_output, JSON Entry: $json_entry" >> "$log_file"
	rm -f "$dest"
	return
	fi

	if [ ! -s "$dest" ]; then
	echo "Downloaded zero-byte file: $src, JSON Entry: $json_entry" >> "$log_file"
	rm "$dest"
	else
	echo "1" >> "$temp_file"
	fi
	}

	# Set trap to catch termination and run cleanup
	trap cleanup INT TERM

	# Static API endpoint
	api_url="https://jtsternberg.com/?json&gifs"

	# Fetch JSON data from the API
	json_data=$(curl -s "$api_url")

	if [ -z "$json_data" ]; then
	echo "Failed to fetch data from the API. Exiting."
	exit 1
	fi

	echo "Enter directory name:"
	read directory

	# Create directory in ~/Pictures
	dir_path="$HOME/Pictures/$directory"
	log_file="$dir_path/download_log.txt"

	if [ -d "$dir_path" ]; then
	echo "Directory already exists. Exiting."
	exit 1
	else
	mkdir -p "$dir_path"
	touch "$log_file"
	fi

	# Create a temporary file for tracking downloads
	temp_file=$(mktemp)

	# Count the number of images in the JSON data
	total_images=$(echo "$json_data" \| jq '.data \| length')

	# Max number of parallel downloads
	max_parallel=10
	count=0

	# Iterate over each entry in the JSON data
	echo "$json_data" \| jq -c '.data[] \| {name, src}' \| while IFS= read -r line; do
	if [[ $(echo "$line" \| jq -r '.src') == "?" ]]; then
	continue
	fi
	# Extract name and src
	name=$(echo "$line" \| jq -r '.name')
	src=$(echo "$line" \| jq -r '.src')

	# Extract the filename from the URL and sanitize it
	raw_filename=$(extract_filename_from_url "$src")
	filename=$(sanitize_filename "$raw_filename")
	extension=$(get_extension "$src")
	# Check and append the extension if needed
	full_filename="$dir_path/"$(append_extension_if_needed "$filename" "$extension")

	# Convert the JSON line to a JSON string and pass to the download function
	json_string=$(echo "$line" \| jq -c .)
	download_file "$json_string" "$full_filename" &

	# Update count and wait if we reach max parallel downloads
	((count++))
	if ((count >= max_parallel)); then
	wait
	count=0
	fi

	# Update progress
	downloaded_files=$(wc -l < "$temp_file")
	echo -ne "Downloaded $downloaded_files/$total_images images...\r"
	done

	# Wait for all background jobs to finish
	wait

	# Perform cleanup
	cleanup

	# Remove the temporary file
	rm "$temp_file" 2>> "$log_file"
	rm wget_error.log 2>> "$log_file" # Clean up the wget error log