Skip to content

Instantly share code, notes, and snippets.

@Fooftilly
Last active November 15, 2023 23:25
Show Gist options
  • Save Fooftilly/52793337319782576ad57fc01cbbb312 to your computer and use it in GitHub Desktop.
Save Fooftilly/52793337319782576ad57fc01cbbb312 to your computer and use it in GitHub Desktop.
Dlib.me downloader
#!/bin/bash
base_url="https://www.old.dlib.me/sken_prikaz_1_f.php?id_jedinice="
image_base_url="https://www.old.dlib.me/"
current_id=1
while true; do
url="${base_url}${current_id}"
content=$(wget -q -O - "$url")
# Check for 404 error
if [[ "$content" == *"404 Not Found"* ]]; then
echo "404 error encountered for $url. Moving to the next ID."
((current_id++))
continue
fi
# Extract folder name and image list
ime_foldera_skenova=$(echo "$content" | grep -oP 'ime_foldera_skenova = "\K[^"]+')
lista_skenova=$(echo "$content" | grep -oP 'lista_skenova = "\K[^"]+')
# Create folder
folder_name=$(basename "$ime_foldera_skenova")
mkdir -p "$folder_name"
# Download images
IFS='|' read -ra images <<< "$lista_skenova"
for image in "${images[@]}"; do
image_url="${image_base_url}${ime_foldera_skenova}/${image}"
destination="$folder_name/$image"
# Check if the image already exists
if [ -e "$destination" ]; then
echo "Skipping already downloaded image: $destination"
else
wget -P "$folder_name" "$image_url"
echo "Downloaded: $image_url to $destination"
sleep 3 # Rate limit: 1 download every 3 seconds
fi
done
echo "Downloaded images for $url to $folder_name."
((current_id++))
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment