Skip to content

Instantly share code, notes, and snippets.

@willettk
Last active December 29, 2015 01:09
Show Gist options
  • Save willettk/7590844 to your computer and use it in GitHub Desktop.
Save willettk/7590844 to your computer and use it in GitHub Desktop.
Remove duplicate images (ie, the black file not found image) from a directory with SDSS images. Cutout center region (should be pure black), use ImageMagick's identify to locate duplicates, and then delete them.
#!/bin/sh
for img in *.jpg; do
filename=${img%.*}
newfilename=${img%.*}_cropped
convert "$filename.jpg" -crop 100x100+162+162 "$newfilename.jpg"
done
# Probably needs to be run on subsets of images if using the full GZ set due to memory limitation
# Include at least one black image (badimage_cropped.jpg) so subset has a comparison
# Takes 3m11s for [6-9]*jpg (40K images)
identify -quiet -format "%i %#\n" *cropped.jpg badimage_cropped.jpg | sort -k2 | uniq -D --skip-fields=1 | awk '{print $1}' > duplicates.cat
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment