Last active
December 29, 2015 01:09
-
-
Save willettk/7590844 to your computer and use it in GitHub Desktop.
Remove duplicate images (ie, the black file not found image) from a directory with SDSS images. Cutout center region (should be pure black), use ImageMagick's identify to locate duplicates, and then delete them.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/sh | |
for img in *.jpg; do | |
filename=${img%.*} | |
newfilename=${img%.*}_cropped | |
convert "$filename.jpg" -crop 100x100+162+162 "$newfilename.jpg" | |
done | |
# Probably needs to be run on subsets of images if using the full GZ set due to memory limitation | |
# Include at least one black image (badimage_cropped.jpg) so subset has a comparison | |
# Takes 3m11s for [6-9]*jpg (40K images) | |
identify -quiet -format "%i %#\n" *cropped.jpg badimage_cropped.jpg | sort -k2 | uniq -D --skip-fields=1 | awk '{print $1}' > duplicates.cat |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment