Created
March 24, 2017 05:29
-
-
Save tednaleid/c4ed5ec8e047f1adb13c1dd333bbb2e2 to your computer and use it in GitHub Desktop.
shell commands for finding and deleting duplicate files based on md5sum of the file
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# RUN AT YOUR OWN RISK, UNDERSTAND THE COMMANDS AND DO NOT RUN BLINDLY | |
This will find all the duplicate photos in a `raw_photos` directory. | |
create md5sum of all files (after `brew install coreutils`): | |
find raw_photos -type f -exec gmd5sum "{}" + > files.md5 | |
then sort that and you can find the files where the md5sum (the first field) is repeated for spot-checking and comparison | |
for MD5 in $(awk '{print $1}' files.md5 | sort | uniq -d); do | |
grep $MD5 files.md5 | |
done > dupes_and_original.txt | |
creates a file like: | |
cat dupes_and_original.txt | |
5abc595dae9b1a9c1dead42b4e55b480 raw_photos/2008/Q4/IMG_0060-1.JPG | |
5abc595dae9b1a9c1dead42b4e55b480 raw_photos/2008/Q4/IMG_0060-2.JPG | |
5abc595dae9b1a9c1dead42b4e55b480 raw_photos/2008/Q4/IMG_0060-3.JPG | |
5abc595dae9b1a9c1dead42b4e55b480 raw_photos/2008/Q4/IMG_0060-4.JPG | |
... next duplicate md5 & file | |
Then you can make a file that only has the 2nd through nth duplicates: | |
for MD5 in $(awk '{print $1}' files.md5 | sort | uniq -d); do | |
grep $MD5 files.md5 | tail -n+2 | |
done > dupes.txt | |
creates a file like this that is missing the first row from the above: | |
cat dupes.txt | |
5abc595dae9b1a9c1dead42b4e55b480 raw_photos/2008/Q4/IMG_0060-2.JPG | |
5abc595dae9b1a9c1dead42b4e55b480 raw_photos/2008/Q4/IMG_0060-3.JPG | |
5abc595dae9b1a9c1dead42b4e55b480 raw_photos/2008/Q4/IMG_0060-4.JPG | |
Then move all of the duplicate files into a dupes directory. | |
Caution, there is a possibility of deletion here if you have multiple files with the same name. | |
An exercise left for the reader would be to move it into a subdirectory with the full path as the original: | |
mkdir dupes | |
cat dupes.txt | cut -d' ' -f2- | xargs -I{} mv "{}" dupes |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment