Skip to content

Instantly share code, notes, and snippets.

@tednaleid
Created March 24, 2017 05:29
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tednaleid/c4ed5ec8e047f1adb13c1dd333bbb2e2 to your computer and use it in GitHub Desktop.
Save tednaleid/c4ed5ec8e047f1adb13c1dd333bbb2e2 to your computer and use it in GitHub Desktop.
shell commands for finding and deleting duplicate files based on md5sum of the file
# RUN AT YOUR OWN RISK, UNDERSTAND THE COMMANDS AND DO NOT RUN BLINDLY
This will find all the duplicate photos in a `raw_photos` directory.
create md5sum of all files (after `brew install coreutils`):
find raw_photos -type f -exec gmd5sum "{}" + > files.md5
then sort that and you can find the files where the md5sum (the first field) is repeated for spot-checking and comparison
for MD5 in $(awk '{print $1}' files.md5 | sort | uniq -d); do
grep $MD5 files.md5
done > dupes_and_original.txt
creates a file like:
cat dupes_and_original.txt
5abc595dae9b1a9c1dead42b4e55b480 raw_photos/2008/Q4/IMG_0060-1.JPG
5abc595dae9b1a9c1dead42b4e55b480 raw_photos/2008/Q4/IMG_0060-2.JPG
5abc595dae9b1a9c1dead42b4e55b480 raw_photos/2008/Q4/IMG_0060-3.JPG
5abc595dae9b1a9c1dead42b4e55b480 raw_photos/2008/Q4/IMG_0060-4.JPG
... next duplicate md5 & file
Then you can make a file that only has the 2nd through nth duplicates:
for MD5 in $(awk '{print $1}' files.md5 | sort | uniq -d); do
grep $MD5 files.md5 | tail -n+2
done > dupes.txt
creates a file like this that is missing the first row from the above:
cat dupes.txt
5abc595dae9b1a9c1dead42b4e55b480 raw_photos/2008/Q4/IMG_0060-2.JPG
5abc595dae9b1a9c1dead42b4e55b480 raw_photos/2008/Q4/IMG_0060-3.JPG
5abc595dae9b1a9c1dead42b4e55b480 raw_photos/2008/Q4/IMG_0060-4.JPG
Then move all of the duplicate files into a dupes directory.
Caution, there is a possibility of deletion here if you have multiple files with the same name.
An exercise left for the reader would be to move it into a subdirectory with the full path as the original:
mkdir dupes
cat dupes.txt | cut -d' ' -f2- | xargs -I{} mv "{}" dupes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment