Skip to content

Instantly share code, notes, and snippets.

@SavSanta
Forked from OndraZizka/find-duplicate-files.bash
Created September 5, 2021 19:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save SavSanta/63ee3d1845b9f23735408e4c658ba87a to your computer and use it in GitHub Desktop.
Save SavSanta/63ee3d1845b9f23735408e4c658ba87a to your computer and use it in GitHub Desktop.
Finds duplicate files. An alternative to `fdupes -r -S .`
find -type f -size +3M -print0 | while IFS= read -r -d '' i; do
#echo $i
echo -n '.'
if grep -q "$i" md5-partial.txt; then
echo -n ':'; #-e "\n$i ---- Already counted, skipping.";
continue;
fi
#md5sum "$i" >> md5.txt
MD5=`dd bs=1M count=1 if="$i" status=none | md5sum`
MD5=`echo $MD5 | cut -d' ' -f1`
if grep "$MD5" md5-partial.txt; then echo -e "Duplicate: $i"; fi
echo $MD5 $i >> md5-partial.txt
done
fi
## Show the duplicates
#sort md5-partial.txt | uniq --check-chars=32 -d -c
#sort md5-partial.txt | uniq --check-chars=32 -d -c | sort -b -n
#sort md5-partial.txt | uniq --check-chars=32 -d -c | sort -b -n | cut -c 9-40 | xargs -I '{}' sh -c "grep '{}' md5-partial.txt && echo"
## Show wasted space
if [ false ] ; then
sort md5-partial.txt | uniq --check-chars=32 -d -c | while IFS= read -r -d '' LINE; do
HASH=`echo $LINE | cut -c 9-40`;
PATH=`echo $LINE | cut -c 41-`;
ls -l '$PATH' | cud -c 26-34
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment