Skip to content

Instantly share code, notes, and snippets.

@crodas

crodas/duplicate.sh Secret

Created May 28, 2013
Embed
What would you like to do?
#!/bin/bash
files_dir=$1
if [[ -z "$files_dir" ]]; then
echo "Error: files dir is undefined"
exit;
fi
find $files_dir -type f -exec sha1sum \{\} \; > /tmp/list.txt
count=-1
total=0
for l in `cat /tmp/list.txt | awk '{print $1}' | sort | uniq -c | sort -nr`
do
if [[ $count == -1 ]]
then
count=$l
else
hash=$l
if [[ $count == 1 ]]
then
break
fi
for f in `grep $hash /tmp/list.txt | awk '{print $2}'`
do
echo "rm $f"
done
total=`expr $total + $count`
count=-1
fi
done
echo "Deleted $total files"
echo
@mkoistinen

This comment has been minimized.

Copy link

@mkoistinen mkoistinen commented Sep 9, 2014

Are you sure this doesn't delete all files that /have/ duplicates, including the "original" itself. Looks to me that the only files safe from deletion are those that haven't been duplicated.

Also, here's a version that works as it should and works on systems that have openssl, but not sha1sum available (I.e., Macs OS X).

#!/bin/bash

files_dir=$1
if [[ -z "$files_dir" ]]; then
    echo "Error: files dir is undefined"
    exit;
fi

find $files_dir -type f -exec openssl sha1 \{\} \; > /tmp/list.txt


count=-1
total=0
for l in `cat /tmp/list.txt | sed 's/SHA1(\(.*\))\= \(.*\)$/\2 \1/' | awk '{print $1}' | sort | uniq -c | sort -nr`
do
    if [[ $count == -1 ]]
    then
        count=$l
    else 
        hash=$l
        if [[ $count == 1 ]]
        then
            break
        fi
        for f in `grep $hash /tmp/list.txt | sed 's/SHA1(\(.*\))\= \(.*\)$/\2 \1/' | awk '{print $2}'`
        do
            if [[ $count > 1 ]]
            then
                echo "removing: $f"
                echo "rm $f"
                count=$((count-1))
            fi
        done
        total=`expr $total + $count`
        count=-1
    fi
done

echo "Deleted $total files"
echo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.