Skip to content

Instantly share code, notes, and snippets.

@mcgrew
Created January 19, 2011 19:57
Show Gist options
  • Save mcgrew/786739 to your computer and use it in GitHub Desktop.
Save mcgrew/786739 to your computer and use it in GitHub Desktop.
Removes duplicate files in a directory, replacing them with hard links.
#!/bin/sh
dir=$1
if [ -z $dir ]; then
dir="."
fi;
# create a temporary file name
TMPFILE="/tmp/$(echo -n "$dir" | md5sum | awk '{print $1}' | tr -d '\n')_$(date +%Y.%m.%d.%H.%M)_dupes.txt"
#find the duplicates and output the list to the temporary file
echo "Looking for duplicates..."
nice fdupes -qr1 $dir | egrep -i "\.(mzxml|mzdata|mzml|csv|cdf|dlt|sig|txt)\s*\$" | tee $TMPFILE
# overwrite duplicate files with hard links.
cat $TMPFILE | while read dupe; do
firstfile=""
nextdupe=""
for i in $dupe; do
if [ -z "$firstfile" ]; then
firstfile="$i";
elif [ ! -f "$firstfile" ]; then
firstfile="$firstfile $i"
elif [ -z "$nextdupe" ]; then
nextdupe="$i";
elif [ ! -f "$nextdupe" ]; then
nextdupe="$nextdupe $i";
fi;
if [ -f "$firstfile" -a -f "$nextdupe" ]; then
echo -e "\033[01;32mLinking $nextdupe to $firstfile\033[0m"
ln -f "$firstfile" "$nextdupe"
nextdupe=""
fi;
done;
done;
#rm $TMPFILE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment