Skip to content

Instantly share code, notes, and snippets.

@iandexter
Created January 5, 2010 16:07
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save iandexter/269481 to your computer and use it in GitHub Desktop.
Save iandexter/269481 to your computer and use it in GitHub Desktop.
Find duplicate files
#!/usr/bin/perl -w
use strict;
use File::Find;
use Digest::MD5;
my %files;
find(\&check_file, $ARGV[0] || ".");
local $" = ", ";
foreach my $size (sort {$b <=> $a} keys %files) {
next unless @{$files{$size}} > 1;
my %md5;
foreach my $file (@{$files{$size}}) {
open(FILE, $file) or next;
binmode(FILE);
push @{$md5{Digest::MD5->new->addfile(*FILE)->digest}},$file;
}
foreach my $hash (keys %md5) {
next unless @{$md5{$hash}} > 1;
print "$size @{$md5{$hash}}\n";
}
}
sub check_file {
-f && push @{$files{(stat(_))[7]}}, $File::Find::name;
}
find "$@" -type f -exec md5sum {} \; | \
sort -k 1,32 | uniq -w 32 -d -D | \
awk 'NF { a[substr($0,0,32)]=(a[substr($0,0,32)]) ? a[substr($0,0,32)] FS $2 : $0 } \
END \
{ for(i in a) print a[i] }'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment