Skip to content

Instantly share code, notes, and snippets.

@bunnymatic
Created November 30, 2013 20:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bunnymatic/7723789 to your computer and use it in GitHub Desktop.
Save bunnymatic/7723789 to your computer and use it in GitHub Desktop.
find duplicate files by md5 with ruby
#!/usr/bin/env ruby
require 'pp'
files = {}
ctr = 0
print "Searching "
Dir.glob(ARGV[0]).each do |f|
s = File.stat(f)
if s.file?
md5 = (`md5 -q #{f}`).chomp
files[md5] ||= []
files[md5] << {:file => f, :stat => s}
ctr +=1
end
if (ctr % 100) == 0
print '.'
end
end
print "Done\n"
duplicates = files.select{|k,v| v.length > 1}
puts "Searched #{files.length} files"
puts "Found #{duplicates.length} dups"
duplicates.each do |md5, finfo|
puts "[#{md5}]"
finfo.sort_by{|f|f[:file].length}.each do |f|
puts " #{f[:file]}"
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment