Skip to content

Instantly share code, notes, and snippets.

@urubatan
Created May 24, 2012 14:42
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save urubatan/2781970 to your computer and use it in GitHub Desktop.
Save urubatan/2781970 to your computer and use it in GitHub Desktop.
Ruby script to remove duplicated files, I created it when migrating my pictures collection from iPhoto to picasa, and merged some independent collections, it created a real mess, and the result of this big mess is this gist.
require 'digest/sha1'
require 'fileutils'
directories = [
"SOURCE DIR 1",
"SOURCE DIR 2"
]
files = {}
directories.each do |dir_name|
puts "Scanning Directory: #{dir_name} "
Dir.glob("#{dir_name}/**/*.*") do |file_name|
unless File.directory?(file_name)
print "."
dig = Digest::SHA1.hexdigest(File.open(file_name,'rb'){|f| f.read })
arr = files[dig] || []
arr << file_name
files[dig] = arr
end
end
puts ""
end
total_files = files.inject(0){|acum,val| acum + val[1].size}
with_copies = files.select{|k,v| v.length > 1 }
puts "#{files.size} different files"
puts "#{with_copies.size} files with copies"
puts "#{total_files = files.size} duplicates"
FileUtils.mkdir_p "CopiesTrash"
with_copies.each do |k,v|
orig = v.pop
puts "moving #{v.length} copie(s) of #{orig} to CopiesTrash"
FileUtils.mv v, "CopiesTrash", :force => true
puts ""
end
puts "Your directories are cleaned up of duplicated files, all the trash is in the CopiesTrash folder"
@wilsonfoz
Copy link

I think the right code in line 25 might be: puts "#{total_files - files.size} duplicates"
Very usefull code. It helped me a lot. Thank's

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment