Skip to content

Instantly share code, notes, and snippets.

@costa
Forked from ma11hew28/find-duplicate-files.rb
Last active March 8, 2017 18:09
Show Gist options
  • Save costa/b48278d929e6724583bc to your computer and use it in GitHub Desktop.
Save costa/b48278d929e6724583bc to your computer and use it in GitHub Desktop.
#!/usr/bin/env ruby
# NOTE current directory is assumed
# NOTE run with 'rm {}' to remove duplicate copies (while taking necessary caution)
exec_command = ARGV[0] || 'echo {}'
require 'digest/md5'
require 'shellwords'
filenames_by_md5 = {}
Dir.glob("**/*").each do |filename| # NOTE dotfiles are ignored
next if File.directory?(filename) # NOTE directories are ignored
next unless File.size?(filename) # NOTE empty files are ignored
md5 = Digest::MD5.file(filename).hexdigest
(filenames_by_md5[md5] ||= []).push filename
end
filenames_by_md5.each_value do |filenames|
if filenames.length > 1
filenames.sort! # NOTE all paths starting with YYYY/MM/DD.. ensure the oldest are chosen as originals (e.g. think Aperture lib)
$stderr.puts "ATTENTION: #{Shellwords.escape filenames[0]} (original): number of copies: #{filenames.length - 1}"
basename = File.basename filenames[0]
filenames[1..-1].each do |filename|
if basename != File.basename(filename)
$stderr.puts "WARNING: #{Shellwords.escape filename} (copy) has a file name different from the original (#{Shellwords.escape filenames[0]}), skipping"
next
end
command = exec_command.gsub(/{}/, Shellwords.escape(filename))
$stderr.puts "ERROR: Executing #{command}" unless system command
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment