Skip to content

Instantly share code, notes, and snippets.

@ma11hew28
Last active January 25, 2022 18:48
Show Gist options
  • Star 11 You must be signed in to star a gist
  • Fork 10 You must be signed in to fork a gist
  • Save ma11hew28/571405 to your computer and use it in GitHub Desktop.
Save ma11hew28/571405 to your computer and use it in GitHub Desktop.
Ruby script that finds identical (md5) files in all subdirectories (recursive)
# This Ruby script (regardless of where it's located on the file system) recur-
# sively lists all duplicate files in the direcotry in which it's executed.
require 'digest/md5'
hash = {}
Dir.glob('**/*', File::FNM_DOTMATCH).each do |f|
next if File.directory?(f)
key = Digest::MD5.hexdigest(IO.read(f)).to_sym
if hash.has_key?(key) then hash[key].push(f) else hash[key] = [f] end
end
hash.each_value do |a|
next if a.length == 1
puts '=== Identical Files ==='
a.each { |f| puts "\t" + f }
end
@rebelwarrior
Copy link

pretty cool idea to use the md5 hash for checks.

@costa
Copy link

costa commented Aug 13, 2014

@milothiesen
Copy link

@KarlKemp
Copy link

As a one-liner, using a few new things:

files = Pathname(".").glob("**/*").reject(&:directory?); d = files.map.with_index {|f, i| puts "%i / %i" % [i, files.size] if i % 1000 == 12; [f, Digest::MD5.file(f)] }.group_by(&:last).select {|a,v| v.size > 1 }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment