Skip to content

Instantly share code, notes, and snippets.

@sega
Last active December 1, 2020 14:57
Show Gist options
  • Save sega/fcc3f64e14a16a21368592f7c04aefd3 to your computer and use it in GitHub Desktop.
Save sega/fcc3f64e14a16a21368592f7c04aefd3 to your computer and use it in GitHub Desktop.
Remove duplicate files by content
#!/usr/bin/ruby
require 'digest'
require 'fileutils'
def paths
directories.map { |d| Dir.glob("#{d}/**/*") }.flatten.select { |p| File.file?(p) }
end
def directories
[]
end
t1 = Time.now
by_checksum = paths.group_by do |file|
sha512 = Digest::SHA512.file file
sha512.hexdigest
end
by_checksum.each do |_checksum, files|
next if files.size == 1
file = files.first
more = files.drop(1)
more.each do |other|
File.delete(other) if FileUtils.identical?(file, other)
end
end
t2 = Time.now
puts "Took #{t2 - t1} seconds"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment