Skip to content

Instantly share code, notes, and snippets.

@rbriank
Forked from jchris/hashfiles.rb
Created July 5, 2010 21:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rbriank/464697 to your computer and use it in GitHub Desktop.
Save rbriank/464697 to your computer and use it in GitHub Desktop.
require 'digest/md5'
# usage: run this in the root directory of your iTunes Music folder, or wherever, and pipe the output to a file
# next, pipe the output of that file through `sort` to a new file
# now, use the next script on that file
ls = Dir['**/*']
ls.each_with_index do |f, i|
STDERR.puts ls.length - i if (i % 100 == 0)
next if File.directory?(f)
md5= Digest::MD5.hexdigest(File.read(f))
puts "#{md5} #{f}"
end
# use at your own risk!
# read the code
# enjoy!
# usage: cat filesort.txt | ruby removedupes.rb
# where filesort.txt is the output of the `sort` command from the last script
lasthash = "x"
lastname = "z"
doput = false
group = []
while line = gets
line = line.chomp
lp = line.split(' ')
hash = lp.shift
name = lp.join(' ')
if hash == lasthash
group << lastname
doput = true
else
if doput
group << lastname
# process the group of identical files to delete all but the one with the shortest pathname
group.sort! do |a, b|
b.length - a.length
end
keep = group.pop
group.each do |f|
begin
File.delete(f)
rescue
# somewhere in the toolchain, things like double-spaces are getting collapsed to single spaces
# when I used this, there were 40-ish files I had to delete by hand
# by grepping the output of this script for ^error and using shell-completion to
# get the proper filename to delete
puts "error #{f}"
end
end
puts "kept #{keep}"
end
group = []
doput = false
end
lasthash = hash
lastname = name
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment