Skip to content

Instantly share code, notes, and snippets.

@marcbowes
Created November 5, 2012 14:12
Show Gist options
  • Save marcbowes/4017369 to your computer and use it in GitHub Desktop.
Save marcbowes/4017369 to your computer and use it in GitHub Desktop.
Dedup posters
require "digest/md5"
require "set"
found = Set.new
def md5sum(filename)
content = File.read(filename)
Digest::MD5.hexdigest(content)
end
Poster.all.each do |poster|
# Not sure how this could happen, but anyways..
poster.destroy if poster.image.nil? or poster.image.path.blank?
md5sum = md5sum(poster.image.path)
if found.include?(md5sum)
poster.destroy
else
found << md5sum
end
end
@marcbowes
Copy link
Author

@timsjoberg
Copy link

poster.destroy unless File.file?(poster.image.path)

Not sure if the above is necessary, but maybe add it anyway. Otherwise it looks good

@marcbowes
Copy link
Author

No, I believe the file is actually there. There is nothing stopping dups. open-uri downloads them with a random filename. Content is what matters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment