Skip to content

Instantly share code, notes, and snippets.

@jonathanclarke
Created June 19, 2021 12:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jonathanclarke/efa2b8bde420976210534663d6996e0a to your computer and use it in GitHub Desktop.
Save jonathanclarke/efa2b8bde420976210534663d6996e0a to your computer and use it in GitHub Desktop.
Export files recusively to a common directory with a unique sha
# frozen_string_literal: true
# Export files recusively to a common directory with a unique sha
# Ignore duplicate files, skip directories
require 'pathname'
INPUT_DIR = '/home/jonathan/duplicate-input'
OUTPUT_DIR = '/home/jonathan/deduped-output'
files = Dir["#{INPUT_DIR}/**/*.*"]
files.each do |f|
path = Pathname.new(f)
next if path.directory?
ext = File.extname(f.shellescape)
sha_result = `sha256sum #{f.shellescape}`
sha = sha_result.split(' ').first
if File.exist?("#{OUTPUT_DIR}/#{sha}#{ext}")
puts "DUPLICATE: #{f.shellescape}"
else
puts "copying #{f}"
`cp #{f.shellescape} #{OUTPUT_DIR}/#{sha}#{ext.downcase}`
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment