Skip to content

Instantly share code, notes, and snippets.

@zerobase
Last active December 17, 2015 17:09
Show Gist options
  • Save zerobase/5643476 to your computer and use it in GitHub Desktop.
Save zerobase/5643476 to your computer and use it in GitHub Desktop.
To clean up duplicated files. For Mac OS X.
# For Mac OS X
# To clean up duplicated files.
# isDuplicate(fileName)
# It returns true is the files are duplicated.
# It tests:
# - a file name (before the extention) ends with [0-9], and
# - there is a same sized file without trailing [0-9] of its file name.
# For example: "Word Power Made Easy Norman Lewis 528p_067174190X 2.pdf"
# original: "Word Power Made Easy Norman Lewis 528p_067174190X.pdf"
require 'fileutils'
def isDuplicate(fileName)
return false unless fileName =~ /^(.*) [0-9](\.\w+)$/
#return false unless fileName =~ /^(.*).[0-9](\.\w+)$/
baseName = Regexp.last_match[1]
baseName = Regexp.last_match[1]
extention = Regexp.last_match[2]
originalFileName = baseName + extention
return false unless File.exists?(originalFileName)
return false unless File.size(originalFileName) == File.size(fileName)
if FileUtils.compare_file(originalFileName, fileName)
return true
else
return false
end
end
def findDuplicateFiles(directoryName)
foundFileNames = []
Dir.glob(directoryName + "/*") do |fileName|
if isDuplicate(fileName)
foundFileNames << fileName
end
end
return foundFileNames
end
def cleanUp(foundFileNames)
print "Deleting #{foundFileNames.size} files:\n"
foundFileNames.each do |fileName|
File.unlink fileName
print fileName, "\n"
end
print "Finished to delete #{foundFileNames.size} files.\n"
end
# main:
directoryName = ARGV.shift || '.'
foundFileNames = findDuplicateFiles(directoryName)
cleanUp(foundFileNames)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment