Skip to content

Instantly share code, notes, and snippets.

@peeyushsrj
Created November 17, 2019 17:42
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save peeyushsrj/0ba2393f3ca25c6b4367846810975d74 to your computer and use it in GitHub Desktop.
Save peeyushsrj/0ba2393f3ca25c6b4367846810975d74 to your computer and use it in GitHub Desktop.
Finding duplicates in directory (and move them too - optional)
# python 3
import os
import hashlib
import shutil
def md5(fname):
hash_md5 = hashlib.md5()
with open(fname, "rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
hash_md5.update(chunk)
return hash_md5.hexdigest()
listofFiles = []
listofDuplicates = []
rootDir = '.'
for dirName, subdirList, fileList in os.walk(rootDir):
fileList = map(lambda x:dirName+"/"+x, fileList)
for file in fileList:
md5sum = md5(file)
if md5sum not in listofFiles:
listofFiles.append(md5sum)
else:
listofDuplicates.append(file)
print(listofDuplicates)
dest=""
if dest:
for file in listofDuplicates:
shutil.move(file, dest)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment