Skip to content

Instantly share code, notes, and snippets.

@narbehaj
Last active May 28, 2018 23:54
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save narbehaj/bda6c044273468abee01b317aa80ab28 to your computer and use it in GitHub Desktop.
Save narbehaj/bda6c044273468abee01b317aa80ab28 to your computer and use it in GitHub Desktop.
Finds duplicate files under the directory
import hashlib
import os
m_list = []
for i, d , e in os.walk('/home/test/'):
for file in e:
if file.endswith('mkv'):
with open('{}/{}'.format(i, file), 'rb') as file_read:
for chunk in iter(lambda: file_read.read(4096), b""):
file_hash = hashlib.md5(chunk).hexdigest()
if file_hash in m_list:
print(file)
# os.remove('{}/{}'.format(i, file))
else:
m_list.append(file_hash)
break
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment