Skip to content

Instantly share code, notes, and snippets.

@verhovsky
Last active October 22, 2018 11:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save verhovsky/ad4d401fed0d088bd342a6a3324ae219 to your computer and use it in GitHub Desktop.
Save verhovsky/ad4d401fed0d088bd342a6a3324ae219 to your computer and use it in GitHub Desktop.
a script that uses perceptual hashing (pHash) to find all the duplicated images in a directory and delete them
#!/usr/bin/env python3
'''
find all the duplicated images in the current directory using perceptual hashing and delete them
to install dependencies
$ pip install imagehash Pillow
usage: ./delete_duplicate_images.py [directory containing the images]
'''
import sys
from pathlib import Path
from imagehash import phash
from PIL import Image
directory = Path(sys.argv[1] if len(sys.argv) > 1 else ".")
seen_hashes = {}
for filename in directory.iterdir():
try:
hash_ = phash(Image.open(filename))
if hash_ in seen_hashes:
print('deleting {}, it\'s the same as {}'.format(filename, seen_hashes[hash_]))
filename.unlink()
else:
seen_hashes[hash_] = filename
except OSError:
continue # skip files that can't be opened as images
else:
print('no images found in {}'.format(directory.absolute()))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment