Skip to content

Instantly share code, notes, and snippets.

Last active October 22, 2018 11:17
What would you like to do?
a script that uses perceptual hashing (pHash) to find all the duplicated images in a directory and delete them
#!/usr/bin/env python3
find all the duplicated images in the current directory using perceptual hashing and delete them
to install dependencies
$ pip install imagehash Pillow
usage: ./ [directory containing the images]
import sys
from pathlib import Path
from imagehash import phash
from PIL import Image
directory = Path(sys.argv[1] if len(sys.argv) > 1 else ".")
seen_hashes = {}
for filename in directory.iterdir():
hash_ = phash(
if hash_ in seen_hashes:
print('deleting {}, it\'s the same as {}'.format(filename, seen_hashes[hash_]))
seen_hashes[hash_] = filename
except OSError:
continue # skip files that can't be opened as images
print('no images found in {}'.format(directory.absolute()))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment