Skip to content

Instantly share code, notes, and snippets.

@mmechtley
Created March 20, 2015 16:34
Show Gist options
  • Save mmechtley/b292733d76b9700d52dc to your computer and use it in GitHub Desktop.
Save mmechtley/b292733d76b9700d52dc to your computer and use it in GitHub Desktop.
Python SequenceMatcher for finding a similarly-named file
"""
Here's a cute example of using Python's builtin difflib support to find a file with the closest matching name
"""
from difflib import SequenceMatcher
# Suppose we have some files (databases here) with a certain naming scheme.
db_files = ['out_NDWFS_1425+3254_J_db.hdf5', 'out_NDWFS_1425+3254_H_db.hdf5']
# Now we have several other files (model definitions here) that have a similar naming scheme
py_files = ['model_NDWFS_1425+3254_J.py', 'model_NDWFS_1425+3254_H.py']
for db_file in db_files:
# Setup a function that creates a SequenceMatcher against db_file, then returns the similarity ratio
# Note a= and b= are important, the first argument of SequenceMatcher supplies "junk" characters to ignore
similar_score = lambda x: SequenceMatcher(a=db_file, b=x).ratio()
# Now sort the py_files list using their similarity ratios against db_file as the sort key
py_files.sort(key=similar_score)
# Best-matching filename is now the last element in the sorted array
model_file = py_files[-1]
print '{} matches {}'.format(model_file, db_file)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment