Skip to content

Instantly share code, notes, and snippets.

@sgithens
Last active August 29, 2015 13:58
Show Gist options
  • Save sgithens/9982020 to your computer and use it in GitHub Desktop.
Save sgithens/9982020 to your computer and use it in GitHub Desktop.
Judy-san: Possible way of detecting duplicate patients by creating mathematical sets from each part of their name.
"""
We will create several patients who have an ID, fname, mname, lname
and then check for those that switched parts of their name.
"""
from collections import defaultdict
patients = (
(1, "Judy", "Wawira", "Gichoya"),
(2, "Steven", "Wilbur", "Githens"),
(3, "Gichoya", "Wawira", "Judy"),
(4, "Some", "Random", "Person"),
)
pat_names = defaultdict(list)
for pat in patients:
nameset = frozenset(pat[1:])
pat_names[nameset].append(pat[0])
# Any set in the patient dictionary that has more than 1
# ID is a potential duplicate
for dup in [val for val in pat_names.itervalues() if len(val) > 1]:
print "Possible Duplicate IDs:", dup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment