Skip to content

Instantly share code, notes, and snippets.

@langner
Last active May 7, 2021 09:34
Show Gist options
  • Save langner/2aabb6e3f7ee3b6d477e to your computer and use it in GitHub Desktop.
Save langner/2aabb6e3f7ee3b6d477e to your computer and use it in GitHub Desktop.
Python function for testing similarity between the first authors in two article author fields
def similar_first_author(author1, author2):
"""Determine whether two first authors have the same names.
Since there can be various fluctuations in first names and initials, we will
only check the first word in the author string and the first letter of the
second word. Although the second word is usually the first name, there will
be exceptions for multi-word last names, but this will be a small minority
and still passes our test. In case there is just a single word, use just that.
"""
author1 = author1.lower().decode('utf-8')
author2 = author2.lower().decode('utf-8')
to_replace = {
# Spell out umlauts phonetically.
u"ü": "ue"
}
for tr in to_replace:
author1 = author1.replace(tr, to_replace[tr])
author2 = author2.replace(tr, to_replace[tr])
# Replace any remaining Unicode with the ASCII equivalents.
author1 = unidecode(author1)
author2 = unidecode(author2)
try:
first1 = author1.split()[0].lower().strip(',')
first2 = author2.split()[0].lower().strip(',')
except IndexError:
return False
try:
second1 = author1.split()[1].lower().strip(',')
second2 = author2.split()[1].lower().strip(',')
return first1 == first2 and second1[0] == second2[0]
except IndexError:
return first1 == first2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment