Skip to content

Instantly share code, notes, and snippets.

@abehmiel
Created October 31, 2017 19:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save abehmiel/76097b8d4ddd2e396ba537de1e3d0e67 to your computer and use it in GitHub Desktop.
Save abehmiel/76097b8d4ddd2e396ba537de1e3d0e67 to your computer and use it in GitHub Desktop.
Pandas fuzzy join
import difflib
# input data
df1 = DataFrame([[1],[2],[3],[4],[5]], index=['one','two','three','four','five'], columns=['number'])
df2 = DataFrame([['a'],['b'],['c'],['d'],['e']], index=['one','too','three','fours','five'], columns=['letter'])
# want to obtain:
# number letter
# one 1 a
# two 2 b
# three 3 c
# four 4 d
# five 5 e
df2.index = df2.index.map(lambda x: difflib.get_close_matches(x, df1.index)[0])
df1.join(df2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment