Skip to content

Instantly share code, notes, and snippets.

@yfwu
Created November 22, 2021 10:37
Show Gist options
  • Save yfwu/0a1f86cf1e71c39e10dd4ba786d3b75c to your computer and use it in GitHub Desktop.
Save yfwu/0a1f86cf1e71c39e10dd4ba786d3b75c to your computer and use it in GitHub Desktop.
Extract word from texts
import re
import difflib
word="acromegaly"
textarea = "acromegaly, pneumonia, prolactinoma"
pattern = r'\w+'
l = re.findall(pattern, textarea)
matcher = difflib.SequenceMatcher(b=word)
for test_word in l:
matcher.set_seq1(test_word)
distance = len([m for m in matcher.get_opcodes() if m[0]!='equal'])
if distance <= 2:
print(distance, test_word)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment