Skip to content

Instantly share code, notes, and snippets.

@avriiil
Last active April 5, 2021 20:33
Show Gist options
  • Save avriiil/c23b27b8a79ea677a915ed3b5e264c25 to your computer and use it in GitHub Desktop.
Save avriiil/c23b27b8a79ea677a915ed3b5e264c25 to your computer and use it in GitHub Desktop.
Disambiguate a single Arabic sentence
from camel_tools.disambig.mle import MLEDisambiguator
# instantiate the Maximum Likelihood Disambiguator
mle = MLEDisambiguator.pretrained()
# The disambiguator expects pre-tokenized text
sentence = simple_word_tokenize('نجح بايدن في الانتخابات')
disambig = mle.disambiguate(sentence)
diacritized = [d.analyses[0].analysis['diac'] for d in disambig]
pos_tags = [d.analyses[0].analysis['pos'] for d in disambig]
lemmas = [d.analyses[0].analysis['lex'] for d in disambig]
# Print the combined feature values extracted above
for triplet in zip(diacritized, pos_tags, lemmas):
print(triplet)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment