Skip to content

Instantly share code, notes, and snippets.

@avriiil
Created April 5, 2021 19:19
Show Gist options
  • Save avriiil/6a67ac6630fc22d88d6ff8d1abbfa9b3 to your computer and use it in GitHub Desktop.
Save avriiil/6a67ac6630fc22d88d6ff8d1abbfa9b3 to your computer and use it in GitHub Desktop.
Function to reduce orthographic ambiguity of Arabic text
from camel_tools.utils.normalize import normalize_alef_maksura_ar
from camel_tools.utils.normalize import normalize_alef_ar
from camel_tools.utils.normalize import normalize_teh_marbuta_ar
def ortho_normalize(text):
text = normalize_alef_maksura_ar(text)
text = normalize_alef_ar(text)
text = normalize_teh_marbuta_ar(text)
return text
df.tweet_text = df.tweet_text.apply(ortho_normalize)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment