Skip to content

Instantly share code, notes, and snippets.

@zaidalyafeai
Created February 15, 2020 22:35
Show Gist options
  • Save zaidalyafeai/3dab96776b570eaceee6f186a174b12a to your computer and use it in GitHub Desktop.
Save zaidalyafeai/3dab96776b570eaceee6f186a174b12a to your computer and use it in GitHub Desktop.
t = "blah blah"
t = araby.strip_tashkeel(t) #remove tashkeel
t = re.sub(r'([-؟،.!;:])', ' \\1 ', t) #add spaces between special charaacters
t = re.sub(r'([^\s\w\-؟،.!;:])+', '', t) #remove all special characters except some
t = re.sub(r'[³ـ¼]', '', t) #explecitly remove some special characters
t = re.sub('[a-zA-z]', '', t) #remove english litters
@MagedSaeed
Copy link

in the last line, you are removing English letters. We agreed not to remove them. What changed your mind?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment