Skip to content

Instantly share code, notes, and snippets.

@naveed-ahmad
Created March 30, 2019 23:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save naveed-ahmad/1e13d7a181dc84d409b406763209b88b to your computer and use it in GitHub Desktop.
Save naveed-ahmad/1e13d7a181dc84d409b406763209b88b to your computer and use it in GitHub Desktop.
Remove Arabic Accents/Diacritics
text = "بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ"
text.gsub(/\u0FDF0|\u0FDF1|\u0066D|\u0061F|\u060F|\u060E|\u060D|\060C|\u060B|\u064C|\u064D|\u064E|\u064F|\u0650|\u0651|\u0652|\u0653|\u0654|\u0655|\u0656|\0657|\u0658/, '')
#=> بسم ٱلله ٱلرحمٰن ٱلرحيم
@naveed-ahmad
Copy link
Author

Some edge cases

   simple = text.gsub(/\u06E6|\ufe80|\u06E5|\u064B|\u0670|\u0FBCx|\u0FB5x|\u0FBB6|\u0FE7x|\u0FC62|\u0FC61|\u0FC60|\u0FDF0|\u0FDF1|\u0066D|\u0061F|\u060F|\u060E|\u060D|\060C|\u060B|\u064C|\u064D|\u064E|\u064F|\u0650|\u0651|\u0652|\u0653|\u0654|\u0655|\u0656|\0657|\u0658/, '')

      # "ٱ|إ|أ" => ا
      simple = simple.gsub(/\u0671|\u0625|\u0623/, "\u0627".encode('utf-8'))

      # الله => الله
      simple = simple.gsub(/\u0627\u0644\u0644\u0647/, "\u0627\u0644\u0644\u0647".encode('utf-8'))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment