Skip to content

Instantly share code, notes, and snippets.

@yazinsai
Created July 5, 2020 09:45
Show Gist options
  • Save yazinsai/289bd74b02c724f981b7e36574dfea34 to your computer and use it in GitHub Desktop.
Save yazinsai/289bd74b02c724f981b7e36574dfea34 to your computer and use it in GitHub Desktop.
A simple Ruby script to parse text from the Quran into text without diacritics
#!/usr/bin/env ruby
# 1. Copy the raw text from https://www.corequran.com/
# 2. Run "ruby parse.rb"
# 3. Paste the copied text
# 4. Press "Ctrl + D" to see the output
def parse(str)
str
.split("\n")
.map { |aya| remove_verse_number(aya) }
.map { |aya| remove_diacritics(aya) }
end
def remove_verse_number(str)
str.gsub(/﴿.*?﴾/, '').strip
end
def remove_diacritics(str)
diacritics = 'ِ', 'ُ', 'ٓ', 'ٰ', 'ْ', 'ٌ', 'ٍ', 'ً', 'ّ', 'َ', ' ۛ', ' ۖ', ' ۗ'
str.gsub Regexp.union(diacritics), ''
end
puts parse(STDIN.read)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment