Skip to content

Instantly share code, notes, and snippets.

@khiemdoan
Last active July 1, 2019 03:46
Show Gist options
  • Save khiemdoan/80a12189f17287db31b07c6f69b2f3cc to your computer and use it in GitHub Desktop.
Save khiemdoan/80a12189f17287db31b07c6f69b2f3cc to your computer and use it in GitHub Desktop.
Remove Vietnamese Accent
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""Remove Vietnamese tones function"""
"""Author: Khiem Doan"""
uni_chars_l = 'áàảãạâấầẩẫậăắằẳẵặđèéẻẽẹêếềểễệíìỉĩịóòỏõọôốồổỗộơớờởỡợúùủũụưứừửữựýỳỷỹỵ'
uni_chars_u = 'ÁÀẢÃẠÂẤẦẨẪẬĂẮẰẲẴẶĐÈÉẺẼẸÊẾỀỂỄỆÍÌỈĨỊÓÒỎÕỌÔỐỒỔỖỘƠỚỜỞỠỢÚÙỦŨỤƯỨỪỬỮỰÝỲỶỸỴ'
no_tone_chars_l = 'a'*17 + 'd' + 'e'*11 + 'i'*5 + 'o'*17 + 'u'*11 + 'y'*5
no_tone_chars_u = 'A'*17 + 'D' + 'E'*11 + 'I'*5 + 'U'*11 + 'O'*17 + 'Y'*5
no_tone_dict = dict(zip(uni_chars_l + uni_chars_u, no_tone_chars_l + no_tone_chars_u))
def remove_vietnamese_tones(text):
text = [no_tone_dict.get(c, c) for c in text]
return ''.join(text)
if __name__ == '__main__':
print(remove_vietnamese_tones('Khiêm Đoàn'))
print(remove_vietnamese_tones('Đoàn Hoà Khiêm'))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment