Skip to content

Instantly share code, notes, and snippets.

View enamoria's full-sized avatar
💭
vohox

Kính Phan enamoria

💭
vohox
  • Viet Nam
View GitHub Profile
@behitek
behitek / NlpUtils.java
Last active November 20, 2022 12:30
Chuẩn hóa cách gõ dấu câu về kiểu gõ cũ (Python + Java version)
# -*- coding: utf-8 -*-
import regex as re
uniChars = "àáảãạâầấẩẫậăằắẳẵặèéẻẽẹêềếểễệđìíỉĩịòóỏõọôồốổỗộơờớởỡợùúủũụưừứửữựỳýỷỹỵÀÁẢÃẠÂẦẤẨẪẬĂẰẮẲẴẶÈÉẺẼẸÊỀẾỂỄỆĐÌÍỈĨỊÒÓỎÕỌÔỒỐỔỖỘƠỜỚỞỠỢÙÚỦŨỤƯỪỨỬỮỰỲÝỶỸỴÂĂĐÔƠƯ"
unsignChars = "aaaaaaaaaaaaaaaaaeeeeeeeeeeediiiiiooooooooooooooooouuuuuuuuuuuyyyyyAAAAAAAAAAAAAAAAAEEEEEEEEEEEDIIIOOOOOOOOOOOOOOOOOOOUUUUUUUUUUUYYYYYAADOOU"
def loaddicchar():
dic = {}