Skip to content

Instantly share code, notes, and snippets.

@TeraBytesMemory
Created July 20, 2020 08:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save TeraBytesMemory/a15107f38228ccf97cd2621a034d2aba to your computer and use it in GitHub Desktop.
Save TeraBytesMemory/a15107f38228ccf97cd2621a034d2aba to your computer and use it in GitHub Desktop.
normalize utf-8 encoding of japanese
# https://kamosawa.hatenablog.com/entry/20151015
repdict=dict()
for tap in [(c +'\u309a' , chr(ord(c)+2)) for c in u'はひふへほハヒフヘホ']:
repdict.update({tap[0]:tap[1]})
for tap in [(chr(ord(c)) +'\u3099' , chr(ord(c)+1)) for c in u'かきくけこさしすせそたちつてとはひふへほカキクケコサシスセソタチツテトハヒフヘホ']:
repdict.update({tap[0]:tap[1]})
def normalize_encode(contents):
for key in repdict.keys():
contents=contents.replace(key, repdict.get(key))
return contents
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment