Skip to content

Instantly share code, notes, and snippets.

@ckhung
Created August 30, 2022 03:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ckhung/283c876985ed962579f4e8f82cc3c451 to your computer and use it in GitHub Desktop.
Save ckhung/283c876985ed962579f4e8f82cc3c451 to your computer and use it in GitHub Desktop.
find deal-breakers for 'iconv -f utf8 -t big5'
#!/usr/bin/python3
# find deal-breakers for 'iconv -f utf8 -t big5'
# 例: 清冠一號中醫診所清單裡面有一些簡體字「号」「络」和放大鏡符號等等,
# 會造成 iconv 從 utf8 轉 big5 時失敗。 這個程式可以列出有哪幾列有問題。
# https://docs.google.com/spreadsheets/d/e/2PACX-1vQjf_HNeEZKM-XJX-q5v4cfNrB3kcv4gOT8kFbV9rurfoX_H5Qv9112Pv0PgYNFSzbReyNlQkLrJib3/pubhtml#
# 使用方式: python3 db-iconv 某中文utf8編碼檔
# 每列呼叫一次 iconv, 有點慢
import argparse, re
from subprocess import Popen, PIPE, DEVNULL
parser = argparse.ArgumentParser(
description='find deal-breaker for iconv',
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('file', help='中文文字檔')
args = parser.parse_args()
with open(args.file) as F:
for i, line in enumerate(F):
# https://stackoverflow.com/questions/163542/how-do-i-pass-a-string-into-subprocess-popen-using-the-stdin-argument
iconv = Popen(['iconv', '-f', 'utf8', '-t', 'big5'],
stdin=PIPE,
stdout=DEVNULL,
stderr=PIPE,
universal_newlines=True
)
errmsg = iconv.communicate(input=line)[1]
if errmsg:
print(line, end='')
print('! {:4d} {}'.format(i+1, errmsg), end='')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment