Skip to content

Instantly share code, notes, and snippets.

@tkamishima
Created April 20, 2016 17:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tkamishima/52f3c1883bdf7d328e53ff2b2d455e0e to your computer and use it in GitHub Desktop.
Save tkamishima/52f3c1883bdf7d328e53ff2b2d455e0e to your computer and use it in GitHub Desktop.
python3 で unicode ファイルを genfromtxt で読み込む
# Thanks to http://stackoverflow.com/questions/33001373/loading-utf-8-file-in-python-3-using-numpy-genfromtxt
# converter の前の数字は,区切りファイルの列番号(0から)
# 普通に読むと bytecode になるが,それだと unicode 文字列に代入できないので,明示的に変換する
dtype = np.dtype([('eid', np.int), ('feature', 'U82')])
np.genfromtxt("foo.tsv", delimiter='\t', dtype=dtype, converters={1: lambda x: x.decode('utf_8')})
@tkamishima
Copy link
Author

2行目は こっちでいけた

np.genfromtxt("foo.tsv", delimiter='\t', dtype=dtype, converters={1:np.char.decode})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment