Skip to content

Instantly share code, notes, and snippets.

@giapt
Last active August 4, 2017 07:02
Show Gist options
  • Save giapt/c4a278b845a959364edd566e69249587 to your computer and use it in GitHub Desktop.
Save giapt/c4a278b845a959364edd566e69249587 to your computer and use it in GitHub Desktop.
Convert japanese to katakana/romaji in Python

Convert japanese to romaji in Python/Ubuntu

Install library

$ sudo apt-get install mecab libmecab-dev mecab-ipadic
$ sudo aptitude install mecab-ipadic-utf8
$ sudo apt-get install python-mecab
$ pip install romkan

SAMPLE

convert.py

# coding: utf-8
import MeCab
import romkan
import sys

m = MeCab.Tagger ("-Ochasen")

if len(sys.argv)>1:
	option = sys.argv[1]
	pass
else:
	option = "default"

print ("私の名前はボブです。")
sentence = "私の名前はボブです。"
sentence_u = unicode(sentence, "utf-8")
words = list(sentence_u)
katakana = ''
for word in words:
	new_word = word.encode('utf8')
	if option == "kanji-only":
		if ord(word)>12352 and ord(word)<12543:
			katakana = katakana+new_word+" "
			continue
			pass
		pass
	
	parse = m.parse(new_word)
	parts = parse.split('	')
	if len(parts)>2:
		katakana = katakana+parts[1]+" "
		pass
	pass

print katakana
u = unicode(katakana, "utf-8")
print romkan.to_hepburn(u)

RESULT

default

$ python convert.py
私の名前はボブです。
ワタシ ノ ナ マエ ハ ボ ブ デ ス 。 
watashi no na mae ha bo bu de su 。

with option "kanji-only"

$ python convert.py kanji-only
私の名前はボブです。
ワタシ の ナ マエ は ボ ブ で す 。 
watashi no na mae ha bo bu de su 。
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment