Skip to content

Instantly share code, notes, and snippets.

  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save fasiha/53797924cafb5dbca69eeadcebe76312 to your computer and use it in GitHub Desktop.
Add to mecab dictionary
# Put new words in a CSV with this format
# 表層形,左文脈ID,右文脈ID,コスト,品詞,品詞細分類1,品詞細分類2,品詞細分類3,活用形,活用型,原形,読み,発音
# surface_form,left_context_id,right_context_id,cost,part_of_speech,pos_division_1,pos_division_2,pos_division_3,inflection_type,inflection_style,lemma,reading,pronunciation
$ echo "fasihsignal,-1,-1,100,名詞,一般,*,*,*,*,fasihsignal,ファシシグナル,ファシシグナル" > a.csv
# Then use mecab-dict-index to compile the csv into a .dic file, based on an existing mecab dictionary file
$ /usr/local/Cellar/mecab/0.996/libexec/mecab/mecab-dict-index -d/usr/local/Cellar/mecab/0.996/lib/mecab/dic/ipadic/ -u a.dic -f utf8 -t utf8 a.csv
# The use it
$ mecab -ua.dic
fasihsignal
fasihsignal 名詞,一般,*,*,*,*,fasihsignal,ファシシグナル,ファシシグナル
EOS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment