Skip to content

Instantly share code, notes, and snippets.

@curipha
Last active May 7, 2017 06:28
Show Gist options
  • Save curipha/7af3364c314c1d757148bd93bc41e9ce to your computer and use it in GitHub Desktop.
Save curipha/7af3364c314c1d757148bd93bc41e9ce to your computer and use it in GitHub Desktop.
Ruby script to use Mecab via natto gem
#!/usr/bin/env ruby
require 'natto'
require 'pp'
nm = Natto::MeCab.new
morph = []
while line = $stdin.gets
nm.parse(line) {|n|
next unless n.is_nor? || n.is_unk?
p, pd, _, _, _, _, of = n.feature.split(',')
next if p =~ /^助動?詞|記号$/
next if p == '名詞' && pd == '数'
morph << [
( of == '*' ? n.surface : of ), # Original form
p, # Parse
pd, # Parse (detail)
case true
when n.is_nor?
'通常' # well-known morpheme (= in dictionary word)
when n.is_unk?
'未知語'
end
]
}
end
pp morph.group_by {|i| i}.inject([]) {|a, i| a << [ i[0], i[1].count ] }.sort_by {|v| [v[1]*-1, *v[0]] }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment