Skip to content

Instantly share code, notes, and snippets.

@ybenjo
Created April 24, 2015 00:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ybenjo/1685da27d9f6dcdcfbd9 to your computer and use it in GitHub Desktop.
Save ybenjo/1685da27d9f6dcdcfbd9 to your computer and use it in GitHub Desktop.
markov chain table
# -*- coding: utf-8 -*-
require 'natto'
require 'msgpack'
require 'mongo'
texts = ['shirobako★が3巻が見たい', '大久保瑠美さんに会いたい']
trans_freq = Hash.new{|h, k| h[k] = Hash.new{0.0}}
texts.each do |txt|
natto = Natto::MeCab.new
ary = ['***BOS***']
natto.parse(txt) do |n|
w = n.surface
feature = n.feature.split(',')
type_1 = feature[0]
type_2 = feature[1]
next if type_1 == '記号'
next if type_2 == '数'
p w
ary.push w
end
ary.each_cons(2) do |elem|
f, t = elem
trans_freq[f][t] += 1
end
end
# convert transition probability
trans_prob = Hash.new{|h, k| h[k] = Hash.new{0.0}}
trans_freq.each_pair do |f, h|
sum = h.values.inject(:+)
h.each_pair do |t, v|
trans_prob[f][t] = v / sum
end
end
# save trans_prob
open('./anime_title_prob.dump', 'wb'){|f|
f.puts trans_prob.to_msgpack
}
# load trans_prob
t = { }
open('./anime_title_prob.dump', 'rb'){|f|
t = MessagePack.unpack(f.read.chomp)
}
p t
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment