Skip to content

Instantly share code, notes, and snippets.

@tabachain
Created November 26, 2014 05:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tabachain/98096ac406807e2fde47 to your computer and use it in GitHub Desktop.
Save tabachain/98096ac406807e2fde47 to your computer and use it in GitHub Desktop.
Rubyによるコサイン類似度の計算
def calc_score(str1,str2)
vector = []
vector1 = []
vector2 = []
frag_vector1 = []
frag_vector2 = []
node1 = MeCab::Tagger.new.parseToNode(str1)
node2 = MeCab::Tagger.new.parseToNode(str2)
while node1
vector1.push(node1.surface)
node1 = node1.next
end
while node2
vector2.push(node2.surface)
node2 = node2.next
end
vector += vector1
vector += vector2
vector.uniq!.delete("")
vector1.delete("")
vector.delete("")
vector2.delete("")
vector.each do |word|
if vector1.include?(word) then
frag_vector1.push(1)
else
frag_vector1.push(0)
end
if vector2.include?(word) then
frag_vector2.push(1)
else
frag_vector2.push(0)
end
end
vector1_final = Vector.elements(frag_vector1, copy = true)
vector2_final = Vector.elements(frag_vector2, copy = true)
return vector2_final.inner_product(vector1_final)/(vector1_final.norm() * vector2_final.norm())
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment