Skip to content

Instantly share code, notes, and snippets.

@ebisawa
Created August 31, 2010 07:01
Show Gist options
  • Save ebisawa/558665 to your computer and use it in GitHub Desktop.
Save ebisawa/558665 to your computer and use it in GitHub Desktop.
cosine similarity
def vectorize(msg)
vector = {}
words = msg.split(/\s+/)
words.each do |w|
h = w.hash
vector[h] = 1 if vector[h] == nil
vector[h] += 1
end
vector
end
def cosine_similarity(vec1, vec2)
inner_product = 0.0
vec1.keys.each do |k|
if vec2[k] != nil
inner_product += vec1[k] * vec2[k]
end
end
norm1 = 0.0
vec1.values.each {|v| norm1 += v ** 2 }
norm1 = Math.sqrt(norm1)
norm2 = 0.0
vec2.values.each {|v| norm2 += v ** 2 }
norm2 = Math.sqrt(norm2)
inner_product / (norm1 * norm2)
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment