Skip to content

Instantly share code, notes, and snippets.

@amundo
Created January 27, 2010 23:40
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save amundo/288282 to your computer and use it in GitHub Desktop.
Save amundo/288282 to your computer and use it in GitHub Desktop.
Super short intro to using cosine similarity in Python
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# see http://www.fileslip.net/news/2010/02/04/language-id-project-the-basic-algorithm/
from math import sqrt
you = {'pennies': 1, 'nickels': 2, 'dimes': 3, 'quarters': 4 }
me = {'pennies': 0, 'nickels': 3, 'dimes': 1, 'quarters': 1 }
abby = {'pennies': 2, 'nickels': 1, 'dimes': 0, 'quarters': 3 }
def scalar(collection):
total = 0
for coin, count in collection.items():
total += count * count
return sqrt(total)
def similarity(A,B): # A and B are coin collections
total = 0
for kind in A: # kind of coin
if kind in B:
total += A[kind] * B[kind]
return float(total) / (scalar(A) * scalar(B))
print "Similarity of your collection and mine: "
print similarity(you, me)
print "Similarity of your collection and Abby's: "
print similarity(you, abby)
print "Similarity of my collection and Abby's: "
print similarity(me, abby)
@benjaryu
Copy link

benjaryu commented Dec 6, 2015

Interesting script, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment