Last active
May 27, 2021 19:31
-
-
Save Renien/9672f174e31b6f96f356da09eb481d2c to your computer and use it in GitHub Desktop.
Jaccard Similarity: The Jaccard similarity of sets is the ratio of the size of the intersection of the sets to the size of the union. This measure of similarity is suitable for many applications, including textual similarity of documents and similarity of buying habits of customers.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
__author__ = 'renienj' | |
import numpy as np | |
def compute_jaccard_similarity_score(x, y): | |
""" | |
Jaccard Similarity J (A,B) = | Intersection (A,B) | / | |
| Union (A,B) | | |
""" | |
intersection_cardinality = len(set(x).intersection(set(y))) | |
union_cardinality = len(set(x).union(set(y))) | |
return intersection_cardinality / float(union_cardinality) | |
if __name__ == "__main__": | |
score = compute_jaccard_similarity_score(np.array([0, 1, 2, 5, 6]), np.array([0, 2, 3, 5, 7, 9])) | |
print "Jaccard Similarity Score : %s" %score | |
pass |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment