Skip to content

Instantly share code, notes, and snippets.

@florianherrengt
Created April 1, 2019 22:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save florianherrengt/53595cd960a365287f18322a8ecffa32 to your computer and use it in GitHub Desktop.
Save florianherrengt/53595cd960a365287f18322a8ecffa32 to your computer and use it in GitHub Desktop.
How to use cosine similarity to compare 2 strings
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
corpus = [
'This is my first sentence',
'This is my second sentence'
]
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(corpus)
print(cosine_similarity(X.toarray())[0][1]) # 0.8
corpus = [
'This is the same sentence',
'This is the same sentence'
]
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(corpus)
print(cosine_similarity(X.toarray())[0][1]) # 1.0
corpus = [
'Two sentences',
'that are not the same at all'
]
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(corpus)
print(cosine_similarity(X.toarray())[0][1]) # 0.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment