Skip to content

Instantly share code, notes, and snippets.

@4OH4
Created March 29, 2020 09:36
Show Gist options
  • Save 4OH4/bea12f336b154e25ac33acbc76cd5f0a to your computer and use it in GitHub Desktop.
Save 4OH4/bea12f336b154e25ac33acbc76cd5f0a to your computer and use it in GitHub Desktop.
Basic TF-idf model using Scikit-learn
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel
search_terms = 'fruit and vegetables'
documents = ['cars drive on the road', 'tomatoes are actually fruit']
doc_vectors = TfidfVectorizer().fit_transform([search_terms] + documents)
cosine_similarities = linear_kernel(doc_vectors[0:1], doc_vectors).flatten()
document_scores = [item.item() for item in cosine_similarities[1:]]
# [0.0, 0.190]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment