Skip to content

Instantly share code, notes, and snippets.

@maxbellec
Last active January 27, 2017 13:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save maxbellec/de2db7800eafd28dd9deff2f5e010294 to your computer and use it in GitHub Desktop.
Save maxbellec/de2db7800eafd28dd9deff2f5e010294 to your computer and use it in GitHub Desktop.
from gensim.corpora.wikicorpus import WikiCorpus
from gensim.models import TfidfModel
wiki = WikiCorpus.load('wiki.corpus')
tfidf = TfidfModel.load("wiki.gensim.tfidfmodel")
# transform sentence in bow
sentence = "hi my name is"
sentence = wiki.dictionary.doc2bow(sentence.lower().split()) # [(662762, 1), (1271346, 1), (1756375, 1), (1770642, 1)]
# tfidf for that sentence
tfidf[sentence] # [(662762, 0.78124880949077), (1271346, 0.38699681638512373), (1756375, 0.4889516987808046), (1770642, 0.028460464444535528)]
# need the actual words back?
wiki.dictionnary.idtoken[662762]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment