Skip to content

Instantly share code, notes, and snippets.

@MaartenGr
Created October 15, 2020 12:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save MaartenGr/335e6f716c3ee80d1ad0d7538fd893f9 to your computer and use it in GitHub Desktop.
Save MaartenGr/335e6f716c3ee80d1ad0d7538fd893f9 to your computer and use it in GitHub Desktop.
# Create bag of words
count_vectorizer = CountVectorizer().fit(docs_per_class.Document)
count = count_vectorizer.transform(docs_per_class.Document)
words = count_vectorizer.get_feature_names()
# Extract top 10 words per class
ctfidf = CTFIDFVectorizer().fit_transform(count, n_samples=len(docs)).toarray()
words_per_class = {newsgroups.target_names[label]: [words[index] for index in ctfidf[label].argsort()[-10:]]
for label in docs_per_class.Class}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment