Skip to content

Instantly share code, notes, and snippets.

@pragatibaheti
Created April 7, 2020 12:38
Show Gist options
  • Save pragatibaheti/f063ce109182d39b177409856097724c to your computer and use it in GitHub Desktop.
Save pragatibaheti/f063ce109182d39b177409856097724c to your computer and use it in GitHub Desktop.
from nltk.tokenize import word_tokenize
# create bags of words
all_words = []
for message in processed:
words = word_tokenize(message)
for w in words:
all_words.append(w)
#FreqDist : The FreqDist class is used to encode “frequency distributions”, which count the number of times word occurs.
all_words = nltk.FreqDist(all_words)
# use the 1500 most common words as features
word_features = list(all_words.keys())[:1500]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment