Skip to content

Instantly share code, notes, and snippets.

@sahil280114
Created August 15, 2018 18:44
Show Gist options
  • Save sahil280114/d579a56fcdee78834f40c5e8c5acbd13 to your computer and use it in GitHub Desktop.
Save sahil280114/d579a56fcdee78834f40c5e8c5acbd13 to your computer and use it in GitHub Desktop.
def features(sentence):
stop_words = stopwords.words('english') + list(punctuation)
words = word_tokenize(sentence)
words = [w.lower() for w in words]
filtered = [w for w in words if w not in stop_words and not w.isdigit()]
words = {}
for word in filtered:
if word in words:
words[word] += 1.0
else:
words[word] = 1.0
return words
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment