Skip to content

Instantly share code, notes, and snippets.

@ugik
Created January 11, 2017 17:44
Show Gist options
  • Save ugik/c569abc9e0ade1f71fe2ae0830be79b1 to your computer and use it in GitHub Desktop.
Save ugik/c569abc9e0ade1f71fe2ae0830be79b1 to your computer and use it in GitHub Desktop.
part 5 revision
# calculate a score for a given class taking into account word commonality
def calculate_class_score(sentence, class_name, show_details=True):
score = 0
# tokenize each word in our new sentence
for word in nltk.word_tokenize(sentence):
# check to see if the stem of the word is in any of our classes
if stemmer.stem(word.lower()) in class_words[class_name]:
# treat each word with relative weight
score += (1 / corpus_words[stemmer.stem(word.lower())])
if show_details:
print (" match: %s (%s)" % (stemmer.stem(word.lower()), 1 / corpus_words[stemmer.stem(word.lower())]))
return score
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment