Skip to content

Instantly share code, notes, and snippets.

@makispl
Created April 11, 2020 13:44
Show Gist options
  • Save makispl/74d85813baa773ffe5def1011d74b961 to your computer and use it in GitHub Desktop.
Save makispl/74d85813baa773ffe5def1011d74b961 to your computer and use it in GitHub Desktop.
# Create two dictionaries that match each unique word with the respective probability value.
parameters_spam = {unique_word: 0 for unique_word in vocabulary}
parameters_ham = {unique_word: 0 for unique_word in vocabulary}
# Iterate over the vocabulary and for each word, calculate P(wi|Spam) and P(wi|Ham)
for unique_word in vocabulary:
p_unique_word_spam = (spam_df[unique_word].sum() + alpha) / (n_spam + alpha * n_vocabulary)
p_unique_word_ham = (ham_df[unique_word].sum() + alpha) / (n_ham + alpha * n_vocabulary)
# Update the calculated propabilities to the dictionaries
parameters_spam[unique_word] = p_unique_word_spam
parameters_ham[unique_word] = p_unique_word_ham
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment