Skip to content

Instantly share code, notes, and snippets.

@bfraiche
Last active April 22, 2020 01:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bfraiche/5e5fb97e0e1b384787d0da2e80cb37e3 to your computer and use it in GitHub Desktop.
Save bfraiche/5e5fb97e0e1b384787d0da2e80cb37e3 to your computer and use it in GitHub Desktop.
This gist contains the complete code for my blogpost: 'Bayesian Machine Learning and NLP with R and sparklyr'
mc$defaultLibrary <- "sparklyr"
library(sparklyr)
library(tidyverse)
speeches <- magpie::sql(mc, "SELECT * FROM presidential_speeches WHERE president")
partitions <- speeches %>%
ft_tokenizer(input_col = 'speech_text', output_col = 'words') %>%
ft_stop_words_remover(input_col = 'words', output_col = 'clean_words') %>%
ft_hashing_tf(input_col = 'clean_words', output_col = 'words_vector', num_features = 2^12) %>%
sdf_random_split(training = 0.7, test = 0.3, seed = 411)
nb_model <- partitions$training %>%
ml_naive_bayes(president ~ words_vector)
pred <- ml_predict(nb_model, partitions$test)
ml_multiclass_classification_evaluator(pred)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment