Skip to content

Instantly share code, notes, and snippets.

@a-paxton
Forked from christophergandrud/topicmodels_json_ldavis.R
Last active November 15, 2015 08:43
Show Gist options
  • Save a-paxton/a1609f5f772b642027d4 to your computer and use it in GitHub Desktop.
Save a-paxton/a1609f5f772b642027d4 to your computer and use it in GitHub Desktop.
Convert the output of a topicmodels Latent Dirichlet Allocation model to JSON for use with LDAvis
#' Convert the output of a topicmodels Latent Dirichlet Allocation to JSON
#' for use with LDAvis
#'
#' @param fitted Output from a topicmodels \code{LDA} model.
#' @param corpus Corpus object used to create the document term
#' matrix for the \code{LDA} model. This should have been create with
#' the tm package's \code{Corpus} function.
#' @param doc_term The document term matrix used in the \code{LDA}
#' model. This should have been created with the tm package's
#' \code{DocumentTermMatrix} function.
#'
#' @seealso \link{LDAvis}.
#' @export
topicmodels_json_ldavis <- function(fitted, corpus, doc_term){
# Required packages
library(topicmodels)
library(dplyr)
library(stringi)
library(tm)
library(LDAvis)
# Find required quantities
phi <- posterior(fitted)$terms %>% as.matrix
theta <- posterior(fitted)$topics %>% as.matrix
vocab <- colnames(phi)
doc_length <- vector()
for (i in 1:length(corpus)) {
temp <- paste(corpus[[i]]$content, collapse = ' ')
doc_length <- c(doc_length, stri_count(temp, regex = '\\S+'))
}
temp_frequency <- as.data.frame(as.matrix(doc_term)) # elegant, silenced solution adapted from http://stackoverflow.com/a/18749888/5514568
freq_matrix <- data.frame(ST = colnames(temp_frequency),
Freq = colSums(temp_frequency))
rm(temp_frequency)
# Convert to json
json_lda <- LDAvis::createJSON(phi = phi, theta = theta,
vocab = vocab,
doc.length = doc_length,
term.frequency = freq_matrix$Freq)
return(json_lda)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment