Skip to content

Instantly share code, notes, and snippets.

@ypanagis
Last active July 27, 2016 10:05
Show Gist options
  • Save ypanagis/3ca0a7b92ba4f28561b4dbcef33ae60d to your computer and use it in GitHub Desktop.
Save ypanagis/3ca0a7b92ba4f28561b4dbcef33ae60d to your computer and use it in GitHub Desktop.
Prepare the output of an LDA model for use in LDAVis where corpus is created with the quanteda package
#' Convert the output of a topicmodels Latent Dirichlet Allocation to JSON
#' for use with LDAvis
#'
#' @param fitted Output from a topicmodels \code{LDA} model.
#' @param corp Corpus object used to create the document term.
#' The corpus should be created with the \tm{quanteda} package
#' matrix for the \code{LDA} model. This should have been created with
#' the tm package's \code{Corpus} function.
#' @param doc_term The document term matrix used in the \code{LDA}
#' model. This should have been created with the quanteda package's
#' \code{convert} function, e.g. \code{convert(twdfm, to="topicmodels")}.
#' where \code{twdfm} is the document-feature matrix created by quanteda's
#' function \code{dfm}.
#' The code is adapted from https://gist.github.com/christophergandrud/00e7451c16439421b24a
#'
#' @seealso \link{LDAvis}.
#' @export
topicmodels_json_ldavis <- function(fitted, corp, doc_term){
# Required packages
library(topicmodels)
library(dplyr)
library(stringi)
library(tm)
library(LDAvis)
# Find required quantities
phi <- posterior(fitted)$terms %>% as.matrix
theta <- posterior(fitted)$topics %>% as.matrix
vocab <- colnames(phi)
doc_length <- vector()
for (i in 1:length(corp$documents$texts)) {
temp <- paste(corp$documents$texts[i], collapse = ' ')
doc_length <- c(doc_length, stri_count(temp, regex = '\\S+'))
}
temp_frequency <- inspect(doc_term)
freq_matrix <- data.frame(ST = colnames(temp_frequency),
Freq = colSums(temp_frequency))
rm(temp_frequency)
# Convert to json
json_lda <- LDAvis::createJSON(phi = phi, theta = theta,
vocab = vocab,
doc.length = doc_length,
term.frequency = freq_matrix$Freq)
return(json_lda)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment