Skip to content

Instantly share code, notes, and snippets.

@amityo9
amityo9 / LDAspark.md
Created September 24, 2016 16:29 — forked from alex9311/LDAspark.md
How to implement LDA in Spark and get the topic distributions of new documents

How to implement LDA in Spark and get the topic distributions of new documents

import org.apache.spark.rdd._
import org.apache.spark.mllib.clustering.{LDA, DistributedLDAModel, LocalLDAModel}
import org.apache.spark.mllib.linalg.{Vector, Vectors}
import scala.collection.mutable

//create training document set
val input = Seq("this is a document","this could be another document","these are training, not tests", "here is the final file (document)")
@amityo9
amityo9 / LDA_SparkDocs
Created June 15, 2016 17:46 — forked from jkbradley/LDA_SparkDocs
LDA Example: Modeling topics in the Spark documentation
/*
This example uses Scala. Please see the MLlib documentation for a Java example.
Try running this code in the Spark shell. It may produce different topics each time (since LDA includes some randomization), but it should give topics similar to those listed above.
This example is paired with a blog post on LDA in Spark: http://databricks.com/blog
Spark: http://spark.apache.org/
*/
import scala.collection.mutable