amityo9

## LDAspark.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                amityo9
                / LDAspark.md
            
            
              Created
              September 24, 2016 16:29
                — forked from alex9311/LDAspark.md
            
              
                How to implement LDA in Spark and get the topic distributions of new documents
              
          
    How to implement LDA in Spark and get the topic distributions of new documents
import org.apache.spark.rdd._
import org.apache.spark.mllib.clustering.{LDA, DistributedLDAModel, LocalLDAModel}
import org.apache.spark.mllib.linalg.{Vector, Vectors}
import scala.collection.mutable

//create training document set
val input = Seq("this is a document","this could be another document","these are training, not tests", "here is the final file (document)")

  
## LDA_SparkDocs
/*
This example uses Scala.  Please see the MLlib documentation for a Java example.

Try running this code in the Spark shell.  It may produce different topics each time (since LDA includes some randomization), but it should give topics similar to those listed above.

This example is paired with a blog post on LDA in Spark: http://databricks.com/blog
Spark: http://spark.apache.org/
*/

import scala.collection.mutable
	/*
	This example uses Scala. Please see the MLlib documentation for a Java example.

	Try running this code in the Spark shell. It may produce different topics each time (since LDA includes some randomization), but it should give topics similar to those listed above.

	This example is paired with a blog post on LDA in Spark: http://databricks.com/blog
	Spark: http://spark.apache.org/
	*/

	import scala.collection.mutable