How to implement LDA in Spark and get the topic distributions of new documents
import org.apache.spark.rdd._
import org.apache.spark.mllib.clustering.{LDA, DistributedLDAModel, LocalLDAModel}
import org.apache.spark.mllib.linalg.{Vector, Vectors}
import scala.collection.mutable
//create training document set
val input = Seq("this is a document","this could be another document","these are training, not tests", "here is the final file (document)")