Skip to content

Instantly share code, notes, and snippets.

@mizvol
Created April 21, 2017 10:08
Show Gist options
  • Save mizvol/44cc517d6440eb1d246f41f5e52a3cf1 to your computer and use it in GitHub Desktop.
Save mizvol/44cc517d6440eb1d246f41f5e52a3cf1 to your computer and use it in GitHub Desktop.
topicIndices = ldaModel.describeTopics(maxTermsPerTopic=5)
vocablist = vectorizer.vocabulary
topicsRDD = sc.parallelize(topicIndices)
termsRDD = topicsRDD.map(lambda topic: (zip(itemgetter(*topic[0])(vocablist), topic[1])))
indexedTermsRDD = termsRDD.zipWithIndex()
termsRDD = indexedTermsRDD.flatMap(lambda term: [(t[0], t[1], term[1]) for t in term[0]])
termDF = termsRDD.toDF(['term', 'probability', 'topicId'])
#transform Spark Data Frame to JSON in order to pass the data into D3JS
rawJson = termDF.toJSON().collect()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment