Skip to content

Instantly share code, notes, and snippets.

Avatar
😎
Playing around with Big Data!

Maziyar Panahi maziyarpanahi

😎
Playing around with Big Data!
View GitHub Profile
@maziyarpanahi
maziyarpanahi / pubmed-cancer-LDA-results.txt
Last active Oct 22, 2017
Results of LDA over PubMed dataset "Cancer" sub-corpora
View pubmed-cancer-LDA-results.txt
Stanford CoreNLP (Sentence splitter and POS Tagging - extract noun phrases), StopWordsRemover, TF-IDF, word2vec and OnlineLDAOptimizer
==========
Query: "cancer"
Sample: 500K abstracts
Dataset: PubMed
==========
val numTopics: Int = 50
val maxIterations: Int = 100
val vocabSize: Int = 10000
@maziyarpanahi
maziyarpanahi / enwiki-gas-emissions-LDA-results.txt
Last active Jul 3, 2017
The results of Spark LDA ran over English Wikipedia pages (different queries). The topics are sorted by coherence of each topic (Word2Vec).
View enwiki-gas-emissions-LDA-results.txt
Stanford CoreNLP (Sentence splitter and POS Tagging - extract noun phrases), StopWordsRemover, TF-IDF, word2vec and OnlineLDAOptimizer
Query: Global Warming (5000 pages)
==========Parameteres==========
val numTopics: Int = 50
val maxIterations: Int = 100
val vocabSize: Int = 10000
val minDF: Int = 1
val minTF: Int = 1
val maxItems: Int = 15
@maziyarpanahi
maziyarpanahi / enwiki-global-warming-LDA-results.txt
Last active Oct 22, 2017
The results of Spark LDA ran over English Wikipedia pages (different queries). The topics are sorted by coherence of each topic (Word2Vec).
View enwiki-global-warming-LDA-results.txt
====================
Stanford CoreNLP (Sentence splitter and POS Tagging - NN and NNS), StopWordsRemover, TF-IDF, word2vec and OnlineLDAOptimizer
Query: Global Warming (5000 pages)
==========Parameteres==========
val numTopics: Int = 50
val maxIterations: Int = 100
val vocabSize: Int = 10000
val minDF: Int = 10
val minTF: Int = 1