Created
December 20, 2016 13:26
-
-
Save clausd/1566cb47a03a0517975d7058a17dcd85 to your computer and use it in GitHub Desktop.
How to do LDA in python (for Morten)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
import pandas as pd | |
import lda | |
import lda.datasets | |
from sklearn.feature_extraction.text import CountVectorizer | |
def load_questions(): | |
sheet = pd.read_excel('android_watch.xlsx') | |
Qs = sheet['Question'] | |
return sheet, list(Qs) | |
def featurize(questions, stop_words = None): | |
cv = CountVectorizer(stop_words = stop_words) | |
X = cv.fit_transform(questions) | |
return X, list(cv.vocabulary_) | |
def find_topics(features): | |
model = lda.LDA(n_topics=20, n_iter=500, random_state=1) | |
model.fit(features) | |
return model | |
def report(): | |
s,q = load_questions() | |
feats, vocab = featurize(q) | |
model = find_topics(feats) | |
# list topics | |
n_top_words = 8 | |
topic_word = model.topic_word_ | |
for i, topic_dist in enumerate(topic_word): | |
topic_words = np.array(vocab)[np.argsort(topic_dist)][:-n_top_words:-1] | |
print('Topic {}: {}'.format(i, ' '.join(topic_words))) | |
# list primary topic for each q | |
doc_topic = model.doc_topic_ | |
for i in range(10): | |
print("{} (top topic: {})".format(q[i], doc_topic[i].argmax())) | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Skal rode lidt m stopwords
watch.report()
Topic 0: insert by only daughter cm mute bought
Topic 1: box briefly reminders remote images how dad
Topic 2: discontinued insert subscription cm remote global dad
Topic 3: box subscription images only proceed different discontinued
Topic 4: remote briefly discontinued subscription yes check cm
Topic 5: remote yes briefly personal doesnt subscription strap
Topic 6: food silently cm discontinued difficult batteries require
Topic 7: box remote insert long briefly sure subscription
Topic 8: only briefly desk unclasped measurements difficult cm
Topic 9: only upgrading subscription insert how needs notification
Topic 10: insert discontinued 24 only canoe cm becomes
Topic 11: briefly converter smart difficult connected playback never
Topic 12: insert only 24 measurements battey play yes
Topic 13: difficult silently cell yes example upgrading dad
Topic 14: difficult bought dad discontinued cm insert never
Topic 15: briefly only upgrading discontinued s2 who converter
Topic 16: food briefly mute silently lap caller easily
Topic 17: insert only day receive continue issues by
Topic 18: box reminders connected sincronized devices subscription briefly
Topic 19: insert food discontinued silently upgrading mute converter
What's the range on the watch? does the phone have to be in your pocket? People able to talk or take pictures on this watch like the samsung one? (top topic: 14)
Can you change watch bands (top topic: 19)
Sorry that the 1.45 doesn't have a speaker. Hope the 1.63 won't overpower my smallish wrist! How is the sound quality and how easy to adjust? (top topic: 10)
why are you selling the google g watch at $294 when it is $229 everywhere else? (top topic: 8)
Sorry that the 1.45 doesn't have a speaker. Hope the 1.63 won't overpower my smallish wrist! How is the sound quality and how easy to adjust? (top topic: 10)
Does this come with a charging dock? (top topic: 7)
Can you change watch bands (top topic: 19)
How to choose M or S (top topic: 13)
Can you take answer a call without having to take out your phone? (top topic: 16)
the charger is included? (top topic: 7)