Skip to content

Instantly share code, notes, and snippets.

@clausd
Created December 20, 2016 13:26
Show Gist options
  • Save clausd/1566cb47a03a0517975d7058a17dcd85 to your computer and use it in GitHub Desktop.
Save clausd/1566cb47a03a0517975d7058a17dcd85 to your computer and use it in GitHub Desktop.
How to do LDA in python (for Morten)
import numpy as np
import pandas as pd
import lda
import lda.datasets
from sklearn.feature_extraction.text import CountVectorizer
def load_questions():
sheet = pd.read_excel('android_watch.xlsx')
Qs = sheet['Question']
return sheet, list(Qs)
def featurize(questions, stop_words = None):
cv = CountVectorizer(stop_words = stop_words)
X = cv.fit_transform(questions)
return X, list(cv.vocabulary_)
def find_topics(features):
model = lda.LDA(n_topics=20, n_iter=500, random_state=1)
model.fit(features)
return model
def report():
s,q = load_questions()
feats, vocab = featurize(q)
model = find_topics(feats)
# list topics
n_top_words = 8
topic_word = model.topic_word_
for i, topic_dist in enumerate(topic_word):
topic_words = np.array(vocab)[np.argsort(topic_dist)][:-n_top_words:-1]
print('Topic {}: {}'.format(i, ' '.join(topic_words)))
# list primary topic for each q
doc_topic = model.doc_topic_
for i in range(10):
print("{} (top topic: {})".format(q[i], doc_topic[i].argmax()))
@clausd
Copy link
Author

clausd commented Dec 20, 2016

Skal rode lidt m stopwords

watch.report()
Topic 0: insert by only daughter cm mute bought
Topic 1: box briefly reminders remote images how dad
Topic 2: discontinued insert subscription cm remote global dad
Topic 3: box subscription images only proceed different discontinued
Topic 4: remote briefly discontinued subscription yes check cm
Topic 5: remote yes briefly personal doesnt subscription strap
Topic 6: food silently cm discontinued difficult batteries require
Topic 7: box remote insert long briefly sure subscription
Topic 8: only briefly desk unclasped measurements difficult cm
Topic 9: only upgrading subscription insert how needs notification
Topic 10: insert discontinued 24 only canoe cm becomes
Topic 11: briefly converter smart difficult connected playback never
Topic 12: insert only 24 measurements battey play yes
Topic 13: difficult silently cell yes example upgrading dad
Topic 14: difficult bought dad discontinued cm insert never
Topic 15: briefly only upgrading discontinued s2 who converter
Topic 16: food briefly mute silently lap caller easily
Topic 17: insert only day receive continue issues by
Topic 18: box reminders connected sincronized devices subscription briefly
Topic 19: insert food discontinued silently upgrading mute converter
What's the range on the watch? does the phone have to be in your pocket? People able to talk or take pictures on this watch like the samsung one? (top topic: 14)
Can you change watch bands (top topic: 19)
Sorry that the 1.45 doesn't have a speaker. Hope the 1.63 won't overpower my smallish wrist! How is the sound quality and how easy to adjust? (top topic: 10)
why are you selling the google g watch at $294 when it is $229 everywhere else? (top topic: 8)
Sorry that the 1.45 doesn't have a speaker. Hope the 1.63 won't overpower my smallish wrist! How is the sound quality and how easy to adjust? (top topic: 10)
Does this come with a charging dock? (top topic: 7)
Can you change watch bands (top topic: 19)
How to choose M or S (top topic: 13)
Can you take answer a call without having to take out your phone? (top topic: 16)
the charger is included? (top topic: 7)

@clausd
Copy link
Author

clausd commented Dec 20, 2016

Koden scoret fra https://ariddell.org/lda.html (og ikke helt sikker på om alt er som det skal være)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment