Skip to content

Instantly share code, notes, and snippets.

View chyikwei's full-sized avatar

Chyi-Kwei Yau chyikwei

View GitHub Profile
presto> explain (type distributed, format json) select * from example.example.numbers;
Query Plan
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[ {
"id" : "5",
"name" : "Output",
"identifier" : "[text, value]",
"details" : "",
"children" : [ {
"id" : "63",
import nltk
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
def print_top_words(model, feature_names, n_top_words):
for topic_idx, topic in enumerate(model.components_):
message = "Topic #%d: " % topic_idx
message += " ".join([feature_names[i] + " (" + str(round(topic[i], 2)) + ")"
for i in topic.argsort()[:-n_top_words - 1:-1]])
sudo apt-get update
sudo apt install htop
wget http://us.download.nvidia.com/tesla/375.51/nvidia-driver-local-repo-ubuntu1604_375.51-1_amd64.deb
sudo dpkg -i nvidia-driver-local-repo-ubuntu1604_375.51-1_amd64.deb
sudo apt-get -y install cuda-drivers
sudo reboot
sudo apt-get install python-pip python-dev build-essential
sudo pip install virtualenv
sudo pip install --upgrade pip setuptools
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@chyikwei
chyikwei / online_lda.py
Last active August 29, 2015 14:21
pep8 for online LDA code
chyikwei@:~/github/scikit-learn (onlineldavb)$ pep8 sklearn/decomposition/online_lda.py
sklearn/decomposition/online_lda.py:54:80: E501 line too long (81 > 79 characters)
sklearn/decomposition/online_lda.py:73:80: E501 line too long (84 > 79 characters)
sklearn/decomposition/online_lda.py:149:80: E501 line too long (80 > 79 characters)
sklearn/decomposition/online_lda.py:157:80: E501 line too long (84 > 79 characters)
sklearn/decomposition/online_lda.py:168:80: E501 line too long (89 > 79 characters)
sklearn/decomposition/online_lda.py:172:80: E501 line too long (93 > 79 characters)
sklearn/decomposition/online_lda.py:174:80: E501 line too long (90 > 79 characters)
sklearn/decomposition/online_lda.py:175:80: E501 line too long (94 > 79 characters)
sklearn/decomposition/online_lda.py:176:80: E501 line too long (95 > 79 characters)
@chyikwei
chyikwei / gist:34b97d4d443a0cc38a2f
Last active August 29, 2015 14:13
lda performance compare
from time import time
import logging
import numpy as np
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
from gensim.matutils import Sparse2Corpus
#from gensim.models.ldamodel import LdaModel
from gensim.models.ldamulticore import LdaMulticore
@chyikwei
chyikwei / 01_original
Last active August 29, 2015 14:07
LDA cython profiling
File: lda.py
Function: _dirichlet_expectation at line 24
Total time: 8.96912 s
Line # Hits Time Per Hit % Time Line Contents
==============================================================
24 @profile
25 def _dirichlet_expectation(alpha):
26 """
27 For a vector theta ~ Dir(alpha), computes E[log(theta)] given alpha.
@chyikwei
chyikwei / _em_step function
Created October 14, 2014 00:07
online LDA profiling
File: lda.py
Function: _em_step at line 233
Total time: 144.169 s
Line # Hits Time Per Hit % Time Line Contents
==============================================================
233 @profile
234 def _em_step(self, X, batch_update):
235 """
236 EM update for 1 iteration
@chyikwei
chyikwei / kdd_model_2.py
Created July 15, 2014 15:21
kdd model 2
import pandas as pd
import numpy as np
from sklearn import metrics
from sklearn import cross_validation
# models
from sklearn import linear_model
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import RandomForestClassifier
@chyikwei
chyikwei / kdd_model.py
Last active August 29, 2015 14:03
kdd script
import pandas as pd
import numpy as np
from sklearn import metrics
from sklearn import cross_validation
# models
from sklearn import linear_model
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import RandomForestClassifier