Skip to content

Instantly share code, notes, and snippets.

@abhirk
abhirk / classifier_train.py
Last active December 11, 2015 16:49
Classifier object corrupted when compressed with joblib.
from sklearn.linear_model import SGDClassifier
from sklearn.externals import joblib
#load data vectors (vectorized with Tfidf) and target array
def train(documents,target):
vectorizer = TfidfVectorizer(stop_words='english', ngram_range=(1,2),
smooth_idf=True, sublinear_tf=True, max_df=0.5, token_pattern=ur'\b(?!\d)\w\w+\b', use_idf=False)
data_vectors = vectorizer.fit_transform(documents)
@abhirk
abhirk / V
Created October 22, 2012 20:00
profile vectorizer.transform with use_idf=False
In [14]: cProfile.run("vectorizer.transform(input_txt)")
103465 function calls (103458 primitive calls) in 0.128 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.128 0.128 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 base.py:178(asformat)
3 0.000 0.000 0.000 0.000 base.py:51(__init__)
4 0.000 0.000 0.000 0.000 base.py:553(isspmatrix)
@abhirk
abhirk / tfidf
Created October 2, 2012 00:33
Time taken by Tfidf vectorizer.transform
In [34]: cProfile.run("vectorizer.transform(input_txt)")
8676327 function calls (8676325 primitive calls) in 10.875 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 10.875 10.875 <string>:1(<module>)
2 0.000 0.000 0.717 0.359 base.py:178(asformat)
1 0.000 0.000 0.719 0.719 base.py:229(__mul__)
7 0.000 0.000 0.000 0.000 base.py:51(__init__)