This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from sklearn.linear_model import SGDClassifier | |
| from sklearn.externals import joblib | |
| #load data vectors (vectorized with Tfidf) and target array | |
| def train(documents,target): | |
| vectorizer = TfidfVectorizer(stop_words='english', ngram_range=(1,2), | |
| smooth_idf=True, sublinear_tf=True, max_df=0.5, token_pattern=ur'\b(?!\d)\w\w+\b', use_idf=False) | |
| data_vectors = vectorizer.fit_transform(documents) | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| In [14]: cProfile.run("vectorizer.transform(input_txt)") | |
| 103465 function calls (103458 primitive calls) in 0.128 CPU seconds | |
| Ordered by: standard name | |
| ncalls tottime percall cumtime percall filename:lineno(function) | |
| 1 0.000 0.000 0.128 0.128 <string>:1(<module>) | |
| 1 0.000 0.000 0.000 0.000 base.py:178(asformat) | |
| 3 0.000 0.000 0.000 0.000 base.py:51(__init__) | |
| 4 0.000 0.000 0.000 0.000 base.py:553(isspmatrix) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| In [34]: cProfile.run("vectorizer.transform(input_txt)") | |
| 8676327 function calls (8676325 primitive calls) in 10.875 CPU seconds | |
| Ordered by: standard name | |
| ncalls tottime percall cumtime percall filename:lineno(function) | |
| 1 0.000 0.000 10.875 10.875 <string>:1(<module>) | |
| 2 0.000 0.000 0.717 0.359 base.py:178(asformat) | |
| 1 0.000 0.000 0.719 0.719 base.py:229(__mul__) | |
| 7 0.000 0.000 0.000 0.000 base.py:51(__init__) |