Skip to content

Instantly share code, notes, and snippets.

@pprett
Created November 26, 2012 20:48
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save pprett/4150519 to your computer and use it in GitHub Desktop.
Save pprett/4150519 to your computer and use it in GitHub Desktop.
Benchmark sklearn's SGDClassifier on RCV1-ccat dataset.
"""
Benchmark sklearn's SGDClassifier on RCV1-ccat dataset.
So generate the input files see http://leon.bottou.org/projects/sgd .
Results
-------
ACC: 0.9479
AUC: 0.9476
3 loops, best of 1: 1.21 s per loop
"""
import svmlight_loader
from sklearn.linear_model import SGDClassifier
from sklearn.utils import shuffle
from sklearn import metrics
X, y = svmlight_loader.load_svmlight_file('../../corpora/rcv1-ccat/train.dat', buffer_mb=500)
X_test, y_test = svmlight_loader.load_svmlight_file('../../corpora/rcv1-ccat/test.dat', n_features=X.shape[1], buffer_mb=500)
X_train, y_train = shuffle(X, y, random_state=0)
del X
del y
clf = SGDClassifier(n_iter=5, alpha=0.00001)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print "ACC: %.4f" % metrics.zero_one_score(y_test, y_pred)
print "AUC: %.4f" % metrics.auc_score(y_test, y_pred)
print "%timeit clf.fit(X_train, y_train)"
print "%timeit clf.score(X_test, y_test)"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment