Skip to content

Instantly share code, notes, and snippets.

@denis-bz
Last active November 13, 2017 11:50
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save denis-bz/d7e4d1e124f9f3dccf8abb2c1bce66ba to your computer and use it in GitHub Desktop.
Save denis-bz/d7e4d1e124f9f3dccf8abb2c1bce66ba to your computer and use it in GitHub Desktop.
Compare scikit-learn KNN rbf poly2 on MNIST digits

Compare sklearn KNN rbf poly2 on MNIST digits

Purpose: compare 4 scikit-learn classifiers on a venerable test case, the MNIST database of 70000 handwritten digits, 28 x 28 pixels.

Keywords: classification, benchmark, MNIST, KNN, SVM, scikit-learn, python

knn-mismatch-10

Accuracy %, run times

KNN          av 97.6 %  [97.5 97.5 97.7]  train, test: [0 0 0] [11 11 11] sec 
rbf-SVM      av 98.4 %  [98.4 98.5 98.4]  train, test: [502 503 506] [113 113 113] sec 
poly2-SVM    av 98.4 %  [98.4 98.5 98.4]  train, test: [275 274 276] [85 84 85] sec 
Random-Forest av 97.0 % [96.8 97 97.1]    train, test: [283 284 282] [2 2 2] sec 

Notes

compare_classifiers_mnist.py and classifiers.py are modified from the nice
http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html .
The logfile gamma3-C1-50000-10000.log is below.

As you see, KNeighborsClassifier( algorithm="brute" ) is really fast. On Macs, it uses the Accelerate Framework to do dot( 50k x 768, 10k x 768 ) on 4 cores in parallel in ~ 10 seconds. (Bigger sizes can be done in blocks.) It's also by far the simplest method: easy to understand, easy to implement from scratch if need be.

When in doubt, use brute force.
-- Ken Thompson

KNN needs no tuning, beyond the number of nearest neighbors; 3 is good for MNIST. Looking at the 3 nearest digits in the picture above -- 0 0 9, 4 9 9 ... -- can give some insight into why mismatches.

SVMs score a bit better than KNN, but poly2 is 20 times slower and RBF 50 times slower.
I have my doubts about their parsimony:

train() --> how many coefficients --> predict() ?

Linear SVMs generate ncluster * dim coefficients, here 10 * 768. For this run, RBF has 14374 x 784 support_vectors_ -- huge, not plottable, not understandable. Experts please comment.

Neural network classifiers

http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html#4d4e495354 lists 50 papers on (mostly) neural net classifiers of the MNIST digits, with error rates 0.21 % .. mostly < 1 %. I don't know which of these are reproducible (with code, log files, run times on the web), nor how long their code is (over and above numpy and scikit-learn).

Preprocessing raw pixels to high-level features -- "shapes" or "strokes" -- is of crucial importance for any classifier. I don't know what preprocessing these 50 do, either; KNN etc. here do none.

RBFs

A way to tune the parameter gamma in exp( - gamma |Xi - Xj|^2 ):
first scale dist so that median |Xi - Xj| ~ 1, half the neighbors < 1 away and half > 1. What exp( - gamma dist^2 ) does is down-weight more distant neighbors -- near 0 is not "seen". With gamma = 3 we down-weight by

dist:                   [0    .5   1  1.5  2]
						---------------------
exp( -   3 * dist^2 ):  [100  47   5   0   0]  %  -- Gaussian

so half the neighbors are down-weighted to 5 % or less. In general,

  • look at quantiles of distance^2 of your data, scipy.spatial.distance.pdist ** 2 | np.percentile
  • print a little table like the above, with various gammas
  • choose gamma to down-weight __ % of the data to __ %, e.g. 50 % down to 5 %.

(KNeighborsClassifier( weights="distance" ) uses 1 / dist, which decays much more slowly than RBF. weights can be a callable; I have not played with that.)

Quick is beautiful

To speed up the try-it-and-see loop, start your programs like

land = 1
sea = 2
...
	# to change these params in sh or ipython, run this.py  a=1  b=None  c=\"str\" ...
for arg in sys.argv[1:]:
	exec( arg )
...
# print all params

This works well from shell scripts. For example,

for gamma in 1 2 3
do
	python my.py  gamma=$gamma  "$@"  | tee gamma$gamma.log
done

There are many other ways of doing grid search:

Do the simplest thing that works, then stop.

Links

MNIST database: https://en.wikipedia.org/wiki/MNIST_database

KNeighborsClassifier: http://scikit-learn.org/stable/modules/neighbors.html
SVC (libsvm): http://scikit-learn.org/stable/modules/svm.html#svm-classification
Random-Forest: http://scikit-learn.org/stable/modules/ensemble.html

https://github.com/scikit-learn/scikit-learn/tree/master/benchmarks/bench_mnist.py

Josh Montague: https://github.com/jrmontag/mnist-sklearn -- 5 * shifts, kdtree

Comments are welcome, real test cases most welcome.

cheers
-- denis

Last change: 2016-08-10 august

#!/usr/bin/env python
# from http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html
from __future__ import division
import re
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC, LinearSVC # libsvm, liblinear
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.linear_model import LogisticRegression, SGDClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
#...............................................................................
def classifiers( only="knn|poly|rbf|random-forest",
knnear=3, C=1, gamma=3, seed=0, **kwignored ):
""" -> [ [name, classifier] ... ] with names starting knn|poly|... e.g.
[ ["KNN", KNeighborsClassifier(...)],
["rbf-SVM", SVC(...)],
... ]
only= "" or "*": all
"""
all_classifiers = [
["KNN",
# $sklearn/neighbors/classification.py $sklearn/neighbors/base.py
# http://scikit-learn.org/stable/modules/neighbors.html
KNeighborsClassifier(
algorithm="brute",
weights="distance", # 1/dist
# default uniform, nnear 3: a b b => b
n_neighbors=knnear, n_jobs=1
)],
# $sklearn/svm/classes.py
# http://scikit-learn.org/stable/modules/svm.html#svm-classification
# The implementation is based on libsvm. The fit time complexity
# is more than quadratic with the number of samples
# The multiclass support is handled according to a one-vs-one scheme.
["rbf-SVM",
SVC( kernel="rbf",
C=C,
gamma=gamma,
cache_size=2000, # M
decision_function_shape="ovr", # one vs rest == default ovo ?
random_state=seed,
)],
["poly2-SVM", # on mnist ~ as good as rbf, twice as fast
SVC( kernel="poly", degree=2, coef0=0,
C=C,
gamma=gamma,
cache_size=2000, # M
decision_function_shape="ovr", # one vs rest
random_state=seed,
)],
["Random-Forest",
# $sklearn/ensemble/forest.py
# http://scikit-learn.org/stable/modules/ensemble.html
RandomForestClassifier(
n_estimators=500, max_features="auto",
random_state=seed, n_jobs=1
)],
["libsvm-linear", SVC( kernel="linear", # ~ 94
C=C,
cache_size=2000, # M
decision_function_shape="ovr", # ? coef_ 45 x 784
random_state=seed,
)],
["liblinear", LinearSVC( # ~ 91
C=C,
multi_class='ovr', # 'crammer_singer': 45 pairs
dual=False,
random_state=seed,
)],
["SGD", SGDClassifier( n_iter=10, random_state=seed # ~ 91 %, 5 sec
)],
# ["Logistic-Regression", # ~ 91
# LogisticRegression( C=1, solver="lbfgs", n_jobs=1
# )],
# ["Linear-Discriminant-Analysis", LinearDiscriminantAnalysis()],
# ["Naive-Bayes", GaussianNB()],
# ["AdaBoost", AdaBoostClassifier()],
# ["Decision-Tree", DecisionTreeClassifier( )], # max_depth=5
# ["Quadratic-Discriminant-Analysis", QuadraticDiscriminantAnalysis()],
]
if only in ("", "*"):
return all_classifiers
return [c for c in all_classifiers
if re.match( only, c[0], re.IGNORECASE )]
#...............................................................................
if __name__ == "__main__":
import inspect
for (classifname, classif) in classifiers( only="*" ):
f = inspect.getfile( classif.__class__ )[:-1]
print "%s: %s \n%s \n" % (
classifname, f, classif )
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# from http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html
# on https://en.wikipedia.org/wiki/MNIST_database
"""
=====================
Classifier comparison
=====================
A comparison of a several classifiers in scikit-learn on MNIST digits.
The point of this example is to illustrate the nature of decision boundaries
of different classifiers.
This should be taken with a grain of salt, as the intuition conveyed by
these examples does not necessarily carry over to real datasets.
Particularly in high-dimensional spaces, data can more easily be separated
linearly and the simplicity of classifiers such as naive Bayes and linear SVMs
might lead to better generalization than is achieved by other classifiers.
"""
# mnist, no plots
# Code source: Gaël Varoquaux
# Andreas Müller
# Modified for documentation by Jaques Grobler
# License: BSD 3 clause
from __future__ import division
import sys
import numpy as np
import sklearn
from sklearn.cross_validation import train_test_split
from sklearn.datasets.base import Bunch
from sklearn.preprocessing import normalize, scale
from classifiers import classifiers
import confus
import etcutil as nu
import mnist
__version__ = "2016-07-18 july denis-bz-py t-online.de"
np.set_printoptions( threshold=20, edgeitems=14, linewidth=140,
formatter = dict( float = lambda x: "%.3g" % x ))
print "\n", 80 * "-"
print "python", " ".join(sys.argv)
print "versions: sklearn %s numpy %s python %s " % (
sklearn.__version__, np.__version__, sys.version.split()[0] )
#-------------------------------------------------------------------------------
# outline:
# 1 parameters
# 2 load, subset, normalize the data
# 3 classifiers = [["rbf-SVM", SVC()] ... ] that match "only"
# 4 for classif:
# train_test_split, classif .fit .predict, save
#-------------------------------------------------------------------------------
ntrain = 50000
ntest = 10000
digits = [] # [] all / [4,9]
only = "knn|poly|rbf|random-forest" # only these classifiers
nsplit = 3 # iter split train test, fit, predict
# params for various classifiers --
knnear = 3
C = 1
gamma = 3 # exp( - gamma * dist^2 ): exp( -3 ) = .05
tag = "tmp"
save = 1 # > tag.npz
seed = 0
# to change these params in sh or ipython, run this.py a=1 b=None c=\"str\" ...
for arg in sys.argv[1:]:
exec( arg )
np.random.seed( seed )
#...........................................................................
bag = mnist.load_mnist( ntrain + ntest, dtype=np.float32 )
X = bag.data
y = bag.target
if digits:
X, y = nu.xy_2class( X, y, *digits ) # e.g. [4,9] only
print "rows /= |row| (cos distance), -= mean"
X = normalize( X )
X -= X.mean( axis=0 ) # for svm -- properly Xtrain Xtest
# X = nu.div0( X, X.std() ) # ? knn a bit worse, svm grid gamma
params = """
X %s
ntrain %d
ntest %d
digits %s
y counts %s
knnear %d
C %.3g
gamma %.3g
nsplit %d
seed %d
run_date %s
""" % ( nu.asum(X), ntrain, ntest, digits, np.bincount(y),
knnear, C, gamma, nsplit, seed,
nu.isoday() )
print "\nparams --", params
#...............................................................................
def classify_splits( classifname, classif, X, y, nsplit=1 ):
""" train_test_split, classif .fit .predict, save
for nsplit splits
-> [ Bag( ... ) for each split ]
"""
print "\n{ classify with %s --" % classifname
classifstr = str(classif)
print classifstr
np.random.seed( seed ) # just in case
classifstep = classif
# classifstep = classif.steps[1][1] # grr in pipe
bags = []
testfraction = ntest / (ntrain + ntest)
scores = []
trainsecs = []
testsecs = []
for jsplit in range(nsplit): # ~ kfold
print "\n%s split %d --" % (classifname, jsplit)
Xtrain, Xtest, ytrain, ytest = \
train_test_split( X, y, test_size=testfraction, random_state=jsplit )
# $etc/traintestsplit.py + jtrain jtest -> Xall yall
print "ytrain class sizes:", np.bincount( ytrain )
print "ytest class sizes: ", np.bincount( ytest )
nu.ptime()
#.......................................................................
classif.fit( Xtrain, ytrain )
trainsec = nu.ptime( "train " + classifname )
ypred = classif.predict( Xtest )
score = (ypred == ytest) .mean() * 100
# Proportion classified correctly is an improper scoring rule, "Brier score" ?
testsec = nu.ptime( "test %s: ypred == ytest %.1f %%" % (classifname, score) )
scores += [score]
trainsecs += [trainsec]
testsecs += [testsec]
confusmat = confus.pconfus( ytest, ypred, label=classifname ) # print
b = Bunch(
classif = classifname,
classifstr = classifstr,
confusmat = confusmat,
ntrain = ntrain,
ntest = ntest,
params = params,
random_state = jsplit,
score = score,
sec = [int(trainsec), int(testsec)],
ypred = ypred,
ytest = ytest,
)
if save: # to plot etc.
out = "%s-%s-split%d.npz" % ( # grr
tag, classifname, jsplit )
print "saving to %s \n" % out
np.savez( out, **b )
bags.append( b )
scores = np.array( scores )
trainsecs = nu.ints( trainsecs )
testsecs = nu.ints( testsecs )
print "scores: %-12s av %.1f %s train, test: %s %s sec " % (
classifname, scores.mean(), scores, trainsecs, testsecs )
# how many coefs are needed to predict() ? rbf-svm ~ 1/3 ntest
for attr in "coef_ dual_coef_ support_vectors_ feature_importances_ ".split():
if hasattr( classifstep, attr ):
print "%s : %s" % (
attr, nu.asum( getattr( classifstep, attr )))
print "}\n"
return bags
#...........................................................................
for (classifname, classif) in classifiers( only,
knnear=knnear, C=C, gamma=gamma, seed=seed ):
bags = classify_splits( classifname, classif, X, y, nsplit=nsplit )
#!/usr/bin/env python
from __future__ import division
import numpy as np
#...............................................................................
def pconfus( true, est, verbose=1, nmaxconfus=5, label="" ):
""" print confusion matrix """
# plot, interactive ? Hastie p. 118 vowel picture
true = np.squeeze(true)
est = np.squeeze(est)
n = len(true)
assert n == len(est), "len(true) != len(est): shapes %s %s" % (
true.shape, est.shape )
truemin = np.nanmin(true)
estmin = np.nanmin(est)
if truemin != 0:
true = true - truemin
if estmin != 0:
est = est - estmin
truemax = np.nanmax(true)
estmax = np.nanmax(est)
if truemax != estmax:
print "warning: pconfus true max %.3g != est max %.3g" % (
truemax, estmax )
nclass = int( max( truemax, estmax )) + 1
confus = np.zeros( (nclass,nclass), int )
for t, e in zip( true, est ):
if not np.isnan(t) and not np.isnan(e):
confus[ int(t), int(e) ] += 1
if not verbose:
return confus
diag = confus.diagonal()
dsum = diag.sum()
csum = confus.sum()
correct = dsum / csum * 100
# permuted > diag ? cf hungarian.py
#...............................................................................
print "\nConfusion matrix: %.1f %% correct = %d / %d %s" % (
correct, dsum, csum, label )
print "True classes down, estimated across | wrong | total class sizes"
print " ", 5*nclass*"-"
for j, row in enumerate(confus):
rowsum = row.sum()
if rowsum > 0:
print "%2d: %s | %4d | %4d " % (
j, _astr( row, fmt="%4.0d" ), rowsum - diag[j], rowsum )
print " ", 5*nclass*"-"
estsize = confus.sum(axis=0) - diag
print " %s | %4d wrong predictions " % (
_astr( estsize, fmt="%4d" ), csum - dsum )
if nmaxconfus > 0:
print "\nmost confusable, true est: count -- " ,
A0 = confus.copy()
np.fill_diagonal( A0, -1 )
for a, jk in zip( *_maxfewat( A0, nmaxconfus, ge=-1 )):
if a <= 0: break
print "%d %d: %d " % (jk[0], jk[1], a) ,
print label
print "\n"
return confus
#...............................................................................
def _astr( x, fmt="%g" ):
""" scalar / vec / 2d array -> join( fmt % xj ) """
if x is None or isinstance( x, basestring ):
return x
if np.isscalar(x):
if np.fabs(x) < 1e-10:
x = 0
return fmt % x
x = np.asanyarray(x)
if x.ndim == 0: # asarray(3)
x = x.item()
assert np.isscalar(x), x
return _astr( x, fmt )
if x.ndim == 1:
return " ".join([ _astr(xx, fmt ) for xx in x ])
else:
assert x.ndim == 2, x.shape
return "[ %s ]" % "\n".join([ _astr( row, fmt ) for row in x ])
def _maxfewat( A, max=10, ge=None ):
""" -> Amax[], jkmax[ max, 2 ] """
A = np.asanyarray(A)
if ge is None:
ge = A.mean() # maybe too few
nz = np.nonzero( A >= ge ) # tuple ndim ([j ...], [k ...])
if A.ndim == 1:
nz = nz[0]
Abig = A[nz]
down = Abig.argsort() [::-1] [:max]
return Abig[down], np.transpose( nz )[down]
# "Brier score" site:stats.stackexchange.com
# Proportion classified correctly is an improper scoring rule, i.e., it is
# optimized by a bogus model. I would use the quadratic proper scoring rule known
# as the Brier score, or the concordance probability
# [multi-class] [classification] top
#!/usr/bin/env python
""" etcutil.py: asum div0 findfile ... """
from __future__ import division
from os.path import expanduser, expandvars, isfile, join
import time
import numpy as np
from sklearn.datasets.base import Bunch as Bag
def asum( X ):
""" array summary: "shape type min av max [density]" """
if not hasattr( X, "dtype" ):
return str(X)
if hasattr( X, "todense" ): # issparse
sparsetype = type(X).__name__ + " " # csr_matrix etc.
density = " for the %.3g %% non-0" % (
100. * X.nnz / np.prod( X.shape ))
else:
sparsetype = density = ""
return "%s %s%s min av max %.3g %.3g %.3g %s" % (
X.shape, sparsetype, X.dtype, X.min(), X.mean(), X.max(), density )
def div0( a, b ):
""" ignore / 0: div0( [-1, 0, 1], 0 ) -> [0 0 0] """
with np.errstate(divide='ignore', invalid='ignore'):
c = np.true_divide( a, b )
c[ ~ np.isfinite( c )] = 0
return c
def findfile( filename, dirs=["", "data", "$SCIKIT_LEARN_DATA", "$webdata"] ):
""" -> first dir/file found in a list of dirs
or IOError if none
"""
for dir in dirs:
dirfile = expanduser( expandvars( join( dir, filename )))
if isfile( dirfile ):
return dirfile
raise IOError( "file \"%s\" not found in folders %s" % (
filename, dirs ))
def ints( X ):
return np.round(X).astype(int) # NaN Inf -> - maxint
def isoday():
return time.strftime( "%Y-%m-%d %h %H:%M" ) # 2011-11-15 Nov 12:06
def mkdirpart( filename ):
""" tmp/file: mkdir tmp """
import os
dir = os.path.dirname( filename )
if dir and not os.path.isdir( dir ):
os.makedirs( dir )
def ptime( msg=None, T=[0]):
""" ptime()
...
dt = ptime(): delta seconds (wall clock) from previous call
dt = ptime( "message" ) prints dt, message
"""
t = time.time() # seconds since epoch
dt = t - T[0]
if msg:
print "%4.0f sec %s" % (dt, msg)
T[0] = t
return dt
def quantiles( x, q=[ 10, 25, 50, 75, 90 ]):
p = np.percentile( x, q )
return "quantiles %s: %s" % (np.array(q), p) # with caller's np.print_options
def xy_2class( X, y, c=4, cc=9 ):
""" -> subset X, y 4 or 9 only """
yc = (y == c)
ycc = (y == cc)
print "xy_2class %d %d: %d %d " % (c, cc, yc.sum(), ycc.sum())
J = (yc | ycc)
y = y[J]
return X[J], (y == cc).astype(int)
def subset_xy( X, y, classes=[4,9], plusminus1=False ):
""" -> X[J], y[J] where y == 4 or 9 """
if len(classes) == 0:
return X, y
y = np.asarray( y )
# lookup table: 4 -> 1, 9 -> 2, rest -> 0
lut = np.zeros( y.max() + 1, dtype=int )
lut[classes] = np.arange( len(classes) ) + 1
J = np.where( lut[y] )[0]
y01 = lut[ y[J] ] - 1 # 4 -> 0, 9 -> 1
if plusminus1:
y01 *= 2; y01 -=1 # 2class +1 -1
return X[J], y01
# from: python compare_classifiers_mnist.py gamma=3 C=1 digits=[] ntrain=50000 ntest=10000 tag="60k/gamma3-C1-50000-10000"
# run: 20 Jul 2016 11:29 in ~bz/py/ml/sklearn/mnist Denis-iMac 10.8.3
--------------------------------------------------------------------------------
python compare_classifiers_mnist.py gamma=3 C=1 digits=[] ntrain=50000 ntest=10000 tag="60k/gamma3-C1-50000-10000"
versions: sklearn 0.17.1 numpy 1.11.1 python 2.7.11
rows /= |row| (cos distance), -= mean
params --
X (60000, 784) float32 min av max -0.0615 -5.92e-09 0.235
ntrain 50000
ntest 10000
digits []
y counts [5935 6729 5961 6131 5841 5423 5931 6214 5865 5970]
knnear 3
C 1
gamma 3
nsplit 3
seed 0
run_date 2016-07-20 Jul 11:29
{ classify with KNN --
KNeighborsClassifier(algorithm='brute', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=1, n_neighbors=3, p=2,
weights='distance')
KNN split 0 --
ytrain class sizes: [4897 5622 4985 5118 4849 4534 4957 5143 4895 5000]
ytest class sizes: [1038 1107 976 1013 992 889 974 1071 970 970]
0 sec train KNN
12 sec test KNN: ypred == ytest 97.5 %
Confusion matrix: 97.5 % correct = 9748 / 10000 KNN
True classes down, estimated across | wrong | total class sizes
--------------------------------------------------
0: 1032 1 1 3 1 | 6 | 1038
1: 1101 4 2 | 6 | 1107
2: 10 2 947 2 1 6 6 2 | 29 | 976
3: 3 4 971 1 7 6 13 8 | 42 | 1013
4: 1 4 1 955 7 1 1 22 | 37 | 992
5: 4 1 4 1 860 10 1 6 2 | 29 | 889
6: 6 1 1 965 1 | 9 | 974
7: 5 6 5 1040 1 14 | 31 | 1071
8: 2 10 8 2 4 3 1 935 5 | 35 | 970
9: 3 1 7 7 1 6 3 942 | 28 | 970
--------------------------------------------------
26 26 16 22 16 14 24 23 31 54 | 252 wrong predictions
most confusable, true est: count -- 4 9: 22 7 9: 14 3 8: 13 2 0: 10 5 6: 10 KNN
saving to 60k/gamma3-C1-50000-10000-KNN-split0.npz
KNN split 1 --
ytrain class sizes: [4960 5627 4961 5108 4900 4523 4976 5126 4845 4974]
ytest class sizes: [ 975 1102 1000 1023 941 900 955 1088 1020 996]
0 sec train KNN
11 sec test KNN: ypred == ytest 97.5 %
Confusion matrix: 97.5 % correct = 9750 / 10000 KNN
True classes down, estimated across | wrong | total class sizes
--------------------------------------------------
0: 971 2 2 | 4 | 975
1: 1097 2 2 1 | 5 | 1102
2: 3 1 978 2 3 7 4 2 | 22 | 1000
3: 1 1 4 986 7 1 6 13 4 | 37 | 1023
4: 1 8 905 4 3 20 | 36 | 941
5: 2 1 1 12 2 861 12 1 4 4 | 39 | 900
6: 4 2 2 945 2 | 10 | 955
7: 1 7 2 5 1060 1 12 | 28 | 1088
8: 2 7 1 7 2 7 8 4 978 4 | 42 | 1020
9: 2 5 8 1 2 6 3 969 | 27 | 996
--------------------------------------------------
14 31 10 28 17 17 32 28 27 46 | 250 wrong predictions
most confusable, true est: count -- 4 9: 20 3 8: 13 7 9: 12 5 6: 12 5 3: 12 KNN
saving to 60k/gamma3-C1-50000-10000-KNN-split1.npz
KNN split 2 --
ytrain class sizes: [4956 5545 5003 5130 4851 4505 4955 5151 4913 4991]
ytest class sizes: [ 979 1184 958 1001 990 918 976 1063 952 979]
0 sec train KNN
11 sec test KNN: ypred == ytest 97.7 %
Confusion matrix: 97.7 % correct = 9770 / 10000 KNN
True classes down, estimated across | wrong | total class sizes
--------------------------------------------------
0: 977 1 1 | 2 | 979
1: 1178 1 1 1 1 2 | 6 | 1184
2: 5 4 938 2 1 2 4 1 1 | 20 | 958
3: 1 9 969 5 1 4 6 6 | 32 | 1001
4: 4 962 1 1 2 20 | 28 | 990
5: 6 1 6 2 885 9 4 5 | 33 | 918
6: 3 1 2 4 966 | 10 | 976
7: 2 8 2 2 1036 13 | 27 | 1063
8: 2 12 1 10 2 6 5 908 6 | 44 | 952
9: 2 2 4 5 1 1 9 4 951 | 28 | 979
--------------------------------------------------
20 32 15 22 15 18 20 21 15 52 | 230 wrong predictions
most confusable, true est: count -- 4 9: 20 7 9: 13 8 1: 12 8 3: 10 5 6: 9 KNN
saving to 60k/gamma3-C1-50000-10000-KNN-split2.npz
scores: KNN av 97.6 [97.5 97.5 97.7] train, test: [0 0 0] [12 11 11] sec
}
{ classify with rbf-SVM --
SVC(C=1, cache_size=2000, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma=3, kernel='rbf',
max_iter=-1, probability=False, random_state=0, shrinking=True,
tol=0.001, verbose=False)
rbf-SVM split 0 --
ytrain class sizes: [4897 5622 4985 5118 4849 4534 4957 5143 4895 5000]
ytest class sizes: [1038 1107 976 1013 992 889 974 1071 970 970]
510 sec train rbf-SVM
113 sec test rbf-SVM: ypred == ytest 98.4 %
Confusion matrix: 98.4 % correct = 9836 / 10000 rbf-SVM
True classes down, estimated across | wrong | total class sizes
--------------------------------------------------
0: 1030 1 1 2 3 1 | 8 | 1038
1: 1094 6 4 1 1 1 | 13 | 1107
2: 3 959 1 3 5 4 1 | 17 | 976
3: 4 986 7 5 7 4 | 27 | 1013
4: 1 2 973 5 1 2 8 | 19 | 992
5: 2 1 4 871 6 1 2 2 | 18 | 889
6: 1 2 1 968 2 | 6 | 974
7: 2 6 1 5 1053 1 3 | 18 | 1071
8: 1 2 1 3 3 2 958 | 12 | 970
9: 2 1 2 5 7 2 6 1 944 | 26 | 970
--------------------------------------------------
9 4 23 17 17 14 16 21 23 20 | 164 wrong predictions
most confusable, true est: count -- 4 9: 8 3 5: 7 9 4: 7 3 8: 7 9 7: 6 rbf-SVM
saving to 60k/gamma3-C1-50000-10000-rbf-SVM-split0.npz
rbf-SVM split 1 --
ytrain class sizes: [4960 5627 4961 5108 4900 4523 4976 5126 4845 4974]
ytest class sizes: [ 975 1102 1000 1023 941 900 955 1088 1020 996]
505 sec train rbf-SVM
113 sec test rbf-SVM: ypred == ytest 98.5 %
Confusion matrix: 98.5 % correct = 9850 / 10000 rbf-SVM
True classes down, estimated across | wrong | total class sizes
--------------------------------------------------
0: 971 1 2 1 | 4 | 975
1: 1089 4 4 3 1 1 | 13 | 1102
2: 2 991 1 1 4 1 | 9 | 1000
3: 7 998 3 7 5 3 | 25 | 1023
4: 3 1 930 2 5 | 11 | 941
5: 1 3 5 880 6 1 3 1 | 20 | 900
6: 5 1 1 4 942 2 | 13 | 955
7: 1 6 2 2 1072 5 | 16 | 1088
8: 3 1 2 1 5 1 3 1002 2 | 18 | 1020
9: 3 4 6 1 4 3 975 | 21 | 996
--------------------------------------------------
9 17 18 16 10 12 11 24 16 17 | 150 wrong predictions
most confusable, true est: count -- 3 2: 7 3 7: 7 5 6: 6 9 4: 6 7 1: 6 rbf-SVM
saving to 60k/gamma3-C1-50000-10000-rbf-SVM-split1.npz
rbf-SVM split 2 --
ytrain class sizes: [4956 5545 5003 5130 4851 4505 4955 5151 4913 4991]
ytest class sizes: [ 979 1184 958 1001 990 918 976 1063 952 979]
506 sec train rbf-SVM
113 sec test rbf-SVM: ypred == ytest 98.4 %
Confusion matrix: 98.4 % correct = 9842 / 10000 rbf-SVM
True classes down, estimated across | wrong | total class sizes
--------------------------------------------------
0: 974 2 1 1 1 | 5 | 979
1: 1176 1 1 1 4 1 | 8 | 1184
2: 3 1 946 2 2 4 | 12 | 958
3: 10 975 3 6 7 | 26 | 1001
4: 1 1 972 1 2 13 | 18 | 990
5: 3 2 4 1 903 4 1 | 15 | 918
6: 1 1 1 3 969 1 | 7 | 976
7: 1 3 3 1 2 1049 4 | 14 | 1063
8: 2 4 1 4 2 4 4 928 3 | 24 | 952
9: 3 2 1 4 8 1 1 7 2 950 | 29 | 979
--------------------------------------------------
13 11 22 16 17 14 9 23 12 21 | 158 wrong predictions
most confusable, true est: count -- 4 9: 13 3 2: 10 9 4: 8 9 7: 7 3 8: 7 rbf-SVM
saving to 60k/gamma3-C1-50000-10000-rbf-SVM-split2.npz
scores: rbf-SVM av 98.4 [98.4 98.5 98.4] train, test: [510 505 506] [113 113 113] sec
dual_coef_ : (9, 14374) float64 min av max -1 1.76e-18 1
support_vectors_ : (14374, 784) float64 min av max -0.0615 -0.00033 0.235
}
{ classify with poly2-SVM --
SVC(C=1, cache_size=2000, class_weight=None, coef0=0,
decision_function_shape='ovr', degree=2, gamma=3, kernel='poly',
max_iter=-1, probability=False, random_state=0, shrinking=True,
tol=0.001, verbose=False)
poly2-SVM split 0 --
ytrain class sizes: [4897 5622 4985 5118 4849 4534 4957 5143 4895 5000]
ytest class sizes: [1038 1107 976 1013 992 889 974 1071 970 970]
276 sec train poly2-SVM
86 sec test poly2-SVM: ypred == ytest 98.4 %
Confusion matrix: 98.4 % correct = 9838 / 10000 poly2-SVM
True classes down, estimated across | wrong | total class sizes
--------------------------------------------------
0: 1029 1 1 3 3 1 | 9 | 1038
1: 1099 5 1 1 1 | 8 | 1107
2: 5 953 4 3 4 5 2 | 23 | 976
3: 1 1 995 5 1 2 5 3 | 18 | 1013
4: 1 1 971 6 2 2 9 | 21 | 992
5: 3 1 4 868 5 5 3 | 21 | 889
6: 2 1 968 3 | 6 | 974
7: 1 2 8 2 3 1048 1 6 | 23 | 1071
8: 1 2 1 1 1 2 962 | 8 | 970
9: 1 1 2 6 6 2 5 2 945 | 25 | 970
--------------------------------------------------
14 5 18 18 15 9 16 16 27 24 | 162 wrong predictions
most confusable, true est: count -- 4 9: 9 7 2: 8 7 9: 6 9 4: 6 9 3: 6 poly2-SVM
saving to 60k/gamma3-C1-50000-10000-poly2-SVM-split0.npz
poly2-SVM split 1 --
ytrain class sizes: [4960 5627 4961 5108 4900 4523 4976 5126 4845 4974]
ytest class sizes: [ 975 1102 1000 1023 941 900 955 1088 1020 996]
275 sec train poly2-SVM
85 sec test poly2-SVM: ypred == ytest 98.5 %
Confusion matrix: 98.5 % correct = 9845 / 10000 poly2-SVM
True classes down, estimated across | wrong | total class sizes
--------------------------------------------------
0: 971 2 1 1 | 4 | 975
1: 1095 3 1 2 1 | 7 | 1102
2: 2 1 988 2 1 2 3 1 | 12 | 1000
3: 5 999 5 3 9 2 | 24 | 1023
4: 4 2 931 1 3 | 10 | 941
5: 1 2 11 1 874 5 1 2 3 | 26 | 900
6: 8 2 3 940 2 | 15 | 955
7: 1 5 4 2 1 1070 5 | 18 | 1088
8: 2 4 1 3 2 4 1002 2 | 18 | 1020
9: 2 2 4 4 1 6 2 975 | 21 | 996
--------------------------------------------------
14 18 16 24 7 12 10 19 18 17 | 155 wrong predictions
most confusable, true est: count -- 5 3: 11 3 8: 9 6 0: 8 9 7: 6 5 6: 5 poly2-SVM
saving to 60k/gamma3-C1-50000-10000-poly2-SVM-split1.npz
poly2-SVM split 2 --
ytrain class sizes: [4956 5545 5003 5130 4851 4505 4955 5151 4913 4991]
ytest class sizes: [ 979 1184 958 1001 990 918 976 1063 952 979]
276 sec train poly2-SVM
86 sec test poly2-SVM: ypred == ytest 98.4 %
Confusion matrix: 98.4 % correct = 9841 / 10000 poly2-SVM
True classes down, estimated across | wrong | total class sizes
--------------------------------------------------
0: 974 1 1 1 1 1 | 5 | 979
1: 1179 1 1 2 1 | 5 | 1184
2: 1 3 947 1 2 4 | 11 | 958
3: 7 978 9 2 5 | 23 | 1001
4: 1 3 973 1 12 | 17 | 990
5: 3 4 6 896 6 3 | 22 | 918
6: 1 1 3 968 3 | 8 | 976
7: 3 5 4 1 6 1041 3 | 22 | 1063
8: 1 3 1 5 1 3 3 932 3 | 20 | 952
9: 3 1 1 2 8 1 7 3 953 | 26 | 979
--------------------------------------------------
13 15 19 17 19 16 10 15 16 19 | 159 wrong predictions
most confusable, true est: count -- 4 9: 12 3 5: 9 9 4: 8 3 2: 7 9 7: 7 poly2-SVM
saving to 60k/gamma3-C1-50000-10000-poly2-SVM-split2.npz
scores: poly2-SVM av 98.4 [98.4 98.5 98.4] train, test: [276 275 276] [86 85 86] sec
dual_coef_ : (9, 11092) float64 min av max -1 -2.56e-18 1
support_vectors_ : (11092, 784) float64 min av max -0.0615 2.82e-05 0.234
}
{ classify with Random-Forest --
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features='auto', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=500, n_jobs=1,
oob_score=False, random_state=0, verbose=0, warm_start=False)
Random-Forest split 0 --
ytrain class sizes: [4897 5622 4985 5118 4849 4534 4957 5143 4895 5000]
ytest class sizes: [1038 1107 976 1013 992 889 974 1071 970 970]
283 sec train Random-Forest
2 sec test Random-Forest: ypred == ytest 96.8 %
Confusion matrix: 96.8 % correct = 9682 / 10000 Random-Forest
True classes down, estimated across | wrong | total class sizes
--------------------------------------------------
0: 1020 2 1 1 4 1 8 1 | 18 | 1038
1: 1082 12 6 3 1 1 2 | 25 | 1107
2: 7 2 948 2 6 2 5 3 1 | 28 | 976
3: 2 10 969 1 9 6 10 6 | 44 | 1013
4: 2 2 961 6 1 4 16 | 31 | 992
5: 3 1 13 1 854 8 1 6 2 | 35 | 889
6: 3 4 3 961 3 | 13 | 974
7: 4 12 1 7 1025 2 20 | 46 | 1071
8: 2 2 1 2 3 4 5 3 937 11 | 33 | 970
9: 4 3 19 7 2 5 5 925 | 45 | 970
--------------------------------------------------
21 13 44 43 29 20 25 23 43 57 | 318 wrong predictions
most confusable, true est: count -- 7 9: 20 9 3: 19 4 9: 16 5 3: 13 1 2: 12 Random-Forest
saving to 60k/gamma3-C1-50000-10000-Random-Forest-split0.npz
Random-Forest split 1 --
ytrain class sizes: [4960 5627 4961 5108 4900 4523 4976 5126 4845 4974]
ytest class sizes: [ 975 1102 1000 1023 941 900 955 1088 1020 996]
285 sec train Random-Forest
2 sec test Random-Forest: ypred == ytest 97.0 %
Confusion matrix: 97.0 % correct = 9701 / 10000 Random-Forest
True classes down, estimated across | wrong | total class sizes
--------------------------------------------------
0: 968 1 4 2 | 7 | 975
1: 1085 6 5 2 1 1 1 1 | 17 | 1102
2: 3 2 978 2 3 3 4 3 2 | 22 | 1000
3: 1 2 9 976 1 11 1 9 10 3 | 47 | 1023
4: 2 2 1 917 1 2 2 14 | 24 | 941
5: 3 4 12 1 862 8 6 4 | 38 | 900
6: 6 3 1 9 933 3 | 22 | 955
7: 8 8 7 1052 1 12 | 36 | 1088
8: 1 5 3 7 6 6 3 978 11 | 42 | 1020
9: 4 2 12 13 1 2 6 4 952 | 44 | 996
--------------------------------------------------
20 25 30 39 34 28 20 25 32 46 | 299 wrong predictions
most confusable, true est: count -- 4 9: 14 9 4: 13 7 9: 12 9 3: 12 5 3: 12 Random-Forest
saving to 60k/gamma3-C1-50000-10000-Random-Forest-split1.npz
Random-Forest split 2 --
ytrain class sizes: [4956 5545 5003 5130 4851 4505 4955 5151 4913 4991]
ytest class sizes: [ 979 1184 958 1001 990 918 976 1063 952 979]
283 sec train Random-Forest
2 sec test Random-Forest: ypred == ytest 97.1 %
Confusion matrix: 97.1 % correct = 9711 / 10000 Random-Forest
True classes down, estimated across | wrong | total class sizes
--------------------------------------------------
0: 969 1 1 1 1 4 2 | 10 | 979
1: 1172 3 2 1 1 1 3 1 | 12 | 1184
2: 2 1 934 5 3 1 8 4 | 24 | 958
3: 2 14 957 1 8 2 8 5 4 | 44 | 1001
4: 1 1 1 957 2 1 1 3 23 | 33 | 990
5: 5 2 1 8 2 891 5 2 2 | 27 | 918
6: 2 1 1 1 6 964 1 | 12 | 976
7: 1 6 6 1 8 1022 4 15 | 41 | 1063
8: 4 8 3 10 4 3 7 909 4 | 43 | 952
9: 3 2 4 8 7 2 1 9 7 936 | 43 | 979
--------------------------------------------------
18 23 34 34 28 23 19 29 31 50 | 289 wrong predictions
most confusable, true est: count -- 4 9: 23 7 9: 15 3 2: 14 8 3: 10 9 7: 9 Random-Forest
saving to 60k/gamma3-C1-50000-10000-Random-Forest-split2.npz
scores: Random-Forest av 97.0 [96.8 97 97.1] train, test: [283 285 283] [2 2 2] sec
feature_importances_ : (784,) float64 min av max 0 0.00128 0.0123
}
3839.10 real 3875.89 user 12.26 sys
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment