denis-bz/0-MNIST-KNN-svm.md

## 0-MNIST-KNN-svm.md

      
    Raw
  

              0-MNIST-KNN-svm.md
            
          
    Compare sklearn KNN rbf poly2 on MNIST digits

Purpose: compare 4 scikit-learn classifiers on a venerable test case,
the MNIST database
of 70000 handwritten digits, 28 x 28 pixels.
Keywords: classification, benchmark, MNIST, KNN, SVM, scikit-learn, python

Accuracy %, run times

KNN          av 97.6 %  [97.5 97.5 97.7]  train, test: [0 0 0] [11 11 11] sec 
rbf-SVM      av 98.4 %  [98.4 98.5 98.4]  train, test: [502 503 506] [113 113 113] sec 
poly2-SVM    av 98.4 %  [98.4 98.5 98.4]  train, test: [275 274 276] [85 84 85] sec 
Random-Forest av 97.0 % [96.8 97 97.1]    train, test: [283 284 282] [2 2 2] sec 

Notes

compare_classifiers_mnist.py and classifiers.py are modified from the nice

http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html .

The logfile gamma3-C1-50000-10000.log is below.
As you see, KNeighborsClassifier( algorithm="brute" ) is really fast.
On Macs, it uses the Accelerate Framework to do
dot( 50k x 768, 10k x 768 ) on 4 cores in parallel in ~ 10 seconds.
(Bigger sizes can be done in blocks.)
It's also by far the simplest method: easy to understand,
easy to implement from scratch if need be.

When in doubt, use brute force.

-- Ken Thompson

KNN needs no tuning, beyond the number of nearest neighbors; 3 is good for MNIST.
Looking at the 3 nearest digits in the picture above --
0 0 9, 4 9 9 ... -- can give some insight into why mismatches.
SVMs score a bit better than KNN, but poly2 is 20 times slower and RBF 50 times slower.

I have my doubts about their parsimony:
train() --> how many coefficients --> predict() ?

Linear SVMs generate ncluster * dim coefficients, here 10 * 768.
For this run, RBF has 14374 x 784 support_vectors_ --
huge, not plottable, not understandable.
Experts please comment.
Neural network classifiers

http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html#4d4e495354
lists 50 papers on (mostly) neural net classifiers of the MNIST digits,
with error rates 0.21 % .. mostly < 1 %.
I don't know which of these are reproducible (with code, log files, run times on the web),
nor how long their code is (over and above numpy and scikit-learn).
Preprocessing raw pixels to high-level features -- "shapes" or "strokes" --
is of crucial importance for any classifier.
I don't know what preprocessing these 50 do, either; KNN etc. here do none.
RBFs

A way to tune the parameter gamma in exp( - gamma |Xi - Xj|^2 ):

first scale dist so that median |Xi - Xj| ~ 1,
half the neighbors < 1 away and half > 1.
What exp( - gamma dist^2 ) does is down-weight more distant neighbors --
near 0 is not "seen".
With gamma = 3 we down-weight by
dist:                   [0    .5   1  1.5  2]
						---------------------
exp( -   3 * dist^2 ):  [100  47   5   0   0]  %  -- Gaussian

so half the neighbors are down-weighted to 5 % or less.
In general,

look at quantiles of distance^2 of your data,
scipy.spatial.distance.pdist ** 2 | np.percentile
print a little table like the above, with various gammas
choose gamma to down-weight __ % of the data to __ %, e.g. 50 % down to 5 %.

(KNeighborsClassifier( weights="distance" ) uses 1 / dist,
which decays much more slowly than RBF.
weights can be a callable; I have not played with that.)
Quick is beautiful

To speed up the try-it-and-see loop, start your programs like
land = 1
sea = 2
...
	# to change these params in sh or ipython, run this.py  a=1  b=None  c=\"str\" ...
for arg in sys.argv[1:]:
	exec( arg )
...
# print all params

This works well from shell scripts. For example,
for gamma in 1 2 3
do
	python my.py  gamma=$gamma  "$@"  | tee gamma$gamma.log
done

There are many other ways of doing grid search:

Do the simplest thing that works, then stop.

Links

MNIST database: https://en.wikipedia.org/wiki/MNIST_database
KNeighborsClassifier: http://scikit-learn.org/stable/modules/neighbors.html

SVC (libsvm): http://scikit-learn.org/stable/modules/svm.html#svm-classification

Random-Forest: http://scikit-learn.org/stable/modules/ensemble.html
https://github.com/scikit-learn/scikit-learn/tree/master/benchmarks/bench_mnist.py
Josh Montague: https://github.com/jrmontag/mnist-sklearn -- 5 * shifts, kdtree
Comments are welcome, real test cases most welcome.

cheers

-- denis
Last change: 2016-08-10 august

  
## classifiers.py
#!/usr/bin/env python
# from http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html

from __future__ import division
import re

from sklearn.neighbors  import KNeighborsClassifier
from sklearn.svm        import SVC, LinearSVC  # libsvm, liblinear
from sklearn.tree       import DecisionTreeClassifier
from sklearn.ensemble   import RandomForestClassifier, AdaBoostClassifier
from sklearn.linear_model import LogisticRegression, SGDClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis  import LinearDiscriminantAnalysis
from sklearn.discriminant_analysis  import QuadraticDiscriminantAnalysis

#...............................................................................
def classifiers( only="knn|poly|rbf|random-forest",
        knnear=3, C=1, gamma=3, seed=0, **kwignored ):
    """ -> [ [name, classifier] ... ] with names starting knn|poly|...  e.g.
        [ ["KNN", KNeighborsClassifier(...)],
            ["rbf-SVM", SVC(...)],
            ... ]
        only= "" or "*": all
    """
    all_classifiers = [

["KNN",
    # $sklearn/neighbors/classification.py $sklearn/neighbors/base.py
    # http://scikit-learn.org/stable/modules/neighbors.html
    KNeighborsClassifier(
        algorithm="brute",
        weights="distance",  # 1/dist
            # default uniform, nnear 3: a b b => b
        n_neighbors=knnear, n_jobs=1
    )],

    # $sklearn/svm/classes.py
    # http://scikit-learn.org/stable/modules/svm.html#svm-classification
    # The implementation is based on libsvm. The fit time complexity
    # is more than quadratic with the number of samples
    # The multiclass support is handled according to a one-vs-one scheme.
["rbf-SVM",
    SVC( kernel="rbf",
        C=C,
        gamma=gamma,
        cache_size=2000,  # M
        decision_function_shape="ovr",  # one vs rest == default ovo ?
        random_state=seed,
    )],

["poly2-SVM",  # on mnist ~ as good as rbf, twice as fast
    SVC( kernel="poly", degree=2, coef0=0,
        C=C,
        gamma=gamma,
        cache_size=2000,  # M
        decision_function_shape="ovr",  # one vs rest
        random_state=seed,
    )],

["Random-Forest",
    # $sklearn/ensemble/forest.py
    # http://scikit-learn.org/stable/modules/ensemble.html
    RandomForestClassifier(
        n_estimators=500, max_features="auto",
        random_state=seed, n_jobs=1
    )],

["libsvm-linear", SVC( kernel="linear",  # ~ 94
        C=C,
        cache_size=2000,  # M
        decision_function_shape="ovr",  # ? coef_ 45 x 784
        random_state=seed,
    )],

["liblinear", LinearSVC(  # ~ 91
        C=C,
        multi_class='ovr',  # 'crammer_singer': 45 pairs
        dual=False,
        random_state=seed,
    )],

["SGD", SGDClassifier( n_iter=10, random_state=seed  # ~ 91 %, 5 sec
    )],

# ["Logistic-Regression",  # ~ 91
#         LogisticRegression( C=1, solver="lbfgs", n_jobs=1
#         )],
# ["Linear-Discriminant-Analysis", LinearDiscriminantAnalysis()],
# ["Naive-Bayes",     GaussianNB()],
# ["AdaBoost",        AdaBoostClassifier()],
# ["Decision-Tree",   DecisionTreeClassifier( )],  # max_depth=5
# ["Quadratic-Discriminant-Analysis", QuadraticDiscriminantAnalysis()],
]

    if only in ("", "*"):
        return all_classifiers
    return [c for c in all_classifiers
            if re.match( only, c[0], re.IGNORECASE )]

#...............................................................................
if __name__ == "__main__":
    import inspect

    for (classifname, classif) in classifiers( only="*" ):
        f = inspect.getfile( classif.__class__ )[:-1]
        print "%s: %s \n%s \n" % (
                classifname, f, classif )


## compare_classifiers_mnist.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# from http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html
# on https://en.wikipedia.org/wiki/MNIST_database
"""
=====================
Classifier comparison
=====================

A comparison of a several classifiers in scikit-learn on MNIST digits.
The point of this example is to illustrate the nature of decision boundaries
of different classifiers.
This should be taken with a grain of salt, as the intuition conveyed by
these examples does not necessarily carry over to real datasets.

Particularly in high-dimensional spaces, data can more easily be separated
linearly and the simplicity of classifiers such as naive Bayes and linear SVMs
might lead to better generalization than is achieved by other classifiers.

"""

# mnist, no plots
# Code source: Gaël Varoquaux
#              Andreas Müller
# Modified for documentation by Jaques Grobler
# License: BSD 3 clause

from __future__ import division
import sys
import numpy as np
import sklearn
from sklearn.cross_validation import train_test_split
from sklearn.datasets.base import Bunch
from sklearn.preprocessing import normalize, scale

from classifiers import classifiers
import confus
import etcutil as nu
import mnist

__version__ = "2016-07-18 july  denis-bz-py t-online.de"

np.set_printoptions( threshold=20, edgeitems=14, linewidth=140,
        formatter = dict( float = lambda x: "%.3g" % x ))

print "\n", 80 * "-"
print "python", " ".join(sys.argv)
print "versions: sklearn %s  numpy %s  python %s " % (
        sklearn.__version__, np.__version__, sys.version.split()[0] )

#-------------------------------------------------------------------------------
# outline:
# 1 parameters
# 2 load, subset, normalize the data
# 3 classifiers = [["rbf-SVM", SVC()] ... ] that match "only"
# 4 for classif:
#     train_test_split, classif .fit .predict, save
#-------------------------------------------------------------------------------

ntrain = 50000
ntest = 10000
digits = []  # [] all / [4,9]
only = "knn|poly|rbf|random-forest"  # only these classifiers
nsplit = 3  # iter split train test, fit, predict

    # params for various classifiers --
knnear = 3
C = 1
gamma = 3  # exp( - gamma * dist^2 ): exp( -3 ) = .05

tag = "tmp"
save = 1  # > tag.npz
seed = 0

    # to change these params in sh or ipython, run this.py  a=1  b=None  c=\"str\" ...
for arg in sys.argv[1:]:
    exec( arg )

np.random.seed( seed )

#...........................................................................
bag = mnist.load_mnist( ntrain + ntest, dtype=np.float32 )
X = bag.data
y = bag.target
if digits:
    X, y = nu.xy_2class( X, y, *digits )  # e.g. [4,9] only

print "rows /= |row| (cos distance), -= mean"
X = normalize( X )
X -= X.mean( axis=0 )  # for svm -- properly Xtrain Xtest
    # X = nu.div0( X, X.std() )  # ? knn a bit worse, svm grid gamma

params = """
X       %s
ntrain  %d
ntest   %d
digits  %s
y counts %s
knnear  %d
C       %.3g
gamma   %.3g
nsplit  %d
seed    %d
run_date %s
""" % ( nu.asum(X), ntrain, ntest, digits, np.bincount(y),
        knnear, C, gamma, nsplit, seed,
        nu.isoday() )
print "\nparams --", params

#...............................................................................
def classify_splits( classifname, classif, X, y, nsplit=1 ):
    """ train_test_split, classif .fit .predict, save
        for nsplit splits
        -> [ Bag( ... ) for each split ]
    """
    print "\n{ classify with %s --" % classifname
    classifstr = str(classif)
    print classifstr
    np.random.seed( seed )  # just in case
    classifstep = classif
    # classifstep = classif.steps[1][1]  # grr in pipe
    bags = []
    testfraction = ntest / (ntrain + ntest)
    scores = []
    trainsecs = []
    testsecs = []
    for jsplit in range(nsplit):  # ~ kfold
        print "\n%s split %d --" % (classifname, jsplit)
        Xtrain, Xtest, ytrain, ytest = \
            train_test_split( X, y, test_size=testfraction, random_state=jsplit )
            # $etc/traintestsplit.py + jtrain jtest -> Xall yall
        print "ytrain class sizes:", np.bincount( ytrain )
        print "ytest class sizes: ", np.bincount( ytest )
        nu.ptime()

        #.......................................................................
        classif.fit( Xtrain, ytrain )
        trainsec = nu.ptime( "train " + classifname )

        ypred = classif.predict( Xtest )
        score = (ypred == ytest) .mean() * 100
        # Proportion classified correctly is an improper scoring rule, "Brier score" ?

        testsec = nu.ptime( "test %s: ypred == ytest %.1f %%" % (classifname, score) )
        scores += [score]
        trainsecs += [trainsec]
        testsecs += [testsec]

        confusmat = confus.pconfus( ytest, ypred, label=classifname )  # print

        b = Bunch(
            classif     = classifname,
            classifstr  = classifstr,
            confusmat   = confusmat,
            ntrain      = ntrain,
            ntest       = ntest,
            params      = params,
            random_state = jsplit,
            score       = score,
            sec         = [int(trainsec), int(testsec)],
            ypred       = ypred,
            ytest       = ytest,
            )
        if save:  # to plot etc.
            out = "%s-%s-split%d.npz" % (  # grr
                    tag, classifname, jsplit )
            print "saving to %s \n" % out
            np.savez( out, **b )
        bags.append( b )

    scores = np.array( scores )
    trainsecs = nu.ints( trainsecs )
    testsecs = nu.ints( testsecs )
    print "scores: %-12s av %.1f  %s  train, test: %s %s sec " % (
            classifname, scores.mean(), scores, trainsecs, testsecs )

        # how many coefs are needed to predict() ? rbf-svm ~ 1/3 ntest
    for attr in "coef_  dual_coef_  support_vectors_  feature_importances_ ".split():
        if hasattr( classifstep, attr ):
            print "%s : %s" % (
                    attr, nu.asum( getattr( classifstep, attr )))
    print "}\n"

    return bags

#...........................................................................
for (classifname, classif) in classifiers( only,
        knnear=knnear, C=C, gamma=gamma, seed=seed ):
    bags = classify_splits( classifname, classif, X, y, nsplit=nsplit )


## confus.py
#!/usr/bin/env python

from __future__ import division
import numpy as np

#...............................................................................
def pconfus( true, est, verbose=1, nmaxconfus=5, label="" ):
    """ print confusion matrix """
        # plot, interactive ? Hastie p. 118 vowel picture
    true = np.squeeze(true)
    est = np.squeeze(est)
    n = len(true)
    assert n == len(est), "len(true) != len(est): shapes %s %s" % (
            true.shape, est.shape )
    truemin = np.nanmin(true)
    estmin = np.nanmin(est)
    if truemin != 0:
        true = true - truemin
    if estmin != 0:
        est = est - estmin
    truemax = np.nanmax(true)
    estmax = np.nanmax(est)
    if truemax != estmax:
        print "warning: pconfus true max %.3g != est max %.3g" % (
            truemax, estmax )
    nclass = int( max( truemax, estmax )) + 1
    confus = np.zeros( (nclass,nclass), int )

    for t, e in zip( true, est ):
        if not np.isnan(t) and not np.isnan(e):
            confus[ int(t), int(e) ] += 1

    if not verbose:
        return confus

    diag = confus.diagonal()
    dsum = diag.sum()
    csum = confus.sum()
    correct = dsum / csum * 100
        # permuted > diag ? cf hungarian.py

#...............................................................................
    print "\nConfusion matrix: %.1f %% correct = %d / %d  %s" % (
            correct, dsum, csum, label )
    print "True classes down, estimated across   | wrong | total class sizes"
    print " ", 5*nclass*"-"
    for j, row in enumerate(confus):
        rowsum = row.sum()
        if rowsum > 0:
            print "%2d: %s  | %4d | %4d " % (
                j, _astr( row, fmt="%4.0d" ), rowsum - diag[j], rowsum )

    print " ", 5*nclass*"-"
    estsize = confus.sum(axis=0) - diag
    print "    %s  | %4d wrong predictions " % (
            _astr( estsize, fmt="%4d" ), csum - dsum )

    if nmaxconfus > 0:
        print "\nmost confusable, true est: count -- " ,
        A0 = confus.copy()
        np.fill_diagonal( A0, -1 )
        for a, jk in zip( *_maxfewat( A0, nmaxconfus, ge=-1 )):
            if a <= 0:  break
            print "%d %d: %d  " % (jk[0], jk[1], a) ,
        print label
    print "\n"

    return confus


#...............................................................................
def _astr( x, fmt="%g" ):
    """ scalar / vec / 2d array -> join( fmt % xj ) """
    if x is None or isinstance( x, basestring ):
        return x
    if np.isscalar(x):
        if np.fabs(x) < 1e-10:
            x = 0
        return fmt % x
    x = np.asanyarray(x)
    if x.ndim == 0:  # asarray(3)
        x = x.item()
        assert np.isscalar(x), x
        return _astr( x, fmt )
    if x.ndim == 1:
        return " ".join([ _astr(xx, fmt )  for xx in x ])
    else:
        assert x.ndim == 2, x.shape
        return "[ %s ]" % "\n".join([ _astr( row, fmt )  for row in x ])

def _maxfewat( A, max=10, ge=None ):
    """ -> Amax[], jkmax[ max, 2 ] """
    A = np.asanyarray(A)
    if ge is None:
        ge = A.mean()  # maybe too few
    nz = np.nonzero( A >= ge )  # tuple ndim ([j ...], [k ...])
    if A.ndim == 1:
        nz = nz[0]
    Abig = A[nz]
    down = Abig.argsort() [::-1] [:max]
    return Abig[down], np.transpose( nz )[down]


# "Brier score" site:stats.stackexchange.com
# Proportion classified correctly is an improper scoring rule, i.e., it is
# optimized by a bogus model. I would use the quadratic proper scoring rule known
# as the Brier score, or the concordance probability
# [multi-class] [classification] top


## etcutil.py
#!/usr/bin/env python
""" etcutil.py: asum div0 findfile ... """

from __future__ import division
from os.path import expanduser, expandvars, isfile, join
import time
import numpy as np
from sklearn.datasets.base import Bunch as Bag


def asum( X ):
    """ array summary: "shape type min av max [density]" """
    if not hasattr( X, "dtype" ):
        return str(X)
    if hasattr( X, "todense" ):  # issparse
        sparsetype = type(X).__name__ + " "  # csr_matrix etc.
        density = " for the %.3g %% non-0" % (
                100. * X.nnz / np.prod( X.shape ))
    else:
        sparsetype = density = ""
    return "%s %s%s  min av max %.3g %.3g %.3g %s" % (
            X.shape, sparsetype, X.dtype, X.min(), X.mean(), X.max(), density )

def div0( a, b ):
    """ ignore / 0: div0( [-1, 0, 1], 0 ) -> [0 0 0] """
    with np.errstate(divide='ignore', invalid='ignore'):
        c = np.true_divide( a, b )
        c[ ~ np.isfinite( c )] = 0
    return c

def findfile( filename, dirs=["", "data", "$SCIKIT_LEARN_DATA", "$webdata"] ):
    """ -> first dir/file found in a list of dirs
        or IOError if none
    """
    for dir in dirs:
        dirfile = expanduser( expandvars( join( dir, filename )))
        if isfile( dirfile ):
            return dirfile
    raise IOError( "file \"%s\" not found in folders %s" % (
                filename, dirs ))

def ints( X ):
    return np.round(X).astype(int)  # NaN Inf -> - maxint

def isoday():
    return time.strftime( "%Y-%m-%d %h %H:%M" )  # 2011-11-15 Nov 12:06

def mkdirpart( filename ):
    """ tmp/file: mkdir tmp """
    import os
    dir = os.path.dirname( filename )
    if dir and not os.path.isdir( dir ):
        os.makedirs( dir )

def ptime( msg=None, T=[0]):
    """ ptime()
        ...
        dt = ptime(): delta seconds (wall clock) from previous call
        dt = ptime( "message" )  prints dt, message
    """
    t = time.time()  # seconds since epoch
    dt = t - T[0]
    if msg:
        print "%4.0f sec  %s" % (dt, msg)
    T[0] = t
    return dt

def quantiles( x, q=[ 10, 25, 50, 75, 90 ]):
    p = np.percentile( x, q )
    return "quantiles %s: %s" % (np.array(q), p)  # with caller's np.print_options

def xy_2class( X, y, c=4, cc=9 ):
    """ -> subset X, y 4 or 9 only """
    yc = (y == c)
    ycc = (y == cc)
    print "xy_2class %d %d: %d %d " % (c, cc, yc.sum(), ycc.sum())
    J = (yc | ycc)
    y = y[J]
    return X[J], (y == cc).astype(int)

def subset_xy( X, y, classes=[4,9], plusminus1=False ):
    """ -> X[J], y[J] where y == 4 or 9 """
    if len(classes) == 0:
        return X, y
    y = np.asarray( y )
        # lookup table: 4 -> 1, 9 -> 2, rest -> 0
    lut = np.zeros( y.max() + 1, dtype=int )
    lut[classes] = np.arange( len(classes) ) + 1
    J = np.where( lut[y] )[0]
    y01 = lut[ y[J] ] - 1  # 4 -> 0, 9 -> 1
    if plusminus1:
        y01 *= 2;  y01 -=1  # 2class +1 -1
    return X[J], y01

## gamma3-C1-50000-10000.log
# from: python compare_classifiers_mnist.py gamma=3 C=1 digits=[] ntrain=50000 ntest=10000 tag="60k/gamma3-C1-50000-10000"
# run: 20 Jul 2016 11:29  in ~bz/py/ml/sklearn/mnist  Denis-iMac 10.8.3

--------------------------------------------------------------------------------
python compare_classifiers_mnist.py gamma=3 C=1 digits=[] ntrain=50000 ntest=10000 tag="60k/gamma3-C1-50000-10000"
versions: sklearn 0.17.1  numpy 1.11.1  python 2.7.11
rows /= |row| (cos distance), -= mean

params --
X       (60000, 784)  float32  min av max -0.0615 -5.92e-09 0.235
ntrain  50000
ntest   10000
digits  []
y counts [5935 6729 5961 6131 5841 5423 5931 6214 5865 5970]
knnear  3
C       1
gamma   3
nsplit  3
seed    0
run_date 2016-07-20 Jul 11:29


{ classify with KNN --
KNeighborsClassifier(algorithm='brute', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=3, p=2,
           weights='distance')

KNN split 0 --
ytrain class sizes: [4897 5622 4985 5118 4849 4534 4957 5143 4895 5000]
ytest class sizes:  [1038 1107  976 1013  992  889  974 1071  970  970]
   0 sec  train KNN
  12 sec  test KNN: ypred == ytest 97.5 %

Confusion matrix: 97.5 % correct = 9748 / 10000  KNN
True classes down, estimated across   | wrong | total class sizes
  --------------------------------------------------
 0: 1032    1                   1    3              1  |    6 | 1038
 1:      1101    4                        2            |    6 | 1107
 2:   10    2  947    2              1    6    6    2  |   29 |  976
 3:         3    4  971    1    7         6   13    8  |   42 | 1013
 4:    1    4         1  955         7    1    1   22  |   37 |  992
 5:    4         1    4    1  860   10    1    6    2  |   29 |  889
 6:    6    1                   1  965         1       |    9 |  974
 7:         5    6         5           1040    1   14  |   31 | 1071
 8:    2   10         8    2    4    3    1  935    5  |   35 |  970
 9:    3         1    7    7    1         6    3  942  |   28 |  970
  --------------------------------------------------
      26   26   16   22   16   14   24   23   31   54  |  252 wrong predictions

most confusable, true est: count --  4 9: 22   7 9: 14   3 8: 13   2 0: 10   5 6: 10   KNN


saving to 60k/gamma3-C1-50000-10000-KNN-split0.npz


KNN split 1 --
ytrain class sizes: [4960 5627 4961 5108 4900 4523 4976 5126 4845 4974]
ytest class sizes:  [ 975 1102 1000 1023  941  900  955 1088 1020  996]
   0 sec  train KNN
  11 sec  test KNN: ypred == ytest 97.5 %

Confusion matrix: 97.5 % correct = 9750 / 10000  KNN
True classes down, estimated across   | wrong | total class sizes
  --------------------------------------------------
 0:  971    2                        2                 |    4 |  975
 1:      1097    2    2                   1            |    5 | 1102
 2:    3    1  978    2              3    7    4    2  |   22 | 1000
 3:    1    1    4  986         7    1    6   13    4  |   37 | 1023
 4:    1    8            905         4    3        20  |   36 |  941
 5:    2    1    1   12    2  861   12    1    4    4  |   39 |  900
 6:    4    2                   2  945         2       |   10 |  955
 7:    1    7    2         5           1060    1   12  |   28 | 1088
 8:    2    7    1    7    2    7    8    4  978    4  |   42 | 1020
 9:         2         5    8    1    2    6    3  969  |   27 |  996
  --------------------------------------------------
      14   31   10   28   17   17   32   28   27   46  |  250 wrong predictions

most confusable, true est: count --  4 9: 20   3 8: 13   7 9: 12   5 6: 12   5 3: 12   KNN


saving to 60k/gamma3-C1-50000-10000-KNN-split1.npz


KNN split 2 --
ytrain class sizes: [4956 5545 5003 5130 4851 4505 4955 5151 4913 4991]
ytest class sizes:  [ 979 1184  958 1001  990  918  976 1063  952  979]
   0 sec  train KNN
  11 sec  test KNN: ypred == ytest 97.7 %

Confusion matrix: 97.7 % correct = 9770 / 10000  KNN
True classes down, estimated across   | wrong | total class sizes
  --------------------------------------------------
 0:  977         1                                  1  |    2 |  979
 1:      1178    1         1    1    1    2            |    6 | 1184
 2:    5    4  938    2    1         2    4    1    1  |   20 |  958
 3:         1    9  969         5    1    4    6    6  |   32 | 1001
 4:         4            962    1    1    2        20  |   28 |  990
 5:    6         1    6    2  885    9         4    5  |   33 |  918
 6:    3    1              2    4  966                 |   10 |  976
 7:    2    8    2         2           1036        13  |   27 | 1063
 8:    2   12    1   10    2    6    5       908    6  |   44 |  952
 9:    2    2         4    5    1    1    9    4  951  |   28 |  979
  --------------------------------------------------
      20   32   15   22   15   18   20   21   15   52  |  230 wrong predictions

most confusable, true est: count --  4 9: 20   7 9: 13   8 1: 12   8 3: 10   5 6: 9   KNN


saving to 60k/gamma3-C1-50000-10000-KNN-split2.npz

scores: KNN          av 97.6  [97.5 97.5 97.7]  train, test: [0 0 0] [12 11 11] sec
}


{ classify with rbf-SVM --
SVC(C=1, cache_size=2000, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=3, kernel='rbf',
  max_iter=-1, probability=False, random_state=0, shrinking=True,
  tol=0.001, verbose=False)

rbf-SVM split 0 --
ytrain class sizes: [4897 5622 4985 5118 4849 4534 4957 5143 4895 5000]
ytest class sizes:  [1038 1107  976 1013  992  889  974 1071  970  970]
 510 sec  train rbf-SVM
 113 sec  test rbf-SVM: ypred == ytest 98.4 %

Confusion matrix: 98.4 % correct = 9836 / 10000  rbf-SVM
True classes down, estimated across   | wrong | total class sizes
  --------------------------------------------------
 0: 1030                   1    1    2         3    1  |    8 | 1038
 1:      1094    6    4                   1    1    1  |   13 | 1107
 2:    3       959    1    3              5    4    1  |   17 |  976
 3:              4  986         7         5    7    4  |   27 | 1013
 4:         1    2       973         5    1    2    8  |   19 |  992
 5:    2         1    4       871    6    1    2    2  |   18 |  889
 6:    1         2              1  968         2       |    6 |  974
 7:         2    6    1    5           1053    1    3  |   18 | 1071
 8:    1              2    1    3    3    2  958       |   12 |  970
 9:    2    1    2    5    7    2         6    1  944  |   26 |  970
  --------------------------------------------------
       9    4   23   17   17   14   16   21   23   20  |  164 wrong predictions

most confusable, true est: count --  4 9: 8   3 5: 7   9 4: 7   3 8: 7   9 7: 6   rbf-SVM


saving to 60k/gamma3-C1-50000-10000-rbf-SVM-split0.npz


rbf-SVM split 1 --
ytrain class sizes: [4960 5627 4961 5108 4900 4523 4976 5126 4845 4974]
ytest class sizes:  [ 975 1102 1000 1023  941  900  955 1088 1020  996]
 505 sec  train rbf-SVM
 113 sec  test rbf-SVM: ypred == ytest 98.5 %

Confusion matrix: 98.5 % correct = 9850 / 10000  rbf-SVM
True classes down, estimated across   | wrong | total class sizes
  --------------------------------------------------
 0:  971    1                        2         1       |    4 |  975
 1:      1089    4    4                   3    1    1  |   13 | 1102
 2:    2       991    1              1    4    1       |    9 | 1000
 3:              7  998         3         7    5    3  |   25 | 1023
 4:         3    1       930              2         5  |   11 |  941
 5:    1         3    5       880    6    1    3    1  |   20 |  900
 6:    5    1              1    4  942         2       |   13 |  955
 7:    1    6    2         2           1072         5  |   16 | 1088
 8:         3    1    2    1    5    1    3 1002    2  |   18 | 1020
 9:         3         4    6         1    4    3  975  |   21 |  996
  --------------------------------------------------
       9   17   18   16   10   12   11   24   16   17  |  150 wrong predictions

most confusable, true est: count --  3 2: 7   3 7: 7   5 6: 6   9 4: 6   7 1: 6   rbf-SVM


saving to 60k/gamma3-C1-50000-10000-rbf-SVM-split1.npz


rbf-SVM split 2 --
ytrain class sizes: [4956 5545 5003 5130 4851 4505 4955 5151 4913 4991]
ytest class sizes:  [ 979 1184  958 1001  990  918  976 1063  952  979]
 506 sec  train rbf-SVM
 113 sec  test rbf-SVM: ypred == ytest 98.4 %

Confusion matrix: 98.4 % correct = 9842 / 10000  rbf-SVM
True classes down, estimated across   | wrong | total class sizes
  --------------------------------------------------
 0:  974         2         1    1                   1  |    5 |  979
 1:      1176    1    1         1         4    1       |    8 | 1184
 2:    3    1  946    2    2              4            |   12 |  958
 3:             10  975         3         6    7       |   26 | 1001
 4:         1    1       972    1         2        13  |   18 |  990
 5:    3         2    4    1  903    4         1       |   15 |  918
 6:    1         1         1    3  969         1       |    7 |  976
 7:    1    3    3    1    2           1049         4  |   14 | 1063
 8:    2    4    1    4    2    4    4       928    3  |   24 |  952
 9:    3    2    1    4    8    1    1    7    2  950  |   29 |  979
  --------------------------------------------------
      13   11   22   16   17   14    9   23   12   21  |  158 wrong predictions

most confusable, true est: count --  4 9: 13   3 2: 10   9 4: 8   9 7: 7   3 8: 7   rbf-SVM


saving to 60k/gamma3-C1-50000-10000-rbf-SVM-split2.npz

scores: rbf-SVM      av 98.4  [98.4 98.5 98.4]  train, test: [510 505 506] [113 113 113] sec
dual_coef_ : (9, 14374)  float64  min av max -1 1.76e-18 1
support_vectors_ : (14374, 784)  float64  min av max -0.0615 -0.00033 0.235
}


{ classify with poly2-SVM --
SVC(C=1, cache_size=2000, class_weight=None, coef0=0,
  decision_function_shape='ovr', degree=2, gamma=3, kernel='poly',
  max_iter=-1, probability=False, random_state=0, shrinking=True,
  tol=0.001, verbose=False)

poly2-SVM split 0 --
ytrain class sizes: [4897 5622 4985 5118 4849 4534 4957 5143 4895 5000]
ytest class sizes:  [1038 1107  976 1013  992  889  974 1071  970  970]
 276 sec  train poly2-SVM
  86 sec  test poly2-SVM: ypred == ytest 98.4 %

Confusion matrix: 98.4 % correct = 9838 / 10000  poly2-SVM
True classes down, estimated across   | wrong | total class sizes
  --------------------------------------------------
 0: 1029    1              1         3         3    1  |    9 | 1038
 1:      1099    5         1              1    1       |    8 | 1107
 2:    5       953    4    3              4    5    2  |   23 |  976
 3:         1    1  995         5    1    2    5    3  |   18 | 1013
 4:    1         1       971         6    2    2    9  |   21 |  992
 5:    3         1    4       868    5         5    3  |   21 |  889
 6:    2                        1  968         3       |    6 |  974
 7:    1    2    8    2    3           1048    1    6  |   23 | 1071
 8:    1              2    1    1    1    2  962       |    8 |  970
 9:    1    1    2    6    6    2         5    2  945  |   25 |  970
  --------------------------------------------------
      14    5   18   18   15    9   16   16   27   24  |  162 wrong predictions

most confusable, true est: count --  4 9: 9   7 2: 8   7 9: 6   9 4: 6   9 3: 6   poly2-SVM


saving to 60k/gamma3-C1-50000-10000-poly2-SVM-split0.npz


poly2-SVM split 1 --
ytrain class sizes: [4960 5627 4961 5108 4900 4523 4976 5126 4845 4974]
ytest class sizes:  [ 975 1102 1000 1023  941  900  955 1088 1020  996]
 275 sec  train poly2-SVM
  85 sec  test poly2-SVM: ypred == ytest 98.5 %

Confusion matrix: 98.5 % correct = 9845 / 10000  poly2-SVM
True classes down, estimated across   | wrong | total class sizes
  --------------------------------------------------
 0:  971    2                   1    1                 |    4 |  975
 1:      1095    3    1                   2         1  |    7 | 1102
 2:    2    1  988    2              1    2    3    1  |   12 | 1000
 3:              5  999         5         3    9    2  |   24 | 1023
 4:         4    2       931              1         3  |   10 |  941
 5:    1         2   11    1  874    5    1    2    3  |   26 |  900
 6:    8    2                   3  940         2       |   15 |  955
 7:    1    5    4    2    1           1070         5  |   18 | 1088
 8:         2         4    1    3    2    4 1002    2  |   18 | 1020
 9:    2    2         4    4         1    6    2  975  |   21 |  996
  --------------------------------------------------
      14   18   16   24    7   12   10   19   18   17  |  155 wrong predictions

most confusable, true est: count --  5 3: 11   3 8: 9   6 0: 8   9 7: 6   5 6: 5   poly2-SVM


saving to 60k/gamma3-C1-50000-10000-poly2-SVM-split1.npz


poly2-SVM split 2 --
ytrain class sizes: [4956 5545 5003 5130 4851 4505 4955 5151 4913 4991]
ytest class sizes:  [ 979 1184  958 1001  990  918  976 1063  952  979]
 276 sec  train poly2-SVM
  86 sec  test poly2-SVM: ypred == ytest 98.4 %

Confusion matrix: 98.4 % correct = 9841 / 10000  poly2-SVM
True classes down, estimated across   | wrong | total class sizes
  --------------------------------------------------
 0:  974         1    1    1    1                   1  |    5 |  979
 1:      1179    1    1                   2    1       |    5 | 1184
 2:    1    3  947    1    2              4            |   11 |  958
 3:              7  978         9         2    5       |   23 | 1001
 4:    1    3            973                   1   12  |   17 |  990
 5:    3         4    6       896    6         3       |   22 |  918
 6:    1                   1    3  968         3       |    8 |  976
 7:    3    5    4    1    6           1041         3  |   22 | 1063
 8:    1    3    1    5    1    3    3       932    3  |   20 |  952
 9:    3    1    1    2    8         1    7    3  953  |   26 |  979
  --------------------------------------------------
      13   15   19   17   19   16   10   15   16   19  |  159 wrong predictions

most confusable, true est: count --  4 9: 12   3 5: 9   9 4: 8   3 2: 7   9 7: 7   poly2-SVM


saving to 60k/gamma3-C1-50000-10000-poly2-SVM-split2.npz

scores: poly2-SVM    av 98.4  [98.4 98.5 98.4]  train, test: [276 275 276] [86 85 86] sec
dual_coef_ : (9, 11092)  float64  min av max -1 -2.56e-18 1
support_vectors_ : (11092, 784)  float64  min av max -0.0615 2.82e-05 0.234
}


{ classify with Random-Forest --
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=500, n_jobs=1,
            oob_score=False, random_state=0, verbose=0, warm_start=False)

Random-Forest split 0 --
ytrain class sizes: [4897 5622 4985 5118 4849 4534 4957 5143 4895 5000]
ytest class sizes:  [1038 1107  976 1013  992  889  974 1071  970  970]
 283 sec  train Random-Forest
   2 sec  test Random-Forest: ypred == ytest 96.8 %

Confusion matrix: 96.8 % correct = 9682 / 10000  Random-Forest
True classes down, estimated across   | wrong | total class sizes
  --------------------------------------------------
 0: 1020         2         1    1    4    1    8    1  |   18 | 1038
 1:      1082   12    6    3    1         1    2       |   25 | 1107
 2:    7    2  948    2    6         2    5    3    1  |   28 |  976
 3:         2   10  969    1    9         6   10    6  |   44 | 1013
 4:    2    2            961         6    1    4   16  |   31 |  992
 5:    3    1        13    1  854    8    1    6    2  |   35 |  889
 6:    3         4              3  961         3       |   13 |  974
 7:         4   12    1    7           1025    2   20  |   46 | 1071
 8:    2    2    1    2    3    4    5    3  937   11  |   33 |  970
 9:    4         3   19    7    2         5    5  925  |   45 |  970
  --------------------------------------------------
      21   13   44   43   29   20   25   23   43   57  |  318 wrong predictions

most confusable, true est: count --  7 9: 20   9 3: 19   4 9: 16   5 3: 13   1 2: 12   Random-Forest


saving to 60k/gamma3-C1-50000-10000-Random-Forest-split0.npz


Random-Forest split 1 --
ytrain class sizes: [4960 5627 4961 5108 4900 4523 4976 5126 4845 4974]
ytest class sizes:  [ 975 1102 1000 1023  941  900  955 1088 1020  996]
 285 sec  train Random-Forest
   2 sec  test Random-Forest: ypred == ytest 97.0 %

Confusion matrix: 97.0 % correct = 9701 / 10000  Random-Forest
True classes down, estimated across   | wrong | total class sizes
  --------------------------------------------------
 0:  968    1                        4         2       |    7 |  975
 1:      1085    6    5    2    1    1    1    1       |   17 | 1102
 2:    3    2  978    2    3         3    4    3    2  |   22 | 1000
 3:    1    2    9  976    1   11    1    9   10    3  |   47 | 1023
 4:    2    2         1  917         1    2    2   14  |   24 |  941
 5:    3         4   12    1  862    8         6    4  |   38 |  900
 6:    6    3              1    9  933         3       |   22 |  955
 7:         8    8         7           1052    1   12  |   36 | 1088
 8:    1    5    3    7    6    6         3  978   11  |   42 | 1020
 9:    4    2        12   13    1    2    6    4  952  |   44 |  996
  --------------------------------------------------
      20   25   30   39   34   28   20   25   32   46  |  299 wrong predictions

most confusable, true est: count --  4 9: 14   9 4: 13   7 9: 12   9 3: 12   5 3: 12   Random-Forest


saving to 60k/gamma3-C1-50000-10000-Random-Forest-split1.npz


Random-Forest split 2 --
ytrain class sizes: [4956 5545 5003 5130 4851 4505 4955 5151 4913 4991]
ytest class sizes:  [ 979 1184  958 1001  990  918  976 1063  952  979]
 283 sec  train Random-Forest
   2 sec  test Random-Forest: ypred == ytest 97.1 %

Confusion matrix: 97.1 % correct = 9711 / 10000  Random-Forest
True classes down, estimated across   | wrong | total class sizes
  --------------------------------------------------
 0:  969         1         1    1    1         4    2  |   10 |  979
 1:      1172    3    2    1    1    1    3    1       |   12 | 1184
 2:    2    1  934    5    3         1    8    4       |   24 |  958
 3:         2   14  957    1    8    2    8    5    4  |   44 | 1001
 4:    1    1    1       957    2    1    1    3   23  |   33 |  990
 5:    5    2    1    8    2  891    5         2    2  |   27 |  918
 6:    2    1    1         1    6  964         1       |   12 |  976
 7:    1    6    6    1    8           1022    4   15  |   41 | 1063
 8:    4    8    3   10    4    3    7       909    4  |   43 |  952
 9:    3    2    4    8    7    2    1    9    7  936  |   43 |  979
  --------------------------------------------------
      18   23   34   34   28   23   19   29   31   50  |  289 wrong predictions

most confusable, true est: count --  4 9: 23   7 9: 15   3 2: 14   8 3: 10   9 7: 9   Random-Forest


saving to 60k/gamma3-C1-50000-10000-Random-Forest-split2.npz

scores: Random-Forest av 97.0  [96.8 97 97.1]  train, test: [283 285 283] [2 2 2] sec
feature_importances_ : (784,)  float64  min av max 0 0.00128 0.0123
}

     3839.10 real      3875.89 user        12.26 sys
	#!/usr/bin/env python
	# from http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html

	from __future__ import division
	import re

	from sklearn.neighbors import KNeighborsClassifier
	from sklearn.svm import SVC, LinearSVC # libsvm, liblinear
	from sklearn.tree import DecisionTreeClassifier
	from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
	from sklearn.linear_model import LogisticRegression, SGDClassifier
	from sklearn.naive_bayes import GaussianNB
	from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
	from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis

	#...............................................................................
	def classifiers( only="knn\|poly\|rbf\|random-forest",
	knnear=3, C=1, gamma=3, seed=0, **kwignored ):
	""" -> [ [name, classifier] ... ] with names starting knn\|poly\|... e.g.
	[ ["KNN", KNeighborsClassifier(...)],
	["rbf-SVM", SVC(...)],
	... ]
	only= "" or "*": all
	"""
	all_classifiers = [

	["KNN",
	# $sklearn/neighbors/classification.py $sklearn/neighbors/base.py
	# http://scikit-learn.org/stable/modules/neighbors.html
	KNeighborsClassifier(
	algorithm="brute",
	weights="distance", # 1/dist
	# default uniform, nnear 3: a b b => b
	n_neighbors=knnear, n_jobs=1
	)],

	# $sklearn/svm/classes.py
	# http://scikit-learn.org/stable/modules/svm.html#svm-classification
	# The implementation is based on libsvm. The fit time complexity
	# is more than quadratic with the number of samples
	# The multiclass support is handled according to a one-vs-one scheme.
	["rbf-SVM",
	SVC( kernel="rbf",
	C=C,
	gamma=gamma,
	cache_size=2000, # M
	decision_function_shape="ovr", # one vs rest == default ovo ?
	random_state=seed,
	)],

	["poly2-SVM", # on mnist ~ as good as rbf, twice as fast
	SVC( kernel="poly", degree=2, coef0=0,
	C=C,
	gamma=gamma,
	cache_size=2000, # M
	decision_function_shape="ovr", # one vs rest
	random_state=seed,
	)],

	["Random-Forest",
	# $sklearn/ensemble/forest.py
	# http://scikit-learn.org/stable/modules/ensemble.html
	RandomForestClassifier(
	n_estimators=500, max_features="auto",
	random_state=seed, n_jobs=1
	)],

	["libsvm-linear", SVC( kernel="linear", # ~ 94
	C=C,
	cache_size=2000, # M
	decision_function_shape="ovr", # ? coef_ 45 x 784
	random_state=seed,
	)],

	["liblinear", LinearSVC( # ~ 91
	C=C,
	multi_class='ovr', # 'crammer_singer': 45 pairs
	dual=False,
	random_state=seed,
	)],

	["SGD", SGDClassifier( n_iter=10, random_state=seed # ~ 91 %, 5 sec
	)],

	# ["Logistic-Regression", # ~ 91
	# LogisticRegression( C=1, solver="lbfgs", n_jobs=1
	# )],
	# ["Linear-Discriminant-Analysis", LinearDiscriminantAnalysis()],
	# ["Naive-Bayes", GaussianNB()],
	# ["AdaBoost", AdaBoostClassifier()],
	# ["Decision-Tree", DecisionTreeClassifier( )], # max_depth=5
	# ["Quadratic-Discriminant-Analysis", QuadraticDiscriminantAnalysis()],
	]

	if only in ("", "*"):
	return all_classifiers
	return [c for c in all_classifiers
	if re.match( only, c[0], re.IGNORECASE )]

	#...............................................................................
	if __name__ == "__main__":
	import inspect

	for (classifname, classif) in classifiers( only="*" ):
	f = inspect.getfile( classif.__class__ )[:-1]
	print "%s: %s \n%s \n" % (
	classifname, f, classif )
	#!/usr/bin/env python
	# -- coding: utf-8 --
	# from http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html
	# on https://en.wikipedia.org/wiki/MNIST_database
	"""
	=====================
	Classifier comparison
	=====================

	A comparison of a several classifiers in scikit-learn on MNIST digits.
	The point of this example is to illustrate the nature of decision boundaries
	of different classifiers.
	This should be taken with a grain of salt, as the intuition conveyed by
	these examples does not necessarily carry over to real datasets.

	Particularly in high-dimensional spaces, data can more easily be separated
	linearly and the simplicity of classifiers such as naive Bayes and linear SVMs
	might lead to better generalization than is achieved by other classifiers.

	"""

	# mnist, no plots
	# Code source: Gaël Varoquaux
	# Andreas Müller
	# Modified for documentation by Jaques Grobler
	# License: BSD 3 clause

	from __future__ import division
	import sys
	import numpy as np
	import sklearn
	from sklearn.cross_validation import train_test_split
	from sklearn.datasets.base import Bunch
	from sklearn.preprocessing import normalize, scale

	from classifiers import classifiers
	import confus
	import etcutil as nu
	import mnist

	__version__ = "2016-07-18 july denis-bz-py t-online.de"

	np.set_printoptions( threshold=20, edgeitems=14, linewidth=140,
	formatter = dict( float = lambda x: "%.3g" % x ))

	print "\n", 80 * "-"
	print "python", " ".join(sys.argv)
	print "versions: sklearn %s numpy %s python %s " % (
	sklearn.__version__, np.__version__, sys.version.split()[0] )

	#-------------------------------------------------------------------------------
	# outline:
	# 1 parameters
	# 2 load, subset, normalize the data
	# 3 classifiers = [["rbf-SVM", SVC()] ... ] that match "only"
	# 4 for classif:
	# train_test_split, classif .fit .predict, save
	#-------------------------------------------------------------------------------

	ntrain = 50000
	ntest = 10000
	digits = [] # [] all / [4,9]
	only = "knn\|poly\|rbf\|random-forest" # only these classifiers
	nsplit = 3 # iter split train test, fit, predict

	# params for various classifiers --
	knnear = 3
	C = 1
	gamma = 3 # exp( - gamma * dist^2 ): exp( -3 ) = .05

	tag = "tmp"
	save = 1 # > tag.npz
	seed = 0

	# to change these params in sh or ipython, run this.py a=1 b=None c=\"str\" ...
	for arg in sys.argv[1:]:
	exec( arg )

	np.random.seed( seed )

	#...........................................................................
	bag = mnist.load_mnist( ntrain + ntest, dtype=np.float32 )
	X = bag.data
	y = bag.target
	if digits:
	X, y = nu.xy_2class( X, y, *digits ) # e.g. [4,9] only

	print "rows /= \|row\| (cos distance), -= mean"
	X = normalize( X )
	X -= X.mean( axis=0 ) # for svm -- properly Xtrain Xtest
	# X = nu.div0( X, X.std() ) # ? knn a bit worse, svm grid gamma

	params = """
	X %s
	ntrain %d
	ntest %d
	digits %s
	y counts %s
	knnear %d
	C %.3g
	gamma %.3g
	nsplit %d
	seed %d
	run_date %s
	""" % ( nu.asum(X), ntrain, ntest, digits, np.bincount(y),
	knnear, C, gamma, nsplit, seed,
	nu.isoday() )
	print "\nparams --", params

	#...............................................................................
	def classify_splits( classifname, classif, X, y, nsplit=1 ):
	""" train_test_split, classif .fit .predict, save
	for nsplit splits
	-> [ Bag( ... ) for each split ]
	"""
	print "\n{ classify with %s --" % classifname
	classifstr = str(classif)
	print classifstr
	np.random.seed( seed ) # just in case
	classifstep = classif
	# classifstep = classif.steps[1][1] # grr in pipe
	bags = []
	testfraction = ntest / (ntrain + ntest)
	scores = []
	trainsecs = []
	testsecs = []
	for jsplit in range(nsplit): # ~ kfold
	print "\n%s split %d --" % (classifname, jsplit)
	Xtrain, Xtest, ytrain, ytest = \
	train_test_split( X, y, test_size=testfraction, random_state=jsplit )
	# $etc/traintestsplit.py + jtrain jtest -> Xall yall
	print "ytrain class sizes:", np.bincount( ytrain )
	print "ytest class sizes: ", np.bincount( ytest )
	nu.ptime()

	#.......................................................................
	classif.fit( Xtrain, ytrain )
	trainsec = nu.ptime( "train " + classifname )

	ypred = classif.predict( Xtest )
	score = (ypred == ytest) .mean() * 100
	# Proportion classified correctly is an improper scoring rule, "Brier score" ?

	testsec = nu.ptime( "test %s: ypred == ytest %.1f %%" % (classifname, score) )
	scores += [score]
	trainsecs += [trainsec]
	testsecs += [testsec]

	confusmat = confus.pconfus( ytest, ypred, label=classifname ) # print

	b = Bunch(
	classif = classifname,
	classifstr = classifstr,
	confusmat = confusmat,
	ntrain = ntrain,
	ntest = ntest,
	params = params,
	random_state = jsplit,
	score = score,
	sec = [int(trainsec), int(testsec)],
	ypred = ypred,
	ytest = ytest,
	)
	if save: # to plot etc.
	out = "%s-%s-split%d.npz" % ( # grr
	tag, classifname, jsplit )
	print "saving to %s \n" % out
	np.savez( out, **b )
	bags.append( b )

	scores = np.array( scores )
	trainsecs = nu.ints( trainsecs )
	testsecs = nu.ints( testsecs )
	print "scores: %-12s av %.1f %s train, test: %s %s sec " % (
	classifname, scores.mean(), scores, trainsecs, testsecs )

	# how many coefs are needed to predict() ? rbf-svm ~ 1/3 ntest
	for attr in "coef_ dual_coef_ support_vectors_ feature_importances_ ".split():
	if hasattr( classifstep, attr ):
	print "%s : %s" % (
	attr, nu.asum( getattr( classifstep, attr )))
	print "}\n"

	return bags

	#...........................................................................
	for (classifname, classif) in classifiers( only,
	knnear=knnear, C=C, gamma=gamma, seed=seed ):
	bags = classify_splits( classifname, classif, X, y, nsplit=nsplit )
	#!/usr/bin/env python
	""" etcutil.py: asum div0 findfile ... """

	from __future__ import division
	from os.path import expanduser, expandvars, isfile, join
	import time
	import numpy as np
	from sklearn.datasets.base import Bunch as Bag


	def asum( X ):
	""" array summary: "shape type min av max [density]" """
	if not hasattr( X, "dtype" ):
	return str(X)
	if hasattr( X, "todense" ): # issparse
	sparsetype = type(X).__name__ + " " # csr_matrix etc.
	density = " for the %.3g %% non-0" % (
	100. * X.nnz / np.prod( X.shape ))
	else:
	sparsetype = density = ""
	return "%s %s%s min av max %.3g %.3g %.3g %s" % (
	X.shape, sparsetype, X.dtype, X.min(), X.mean(), X.max(), density )

	def div0( a, b ):
	""" ignore / 0: div0( [-1, 0, 1], 0 ) -> [0 0 0] """
	with np.errstate(divide='ignore', invalid='ignore'):
	c = np.true_divide( a, b )
	c[ ~ np.isfinite( c )] = 0
	return c

	def findfile( filename, dirs=["", "data", "$SCIKIT_LEARN_DATA", "$webdata"] ):
	""" -> first dir/file found in a list of dirs
	or IOError if none
	"""
	for dir in dirs:
	dirfile = expanduser( expandvars( join( dir, filename )))
	if isfile( dirfile ):
	return dirfile
	raise IOError( "file \"%s\" not found in folders %s" % (
	filename, dirs ))

	def ints( X ):
	return np.round(X).astype(int) # NaN Inf -> - maxint

	def isoday():
	return time.strftime( "%Y-%m-%d %h %H:%M" ) # 2011-11-15 Nov 12:06

	def mkdirpart( filename ):
	""" tmp/file: mkdir tmp """
	import os
	dir = os.path.dirname( filename )
	if dir and not os.path.isdir( dir ):
	os.makedirs( dir )

	def ptime( msg=None, T=[0]):
	""" ptime()
	...
	dt = ptime(): delta seconds (wall clock) from previous call
	dt = ptime( "message" ) prints dt, message
	"""
	t = time.time() # seconds since epoch
	dt = t - T[0]
	if msg:
	print "%4.0f sec %s" % (dt, msg)
	T[0] = t
	return dt

	def quantiles( x, q=[ 10, 25, 50, 75, 90 ]):
	p = np.percentile( x, q )
	return "quantiles %s: %s" % (np.array(q), p) # with caller's np.print_options

	def xy_2class( X, y, c=4, cc=9 ):
	""" -> subset X, y 4 or 9 only """
	yc = (y == c)
	ycc = (y == cc)
	print "xy_2class %d %d: %d %d " % (c, cc, yc.sum(), ycc.sum())
	J = (yc \| ycc)
	y = y[J]
	return X[J], (y == cc).astype(int)

	def subset_xy( X, y, classes=[4,9], plusminus1=False ):
	""" -> X[J], y[J] where y == 4 or 9 """
	if len(classes) == 0:
	return X, y
	y = np.asarray( y )
	# lookup table: 4 -> 1, 9 -> 2, rest -> 0
	lut = np.zeros( y.max() + 1, dtype=int )
	lut[classes] = np.arange( len(classes) ) + 1
	J = np.where( lut[y] )[0]
	y01 = lut[ y[J] ] - 1 # 4 -> 0, 9 -> 1
	if plusminus1:
	y01 *= 2; y01 -=1 # 2class +1 -1
	return X[J], y01
	# from: python compare_classifiers_mnist.py gamma=3 C=1 digits=[] ntrain=50000 ntest=10000 tag="60k/gamma3-C1-50000-10000"
	# run: 20 Jul 2016 11:29 in ~bz/py/ml/sklearn/mnist Denis-iMac 10.8.3

	--------------------------------------------------------------------------------
	python compare_classifiers_mnist.py gamma=3 C=1 digits=[] ntrain=50000 ntest=10000 tag="60k/gamma3-C1-50000-10000"
	versions: sklearn 0.17.1 numpy 1.11.1 python 2.7.11
	rows /= \|row\| (cos distance), -= mean

	params --
	X (60000, 784) float32 min av max -0.0615 -5.92e-09 0.235
	ntrain 50000
	ntest 10000
	digits []
	y counts [5935 6729 5961 6131 5841 5423 5931 6214 5865 5970]
	knnear 3
	C 1
	gamma 3
	nsplit 3
	seed 0
	run_date 2016-07-20 Jul 11:29


	{ classify with KNN --
	KNeighborsClassifier(algorithm='brute', leaf_size=30, metric='minkowski',
	metric_params=None, n_jobs=1, n_neighbors=3, p=2,
	weights='distance')

	KNN split 0 --
	ytrain class sizes: [4897 5622 4985 5118 4849 4534 4957 5143 4895 5000]
	ytest class sizes: [1038 1107 976 1013 992 889 974 1071 970 970]
	0 sec train KNN
	12 sec test KNN: ypred == ytest 97.5 %

	Confusion matrix: 97.5 % correct = 9748 / 10000 KNN
	True classes down, estimated across \| wrong \| total class sizes
	--------------------------------------------------
	0: 1032 1 1 3 1 \| 6 \| 1038
	1: 1101 4 2 \| 6 \| 1107
	2: 10 2 947 2 1 6 6 2 \| 29 \| 976
	3: 3 4 971 1 7 6 13 8 \| 42 \| 1013
	4: 1 4 1 955 7 1 1 22 \| 37 \| 992
	5: 4 1 4 1 860 10 1 6 2 \| 29 \| 889
	6: 6 1 1 965 1 \| 9 \| 974
	7: 5 6 5 1040 1 14 \| 31 \| 1071
	8: 2 10 8 2 4 3 1 935 5 \| 35 \| 970
	9: 3 1 7 7 1 6 3 942 \| 28 \| 970
	--------------------------------------------------
	26 26 16 22 16 14 24 23 31 54 \| 252 wrong predictions

	most confusable, true est: count -- 4 9: 22 7 9: 14 3 8: 13 2 0: 10 5 6: 10 KNN


	saving to 60k/gamma3-C1-50000-10000-KNN-split0.npz


	KNN split 1 --
	ytrain class sizes: [4960 5627 4961 5108 4900 4523 4976 5126 4845 4974]
	ytest class sizes: [ 975 1102 1000 1023 941 900 955 1088 1020 996]
	0 sec train KNN
	11 sec test KNN: ypred == ytest 97.5 %

	Confusion matrix: 97.5 % correct = 9750 / 10000 KNN
	True classes down, estimated across \| wrong \| total class sizes
	--------------------------------------------------
	0: 971 2 2 \| 4 \| 975
	1: 1097 2 2 1 \| 5 \| 1102
	2: 3 1 978 2 3 7 4 2 \| 22 \| 1000
	3: 1 1 4 986 7 1 6 13 4 \| 37 \| 1023
	4: 1 8 905 4 3 20 \| 36 \| 941
	5: 2 1 1 12 2 861 12 1 4 4 \| 39 \| 900
	6: 4 2 2 945 2 \| 10 \| 955
	7: 1 7 2 5 1060 1 12 \| 28 \| 1088
	8: 2 7 1 7 2 7 8 4 978 4 \| 42 \| 1020
	9: 2 5 8 1 2 6 3 969 \| 27 \| 996
	--------------------------------------------------
	14 31 10 28 17 17 32 28 27 46 \| 250 wrong predictions

	most confusable, true est: count -- 4 9: 20 3 8: 13 7 9: 12 5 6: 12 5 3: 12 KNN


	saving to 60k/gamma3-C1-50000-10000-KNN-split1.npz


	KNN split 2 --
	ytrain class sizes: [4956 5545 5003 5130 4851 4505 4955 5151 4913 4991]
	ytest class sizes: [ 979 1184 958 1001 990 918 976 1063 952 979]
	0 sec train KNN
	11 sec test KNN: ypred == ytest 97.7 %

	Confusion matrix: 97.7 % correct = 9770 / 10000 KNN
	True classes down, estimated across \| wrong \| total class sizes
	--------------------------------------------------
	0: 977 1 1 \| 2 \| 979
	1: 1178 1 1 1 1 2 \| 6 \| 1184
	2: 5 4 938 2 1 2 4 1 1 \| 20 \| 958
	3: 1 9 969 5 1 4 6 6 \| 32 \| 1001
	4: 4 962 1 1 2 20 \| 28 \| 990
	5: 6 1 6 2 885 9 4 5 \| 33 \| 918
	6: 3 1 2 4 966 \| 10 \| 976
	7: 2 8 2 2 1036 13 \| 27 \| 1063
	8: 2 12 1 10 2 6 5 908 6 \| 44 \| 952
	9: 2 2 4 5 1 1 9 4 951 \| 28 \| 979
	--------------------------------------------------
	20 32 15 22 15 18 20 21 15 52 \| 230 wrong predictions

	most confusable, true est: count -- 4 9: 20 7 9: 13 8 1: 12 8 3: 10 5 6: 9 KNN


	saving to 60k/gamma3-C1-50000-10000-KNN-split2.npz

	scores: KNN av 97.6 [97.5 97.5 97.7] train, test: [0 0 0] [12 11 11] sec
	}


	{ classify with rbf-SVM --
	SVC(C=1, cache_size=2000, class_weight=None, coef0=0.0,
	decision_function_shape='ovr', degree=3, gamma=3, kernel='rbf',
	max_iter=-1, probability=False, random_state=0, shrinking=True,
	tol=0.001, verbose=False)

	rbf-SVM split 0 --
	ytrain class sizes: [4897 5622 4985 5118 4849 4534 4957 5143 4895 5000]
	ytest class sizes: [1038 1107 976 1013 992 889 974 1071 970 970]
	510 sec train rbf-SVM
	113 sec test rbf-SVM: ypred == ytest 98.4 %

	Confusion matrix: 98.4 % correct = 9836 / 10000 rbf-SVM
	True classes down, estimated across \| wrong \| total class sizes
	--------------------------------------------------
	0: 1030 1 1 2 3 1 \| 8 \| 1038
	1: 1094 6 4 1 1 1 \| 13 \| 1107
	2: 3 959 1 3 5 4 1 \| 17 \| 976
	3: 4 986 7 5 7 4 \| 27 \| 1013
	4: 1 2 973 5 1 2 8 \| 19 \| 992
	5: 2 1 4 871 6 1 2 2 \| 18 \| 889
	6: 1 2 1 968 2 \| 6 \| 974
	7: 2 6 1 5 1053 1 3 \| 18 \| 1071
	8: 1 2 1 3 3 2 958 \| 12 \| 970
	9: 2 1 2 5 7 2 6 1 944 \| 26 \| 970
	--------------------------------------------------
	9 4 23 17 17 14 16 21 23 20 \| 164 wrong predictions

	most confusable, true est: count -- 4 9: 8 3 5: 7 9 4: 7 3 8: 7 9 7: 6 rbf-SVM


	saving to 60k/gamma3-C1-50000-10000-rbf-SVM-split0.npz


	rbf-SVM split 1 --
	ytrain class sizes: [4960 5627 4961 5108 4900 4523 4976 5126 4845 4974]
	ytest class sizes: [ 975 1102 1000 1023 941 900 955 1088 1020 996]
	505 sec train rbf-SVM
	113 sec test rbf-SVM: ypred == ytest 98.5 %

	Confusion matrix: 98.5 % correct = 9850 / 10000 rbf-SVM
	True classes down, estimated across \| wrong \| total class sizes
	--------------------------------------------------
	0: 971 1 2 1 \| 4 \| 975
	1: 1089 4 4 3 1 1 \| 13 \| 1102
	2: 2 991 1 1 4 1 \| 9 \| 1000
	3: 7 998 3 7 5 3 \| 25 \| 1023
	4: 3 1 930 2 5 \| 11 \| 941
	5: 1 3 5 880 6 1 3 1 \| 20 \| 900
	6: 5 1 1 4 942 2 \| 13 \| 955
	7: 1 6 2 2 1072 5 \| 16 \| 1088
	8: 3 1 2 1 5 1 3 1002 2 \| 18 \| 1020
	9: 3 4 6 1 4 3 975 \| 21 \| 996
	--------------------------------------------------
	9 17 18 16 10 12 11 24 16 17 \| 150 wrong predictions

	most confusable, true est: count -- 3 2: 7 3 7: 7 5 6: 6 9 4: 6 7 1: 6 rbf-SVM


	saving to 60k/gamma3-C1-50000-10000-rbf-SVM-split1.npz


	rbf-SVM split 2 --
	ytrain class sizes: [4956 5545 5003 5130 4851 4505 4955 5151 4913 4991]
	ytest class sizes: [ 979 1184 958 1001 990 918 976 1063 952 979]
	506 sec train rbf-SVM
	113 sec test rbf-SVM: ypred == ytest 98.4 %

	Confusion matrix: 98.4 % correct = 9842 / 10000 rbf-SVM
	True classes down, estimated across \| wrong \| total class sizes
	--------------------------------------------------
	0: 974 2 1 1 1 \| 5 \| 979
	1: 1176 1 1 1 4 1 \| 8 \| 1184
	2: 3 1 946 2 2 4 \| 12 \| 958
	3: 10 975 3 6 7 \| 26 \| 1001
	4: 1 1 972 1 2 13 \| 18 \| 990
	5: 3 2 4 1 903 4 1 \| 15 \| 918
	6: 1 1 1 3 969 1 \| 7 \| 976
	7: 1 3 3 1 2 1049 4 \| 14 \| 1063
	8: 2 4 1 4 2 4 4 928 3 \| 24 \| 952
	9: 3 2 1 4 8 1 1 7 2 950 \| 29 \| 979
	--------------------------------------------------
	13 11 22 16 17 14 9 23 12 21 \| 158 wrong predictions

	most confusable, true est: count -- 4 9: 13 3 2: 10 9 4: 8 9 7: 7 3 8: 7 rbf-SVM


	saving to 60k/gamma3-C1-50000-10000-rbf-SVM-split2.npz

	scores: rbf-SVM av 98.4 [98.4 98.5 98.4] train, test: [510 505 506] [113 113 113] sec
	dual_coef_ : (9, 14374) float64 min av max -1 1.76e-18 1
	support_vectors_ : (14374, 784) float64 min av max -0.0615 -0.00033 0.235
	}


	{ classify with poly2-SVM --
	SVC(C=1, cache_size=2000, class_weight=None, coef0=0,
	decision_function_shape='ovr', degree=2, gamma=3, kernel='poly',
	max_iter=-1, probability=False, random_state=0, shrinking=True,
	tol=0.001, verbose=False)

	poly2-SVM split 0 --
	ytrain class sizes: [4897 5622 4985 5118 4849 4534 4957 5143 4895 5000]
	ytest class sizes: [1038 1107 976 1013 992 889 974 1071 970 970]
	276 sec train poly2-SVM
	86 sec test poly2-SVM: ypred == ytest 98.4 %

	Confusion matrix: 98.4 % correct = 9838 / 10000 poly2-SVM
	True classes down, estimated across \| wrong \| total class sizes
	--------------------------------------------------
	0: 1029 1 1 3 3 1 \| 9 \| 1038
	1: 1099 5 1 1 1 \| 8 \| 1107
	2: 5 953 4 3 4 5 2 \| 23 \| 976
	3: 1 1 995 5 1 2 5 3 \| 18 \| 1013
	4: 1 1 971 6 2 2 9 \| 21 \| 992
	5: 3 1 4 868 5 5 3 \| 21 \| 889
	6: 2 1 968 3 \| 6 \| 974
	7: 1 2 8 2 3 1048 1 6 \| 23 \| 1071
	8: 1 2 1 1 1 2 962 \| 8 \| 970
	9: 1 1 2 6 6 2 5 2 945 \| 25 \| 970
	--------------------------------------------------
	14 5 18 18 15 9 16 16 27 24 \| 162 wrong predictions

	most confusable, true est: count -- 4 9: 9 7 2: 8 7 9: 6 9 4: 6 9 3: 6 poly2-SVM


	saving to 60k/gamma3-C1-50000-10000-poly2-SVM-split0.npz


	poly2-SVM split 1 --
	ytrain class sizes: [4960 5627 4961 5108 4900 4523 4976 5126 4845 4974]
	ytest class sizes: [ 975 1102 1000 1023 941 900 955 1088 1020 996]
	275 sec train poly2-SVM
	85 sec test poly2-SVM: ypred == ytest 98.5 %

	Confusion matrix: 98.5 % correct = 9845 / 10000 poly2-SVM
	True classes down, estimated across \| wrong \| total class sizes
	--------------------------------------------------
	0: 971 2 1 1 \| 4 \| 975
	1: 1095 3 1 2 1 \| 7 \| 1102
	2: 2 1 988 2 1 2 3 1 \| 12 \| 1000
	3: 5 999 5 3 9 2 \| 24 \| 1023
	4: 4 2 931 1 3 \| 10 \| 941
	5: 1 2 11 1 874 5 1 2 3 \| 26 \| 900
	6: 8 2 3 940 2 \| 15 \| 955
	7: 1 5 4 2 1 1070 5 \| 18 \| 1088
	8: 2 4 1 3 2 4 1002 2 \| 18 \| 1020
	9: 2 2 4 4 1 6 2 975 \| 21 \| 996
	--------------------------------------------------
	14 18 16 24 7 12 10 19 18 17 \| 155 wrong predictions

	most confusable, true est: count -- 5 3: 11 3 8: 9 6 0: 8 9 7: 6 5 6: 5 poly2-SVM


	saving to 60k/gamma3-C1-50000-10000-poly2-SVM-split1.npz


	poly2-SVM split 2 --
	ytrain class sizes: [4956 5545 5003 5130 4851 4505 4955 5151 4913 4991]
	ytest class sizes: [ 979 1184 958 1001 990 918 976 1063 952 979]
	276 sec train poly2-SVM
	86 sec test poly2-SVM: ypred == ytest 98.4 %

	Confusion matrix: 98.4 % correct = 9841 / 10000 poly2-SVM
	True classes down, estimated across \| wrong \| total class sizes
	--------------------------------------------------
	0: 974 1 1 1 1 1 \| 5 \| 979
	1: 1179 1 1 2 1 \| 5 \| 1184
	2: 1 3 947 1 2 4 \| 11 \| 958
	3: 7 978 9 2 5 \| 23 \| 1001
	4: 1 3 973 1 12 \| 17 \| 990
	5: 3 4 6 896 6 3 \| 22 \| 918
	6: 1 1 3 968 3 \| 8 \| 976
	7: 3 5 4 1 6 1041 3 \| 22 \| 1063
	8: 1 3 1 5 1 3 3 932 3 \| 20 \| 952
	9: 3 1 1 2 8 1 7 3 953 \| 26 \| 979
	--------------------------------------------------
	13 15 19 17 19 16 10 15 16 19 \| 159 wrong predictions

	most confusable, true est: count -- 4 9: 12 3 5: 9 9 4: 8 3 2: 7 9 7: 7 poly2-SVM


	saving to 60k/gamma3-C1-50000-10000-poly2-SVM-split2.npz

	scores: poly2-SVM av 98.4 [98.4 98.5 98.4] train, test: [276 275 276] [86 85 86] sec
	dual_coef_ : (9, 11092) float64 min av max -1 -2.56e-18 1
	support_vectors_ : (11092, 784) float64 min av max -0.0615 2.82e-05 0.234
	}


	{ classify with Random-Forest --
	RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
	max_depth=None, max_features='auto', max_leaf_nodes=None,
	min_samples_leaf=1, min_samples_split=2,
	min_weight_fraction_leaf=0.0, n_estimators=500, n_jobs=1,
	oob_score=False, random_state=0, verbose=0, warm_start=False)

	Random-Forest split 0 --
	ytrain class sizes: [4897 5622 4985 5118 4849 4534 4957 5143 4895 5000]
	ytest class sizes: [1038 1107 976 1013 992 889 974 1071 970 970]
	283 sec train Random-Forest
	2 sec test Random-Forest: ypred == ytest 96.8 %

	Confusion matrix: 96.8 % correct = 9682 / 10000 Random-Forest
	True classes down, estimated across \| wrong \| total class sizes
	--------------------------------------------------
	0: 1020 2 1 1 4 1 8 1 \| 18 \| 1038
	1: 1082 12 6 3 1 1 2 \| 25 \| 1107
	2: 7 2 948 2 6 2 5 3 1 \| 28 \| 976
	3: 2 10 969 1 9 6 10 6 \| 44 \| 1013
	4: 2 2 961 6 1 4 16 \| 31 \| 992
	5: 3 1 13 1 854 8 1 6 2 \| 35 \| 889
	6: 3 4 3 961 3 \| 13 \| 974
	7: 4 12 1 7 1025 2 20 \| 46 \| 1071
	8: 2 2 1 2 3 4 5 3 937 11 \| 33 \| 970
	9: 4 3 19 7 2 5 5 925 \| 45 \| 970
	--------------------------------------------------
	21 13 44 43 29 20 25 23 43 57 \| 318 wrong predictions

	most confusable, true est: count -- 7 9: 20 9 3: 19 4 9: 16 5 3: 13 1 2: 12 Random-Forest


	saving to 60k/gamma3-C1-50000-10000-Random-Forest-split0.npz


	Random-Forest split 1 --
	ytrain class sizes: [4960 5627 4961 5108 4900 4523 4976 5126 4845 4974]
	ytest class sizes: [ 975 1102 1000 1023 941 900 955 1088 1020 996]
	285 sec train Random-Forest
	2 sec test Random-Forest: ypred == ytest 97.0 %

	Confusion matrix: 97.0 % correct = 9701 / 10000 Random-Forest
	True classes down, estimated across \| wrong \| total class sizes
	--------------------------------------------------
	0: 968 1 4 2 \| 7 \| 975
	1: 1085 6 5 2 1 1 1 1 \| 17 \| 1102
	2: 3 2 978 2 3 3 4 3 2 \| 22 \| 1000
	3: 1 2 9 976 1 11 1 9 10 3 \| 47 \| 1023
	4: 2 2 1 917 1 2 2 14 \| 24 \| 941
	5: 3 4 12 1 862 8 6 4 \| 38 \| 900
	6: 6 3 1 9 933 3 \| 22 \| 955
	7: 8 8 7 1052 1 12 \| 36 \| 1088
	8: 1 5 3 7 6 6 3 978 11 \| 42 \| 1020
	9: 4 2 12 13 1 2 6 4 952 \| 44 \| 996
	--------------------------------------------------
	20 25 30 39 34 28 20 25 32 46 \| 299 wrong predictions

	most confusable, true est: count -- 4 9: 14 9 4: 13 7 9: 12 9 3: 12 5 3: 12 Random-Forest


	saving to 60k/gamma3-C1-50000-10000-Random-Forest-split1.npz


	Random-Forest split 2 --
	ytrain class sizes: [4956 5545 5003 5130 4851 4505 4955 5151 4913 4991]
	ytest class sizes: [ 979 1184 958 1001 990 918 976 1063 952 979]
	283 sec train Random-Forest
	2 sec test Random-Forest: ypred == ytest 97.1 %

	Confusion matrix: 97.1 % correct = 9711 / 10000 Random-Forest
	True classes down, estimated across \| wrong \| total class sizes
	--------------------------------------------------
	0: 969 1 1 1 1 4 2 \| 10 \| 979
	1: 1172 3 2 1 1 1 3 1 \| 12 \| 1184
	2: 2 1 934 5 3 1 8 4 \| 24 \| 958
	3: 2 14 957 1 8 2 8 5 4 \| 44 \| 1001
	4: 1 1 1 957 2 1 1 3 23 \| 33 \| 990
	5: 5 2 1 8 2 891 5 2 2 \| 27 \| 918
	6: 2 1 1 1 6 964 1 \| 12 \| 976
	7: 1 6 6 1 8 1022 4 15 \| 41 \| 1063
	8: 4 8 3 10 4 3 7 909 4 \| 43 \| 952
	9: 3 2 4 8 7 2 1 9 7 936 \| 43 \| 979
	--------------------------------------------------
	18 23 34 34 28 23 19 29 31 50 \| 289 wrong predictions

	most confusable, true est: count -- 4 9: 23 7 9: 15 3 2: 14 8 3: 10 9 7: 9 Random-Forest


	saving to 60k/gamma3-C1-50000-10000-Random-Forest-split2.npz

	scores: Random-Forest av 97.0 [96.8 97 97.1] train, test: [283 285 283] [2 2 2] sec
	feature_importances_ : (784,) float64 min av max 0 0.00128 0.0123
	}

	3839.10 real 3875.89 user 12.26 sys