Saurabh7/multi.ipynb

## multi.ipynb
{
 "metadata": {
  "name": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Multiclass Reductions"
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "by Chiyuan Zhang and S&ouml;ren Sonnenburg"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Since binary classification problems are one of the most thoroughly studied\n",
      "problems in machine learning, it is very appealing to consider reducing\n",
      "multiclass problems to binary ones. Then many advanced learning and\n",
      "optimization techniques as well as generalization bound analysis for binary\n",
      "classification can be utilized."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "\n",
      "\n",
      "In `SHOGUN`, the strategies of reducing a multiclass problem to binary\n",
      "classification problems are described by an instance of\n",
      "`CMulticlassStrategy`. A multiclass strategy describes\n",
      "\n",
      "* How to train the multiclass machine as a number of binary machines?\n",
      "    * How many binary machines are needed?\n",
      "    * For each binary machine, what subset of the training samples are used, and how are they colored? In multiclass problems, we use *coloring* to refer partitioning the classes into two groups: $+1$ and $-1$, or black and white, or any other meaningful names.\n",
      "* How to combine the prediction results of binary machines into the final multiclass prediction?\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The user can derive from the virtual class [CMulticlassStrategy](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CMulticlassStrategy.html) to\n",
      "implement a customized multiclass strategy. But usually the built-in strategies\n",
      "are enough for general problems. We will describe the built-in *One-vs-Rest*,\n",
      "*One-vs-One* and *Error-Correcting Output Codes* strategies in this tutorial.\n",
      "\n",
      "The basic routine to use a multiclass machine with reduction to binary problems\n",
      "in shogun is to create a generic multiclass machine and then assign a particular\n",
      "multiclass strategy and a base binary machine."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## One-vs-Rest and One-vs-One\n",
      "\n",
      "The *One-vs-Rest* strategy is implemented in\n",
      "`CMulticlassOneVsRestStrategy`. As indicated by the name, this\n",
      "strategy reduce a $K$-class problem to $K$ binary sub-problems. For the $k$-th\n",
      "problem, where $k\\in\\{1,\\ldots,K\\}$, the samples from class $k$ are colored as\n",
      "$+1$, and the samples from other classes are colored as $-1$. The multiclass\n",
      "prediction is given as\n",
      "\n",
      "$$\n",
      "f(x) = \\operatorname*{argmax}_{k\\in\\{1,\\ldots,K\\}}\\; f_k(x)\n",
      "$$\n",
      "\n",
      "where $f_k(x)$ is the prediction of the $k$-th binary machines.\n",
      "\n",
      "The One-vs-Rest strategy is easy to implement yet produces excellent performance\n",
      "in many cases. One interesting paper, [Rifkin, R. M. and Klautau, A. (2004). *In defense of one-vs-all classification*. Journal of Machine\n",
      "Learning Research, 5:101\u2013141](http://jmlr.org/papers/v5/rifkin04a.html), it was shown that the\n",
      "One-vs-Rest strategy can be\n",
      "\n",
      "> as accurate as any other approach, assuming that the underlying binary\n",
      "classifiers are well-tuned regularized classifiers such as support vector\n",
      "machines.\n",
      "\n",
      "Implemented in [CMulticlassOneVsOneStrategy](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CMulticlassOneVsOneStrategy.html), the \n",
      "*One-vs-One* strategy is another simple and intuitive \n",
      "strategy: it basically produces one binary problem for each pair of classes. So there will be $\\binom{K}{2}$ binary problems. At prediction time, the \n",
      "output of every binary classifiers are collected to do voting for the $K$ \n",
      "classes. The class with the highest vote becomes the final prediction.\n",
      "\n",
      "Compared with the One-vs-Rest strategy, the One-vs-One strategy is usually more\n",
      "costly to train and evaluate because more binary machines are used.\n",
      "\n",
      "In the following, we demonstrate how to use `SHOGUN`'s One-vs-Rest and \n",
      "One-vs-One multiclass learning strategy on the USPS dataset.  For \n",
      "demonstration, we randomly 200 samples from each class for training and 200 \n",
      "samples from each class for testing.\n",
      "\n",
      "The [CLibLinear](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CLibLinear.html) is used as the base binary classifier in a [CLinearMulticlassMachine](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CLinearMulticlassMachine.html), with One-vs-Rest and One-vs-One strategies. The running time and performance (on my machine) is reported below:"
     ]
    },
    {
     "cell_type": "raw",
     "metadata": {},
     "source": [
      "-------------------------------------------------\n",
      "Strategy       Training Time  Test Time  Accuracy\n",
      "-------------  -------------  ---------  --------\n",
      "One-vs-Rest    12.68           0.27       92.00%\n",
      "One-vs-One     11.54           1.50       93.90%\n",
      "-------------------------------------------------\n",
      "Table: Comparison of One-vs-Rest and One-vs-One multiclass reduction strategy on the USPS dataset."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "First we load the data and initialize random splitting:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import numpy as np\n",
      "\n",
      "from numpy    import random\n",
      "from scipy.io import loadmat\n",
      "\n",
      "mat  = loadmat('../../../data/multiclass/usps.mat')\n",
      "Xall = mat['data']\n",
      "\n",
      "#normalize examples to have norm one\n",
      "Xall = Xall / sqrt(sum(Xall**2,0))\n",
      "Yall = mat['label'].squeeze()\n",
      "\n",
      "# map from 1..10 to 0..9, since shogun\n",
      "# requires multiclass labels to be\n",
      "# 0, 1, ..., K-1\n",
      "Yall = Yall - 1\n",
      "\n",
      "N_train_per_class = 200\n",
      "N_test_per_class = 200\n",
      "N_class = 10\n",
      "\n",
      "# to make the results reproducable\n",
      "random.seed(0)\n",
      "\n",
      "# index for subsampling\n",
      "index = np.zeros((N_train_per_class+N_test_per_class, N_class), 'i')\n",
      "for k in range(N_class):\n",
      "        Ik = (Yall == k).nonzero()[0] # index for samples of class k\n",
      "        I_subsample = random.permutation(len(Ik))[:N_train_per_class+N_test_per_class]\n",
      "        index[:, k] = Ik[I_subsample]\n",
      "\n",
      "idx_train = index[:N_train_per_class, :].reshape(N_train_per_class*N_class)\n",
      "idx_test  = index[N_train_per_class:, :].reshape(N_test_per_class*N_class)\n",
      "\n",
      "random.shuffle(idx_train)\n",
      "random.shuffle(idx_test)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "import SHOGUN components and convert features into SHOGUN format:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from modshogun import RealFeatures, MulticlassLabels\n",
      "from modshogun import LibLinear, L2R_L2LOSS_SVC, LinearMulticlassMachine\n",
      "from modshogun import MulticlassOneVsOneStrategy, MulticlassOneVsRestStrategy\n",
      "from modshogun import MulticlassAccuracy\n",
      "\n",
      "import time\n",
      "\n",
      "feats_train = RealFeatures(Xall[:, idx_train])\n",
      "feats_test  = RealFeatures(Xall[:, idx_test])\n",
      "\n",
      "lab_train = MulticlassLabels(Yall[idx_train].astype('d'))\n",
      "lab_test  = MulticlassLabels(Yall[idx_test].astype('d'))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "define a helper function to train and evaluate multiclass machine given a strategy:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def evaluate(strategy, C):\n",
      "        bin_machine = LibLinear()\n",
      "        bin_machine.set_bias_enabled(True)\n",
      "        bin_machine.set_C(C, C)\n",
      "\n",
      "        mc_machine = LinearMulticlassMachine(strategy, feats_train, bin_machine, lab_train)\n",
      "\n",
      "        t_begin = time.clock()\n",
      "        mc_machine.train()\n",
      "        t_train = time.clock() - t_begin\n",
      "\n",
      "        t_begin = time.clock()\n",
      "        pred_test = mc_machine.apply_multiclass(feats_test)\n",
      "        t_test = time.clock() - t_begin\n",
      "\n",
      "        evaluator = MulticlassAccuracy()\n",
      "        acc = evaluator.evaluate(pred_test, lab_test)\n",
      "\n",
      "        print \"training time: %.4f\" % t_train\n",
      "        print \"testing time:  %.4f\" % t_test\n",
      "        print \"accuracy:      %.4f\" % acc"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Test on One-vs-Rest and One-vs-One strategies."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print \"\\nOne-vs-Rest\"\n",
      "print \"=\"*60\n",
      "evaluate(MulticlassOneVsRestStrategy(), 5.0)\n",
      "\n",
      "print \"\\nOne-vs-One\"\n",
      "print \"=\"*60\n",
      "evaluate(MulticlassOneVsOneStrategy(), 2.0)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "LibLinear also has a true multiclass SVM implemenentation - so it is worthwhile to compare training time and accuracy with the above reduction schemes:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from modshogun import MulticlassLibLinear\n",
      "mcsvm = MulticlassLibLinear(5.0, feats_train, lab_train)\n",
      "mcsvm.set_use_bias(True)\n",
      "\n",
      "t_begin = time.clock()\n",
      "mcsvm.train(feats_train)\n",
      "t_train = time.clock() - t_begin\n",
      "\n",
      "t_begin = time.clock()\n",
      "pred_test = mcsvm.apply_multiclass(feats_test)\n",
      "t_test = time.clock() - t_begin\n",
      "\n",
      "evaluator = MulticlassAccuracy()\n",
      "acc = evaluator.evaluate(pred_test, lab_test)\n",
      "\n",
      "print \"training time: %.4f\" % t_train\n",
      "print \"testing time:  %.4f\" % t_test\n",
      "print \"accuracy:      %.4f\" % acc"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "As you can see performance of all the three is very much the same though the multiclass svm is a bit faster in training. Usually training time of the true multiclass SVM is much slower than one-vs-rest approach. It should be noted that classification performance of one-vs-one is known to be slightly superior to one-vs-rest since the machines do not have to be properly scaled like in the one-vs-rest approach. However, with larger number of classes one-vs-one quickly becomes prohibitive and so one-vs-rest is the only suitable approach - or other schemes presented below."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Error-Correcting Output Codes\n",
      "\n",
      "*Error-Correcting Output Codes* (ECOC) is a\n",
      "generalization of the One-vs-Rest and One-vs-One strategies. For example, we\n",
      "can represent the One-vs-Rest strategy with the following $K\\times K$ coding \n",
      "matrix, or a codebook:\n",
      "\n",
      "$$\n",
      "    \\begin{bmatrix}\n",
      "    +1 & -1 & -1 & \\ldots & -1 & -1 \\\\\\\\\n",
      "    -1 & +1 & -1 & \\ldots & -1 & -1\\\\\\\\\n",
      "    -1 & -1 & +1 & \\ldots & -1 & -1\\\\\\\\\n",
      "    \\vdots & \\vdots & \\vdots & \\ddots & \\vdots & \\vdots \\\\\\\\\n",
      "    -1 & -1 & -1 & \\ldots & +1 & -1 \\\\\\\\\n",
      "    -1 & -1 & -1 & \\ldots & -1 & +1\n",
      "    \\end{bmatrix}\n",
      "$$\n",
      "\n",
      "Denote the codebook by $B$, there is one column of the codebook associated with\n",
      "each of the $K$ classes. For example, the code for class $1$ is\n",
      "$[+1,-1,-1,\\ldots,-1]$. Each row of the codebook corresponds to a binary\n",
      "coloring of all the $K$ classes. For example, in the first row, the class $1$\n",
      "is colored as $+1$, while the rest of the classes are all colored as $-1$.\n",
      "Associated with each row, there is a binary classifier trained according to the\n",
      "coloring. For example, the binary classifier associated with the first row is\n",
      "trained by treating all the examples of class $1$ as positive examples, and all\n",
      "the examples of the rest of the classes as negative examples.\n",
      "\n",
      "In this special case, there are $K$ rows in the codebook. The number of rows in\n",
      "the codebook is usually called the *code length*. As we can see, this\n",
      "codebook exactly describes how the One-vs-Rest strategy trains the binary\n",
      "sub-machines."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "OvR=-np.ones((10,10))\n",
      "fill_diagonal(OvR, +1)\n",
      "\n",
      "_=gray()\n",
      "_=imshow(OvR, interpolation='nearest')\n",
      "_=gca().set_xticks([])\n",
      "_=gca().set_yticks([])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "A further generalization is to allow $0$-values in the codebook. A $0$ for a\n",
      "class $k$ in a row means we ignore (the examples of) class $k$ when training\n",
      "the binary classifiers associated with this row. With this generalization, we\n",
      "can also easily describes the One-vs-One strategy with a $\\binom{K}{2}\\times K$\n",
      "codebook:\n",
      "\n",
      "$$\n",
      "\\begin{bmatrix}\n",
      "+1     & -1     & 0      & \\ldots & 0      & 0 \\\\\\\\\n",
      "+1     & 0      & -1     & \\ldots & 0      & 0 \\\\\\\\\n",
      "\\vdots & \\vdots & \\vdots & \\ddots & \\vdots & 0 \\\\\\\\\n",
      "+1     & 0      & 0      & \\ldots & -1     & 0 \\\\\\\\\n",
      "0      & +1     & -1     & \\ldots & 0      & 0 \\\\\\\\\n",
      "\\vdots & \\vdots & \\vdots &        & \\vdots & \\vdots \\\\\\\\\n",
      "0      & 0      & 0      & \\ldots & +1     & -1\n",
      "\\end{bmatrix}\n",
      "$$\n",
      "\n",
      "Here each of the $\\binom{K}{2}$ rows describes a binary classifier trained with\n",
      "a pair of classes. The resultant binary classifiers will be identical as those\n",
      "described by a One-vs-One strategy.\n",
      "\n",
      "Since $0$ is allowed in the codebook to ignore some classes, this kind of\n",
      "codebooks are usually called *sparse codebook*, while the codebooks with\n",
      "only $+1$ and $-1$ are usually called *dense codebook*.\n",
      "\n",
      "In general case, we can specify any code length and fill the codebook\n",
      "arbitrarily. However, some rules should be followed:\n",
      "\n",
      "* Each row must describe a *valid* binary coloring. In other words, both $+1$ and $-1$ should appear at least once in each row. Or else a binary classifier cannot be obtained for this row.\n",
      "* It is good to avoid duplicated rows. There is generally no harm to have duplicated rows, but the resultant binary classifiers are completely identical provided the training algorithm for the binary classifiers are deterministic. So this can be a waste of computational resource.\n",
      "* Negative rows are also duplicated. Simply inversing the sign of a code row does not produce a \"new\" code row. Because the resultant binary classifier will simply be the negative classifier associated with the original row."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Though you can certainly generate your own codebook, it is usually easier to\n",
      "use the `SHOGUN` built-in procedures to generate codebook automatically. There\n",
      "are various codebook generators (called *encoders*) in `SHOGUN`. However,\n",
      "before describing those encoders in details, let us notice that a codebook \n",
      "only describes how the sub-machines are trained. But we still need a\n",
      "way to specify how the binary classification results of the sub-machines can be\n",
      "combined to get a multiclass classification result.\n",
      "\n",
      "Review the codebook again: corresponding to each class, there is a column. We\n",
      "call the codebook column the (binary) *code* for that class. For a new\n",
      "sample $x$, by applying the binary classifiers associated with each row\n",
      "successively, we get a prediction vector of the same length as the\n",
      "*code*. Deciding the multiclass label from the prediction vector (called\n",
      "*decoding*) can be done by minimizing the *distance* between the\n",
      "codes and the prediction vector. Different *decoders* define different \n",
      "choices of distance functions. For this reason, it is usually good to make the\n",
      "mutual distance between codes of different classes large. In this way, even\n",
      "though several binary classifiers make wrong predictions, the distance of\n",
      "the resultant prediction vector to the code of the *true* class is likely\n",
      "to be still smaller than the distance to other classes. So correct results can\n",
      "still be obtained even when some of the binary classifiers make mistakes. This\n",
      "is the reason for the name *Error-Correcting Output Codes*.\n",
      "\n",
      "In `SHOGUN`, encoding schemes are described by subclasses of\n",
      "[CECOCEncoder](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CECOCEncoder.html), while decoding schemes are described by subclasses\n",
      "of [CECOCDecoder](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CECOCDecoder.html). Theoretically, any combinations of\n",
      "encoder-decoder pairs can be used. Here we will introduce several common\n",
      "encoder/decoders in shogun."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* [CECOCRandomDenseEncoder](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CECOCRandomDenseEncoder.html): This encoder generate random dense ($+1$/$-1$) codebooks and choose the one with the largest *minimum mutual distance* among the classes. The recommended code length for this encoder is $10\\log K$. \n",
      "* [CECOCRandomSparseEncoder](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CECOCRandomSparseEncoder.html): This is similar to the random dense encoder, except that sparse ($+1$/$-1$/$0$) codebooks are generated. The recommended code length for this encoder is $15\\log K$.\n",
      "* [CECOCOVREncoder](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CECOCOVREncoder.html), [CECOCOVOEncoder](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CECOCOVOEncoder.html): These two encoders mimic the One-vs-Rest and One-vs-One strategies respectively. They are implemented mainly for demonstrative purpose.  When suitable decoders are used, the results will be equivalent to the corresponding strategies, respectively."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Using ECOC Strategy in `SHOGUN` is similar to ordinary one-vs-rest or one-vs-one. You need to choose an encoder and a decoder, and then construct a `ECOCStrategy`, as demonstrated below:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from modshogun import ECOCStrategy, ECOCRandomDenseEncoder, ECOCLLBDecoder\n",
      "\n",
      "print \"\\nRandom Dense Encoder + Margin Loss based Decoder\"\n",
      "print \"=\"*60\n",
      "evaluate(ECOCStrategy(ECOCRandomDenseEncoder(), ECOCLLBDecoder()))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    }
   ],
   "metadata": {}
  }
 ]
}
{
 "metadata": {
  "name": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Multiclass Reductions"
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "by Chiyuan Zhang and S&ouml;ren Sonnenburg"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Since binary classification problems are one of the most thoroughly studied\n",
      "problems in machine learning, it is very appealing to consider reducing\n",
      "multiclass problems to binary ones. Then many advanced learning and\n",
      "optimization techniques as well as generalization bound analysis for binary\n",
      "classification can be utilized."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "\n",
      "\n",
      "In `SHOGUN`, the strategies of reducing a multiclass problem to binary\n",
      "classification problems are described by an instance of\n",
      "`CMulticlassStrategy`. A multiclass strategy describes\n",
      "\n",
      "* How to train the multiclass machine as a number of binary machines?\n",
      "    * How many binary machines are needed?\n",
      "    * For each binary machine, what subset of the training samples are used, and how are they colored? In multiclass problems, we use *coloring* to refer partitioning the classes into two groups: $+1$ and $-1$, or black and white, or any other meaningful names.\n",
      "* How to combine the prediction results of binary machines into the final multiclass prediction?\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The user can derive from the virtual class [CMulticlassStrategy](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CMulticlassStrategy.html) to\n",
      "implement a customized multiclass strategy. But usually the built-in strategies\n",
      "are enough for general problems. We will describe the built-in *One-vs-Rest*,\n",
      "*One-vs-One* and *Error-Correcting Output Codes* strategies in this tutorial.\n",
      "\n",
      "The basic routine to use a multiclass machine with reduction to binary problems\n",
      "in shogun is to create a generic multiclass machine and then assign a particular\n",
      "multiclass strategy and a base binary machine."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## One-vs-Rest and One-vs-One\n",
      "\n",
      "The *One-vs-Rest* strategy is implemented in\n",
      "`CMulticlassOneVsRestStrategy`. As indicated by the name, this\n",
      "strategy reduce a $K$-class problem to $K$ binary sub-problems. For the $k$-th\n",
      "problem, where $k\\in\\{1,\\ldots,K\\}$, the samples from class $k$ are colored as\n",
      "$+1$, and the samples from other classes are colored as $-1$. The multiclass\n",
      "prediction is given as\n",
      "\n",
      "$$\n",
      "f(x) = \\operatorname*{argmax}_{k\\in\\{1,\\ldots,K\\}}\\; f_k(x)\n",
      "$$\n",
      "\n",
      "where $f_k(x)$ is the prediction of the $k$-th binary machines.\n",
      "\n",
      "The One-vs-Rest strategy is easy to implement yet produces excellent performance\n",
      "in many cases. One interesting paper, [Rifkin, R. M. and Klautau, A. (2004). *In defense of one-vs-all classification*. Journal of Machine\n",
      "Learning Research, 5:101\u2013141](http://jmlr.org/papers/v5/rifkin04a.html), it was shown that the\n",
      "One-vs-Rest strategy can be\n",
      "\n",
      "> as accurate as any other approach, assuming that the underlying binary\n",
      "classifiers are well-tuned regularized classifiers such as support vector\n",
      "machines.\n",
      "\n",
      "Implemented in [CMulticlassOneVsOneStrategy](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CMulticlassOneVsOneStrategy.html), the \n",
      "*One-vs-One* strategy is another simple and intuitive \n",
      "strategy: it basically produces one binary problem for each pair of classes. So there will be $\\binom{K}{2}$ binary problems. At prediction time, the \n",
      "output of every binary classifiers are collected to do voting for the $K$ \n",
      "classes. The class with the highest vote becomes the final prediction.\n",
      "\n",
      "Compared with the One-vs-Rest strategy, the One-vs-One strategy is usually more\n",
      "costly to train and evaluate because more binary machines are used.\n",
      "\n",
      "In the following, we demonstrate how to use `SHOGUN`'s One-vs-Rest and \n",
      "One-vs-One multiclass learning strategy on the USPS dataset.  For \n",
      "demonstration, we randomly 200 samples from each class for training and 200 \n",
      "samples from each class for testing.\n",
      "\n",
      "The [CLibLinear](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CLibLinear.html) is used as the base binary classifier in a [CLinearMulticlassMachine](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CLinearMulticlassMachine.html), with One-vs-Rest and One-vs-One strategies. The running time and performance (on my machine) is reported below:"
     ]
    },
    {
     "cell_type": "raw",
     "metadata": {},
     "source": [
      "-------------------------------------------------\n",
      "Strategy       Training Time  Test Time  Accuracy\n",
      "-------------  -------------  ---------  --------\n",
      "One-vs-Rest    12.68           0.27       92.00%\n",
      "One-vs-One     11.54           1.50       93.90%\n",
      "-------------------------------------------------\n",
      "Table: Comparison of One-vs-Rest and One-vs-One multiclass reduction strategy on the USPS dataset."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "First we load the data and initialize random splitting:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import numpy as np\n",
      "\n",
      "from numpy    import random\n",
      "from scipy.io import loadmat\n",
      "\n",
      "mat  = loadmat('../../../data/multiclass/usps.mat')\n",
      "Xall = mat['data']\n",
      "\n",
      "#normalize examples to have norm one\n",
      "Xall = Xall / sqrt(sum(Xall**2,0))\n",
      "Yall = mat['label'].squeeze()\n",
      "\n",
      "# map from 1..10 to 0..9, since shogun\n",
      "# requires multiclass labels to be\n",
      "# 0, 1, ..., K-1\n",
      "Yall = Yall - 1\n",
      "\n",
      "N_train_per_class = 200\n",
      "N_test_per_class = 200\n",
      "N_class = 10\n",
      "\n",
      "# to make the results reproducable\n",
      "random.seed(0)\n",
      "\n",
      "# index for subsampling\n",
      "index = np.zeros((N_train_per_class+N_test_per_class, N_class), 'i')\n",
      "for k in range(N_class):\n",
      "        Ik = (Yall == k).nonzero()[0] # index for samples of class k\n",
      "        I_subsample = random.permutation(len(Ik))[:N_train_per_class+N_test_per_class]\n",
      "        index[:, k] = Ik[I_subsample]\n",
      "\n",
      "idx_train = index[:N_train_per_class, :].reshape(N_train_per_class*N_class)\n",
      "idx_test  = index[N_train_per_class:, :].reshape(N_test_per_class*N_class)\n",
      "\n",
      "random.shuffle(idx_train)\n",
      "random.shuffle(idx_test)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "import SHOGUN components and convert features into SHOGUN format:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from modshogun import RealFeatures, MulticlassLabels\n",
      "from modshogun import LibLinear, L2R_L2LOSS_SVC, LinearMulticlassMachine\n",
      "from modshogun import MulticlassOneVsOneStrategy, MulticlassOneVsRestStrategy\n",
      "from modshogun import MulticlassAccuracy\n",
      "\n",
      "import time\n",
      "\n",
      "feats_train = RealFeatures(Xall[:, idx_train])\n",
      "feats_test  = RealFeatures(Xall[:, idx_test])\n",
      "\n",
      "lab_train = MulticlassLabels(Yall[idx_train].astype('d'))\n",
      "lab_test  = MulticlassLabels(Yall[idx_test].astype('d'))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "define a helper function to train and evaluate multiclass machine given a strategy:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def evaluate(strategy, C):\n",
      "        bin_machine = LibLinear()\n",
      "        bin_machine.set_bias_enabled(True)\n",
      "        bin_machine.set_C(C, C)\n",
      "\n",
      "        mc_machine = LinearMulticlassMachine(strategy, feats_train, bin_machine, lab_train)\n",
      "\n",
      "        t_begin = time.clock()\n",
      "        mc_machine.train()\n",
      "        t_train = time.clock() - t_begin\n",
      "\n",
      "        t_begin = time.clock()\n",
      "        pred_test = mc_machine.apply_multiclass(feats_test)\n",
      "        t_test = time.clock() - t_begin\n",
      "\n",
      "        evaluator = MulticlassAccuracy()\n",
      "        acc = evaluator.evaluate(pred_test, lab_test)\n",
      "\n",
      "        print \"training time: %.4f\" % t_train\n",
      "        print \"testing time:  %.4f\" % t_test\n",
      "        print \"accuracy:      %.4f\" % acc"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Test on One-vs-Rest and One-vs-One strategies."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print \"\\nOne-vs-Rest\"\n",
      "print \"=\"*60\n",
      "evaluate(MulticlassOneVsRestStrategy(), 5.0)\n",
      "\n",
      "print \"\\nOne-vs-One\"\n",
      "print \"=\"*60\n",
      "evaluate(MulticlassOneVsOneStrategy(), 2.0)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "LibLinear also has a true multiclass SVM implemenentation - so it is worthwhile to compare training time and accuracy with the above reduction schemes:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from modshogun import MulticlassLibLinear\n",
      "mcsvm = MulticlassLibLinear(5.0, feats_train, lab_train)\n",
      "mcsvm.set_use_bias(True)\n",
      "\n",
      "t_begin = time.clock()\n",
      "mcsvm.train(feats_train)\n",
      "t_train = time.clock() - t_begin\n",
      "\n",
      "t_begin = time.clock()\n",
      "pred_test = mcsvm.apply_multiclass(feats_test)\n",
      "t_test = time.clock() - t_begin\n",
      "\n",
      "evaluator = MulticlassAccuracy()\n",
      "acc = evaluator.evaluate(pred_test, lab_test)\n",
      "\n",
      "print \"training time: %.4f\" % t_train\n",
      "print \"testing time:  %.4f\" % t_test\n",
      "print \"accuracy:      %.4f\" % acc"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "As you can see performance of all the three is very much the same though the multiclass svm is a bit faster in training. Usually training time of the true multiclass SVM is much slower than one-vs-rest approach. It should be noted that classification performance of one-vs-one is known to be slightly superior to one-vs-rest since the machines do not have to be properly scaled like in the one-vs-rest approach. However, with larger number of classes one-vs-one quickly becomes prohibitive and so one-vs-rest is the only suitable approach - or other schemes presented below."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Error-Correcting Output Codes\n",
      "\n",
      "*Error-Correcting Output Codes* (ECOC) is a\n",
      "generalization of the One-vs-Rest and One-vs-One strategies. For example, we\n",
      "can represent the One-vs-Rest strategy with the following $K\\times K$ coding \n",
      "matrix, or a codebook:\n",
      "\n",
      "$$\n",
      "    \\begin{bmatrix}\n",
      "    +1 & -1 & -1 & \\ldots & -1 & -1 \\\\\\\\\n",
      "    -1 & +1 & -1 & \\ldots & -1 & -1\\\\\\\\\n",
      "    -1 & -1 & +1 & \\ldots & -1 & -1\\\\\\\\\n",
      "    \\vdots & \\vdots & \\vdots & \\ddots & \\vdots & \\vdots \\\\\\\\\n",
      "    -1 & -1 & -1 & \\ldots & +1 & -1 \\\\\\\\\n",
      "    -1 & -1 & -1 & \\ldots & -1 & +1\n",
      "    \\end{bmatrix}\n",
      "$$\n",
      "\n",
      "Denote the codebook by $B$, there is one column of the codebook associated with\n",
      "each of the $K$ classes. For example, the code for class $1$ is\n",
      "$[+1,-1,-1,\\ldots,-1]$. Each row of the codebook corresponds to a binary\n",
      "coloring of all the $K$ classes. For example, in the first row, the class $1$\n",
      "is colored as $+1$, while the rest of the classes are all colored as $-1$.\n",
      "Associated with each row, there is a binary classifier trained according to the\n",
      "coloring. For example, the binary classifier associated with the first row is\n",
      "trained by treating all the examples of class $1$ as positive examples, and all\n",
      "the examples of the rest of the classes as negative examples.\n",
      "\n",
      "In this special case, there are $K$ rows in the codebook. The number of rows in\n",
      "the codebook is usually called the *code length*. As we can see, this\n",
      "codebook exactly describes how the One-vs-Rest strategy trains the binary\n",
      "sub-machines."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "OvR=-np.ones((10,10))\n",
      "fill_diagonal(OvR, +1)\n",
      "\n",
      "_=gray()\n",
      "_=imshow(OvR, interpolation='nearest')\n",
      "_=gca().set_xticks([])\n",
      "_=gca().set_yticks([])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "A further generalization is to allow $0$-values in the codebook. A $0$ for a\n",
      "class $k$ in a row means we ignore (the examples of) class $k$ when training\n",
      "the binary classifiers associated with this row. With this generalization, we\n",
      "can also easily describes the One-vs-One strategy with a $\\binom{K}{2}\\times K$\n",
      "codebook:\n",
      "\n",
      "$$\n",
      "\\begin{bmatrix}\n",
      "+1     & -1     & 0      & \\ldots & 0      & 0 \\\\\\\\\n",
      "+1     & 0      & -1     & \\ldots & 0      & 0 \\\\\\\\\n",
      "\\vdots & \\vdots & \\vdots & \\ddots & \\vdots & 0 \\\\\\\\\n",
      "+1     & 0      & 0      & \\ldots & -1     & 0 \\\\\\\\\n",
      "0      & +1     & -1     & \\ldots & 0      & 0 \\\\\\\\\n",
      "\\vdots & \\vdots & \\vdots &        & \\vdots & \\vdots \\\\\\\\\n",
      "0      & 0      & 0      & \\ldots & +1     & -1\n",
      "\\end{bmatrix}\n",
      "$$\n",
      "\n",
      "Here each of the $\\binom{K}{2}$ rows describes a binary classifier trained with\n",
      "a pair of classes. The resultant binary classifiers will be identical as those\n",
      "described by a One-vs-One strategy.\n",
      "\n",
      "Since $0$ is allowed in the codebook to ignore some classes, this kind of\n",
      "codebooks are usually called *sparse codebook*, while the codebooks with\n",
      "only $+1$ and $-1$ are usually called *dense codebook*.\n",
      "\n",
      "In general case, we can specify any code length and fill the codebook\n",
      "arbitrarily. However, some rules should be followed:\n",
      "\n",
      "* Each row must describe a *valid* binary coloring. In other words, both $+1$ and $-1$ should appear at least once in each row. Or else a binary classifier cannot be obtained for this row.\n",
      "* It is good to avoid duplicated rows. There is generally no harm to have duplicated rows, but the resultant binary classifiers are completely identical provided the training algorithm for the binary classifiers are deterministic. So this can be a waste of computational resource.\n",
      "* Negative rows are also duplicated. Simply inversing the sign of a code row does not produce a \"new\" code row. Because the resultant binary classifier will simply be the negative classifier associated with the original row."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Though you can certainly generate your own codebook, it is usually easier to\n",
      "use the `SHOGUN` built-in procedures to generate codebook automatically. There\n",
      "are various codebook generators (called *encoders*) in `SHOGUN`. However,\n",
      "before describing those encoders in details, let us notice that a codebook \n",
      "only describes how the sub-machines are trained. But we still need a\n",
      "way to specify how the binary classification results of the sub-machines can be\n",
      "combined to get a multiclass classification result.\n",
      "\n",
      "Review the codebook again: corresponding to each class, there is a column. We\n",
      "call the codebook column the (binary) *code* for that class. For a new\n",
      "sample $x$, by applying the binary classifiers associated with each row\n",
      "successively, we get a prediction vector of the same length as the\n",
      "*code*. Deciding the multiclass label from the prediction vector (called\n",
      "*decoding*) can be done by minimizing the *distance* between the\n",
      "codes and the prediction vector. Different *decoders* define different \n",
      "choices of distance functions. For this reason, it is usually good to make the\n",
      "mutual distance between codes of different classes large. In this way, even\n",
      "though several binary classifiers make wrong predictions, the distance of\n",
      "the resultant prediction vector to the code of the *true* class is likely\n",
      "to be still smaller than the distance to other classes. So correct results can\n",
      "still be obtained even when some of the binary classifiers make mistakes. This\n",
      "is the reason for the name *Error-Correcting Output Codes*.\n",
      "\n",
      "In `SHOGUN`, encoding schemes are described by subclasses of\n",
      "[CECOCEncoder](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CECOCEncoder.html), while decoding schemes are described by subclasses\n",
      "of [CECOCDecoder](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CECOCDecoder.html). Theoretically, any combinations of\n",
      "encoder-decoder pairs can be used. Here we will introduce several common\n",
      "encoder/decoders in shogun."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* [CECOCRandomDenseEncoder](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CECOCRandomDenseEncoder.html): This encoder generate random dense ($+1$/$-1$) codebooks and choose the one with the largest *minimum mutual distance* among the classes. The recommended code length for this encoder is $10\\log K$. \n",
      "* [CECOCRandomSparseEncoder](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CECOCRandomSparseEncoder.html): This is similar to the random dense encoder, except that sparse ($+1$/$-1$/$0$) codebooks are generated. The recommended code length for this encoder is $15\\log K$.\n",
      "* [CECOCOVREncoder](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CECOCOVREncoder.html), [CECOCOVOEncoder](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CECOCOVOEncoder.html): These two encoders mimic the One-vs-Rest and One-vs-One strategies respectively. They are implemented mainly for demonstrative purpose.  When suitable decoders are used, the results will be equivalent to the corresponding strategies, respectively."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Using ECOC Strategy in `SHOGUN` is similar to ordinary one-vs-rest or one-vs-one. You need to choose an encoder and a decoder, and then construct a `ECOCStrategy`, as demonstrated below:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from modshogun import ECOCStrategy, ECOCRandomDenseEncoder, ECOCLLBDecoder\n",
      "\n",
      "print \"\\nRandom Dense Encoder + Margin Loss based Decoder\"\n",
      "print \"=\"*60\n",
      "evaluate(ECOCStrategy(ECOCRandomDenseEncoder(), ECOCLLBDecoder()))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    }
   ],
   "metadata": {}
  }
 ]
}
{
 "metadata": {
  "name": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Multiclass Reductions"
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "by Chiyuan Zhang and S&ouml;ren Sonnenburg"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Since binary classification problems are one of the most thoroughly studied\n",
      "problems in machine learning, it is very appealing to consider reducing\n",
      "multiclass problems to binary ones. Then many advanced learning and\n",
      "optimization techniques as well as generalization bound analysis for binary\n",
      "classification can be utilized."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "\n",
      "\n",
      "In `SHOGUN`, the strategies of reducing a multiclass problem to binary\n",
      "classification problems are described by an instance of\n",
      "`CMulticlassStrategy`. A multiclass strategy describes\n",
      "\n",
      "* How to train the multiclass machine as a number of binary machines?\n",
      "    * How many binary machines are needed?\n",
      "    * For each binary machine, what subset of the training samples are used, and how are they colored? In multiclass problems, we use *coloring* to refer partitioning the classes into two groups: $+1$ and $-1$, or black and white, or any other meaningful names.\n",
      "* How to combine the prediction results of binary machines into the final multiclass prediction?\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The user can derive from the virtual class [CMulticlassStrategy](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CMulticlassStrategy.html) to\n",
      "implement a customized multiclass strategy. But usually the built-in strategies\n",
      "are enough for general problems. We will describe the built-in *One-vs-Rest*,\n",
      "*One-vs-One* and *Error-Correcting Output Codes* strategies in this tutorial.\n",
      "\n",
      "The basic routine to use a multiclass machine with reduction to binary problems\n",
      "in shogun is to create a generic multiclass machine and then assign a particular\n",
      "multiclass strategy and a base binary machine."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## One-vs-Rest and One-vs-One\n",
      "\n",
      "The *One-vs-Rest* strategy is implemented in\n",
      "`CMulticlassOneVsRestStrategy`. As indicated by the name, this\n",
      "strategy reduce a $K$-class problem to $K$ binary sub-problems. For the $k$-th\n",
      "problem, where $k\\in\\{1,\\ldots,K\\}$, the samples from class $k$ are colored as\n",
      "$+1$, and the samples from other classes are colored as $-1$. The multiclass\n",
      "prediction is given as\n",
      "\n",
      "$$\n",
      "f(x) = \\operatorname*{argmax}_{k\\in\\{1,\\ldots,K\\}}\\; f_k(x)\n",
      "$$\n",
      "\n",
      "where $f_k(x)$ is the prediction of the $k$-th binary machines.\n",
      "\n",
      "The One-vs-Rest strategy is easy to implement yet produces excellent performance\n",
      "in many cases. One interesting paper, [Rifkin, R. M. and Klautau, A. (2004). *In defense of one-vs-all classification*. Journal of Machine\n",
      "Learning Research, 5:101\u2013141](http://jmlr.org/papers/v5/rifkin04a.html), it was shown that the\n",
      "One-vs-Rest strategy can be\n",
      "\n",
      "> as accurate as any other approach, assuming that the underlying binary\n",
      "classifiers are well-tuned regularized classifiers such as support vector\n",
      "machines.\n",
      "\n",
      "Implemented in [CMulticlassOneVsOneStrategy](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CMulticlassOneVsOneStrategy.html), the \n",
      "*One-vs-One* strategy is another simple and intuitive \n",
      "strategy: it basically produces one binary problem for each pair of classes. So there will be $\\binom{K}{2}$ binary problems. At prediction time, the \n",
      "output of every binary classifiers are collected to do voting for the $K$ \n",
      "classes. The class with the highest vote becomes the final prediction.\n",
      "\n",
      "Compared with the One-vs-Rest strategy, the One-vs-One strategy is usually more\n",
      "costly to train and evaluate because more binary machines are used.\n",
      "\n",
      "In the following, we demonstrate how to use `SHOGUN`'s One-vs-Rest and \n",
      "One-vs-One multiclass learning strategy on the USPS dataset.  For \n",
      "demonstration, we randomly 200 samples from each class for training and 200 \n",
      "samples from each class for testing.\n",
      "\n",
      "The [CLibLinear](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CLibLinear.html) is used as the base binary classifier in a [CLinearMulticlassMachine](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CLinearMulticlassMachine.html), with One-vs-Rest and One-vs-One strategies. The running time and performance (on my machine) is reported below:"
     ]
    },
    {
     "cell_type": "raw",
     "metadata": {},
     "source": [
      "-------------------------------------------------\n",
      "Strategy       Training Time  Test Time  Accuracy\n",
      "-------------  -------------  ---------  --------\n",
      "One-vs-Rest    12.68           0.27       92.00%\n",
      "One-vs-One     11.54           1.50       93.90%\n",
      "-------------------------------------------------\n",
      "Table: Comparison of One-vs-Rest and One-vs-One multiclass reduction strategy on the USPS dataset."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "First we load the data and initialize random splitting:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import numpy as np\n",
      "\n",
      "from numpy    import random\n",
      "from scipy.io import loadmat\n",
      "\n",
      "mat  = loadmat('../../../data/multiclass/usps.mat')\n",
      "Xall = mat['data']\n",
      "\n",
      "#normalize examples to have norm one\n",
      "Xall = Xall / sqrt(sum(Xall**2,0))\n",
      "Yall = mat['label'].squeeze()\n",
      "\n",
      "# map from 1..10 to 0..9, since shogun\n",
      "# requires multiclass labels to be\n",
      "# 0, 1, ..., K-1\n",
      "Yall = Yall - 1\n",
      "\n",
      "N_train_per_class = 200\n",
      "N_test_per_class = 200\n",
      "N_class = 10\n",
      "\n",
      "# to make the results reproducable\n",
      "random.seed(0)\n",
      "\n",
      "# index for subsampling\n",
      "index = np.zeros((N_train_per_class+N_test_per_class, N_class), 'i')\n",
      "for k in range(N_class):\n",
      "        Ik = (Yall == k).nonzero()[0] # index for samples of class k\n",
      "        I_subsample = random.permutation(len(Ik))[:N_train_per_class+N_test_per_class]\n",
      "        index[:, k] = Ik[I_subsample]\n",
      "\n",
      "idx_train = index[:N_train_per_class, :].reshape(N_train_per_class*N_class)\n",
      "idx_test  = index[N_train_per_class:, :].reshape(N_test_per_class*N_class)\n",
      "\n",
      "random.shuffle(idx_train)\n",
      "random.shuffle(idx_test)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "import SHOGUN components and convert features into SHOGUN format:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from modshogun import RealFeatures, MulticlassLabels\n",
      "from modshogun import LibLinear, L2R_L2LOSS_SVC, LinearMulticlassMachine\n",
      "from modshogun import MulticlassOneVsOneStrategy, MulticlassOneVsRestStrategy\n",
      "from modshogun import MulticlassAccuracy\n",
      "\n",
      "import time\n",
      "\n",
      "feats_train = RealFeatures(Xall[:, idx_train])\n",
      "feats_test  = RealFeatures(Xall[:, idx_test])\n",
      "\n",
      "lab_train = MulticlassLabels(Yall[idx_train].astype('d'))\n",
      "lab_test  = MulticlassLabels(Yall[idx_test].astype('d'))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "define a helper function to train and evaluate multiclass machine given a strategy:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def evaluate(strategy, C):\n",
      "        bin_machine = LibLinear()\n",
      "        bin_machine.set_bias_enabled(True)\n",
      "        bin_machine.set_C(C, C)\n",
      "\n",
      "        mc_machine = LinearMulticlassMachine(strategy, feats_train, bin_machine, lab_train)\n",
      "\n",
      "        t_begin = time.clock()\n",
      "        mc_machine.train()\n",
      "        t_train = time.clock() - t_begin\n",
      "\n",
      "        t_begin = time.clock()\n",
      "        pred_test = mc_machine.apply_multiclass(feats_test)\n",
      "        t_test = time.clock() - t_begin\n",
      "\n",
      "        evaluator = MulticlassAccuracy()\n",
      "        acc = evaluator.evaluate(pred_test, lab_test)\n",
      "\n",
      "        print \"training time: %.4f\" % t_train\n",
      "        print \"testing time:  %.4f\" % t_test\n",
      "        print \"accuracy:      %.4f\" % acc"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Test on One-vs-Rest and One-vs-One strategies."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print \"\\nOne-vs-Rest\"\n",
      "print \"=\"*60\n",
      "evaluate(MulticlassOneVsRestStrategy(), 5.0)\n",
      "\n",
      "print \"\\nOne-vs-One\"\n",
      "print \"=\"*60\n",
      "evaluate(MulticlassOneVsOneStrategy(), 2.0)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "LibLinear also has a true multiclass SVM implemenentation - so it is worthwhile to compare training time and accuracy with the above reduction schemes:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from modshogun import MulticlassLibLinear\n",
      "mcsvm = MulticlassLibLinear(5.0, feats_train, lab_train)\n",
      "mcsvm.set_use_bias(True)\n",
      "\n",
      "t_begin = time.clock()\n",
      "mcsvm.train(feats_train)\n",
      "t_train = time.clock() - t_begin\n",
      "\n",
      "t_begin = time.clock()\n",
      "pred_test = mcsvm.apply_multiclass(feats_test)\n",
      "t_test = time.clock() - t_begin\n",
      "\n",
      "evaluator = MulticlassAccuracy()\n",
      "acc = evaluator.evaluate(pred_test, lab_test)\n",
      "\n",
      "print \"training time: %.4f\" % t_train\n",
      "print \"testing time:  %.4f\" % t_test\n",
      "print \"accuracy:      %.4f\" % acc"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "As you can see performance of all the three is very much the same though the multiclass svm is a bit faster in training. Usually training time of the true multiclass SVM is much slower than one-vs-rest approach. It should be noted that classification performance of one-vs-one is known to be slightly superior to one-vs-rest since the machines do not have to be properly scaled like in the one-vs-rest approach. However, with larger number of classes one-vs-one quickly becomes prohibitive and so one-vs-rest is the only suitable approach - or other schemes presented below."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Error-Correcting Output Codes\n",
      "\n",
      "*Error-Correcting Output Codes* (ECOC) is a\n",
      "generalization of the One-vs-Rest and One-vs-One strategies. For example, we\n",
      "can represent the One-vs-Rest strategy with the following $K\\times K$ coding \n",
      "matrix, or a codebook:\n",
      "\n",
      "$$\n",
      "    \\begin{bmatrix}\n",
      "    +1 & -1 & -1 & \\ldots & -1 & -1 \\\\\\\\\n",
      "    -1 & +1 & -1 & \\ldots & -1 & -1\\\\\\\\\n",
      "    -1 & -1 & +1 & \\ldots & -1 & -1\\\\\\\\\n",
      "    \\vdots & \\vdots & \\vdots & \\ddots & \\vdots & \\vdots \\\\\\\\\n",
      "    -1 & -1 & -1 & \\ldots & +1 & -1 \\\\\\\\\n",
      "    -1 & -1 & -1 & \\ldots & -1 & +1\n",
      "    \\end{bmatrix}\n",
      "$$\n",
      "\n",
      "Denote the codebook by $B$, there is one column of the codebook associated with\n",
      "each of the $K$ classes. For example, the code for class $1$ is\n",
      "$[+1,-1,-1,\\ldots,-1]$. Each row of the codebook corresponds to a binary\n",
      "coloring of all the $K$ classes. For example, in the first row, the class $1$\n",
      "is colored as $+1$, while the rest of the classes are all colored as $-1$.\n",
      "Associated with each row, there is a binary classifier trained according to the\n",
      "coloring. For example, the binary classifier associated with the first row is\n",
      "trained by treating all the examples of class $1$ as positive examples, and all\n",
      "the examples of the rest of the classes as negative examples.\n",
      "\n",
      "In this special case, there are $K$ rows in the codebook. The number of rows in\n",
      "the codebook is usually called the *code length*. As we can see, this\n",
      "codebook exactly describes how the One-vs-Rest strategy trains the binary\n",
      "sub-machines."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "OvR=-np.ones((10,10))\n",
      "fill_diagonal(OvR, +1)\n",
      "\n",
      "_=gray()\n",
      "_=imshow(OvR, interpolation='nearest')\n",
      "_=gca().set_xticks([])\n",
      "_=gca().set_yticks([])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "A further generalization is to allow $0$-values in the codebook. A $0$ for a\n",
      "class $k$ in a row means we ignore (the examples of) class $k$ when training\n",
      "the binary classifiers associated with this row. With this generalization, we\n",
      "can also easily describes the One-vs-One strategy with a $\\binom{K}{2}\\times K$\n",
      "codebook:\n",
      "\n",
      "$$\n",
      "\\begin{bmatrix}\n",
      "+1     & -1     & 0      & \\ldots & 0      & 0 \\\\\\\\\n",
      "+1     & 0      & -1     & \\ldots & 0      & 0 \\\\\\\\\n",
      "\\vdots & \\vdots & \\vdots & \\ddots & \\vdots & 0 \\\\\\\\\n",
      "+1     & 0      & 0      & \\ldots & -1     & 0 \\\\\\\\\n",
      "0      & +1     & -1     & \\ldots & 0      & 0 \\\\\\\\\n",
      "\\vdots & \\vdots & \\vdots &        & \\vdots & \\vdots \\\\\\\\\n",
      "0      & 0      & 0      & \\ldots & +1     & -1\n",
      "\\end{bmatrix}\n",
      "$$\n",
      "\n",
      "Here each of the $\\binom{K}{2}$ rows describes a binary classifier trained with\n",
      "a pair of classes. The resultant binary classifiers will be identical as those\n",
      "described by a One-vs-One strategy.\n",
      "\n",
      "Since $0$ is allowed in the codebook to ignore some classes, this kind of\n",
      "codebooks are usually called *sparse codebook*, while the codebooks with\n",
      "only $+1$ and $-1$ are usually called *dense codebook*.\n",
      "\n",
      "In general case, we can specify any code length and fill the codebook\n",
      "arbitrarily. However, some rules should be followed:\n",
      "\n",
      "* Each row must describe a *valid* binary coloring. In other words, both $+1$ and $-1$ should appear at least once in each row. Or else a binary classifier cannot be obtained for this row.\n",
      "* It is good to avoid duplicated rows. There is generally no harm to have duplicated rows, but the resultant binary classifiers are completely identical provided the training algorithm for the binary classifiers are deterministic. So this can be a waste of computational resource.\n",
      "* Negative rows are also duplicated. Simply inversing the sign of a code row does not produce a \"new\" code row. Because the resultant binary classifier will simply be the negative classifier associated with the original row."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Though you can certainly generate your own codebook, it is usually easier to\n",
      "use the `SHOGUN` built-in procedures to generate codebook automatically. There\n",
      "are various codebook generators (called *encoders*) in `SHOGUN`. However,\n",
      "before describing those encoders in details, let us notice that a codebook \n",
      "only describes how the sub-machines are trained. But we still need a\n",
      "way to specify how the binary classification results of the sub-machines can be\n",
      "combined to get a multiclass classification result.\n",
      "\n",
      "Review the codebook again: corresponding to each class, there is a column. We\n",
      "call the codebook column the (binary) *code* for that class. For a new\n",
      "sample $x$, by applying the binary classifiers associated with each row\n",
      "successively, we get a prediction vector of the same length as the\n",
      "*code*. Deciding the multiclass label from the prediction vector (called\n",
      "*decoding*) can be done by minimizing the *distance* between the\n",
      "codes and the prediction vector. Different *decoders* define different \n",
      "choices of distance functions. For this reason, it is usually good to make the\n",
      "mutual distance between codes of different classes large. In this way, even\n",
      "though several binary classifiers make wrong predictions, the distance of\n",
      "the resultant prediction vector to the code of the *true* class is likely\n",
      "to be still smaller than the distance to other classes. So correct results can\n",
      "still be obtained even when some of the binary classifiers make mistakes. This\n",
      "is the reason for the name *Error-Correcting Output Codes*.\n",
      "\n",
      "In `SHOGUN`, encoding schemes are described by subclasses of\n",
      "[CECOCEncoder](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CECOCEncoder.html), while decoding schemes are described by subclasses\n",
      "of [CECOCDecoder](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CECOCDecoder.html). Theoretically, any combinations of\n",
      "encoder-decoder pairs can be used. Here we will introduce several common\n",
      "encoder/decoders in shogun."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* [CECOCRandomDenseEncoder](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CECOCRandomDenseEncoder.html): This encoder generate random dense ($+1$/$-1$) codebooks and choose the one with the largest *minimum mutual distance* among the classes. The recommended code length for this encoder is $10\\log K$. \n",
      "* [CECOCRandomSparseEncoder](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CECOCRandomSparseEncoder.html): This is similar to the random dense encoder, except that sparse ($+1$/$-1$/$0$) codebooks are generated. The recommended code length for this encoder is $15\\log K$.\n",
      "* [CECOCOVREncoder](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CECOCOVREncoder.html), [CECOCOVOEncoder](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CECOCOVOEncoder.html): These two encoders mimic the One-vs-Rest and One-vs-One strategies respectively. They are implemented mainly for demonstrative purpose.  When suitable decoders are used, the results will be equivalent to the corresponding strategies, respectively."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Using ECOC Strategy in `SHOGUN` is similar to ordinary one-vs-rest or one-vs-one. You need to choose an encoder and a decoder, and then construct a `ECOCStrategy`, as demonstrated below:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from modshogun import ECOCStrategy, ECOCRandomDenseEncoder, ECOCLLBDecoder\n",
      "\n",
      "print \"\\nRandom Dense Encoder + Margin Loss based Decoder\"\n",
      "print \"=\"*60\n",
      "evaluate(ECOCStrategy(ECOCRandomDenseEncoder(), ECOCLLBDecoder()))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    }
   ],
   "metadata": {}
  }
 ]
}