sarguido/gist:8969894

## gistfile1.txt
{
 "metadata": {
  "name": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Testing and Validation in Scikit-Learn\n",
      "\n",
      "Once you have your estimator and you've made your predictions, it's useful to know how accurate your model is and what you can expect from it. Scikit-learn has a few different methods for doing this.\n",
      "\n",
      "The score() function comes with each estimator, and it calculates how many labels the model got right, or how accurate the prediction is. Going back to our Naive Bayes example from the supervised learning notebook:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sklearn import datasets\n",
      "from sklearn.cross_validation import train_test_split\n",
      "from sklearn.naive_bayes import GaussianNB\n",
      "\n",
      "digits = datasets.load_digits()\n",
      "\n",
      "train_X, test_X, train_y, test_y = train_test_split(digits.data, digits.target, test_size=0.33)\n",
      "\n",
      "nb_estimator = GaussianNB()\n",
      "nb_estimator.fit(train_X, train_y)\n",
      "nb_estimator.score(test_X, test_y)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 2,
       "text": [
        "0.81818181818181823"
       ]
      }
     ],
     "prompt_number": 2
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This means that our model accurately predicted labels around 80% of the time. It's a decent result, but not great.\n",
      "\n",
      "Another method for testing the accuracy of the estimator is cross-validation, which splits up the dataset randomly and computes the score of the model several times, by comparing predicted labels to the actual labels. You can choose how many times to split up and test your model; I've chosen 6 below. As you can see, the accuracy score works out to be around 80% again."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sklearn import cross_validation\n",
      "\n",
      "cross_validation.cross_val_score(nb_estimator, test_X, test_y, cv=6)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 4,
       "text": [
        "array([ 0.86868687,  0.80808081,  0.78787879,  0.82828283,  0.78787879,\n",
        "        0.84848485])"
       ]
      }
     ],
     "prompt_number": 4
    }
   ],
   "metadata": {}
  }
 ]
}
	{
	"metadata": {
	"name": ""
	},
	"nbformat": 3,
	"nbformat_minor": 0,
	"worksheets": [
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Testing and Validation in Scikit-Learn\n",
	"\n",
	"Once you have your estimator and you've made your predictions, it's useful to know how accurate your model is and what you can expect from it. Scikit-learn has a few different methods for doing this.\n",
	"\n",
	"The score() function comes with each estimator, and it calculates how many labels the model got right, or how accurate the prediction is. Going back to our Naive Bayes example from the supervised learning notebook:"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"from sklearn import datasets\n",
	"from sklearn.cross_validation import train_test_split\n",
	"from sklearn.naive_bayes import GaussianNB\n",
	"\n",
	"digits = datasets.load_digits()\n",
	"\n",
	"train_X, test_X, train_y, test_y = train_test_split(digits.data, digits.target, test_size=0.33)\n",
	"\n",
	"nb_estimator = GaussianNB()\n",
	"nb_estimator.fit(train_X, train_y)\n",
	"nb_estimator.score(test_X, test_y)"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 2,
	"text": [
	"0.81818181818181823"
	]
	}
	],
	"prompt_number": 2
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"This means that our model accurately predicted labels around 80% of the time. It's a decent result, but not great.\n",
	"\n",
	"Another method for testing the accuracy of the estimator is cross-validation, which splits up the dataset randomly and computes the score of the model several times, by comparing predicted labels to the actual labels. You can choose how many times to split up and test your model; I've chosen 6 below. As you can see, the accuracy score works out to be around 80% again."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"from sklearn import cross_validation\n",
	"\n",
	"cross_validation.cross_val_score(nb_estimator, test_X, test_y, cv=6)"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 4,
	"text": [
	"array([ 0.86868687, 0.80808081, 0.78787879, 0.82828283, 0.78787879,\n",
	" 0.84848485])"
	]
	}
	],
	"prompt_number": 4
	}
	],
	"metadata": {}
	}
	]
	}