Skip to content

Instantly share code, notes, and snippets.

@chuttenh
Last active March 20, 2017 17:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save chuttenh/3c9a6a2ed573e344cf47 to your computer and use it in GitHub Desktop.
Save chuttenh/3c9a6a2ed573e344cf47 to your computer and use it in GitHub Desktop.
Genomic Data Manipulation
.ipynb_checkpoints
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Performance evaluation\n",
"\n",
"---\n",
"\n",
"## Pairwise similarity measures\n",
"\n",
"Now that we know we're looking at Python in the IPython Notebook (although I suppose the name precludes much mystique), it's useful to explore what some of the quantitative concepts we discussed last week \"look like\" in Python. We'll build up to performing a full performance evaluation of some predictions against a gold standard, beginning by calculating simpler pairwise summary statistics for ordered data. These are in contrast to the unordered, non-paired test statistics we looked at last week (e.g. t-tests). But fortunately they should be familiar anyhow, since similarity measure that compares two vectors - such as our familiar Euclidean distance or Pearson correlation - is really a test statistic in disguise.\n",
"\n",
"As a preamble, recall that pairwise similarity measures are test statistics that compare two vectors of equal length. That is, two Python **lists** in which we expect there to be a relationship between each pair of elements. Variables that contain lists are prefixed by a lower-case \"a\", they contain zero or more elements separated by commas, surrounded by square brackets, and for which order matters:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"aListOne = [1, 2, 2, 3, 4]\n",
"aListTwo = [2, 1, 3, 4, 2]\n",
"aListOne == aListTwo"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 1,
"text": [
"False"
]
}
],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These types of vectors are what we commonly use to represent sets of points in two-dimensional space, for example:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# This ensures that any graphics appear in the document and not in a separate window.\n",
"%pylab inline\n",
"\n",
"scatter( aListOne, aListTwo )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Populating the interactive namespace from numpy and matplotlib\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 2,
"text": [
"<matplotlib.collections.PathCollection at 0x1105f2110>"
]
},
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD9CAYAAABHnDf0AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAF6FJREFUeJzt3X9slPeB5/HPYDvJjqEGn4qpxm4gMW08gMfjNJ1uBfUQ\nkiOG2PJuWDWOAl4CV59VfvZOuuY22tpXjtDFvSwRks9p9+jSaNe5ctV2SsZIROEhJY6xaBwuaioB\nWSx7DLUKxAHzIx48z/1xjZfB9njsGTPjr98vaSQ/83x5ng9fwsfffD3DOGzbtgUAMMqsVAcAACQf\n5Q4ABqLcAcBAlDsAGIhyBwADUe4AYKC4yn1oaEher1cVFRUjzlmWpZycHHm9Xnm9Xu3atSvpIQEA\nE5MZz6B9+/bJ7Xbr2rVro54vKytTIBBIajAAwOSNu3IPhUIKBoPavHmzxnq/E++DAoD0Mu7KfefO\nndq7d6+uXr066nmHw6G2tjZ5PB65XC41NjbK7XaPGAMAmLjJLp5jrtwPHz6s+fPny+v1jnmD0tJS\n9fT06PTp09q6dauqqqrGDJjujx/84Acpz0BOMpKTnJ8/EhGz3Nva2hQIBLRo0SJVV1fr7bff1oYN\nG6LGzJkzR06nU5JUXl6ucDisK1euJBQKAJCYmOW+e/du9fT06Pz582ppadHjjz+ugwcPRo3p6+sb\n/g7T0dEh27aVm5s7dYkBAOOK69Uyn/t877y5uVmSVFtbq0OHDqmpqUmZmZlyOp1qaWlJfsp7xO/3\npzpCXMiZPNMho0TOZJsuORPhsBPd2InnJg5HwvtHADDTJNKdvEMVAAxEuQOAgSh3ADAQ5Q4ABqLc\nAcBAlDsAGIhyBwADUe4AYCDKHQAMRLkDgIEodwAwEOUOAAai3AHAQJQ7ABiIcgcAA1HuAGAgyh0A\nDES5A4CBKHcAMFBc5T40NCSv16uKiopRz2/btk2LFy+Wx+NRZ2dnUgMCkLq6unTq1Cldv3491VEw\nTcRV7vv27ZPb7ZbD4RhxLhgM6ty5czp79qxee+011dXVJT0kMFPZtq0tW/6ziooe06pVm/Xgg4/o\nww8/THUsTAPjlnsoFFIwGNTmzZtH/RTuQCCgmpoaSZLP51N/f7/6+vqSnxSYgY4cOaKf/SyoW7fO\n6urVD3T58g/1l3+5IdWxMA1kjjdg586d2rt3r65evTrq+d7eXhUUFAwf5+fnKxQKKS8vL2pcfX39\n8Nd+v19+v39yiYEZ5Pe//73C4X8vae6fnvkrdXX9x1RGwhSyLEuWZSXlWjHL/fDhw5o/f768Xm/M\nG969oh9t++bOcgcQn6KiImVl/USDg/36/wX/Cy1cWJTqWJgidy98GxoaJn2tmOXe1tamQCCgYDCo\nW7du6erVq9qwYYMOHjw4PMblcqmnp2f4OBQKyeVyTToQgH/z1FNPaePGt/XTny7Wffe5lJV1Wb/8\nZTDVsTANOOzRNtJHcfz4cTU2NurXv/511PPBYFD79+9XMBhUe3u7duzYofb29uibOByj7tcDiE9X\nV5cuX76sRx55RNnZ2amOg3skke4cd8/97htJUnNzsySptrZWa9asUTAYVGFhobKzs3XgwIFJBQEw\ntoULF2rhwoWpjoFpJO6Ve0I3YeUOABOWSHfyDlUAMBDlDgAGotwBwECUOwAYiHIHAANR7gBgIMod\nAAxEuQOAgSh3ADAQ5Q4ABqLcAcBAlDsAGIhyBwADUe4AYCDKHQAMRLkDgIEodwAwEOUOAAaKWe63\nbt2Sz+dTSUmJ3G63XnzxxRFjLMtSTk6OvF6vvF6vdu3aNWVhAQDxifkB2Q888ICOHTsmp9Op27dv\na/ny5Tpx4oSWL18eNa6srEyBQGBKgwIA4jfutozT6ZQkDQ4OamhoSLm5uSPG8OHXAJBeYq7cJSkS\niai0tFQff/yx6urq5Ha7o847HA61tbXJ4/HI5XKpsbFxxBhJqq+vH/7a7/fL7/cnHB4ATGJZlizL\nSsq1HHacy+5PP/1Uq1ev1p49e6KK+dq1a8rIyJDT6VRra6u2b9+uM2fORN/E4WB1DwATlEh3xv1q\nmZycHK1du1anTp2Ken7OnDnDWzfl5eUKh8O6cuXKpMIAAJIjZrlfunRJ/f39kqSbN2/q6NGj8nq9\nUWP6+vqGv7N0dHTItu1R9+UBAPdOzD33ixcvqqamRpFIRJFIROvXr9eqVavU3NwsSaqtrdWhQ4fU\n1NSkzMxMOZ1OtbS03JPgAICxxb3nntBN2HMHgAm7J3vuAIDpg3IHAANR7gBgIModAAxEuQOAgSh3\nADAQ5Q4ABqLcAcBAlDsAGIhyBwADUe4AYCDKHQAMRLkDgIEodwAwEOUOAAai3AHAQJQ7ABiIcgcA\nA1HuAGCgmOV+69Yt+Xw+lZSUyO1268UXXxx13LZt27R48WJ5PB51dnZOSVAAQPwyY5184IEHdOzY\nMTmdTt2+fVvLly/XiRMntHz58uExwWBQ586d09mzZ3Xy5EnV1dWpvb19yoMjvb333ns6ffq0Hn74\nYT3xxBNyOBypjgTMKONuyzidTknS4OCghoaGlJubG3U+EAiopqZGkuTz+dTf36++vr4piIrp4uWX\nG/XEE9/W9773W/3FX2zTd76zPdWRgBkn5spdkiKRiEpLS/Xxxx+rrq5Obrc76nxvb68KCgqGj/Pz\n8xUKhZSXlxc1rr6+fvhrv98vv9+fWHKkpU8++UT19f9Ng4O/l+SSdE3/9E9ubdv2H7Rs2bJUxwPS\nmmVZsiwrKdcat9xnzZqlDz74QJ9++qlWr14ty7JGFLNt21HHo/0v+J3lDnNdvnxZWVn/ToODrj89\nM0dZWYXq6+uj3IFx3L3wbWhomPS14n61TE5OjtauXatTp05FPe9yudTT0zN8HAqF5HK57v7lmCG+\n/OUvKztbkn4qaUhSUENDv5PH40ltMGCGiVnuly5dUn9/vyTp5s2bOnr0qLxeb9SYyspKHTx4UJLU\n3t6uuXPnjtiSwcxx33336dixN/XQQ6/K4bhP8+d/V62tv9QXv/jFVEcDZpSY2zIXL15UTU2NIpGI\nIpGI1q9fr1WrVqm5uVmSVFtbqzVr1igYDKqwsFDZ2dk6cODAPQmO9OV2u/Xxx/9XQ0NDysjISHUc\nYEZy2HdvmE/FTRyOEfvyAIDYEulO3qEKAAai3AHAQJQ7ABiIcgcAA1HuAGAgyh0ADES5A4CBKHcA\nMBDlDgAGotwBwECUOwAYiHIHAANR7gBgIModAAxEuQOAgSh3ADAQ5Q4ABqLcAcBAlDsAGChmuff0\n9GjlypVasmSJli5dqldffXXEGMuylJOTI6/XK6/Xq127dk1ZWABAfDJjnczKytIrr7yikpISDQwM\n6NFHH9WTTz6poqKiqHFlZWUKBAJTGhQAEL+YK/cFCxaopKREkjR79mwVFRXpwoULI8ZN9tO5AQBT\nI+bK/U5dXV3q7OyUz+eLet7hcKitrU0ej0cul0uNjY1yu90jfn19ff3w136/X36/f9KhAcBElmXJ\nsqykXMthx7HsHhgYkN/v10svvaSqqqqoc9euXVNGRoacTqdaW1u1fft2nTlzJvomDgerewCYoES6\nc9xyD4fDevrpp1VeXq4dO3aMe8FFixbpt7/9rXJzc5MSEABmqkS6M+aeu23b2rRpk9xu95jF3tfX\nN3zzjo4O2bYdVewAgHsv5p77u+++q9dff13FxcXyer2SpN27d6u7u1uSVFtbq0OHDqmpqUmZmZly\nOp1qaWmZ+tQAgJji2nNP+CZsywDAhE3ZtgwAYHqi3AHAQJQ7ABiIcgcAA1HuAGAgyh0ADES5A4CB\nKHcAMBDlDgAGotwBwECUOwAYiHIHAANR7gBgIModAAxEuQOAgSh3ADAQ5Q4ABqLcAcBAlDsAGChm\nuff09GjlypVasmSJli5dqldffXXUcdu2bdPixYvl8XjU2dk5JUEBAPHLjHUyKytLr7zyikpKSjQw\nMKBHH31UTz75pIqKiobHBINBnTt3TmfPntXJkydVV1en9vb2KQ+ebEePHtXrr/8fzZnj1Pe+t0UP\nPfRQqiMBmAInT57UT35yUBkZGfrudzeruLg41ZGmRMyV+4IFC1RSUiJJmj17toqKinThwoWoMYFA\nQDU1NZIkn8+n/v5+9fX1TVHcqfHGG/9bVVV/rYMHi9TU9Gfyer+p8+fPpzoWgCSzLEuPP16hf/iH\nhXrttfn65jdXGbvbEHPlfqeuri51dnbK5/NFPd/b26uCgoLh4/z8fIVCIeXl5UWNq6+vH/7a7/fL\n7/dPLvEU+Nu/3asbN34m6UlFItLAwGdqbv6p9uz576mOBiCJ6uv/h27caJS0QZJ0/bpTe/a8qjfe\nOJDaYH9iWZYsy0rKteIq94GBAa1bt0779u3T7NmzR5y3bTvq2OFwjBhzZ7mnm1u3bkmaN3wcieTq\n5s0rqQsEYErcvBn9d12apxs3PktVnBHuXvg2NDRM+lrjvlomHA7rmWee0fPPP6+qqqoR510ul3p6\neoaPQ6GQXC7XpAOlwqZNz8nprJP0rqRfyuncp+rqdamOBSDJamufk9P5nyS9LalVTucP9J3vVKc6\n1pSIuXK3bVubNm2S2+3Wjh07Rh1TWVmp/fv369lnn1V7e7vmzp07Yksm3b300n9RZmaG/vEfdyo7\n26mXXz6ob3zjG6mOBSDJXnjhrxUOh/X3f/9flZGRob/5m79TRUVFqmNNCYd9957KHU6cOKFvfetb\nKi4uHt5q2b17t7q7uyVJtbW1kqQtW7boyJEjys7O1oEDB1RaWhp9E4djxNYNACC2RLozZrknC+UO\nABOXSHfyDlUAMBDlDgAGotwBwECUOwAYiHIHAANR7gBgIModAAxEuQOAgSh3ADAQ5Q4ABqLcAcBA\nlDsAGIhyBwADUe4AYCDKHQAMRLkDgIEodwAwEOUOAAai3AHAQOOW+wsvvKC8vDwtW7Zs1POWZSkn\nJ0der1der1e7du1KekgAwMRkjjdg48aN2rp1qzZs2DDmmLKyMgUCgaQGAwBM3rgr9xUrVmjevHkx\nx0z207kBAFNj3JX7eBwOh9ra2uTxeORyudTY2Ci32z1iXH19/fDXfr9ffr8/0VsDgFEsy5JlWUm5\nlsOOY9nd1dWliooKffjhhyPOXbt2TRkZGXI6nWptbdX27dt15syZ6Js4HKzuAWCCEunOhF8tM2fO\nHDmdTklSeXm5wuGwrly5kuhlAQAJSLjc+/r6hr+zdHR0yLZt5ebmJhwMADB54+65V1dX6/jx47p0\n6ZIKCgrU0NCgcDgsSaqtrdWhQ4fU1NSkzMxMOZ1OtbS0THloAEBsce25J3wT9twBYMJSuucOAEg/\nlDsAGIhyBwADUe4AYCDKHQAMRLkDgIEodwAwEOUOAAai3AHAQJQ7ABiIcgcAA1HuAGAgyh0ADES5\nA4CBKHcAMBDlDgAGotwBwECUOwAYiHIHAAPFLPcXXnhBeXl5WrZs2Zhjtm3bpsWLF8vj8aizszPp\nAQEAExez3Ddu3KgjR46MeT4YDOrcuXM6e/asXnvtNdXV1SU9IKafK1eu6Nvf3qiHH/bqqafWqbu7\nO9WRgBknZrmvWLFC8+bNG/N8IBBQTU2NJMnn86m/v199fX3JTYhpJRKJ6PHHK/Qv//Jn+td//Yne\nesujP//zVbp+/XqqowEzSmYiv7i3t1cFBQXDx/n5+QqFQsrLyxsxtr6+fvhrv98vv9+fyK2Rprq7\nu3XmzHkNDv5G0iwNDX1NAwNBnTp1SmVlZamOB6Q1y7JkWVZSrpVQuUuSbdtRxw6HY9Rxd5Y7zHX/\n/fcrErkl6ZYkp6QhRSKf6v77709xMiD93b3wbWhomPS1Enq1jMvlUk9Pz/BxKBSSy+VK5JKY5r70\npS+psrJCTucaSf9TDzzwV1qy5Et67LHHUh0NmFESKvfKykodPHhQktTe3q65c+eOuiWDmeWf//l/\n6eWX16m6+pReeunrsqw3lZGRkepYwIzisO/eV7lDdXW1jh8/rkuXLikvL08NDQ0Kh8OSpNraWknS\nli1bdOTIEWVnZ+vAgQMqLS0deROHY8T2DQAgtkS6M2a5JwvlDgATl0h38g5VADAQ5Q4ABqLcAcBA\nlDsAGIhyBwADUe4AYCDKHQAMRLkDgIEodwAwEOUOAAai3AHAQJQ7ABiIcgcAA1HuAGAgyh0ADES5\nA4CBKHcAMBDlDgAGotzvYFlWqiPEhZzJMx0ySuRMtumSMxHjlvuRI0f0yCOPaPHixfrRj3404rxl\nWcrJyZHX65XX69WuXbumJOi9MF3+wMmZPNMho0TOZJsuORORGevk0NCQtmzZorfeeksul0uPPfaY\nKisrVVRUFDWurKxMgUBgSoMCAOIXc+Xe0dGhwsJCLVy4UFlZWXr22Wf1q1/9asS4yX46NwBgitgx\n/OIXv7A3b948fPzzn//c3rJlS9QYy7Ls3Nxcu7i42C4vL7d/97vfjbiOJB48ePDgMYnHZMXclnE4\nHLFOS5JKS0vV09Mjp9Op1tZWVVVV6cyZM1FjWNkDwL0Vc1vG5XKpp6dn+Linp0f5+flRY+bMmSOn\n0ylJKi8vVzgc1pUrV6YgKgAgXjHL/Wtf+5rOnj2rrq4uDQ4O6o033lBlZWXUmL6+vuGVeUdHh2zb\nVm5u7tQlBgCMK+a2TGZmpvbv36/Vq1draGhImzZtUlFRkZqbmyVJtbW1OnTokJqampSZmSmn06mW\nlpZ7EhwAEMOkd+tH0draan/1q1+1CwsL7T179ow4f+zYMfsLX/iCXVJSYpeUlNg//OEPk3n7uGzc\nuNGeP3++vXTp0jHHbN261S4sLLSLi4vt999//x6m+zfj5UyHubRt2+7u7rb9fr/tdrvtJUuW2Pv2\n7Rt1XKrnNJ6c6TCnN2/etL/+9a/bHo/HLioqsr///e+POi6V8xlPxnSYy8/dvn3bLikpsZ9++ulR\nz6f6v83Pxco5mflMWrnfvn3bfvjhh+3z58/bg4ODtsfjsT/66KMRASsqKpJ1y0l555137Pfff3/M\n0nzzzTft8vJy27Ztu7293fb5fPcy3rDxcqbDXNq2bV+8eNHu7Oy0bdu2r127Zn/lK18Z8eeeDnMa\nT850mdPr16/btm3b4XDY9vl89m9+85uo8+kwn+NlTJe5tG3b/vGPf2w/99xzo+ZJh7n8XKyck5nP\npP3zA9PlNfErVqzQvHnzxjwfCARUU1MjSfL5fOrv71dfX9+9ijdsvJxS6udSkhYsWKCSkhJJ0uzZ\ns1VUVKQLFy5EjUmHOY0np5Qec/r5CxQGBwc1NDQ04mdY6TCf42WU0mMuQ6GQgsGgNm/ePGqedJhL\nafyc0sTnM2nl3tvbq4KCguHj/Px89fb2Ro1xOBxqa2uTx+PRmjVr9NFHHyXr9kkz2u8jFAqlMNHo\n0nEuu7q61NnZKZ/PF/V8us3pWDnTZU4jkYhKSkqUl5enlStXyu12R51Ph/kcL2O6zOXOnTu1d+9e\nzZo1etWlw1xK4+eczHwmrdwn8pr406dPa+vWraqqqkrW7ZPq7u+Q8fze7rV0m8uBgQGtW7dO+/bt\n0+zZs0ecT5c5jZUzXeZ01qxZ+uCDDxQKhfTOO++M+u+gpHo+x8uYDnN5+PBhzZ8/X16vN+aqN9Vz\nGU/Oycxn0srdlNfE3/37CIVCcrlcKUw0unSay3A4rGeeeUbPP//8qP/RpcucjpczneZUknJycrR2\n7VqdOnUq6vl0mU9p7IzpMJdtbW0KBAJatGiRqqur9fbbb2vDhg1RY9JhLuPJOan5TGD/P0o4HLYf\neugh+/z58/Znn3026g9U//CHP9iRSMS2bds+efKk/eCDDybr9hNy/vz5uH6g+t5776X0ByyxcqbL\nXEYiEXv9+vX2jh07xhyTDnMaT850mNM//vGP9ieffGLbtm3fuHHDXrFihf3WW29FjUn1fMaTMR3m\n8k6WZY36KpRUz+Xdxso5mfmM+Tr3iZgur4mvrq7W8ePHdenSJRUUFKihoUHhcHg445o1axQMBlVY\nWKjs7GwdOHDgnmeMJ2c6zKUkvfvuu3r99ddVXFwsr9crSdq9e7e6u7uHs6bDnMaTMx3m9OLFi6qp\nqVEkElEkEtH69eu1atWqqL9HqZ7PeDKmw1ze7fPtlnSay9GMlnMy8+mw7TT4kTYAIKn4JCYAMBDl\nDgAGotwBwECUOwAYiHIHAANR7gBgoP8Hl16hW9vWgIwAAAAASUVORK5CYII=\n",
"text": [
"<matplotlib.figure.Figure at 0x11018f690>"
]
}
],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Vectors are thus exactly what we're used to computing pairwise similarity between when we talk about \"correlation\" or \"distance.\" We can construct two more interesting vectors that aren't at all similar to each other:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"aOne = standard_normal( 10 )\n",
"print( aOne )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[-1.50322771 0.11841839 0.49731289 -0.50495611 1.14888779 -1.09181061\n",
" -0.67351249 0.08475123 -0.02766802 0.54163642]\n"
]
}
],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"aTwo = standard_normal( 10 )\n",
"print( aTwo )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[ 0.46774625 0.72418922 -0.42905957 -0.12168353 0.21578108 -0.41264106\n",
" -0.99838145 0.09369821 -1.27559825 0.20019431]\n"
]
}
],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"scatter( aOne, aTwo )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 5,
"text": [
"<matplotlib.collections.PathCollection at 0x11069ad90>"
]
},
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAD9CAYAAAC7iRw+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFt5JREFUeJzt3X1wVOWhx/HfkpeGRF4FQmY3GiHRBAwBGo2UG10MEUGI\noWM11NtJKZfmohRtp9Zrr7agwoVRx45ifRtfoLdV1CsvU5IIelmwYIgaprUXlKCim0AiELBIognr\nc//QZsQkZNkNObt5vp+ZzGTPPjnn50P8zcnZ8+IyxhgBAKzRz+kAAIDeRfEDgGUofgCwDMUPAJah\n+AHAMhQ/AFgmrOL/yU9+ouTkZGVnZ3c5ZtGiRcrIyFBOTo527doVzuYAAD0grOKfO3euKisru3y/\nvLxc+/btU21trZ544gktWLAgnM0BAHpAWMWfn5+vIUOGdPn+hg0bVFpaKknKy8vTsWPH1NjYGM4m\nAQBhij2bK6+vr1dqamr7a4/Ho7q6OiUnJ58yzuVync0YANBnhXLzhbP+4e63Q3VV8saYqP367W9/\n63gGG7OT3/kv8jv7FaqzWvxut1t+v7/9dV1dndxu99ncJACgG2e1+IuKirR69WpJUlVVlQYPHtzh\nMA8AoHeFdYx/zpw52rp1qw4fPqzU1FQtWbJEbW1tkqSysjLNmDFD5eXlSk9PV1JSkp555pkeCR1p\nvF6v0xFCFs3ZJfI7jfzRyWXCOVDUUyFcrrCOVwGAjULtTq7cBQDLUPwAYBmKHwAsQ/EDgGUofgCw\nDMUPAJah+AHAMhQ/AFiG4gcAy1D8AGAZih8ALEPxA4BlKH4AsAzFDwCWofgBwDIUPwBYJqwncAHo\nHW1tbfrjH/+ouro6TZo0SQUFBU5HQhTjCVxAhAsEArryypl6++0WtbRcpoSENVq8eJFuu+3nTkeD\nw0LtToofiHCvvPKKrrvuP/TZZ29JipH0seLiMnXixKeKi4tzOh4cxKMXgT7q6NGjcrlG6avSlySP\njHGppaXFyViIYhQ/EOEmT56sL7/cJmmdpEOKjb1DY8eO18CBA52OhihF8QMRLjU1VZWVa5WWdpcS\nEy/S9773d1VW/o/TsRDFOMYPAFGKY/wAgKBQ/ABgGYofACxD8QOAZSh+ALAMxQ8AlqH4AcAyFD8A\nWIbiBwDLUPwAYBmKHwAsQ/EDgGXCLv7KykplZmYqIyNDK1as6PC+z+fToEGDNGHCBE2YMEH33ntv\nuJsEAIQhrGfuBgIBLVy4UK+++qrcbrcuueQSFRUVKSsr65RxV1xxhTZs2BBWUABAzwhrj7+6ulrp\n6elKS0tTXFycSkpKtH79+g7juOUyAESOsPb46+vrlZqa2v7a4/Fo586dp4xxuVzasWOHcnJy5Ha7\ndf/992vMmDEd1rV48eL2771er7xebzjRAKDP8fl88vl8Ya8nrOJ3uVzdjpk4caL8fr8SExNVUVGh\n4uJi7d27t8O4bxZ/tPjkk0/04IMP6ZNPjqq4+GrNmjXL6UgA+rBv7xQvWbIkpPWEdajH7XbL7/e3\nv/b7/fJ4PKeMGTBggBITEyVJ06dPV1tbm5qamsLZbERoampSTs4k3X9/k55++kKVlNyiRx551OlY\nANCtsIo/NzdXtbW12r9/v1pbW7VmzRoVFRWdMqaxsbH9GH91dbWMMRo6dGg4m40If/rTn3TsWJ5O\nnvy9pFvU3LxOd9211OlYANCtsA71xMbGauXKlZo2bZoCgYDmzZunrKwsPf7445KksrIyvfTSS3r0\n0UcVGxurxMREPf/88z0S3GktLS0KBIZ9Y8kwffFFi2N5ACBYPGw9RO+++66++918NTc/LOlC9e//\na82ZM1pPPfWI09EAWCLU7qT4w7B9+3YtWvSfOnKkSddee7Xuu+9excfHOx0LgCUofgCwTKjdyb16\nAMAyFD8AWIbiBwDLUPwAYBmKHwAsQ/EDgGUofgCwDMUPAJah+AHAMhQ/AFiG4gcAy1D8AGAZih8A\nLEPxA4BlKH4AsAzFDwBnwBijpqYm/eMf/3A6SsgofgAI0okTJ1RQMEspKWkaNixFpaVlCgQCTsc6\nYxQ/gLAZY/Tss6t15ZXFmj37X/XXv/7V6UhnxS9+8Wu98cZAtbYeUVtbg156aY9Wrvy907HOGMUP\nIGwPPfSIbr55mbZsuVHr1uVq8uSpeu+995yO1eNef71an39+k6Q4SQPU3DxXW7dWOx3rjFH8AML2\nwAOPqrl5taQfSLpVzc3ztGrVH5yO1eNGjTpPMTFbv35l9J3vbFV6+nmOZgpFrNMBAES/rx747frG\nktAeAh7pVq5coby8Kfr88y0y5jO53Sd1550POR3rjLlMBPzrhPqkeACR4cEHH9Kddz6m5uZ7JdUr\nKeluvfnmNmVlZTkdrcd9+umn2rZtm+Lj43XFFVcoISHBsSyhdifFDyBsxhg9/fSzWrXqZQ0adI7u\nvvtXmjBhgtOx+jyKHwAsE2p38uEuAFiG4gcAy1D8QIQyxuj999/XO++8o9bWVqfjoA/hdE4gAgUC\nAf3gB6WqrHxNMTEDNXx4vF5/vVJut9vpaOgD2OMHItCTTz6pV17xq6XlA3322bvy+4s1b94ip2Oh\nj6D4gQi0a9duNTcXS+ovyaWTJ0v0zjv/53Qs9BEUPxCBsrMvUmLinyV9IUmKiXlZWVmZzoZCn8F5\n/EAEOnnypK69do58vp2KjR2iQYO+0Pbtm5Wamup0NEQQLuAC+hhjjPbs2aPm5mZdfPHFjt4aAJGJ\n4gcAyzh25W5lZaUyMzOVkZGhFStWdDpm0aJFysjIUE5Ojnbt2hXuJgEAYQir+AOBgBYuXKjKykrt\n3r1bzz33nPbs2XPKmPLycu3bt0+1tbV64okntGDBgrACAwDCE1bxV1dXKz09XWlpaYqLi1NJSYnW\nr19/ypgNGzaotLRUkpSXl6djx46psbExnM0CAMIQ1pW79fX1p5xl4PF4tHPnzm7H1NXVKTk5+ZRx\nixcvbv/e6/XK6/WGEw0A+hyfzyefzxf2esIqfpfL1f0gqcOHD5393DeLHwDQ0bd3ipcsWRLSesI6\n1ON2u+X3+9tf+/1+eTye046pq6vjfiMA4KCwij83N1e1tbXav3+/WltbtWbNGhUVFZ0ypqioSKtX\nr5YkVVVVafDgwR0O8wAAek9Yh3piY2O1cuVKTZs2TYFAQPPmzVNWVpYef/xxSVJZWZlmzJih8vJy\npaenKykpSc8880yPBAcAhIYLuAAgSvHoRQBAUCh+ALAMxQ8AlqH4EZWOHDmigoIixccnavjw87Vu\n3TqnIwFRgw93EZWmTJmp7dvPV1vbf0l6R/37z1ZV1asaN26c09GAXsOHu7CGMUavv75ZbW33SRoo\nabKMuU5bt251OhoQFSh+RB2Xy6UBA86VtPvrJUYxMXt07rnnOhkLiBoUP6LSo48+qP79Zyo+/hYl\nJU1VZuaXuu6665yOBUQFjvEjar399tvaunWrhg0bppKSEsXHxzsdCehVPHoRACzDh7sAgKBQ/ABg\nGYofACxD8QOAZSh+ALAMxQ8AlqH4AcAyFD8AWIbiBwDLUPwAYBmKHwAsQ/EDgGUofgCwDMUPAJah\n+AHAMhQ/AFiG4gcAy1D8AGAZir+P45GWAL6N4u+j9u3bpzFjLlVsbJxSUkZr27ZtTkcCECF42Hof\nFAgEdMEFY1VXt0DG/Luk13TOOT9Wbe3fNHLkSKfjoQ8wxmjdunXaufNNjRqVprlz5youLs7pWNbh\nYetod+DAAR0+/KmMuUXSdyTNUL9+E1RTU+N0NPQRt99+l370ozu1YkWCfv7zF3TVVcUKBAJOx0KQ\nKP4+aMiQIQoEjkvyf72kRYFArYYPH+5kLPQRx48f1+9+96BOnPBJ+o2amyv11lsfafv27U5HQ5Ao\n/j7onHPO0T333KPExH9RQsJNSkq6TLNmeZWbm+t0NPQBJ06cUExMf0nDvl4Sq3793Dp+/LiTsXAG\nOMbfh/3lL39RTU2NLrjgAs2cOVMul8vpSOgDjDHKzr5M773n1cmTN0naosGDf63a2r9p2LBh3f48\nek6o3UnxAzhjjY2NuvHGMtXUvCWP53z94Q+/V05OjtOxrNPrxd/U1KQbbrhBH330kdLS0vTCCy9o\n8ODBHcalpaVp4MCBiomJUVxcnKqrq3ssPADYrNfP6lm+fLkKCwu1d+9eFRQUaPny5V0G8/l82rVr\nV6elDwDoXSEX/4YNG1RaWipJKi0t1bp167ocy948AESO2FB/sLGxUcnJyZKk5ORkNTY2djrO5XJp\n6tSpiomJUVlZmebPn9/puMWLF7d/7/V65fV6Q40GAH2Sz+eTz+cLez2nPcZfWFiohoaGDsuXLl2q\n0tJSHT16tH3Z0KFD1dTU1GHswYMHlZKSokOHDqmwsFAPP/yw8vPzTw3BMX4AOGOhdudp9/g3b97c\n5XvJyclqaGjQyJEjdfDgQY0YMaLTcSkpKZKk4cOHa/bs2aquru5Q/ACA3hPyMf6ioiKtWrVKkrRq\n1SoVFxd3GNPc3Nx+UceJEye0adMmZWdnh7pJAEAPCOt0zuuvv14ff/zxKadzHjhwQPPnz9fGjRv1\nwQcf6Pvf/74k6eTJk7rxxht1xx13dAzBoR4AOGNcwAUAluHunACAoFD8AGAZih8ALEPxA4BlKH4A\nsAzFDwCWofgBwDIUPwBYhuIHAMtQ/ABgGYofACxD8QOAZSh+ALAMxQ8AlqH4AcAyFD8AWIbiBwDL\nUPwAYBmKHwAsQ/EDgGUofgCwDMUPAJah+AHAMhQ/AFiG4gcAy1D8AGAZih8ALEPxA4BlKH4AsAzF\nDwCWofgBwDIUPwBYhuIHAMtQ/ABgGYofACwTcvG/+OKLGjt2rGJiYlRTU9PluMrKSmVmZiojI0Mr\nVqwIdXMAgB4ScvFnZ2dr7dq1uvzyy7scEwgEtHDhQlVWVmr37t167rnntGfPnlA3iT6ktbVVv/nN\nPZoy5Vr99KeLdPjwYacjAdaIDfUHMzMzux1TXV2t9PR0paWlSZJKSkq0fv16ZWVlhbpZ9BHXX/9j\nbdp0TC0t/6bt2/9Xr702RX//e7X69+/vdDSgzwu5+INRX1+v1NTU9tcej0c7d+7sdOzixYvbv/d6\nvfJ6vWczGhzU1NSkioo/q7W1UVJ/tbXN1qFDk7Vt2zZNmzbN6XhAxPL5fPL5fGGv57TFX1hYqIaG\nhg7Lly1bplmzZnW7cpfLFXSQbxY/+jZjzNff/fNIo0tSzDeWA+jMt3eKlyxZEtJ6Tlv8mzdvDmml\n/+R2u+X3+9tf+/1+eTyesNaJ6HfuueeqoOAq+Xw3qKVlvmJjt2jIkMPKz893OhpghR45nbOrPbXc\n3FzV1tZq//79am1t1Zo1a1RUVNQTm0SUe/nl/9bChRfrssse1pw5x1Vd7VNSUpLTsQAruEyIf1+v\nXbtWixZ9dTbGoEGDNGHCBFVUVOjAgQOaP3++Nm7cKEmqqKjQrbfeqkAgoHnz5umOO+7oGMLl4s98\nADhDoXZnyMXfkyh+ADhzoXYnV+4CgGUofgCwDMUPAJah+AHAMhQ/AFiG4gcAy1D8AGAZih8ALEPx\nA4BlKH4AsAzFDwCWofgBwDIUPwBYhuIHAMtQ/ABgGYofACxD8QOAZSh+oBft2LFDF1yQrf79B2nS\npELV1dU5HQkW4tGLQC85cOCALrpovD777HFJVygm5iFlZPxZu3e/KZfL5XQ8RCEevQhEuJ07d6pf\nvzxJsyUNVSDwW3344fs6fPiw09FgGYof6CVDhgzRl19+IKnt6yUH9OWXX2jAgAFOxoKFKH6gl1x+\n+eWaPPkiJSV5FRv7KyUm5uvuu+9WQkKC09FgGY7xA70oEAjo+eef18cff6xLL71UBQUFTkdCFAu1\nOyl+AIhSfLgLAAgKxQ8AlqH4AcAyFD8AWIbiBwDLUPwAYBmKHwAsQ/EDgGUofgCwDMUPAJah+AHA\nMhR/D/D5fE5HCFk0Z5fI7zTyR6eQi//FF1/U2LFjFRMTo5qami7HpaWlady4cZowYYIuvfTSUDcX\n0aL5lyeas0vkdxr5o1NsqD+YnZ2ttWvXqqys7LTjXC6XfD6fhg4dGuqmAAA9KOTiz8zMDHost1wG\ngMgR9v34p0yZogceeEATJ07s9P1Ro0Zp0KBBiomJUVlZmebPn98xBA+aBoCQhFLhp93jLywsVEND\nQ4fly5Yt06xZs4LawPbt25WSkqJDhw6psLBQmZmZys/PP2UMfxEAQO85bfFv3rw57A2kpKRIkoYP\nH67Zs2erurq6Q/EDAHpPj5zO2dUee3Nzs44fPy5JOnHihDZt2qTs7Oye2CQAIEQhF//atWuVmpqq\nqqoqXXPNNZo+fbok6cCBA7rmmmskSQ0NDcrPz9f48eOVl5enmTNn6qqrruqZ5ACA0BgH/PKXvzSZ\nmZlm3LhxZvbs2ebYsWOdjquoqDAXXXSRSU9PN8uXL+/llF174YUXzJgxY0y/fv3M22+/3eW4888/\n32RnZ5vx48ebSy65pBcTnl6w+SN1/o8cOWKmTp1qMjIyTGFhoTl69Gin4yJp/oOZy5/97GcmPT3d\njBs3ztTU1PRywtPrLv+WLVvMwIEDzfjx48348ePNPffc40DKzs2dO9eMGDHCXHzxxV2OieS57y5/\nKHPvSPFv2rTJBAIBY4wxt99+u7n99ts7jDl58qQZPXq0+fDDD01ra6vJyckxu3fv7u2ondqzZ495\n7733jNfrPW1xpqWlmSNHjvRisuAEkz+S5/+2224zK1asMMYYs3z58k5/f4yJnPkPZi43btxopk+f\nbowxpqqqyuTl5TkRtVPB5N+yZYuZNWuWQwlPb9u2baampqbL4ozkuTem+/yhzL0jt2woLCxUv35f\nbTovL091dXUdxlRXVys9PV1paWmKi4tTSUmJ1q9f39tRO5WZmakLL7wwqLEmAs9YCiZ/JM//hg0b\nVFpaKkkqLS3VunXruhwbCfMfzFx+878pLy9Px44dU2NjoxNxOwj2dyES5roz+fn5GjJkSJfvR/Lc\nS93nl8587h2/V8/TTz+tGTNmdFheX1+v1NTU9tcej0f19fW9GS1sLpdLU6dOVW5urp588kmn45yR\nSJ7/xsZGJScnS5KSk5O7/J80UuY/mLnsbExnO0ROCCa/y+XSjh07lJOToxkzZmj37t29HTNkkTz3\nwQhl7kO+crc7wVwDsHTpUsXHx+uHP/xhh3FOX9TVW9cwnC3h5o/U+V+6dOkpr10uV5dZnZz/bwp2\nLr+91+b0v8E/BZNj4sSJ8vv9SkxMVEVFhYqLi7V3795eSNczInXugxHK3J+14u/uGoBnn31W5eXl\neu211zp93+12y+/3t7/2+/3yeDw9mvF0ov0ahnDzR/L8Jycnq6GhQSNHjtTBgwc1YsSITsdFyjUk\nwczlt8fU1dXJ7Xb3WsbTCSb/gAED2r+fPn26brrpJjU1NUXFPboiee6DEcrcO3Kop7KyUvfdd5/W\nr1+vhISETsfk5uaqtrZW+/fvV2trq9asWaOioqJeTtq9ro6tRcs1DF3lj+T5Lyoq0qpVqyRJq1at\nUnFxcYcxkTT/wcxlUVGRVq9eLUmqqqrS4MGD2w9nOS2Y/I2Nje2/S9XV1TLGREXpS5E998EIae5D\n/aQ5HOnp6ea8885rP/1owYIFxhhj6uvrzYwZM9rHlZeXmwsvvNCMHj3aLFu2zImonXr55ZeNx+Mx\nCQkJJjk52Vx99dXGmFPzv//++yYnJ8fk5OSYsWPHRl1+YyJ3/o8cOWIKCgo6nM4ZyfPf2Vw+9thj\n5rHHHmsfc/PNN5vRo0ebcePGnfZsMSd0l3/lypVm7NixJicnx0yaNMm88cYbTsY9RUlJiUlJSTFx\ncXHG4/GYp556Kqrmvrv8ocx92DdpAwBEF8fP6gEA9C6KHwAsQ/EDgGUofgCwDMUPAJah+AHAMv8P\n2FOsmz87zhkAAAAASUVORK5CYII=\n",
"text": [
"<matplotlib.figure.Figure at 0x1105d3050>"
]
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we first consider a new vector that's almost exactly like number one, it looks similar:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Create a copy of aOne\n",
"aCloseToOne = aOne\n",
"print( aCloseToOne )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[-1.50322771 0.11841839 0.49731289 -0.50495611 1.14888779 -1.09181061\n",
" -0.67351249 0.08475123 -0.02766802 0.54163642]\n"
]
}
],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Create a vector of the same length that represents a small amount of noise\n",
"aNoise = standard_normal( len( aCloseToOne ) ) / 5\n",
"print( aNoise )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[-0.04827769 0.28817081 0.12999071 -0.09210943 -0.01443267 -0.21404219\n",
" 0.18433576 -0.22022444 -0.11599082 0.02015719]\n"
]
}
],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Add this noise into our copy of the first vector\n",
"aCloseToOne = aCloseToOne + aNoise\n",
"print( aCloseToOne )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[-1.5515054 0.4065892 0.62730359 -0.59706554 1.13445511 -1.3058528\n",
" -0.48917674 -0.13547321 -0.14365884 0.56179362]\n"
]
}
],
"prompt_number": 8
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"scatter( aOne, aCloseToOne )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 9,
"text": [
"<matplotlib.collections.PathCollection at 0x110733dd0>"
]
},
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAD9CAYAAAC7iRw+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGTFJREFUeJzt3X9wVPWh9/HP5ofERQSxYUl3A/GSYPgRktBgrjCp25pA\nQyQGbQvqHRm0TJQC17mPPtzeeZ7HMALCSO/UyijSizZYighTDNUQQMvq3GKMFLReQYk/sJuQ5CIB\nCcm1CeH7/KFmxPxazoacTc77NbMz2d3vnvPxa/hw+J49uy5jjBEAwDGi7A4AAOhfFD8AOAzFDwAO\nQ/EDgMNQ/ADgMBQ/ADhMWMV/zz33yOPxKC0trcvnA4GAhg8frszMTGVmZmrlypXh7A4A0Adiwnnx\nwoULtXTpUt19993djrnpppu0a9eucHYDAOhDYR3x5+Tk6JprrulxDNeHAUBkCeuIvzcul0sHDhxQ\nenq6vF6v1q1bp4kTJ3Y5DgBw6awcXF/Wk7tTp05VMBjUO++8o6VLl6qoqKjbscaYAXt7+OGHbc/g\nxOzkt/9GfntvVl3W4h82bJjcbrckKT8/X21tbWpsbLycuwQA9OKyFn9DQ0PH30pVVVUyxmjkyJGX\nc5cAgF6EtcZ/xx136LXXXtNnn32mxMRErVixQm1tbZKk4uJi7dixQ0899ZRiYmLkdrv1/PPP90no\nSOP3++2OYNlAzi6R327kH5hcJpyFor4K4XKFtV4FAE5ktTu5chcAHIbiBwCHofgBwGEofgBwGIof\nAByG4gcAh6H4AcBhKH4AcBiKHwAchuIHAIeh+AHAYSh+AHAYih8AHIbiBwCHofgBwGEofgAIkTFG\nTz65QZMmTVdGxk0qKyuzO5IlfBELAIToqac26sEHf6WWlvWSzsntvk9lZZuVm5trSx6+iAUALrMN\nG36nlpZfSfqhpEK1tPwfbdq01e5Yl4ziB4AQxcUNkfR5x32X64zi4q6wL5BFLPUAQIh2796tH/94\noVpa/lUu1zm53Y/rjTf+pLS0NFvyWO1Oih8ALsHrr7+uTZt+ryFDYvXP/3yfJk2aZFsWih8AHIaT\nuwCAkIRV/Pfcc488Hk+P61vLli1TSkqK0tPTdfjw4XB2BwDoA2EV/8KFC1VRUdHt8+Xl5frwww9V\nXV2tjRs36v777w9ndwCAPhATzotzcnJ0/Pjxbp/ftWuXFixYIEnKzs7WmTNn1NDQII/H02lsSUlJ\nx89+v19+vz+caAAw6AQCAQUCgbC3E1bx96a2tlaJiYkd930+n2pqanotfgBAZ98+KF6xYoWl7Vz2\nk7vfPuPscrku9y4BAD24rMXv9XoVDAY77tfU1Mjr9V7OXQIAenFZi7+wsFCbN2+WJFVWVmrEiBFd\nLvMAAPpPWGv8d9xxh1577TV99tlnSkxM1IoVK9TW1iZJKi4u1uzZs1VeXq7k5GQNHTpUzz77bJ+E\nBgBYx5W7ADBAceUuACAkFD8AOAzFDwAOQ/EDgMNQ/ADgMBQ/ADgMxQ8ADkPxA4DDUPwA4DAUPwA4\nDMUPAA5D8QOAw1D8AOAwFD8AOAzFDwAOc1m/bB3A4LV//34dPHhQSUlJuv322xUVxXHkQMEXsQC4\nZI8++phWrnxSbW1FGjLkz/rhD8fpxRd/L5fLZXc0R7HanRQ/gEvS3NyskSM9am39QJJX0t911VVT\ntGfPs5o+fbrd8RyFb+AC0C/Onj2rqCi3pO9+9cgQRUeP06lTp+yMhUtA8QO4JB6PRwkJHkVFrZF0\nVlKZ2tsP6vrrr9err76qyspKXbhwwe6Y6AFLPQAu2aeffqrbbrtb//VfBzV69FitWfN/tWzZcrW1\nJam9/b81bVqy9uzZqdjYWLujDmqs8QOwTXZ2rt5661YZs1RSm9zuAj32WJEWL15sd7RBzbY1/oqK\nCqWmpiolJUVr167t9HwgENDw4cOVmZmpzMxMrVy5MtxdAogwn3zysYz50Vf3YtXScrPef/8jWzOh\ne2G9j7+9vV1LlizRK6+8Iq/Xq2nTpqmwsFATJky4aNxNN92kXbt2hRUUQOTKzMzQn/70Hzp//st1\n/6FDt+uGGx6wOxa6EdYRf1VVlZKTk5WUlKTY2FjNnz9fZWVlncaxjAMMbqWlTyo5+VW53T5dccUY\n3XnndN111112x0I3wjrir62tVWJiYsd9n8+nN99886IxLpdLBw4cUHp6urxer9atW6eJEyd22lZJ\nSUnHz36/X36/P5xoAPrR6NGj9d57VQoGgxo6dKi+853v2B1pUAoEAgoEAmFvJ6ziD+UqvalTpyoY\nDMrtdmv37t0qKirSsWPHOo37ZvEDGHiioqI0duxYu2MMat8+KF6xYoWl7YS11OP1ehUMBjvuB4NB\n+Xy+i8YMGzZMbrdbkpSfn6+2tjY1NjaGs1sAQBjCKv6srCxVV1fr+PHjam1t1bZt21RYWHjRmIaG\nho41/qqqKhljNHLkyHB2CwAIQ1hLPTExMVq/fr1mzZql9vZ23XvvvZowYYKefvppSVJxcbF27Nih\np556SjExMXK73Xr++ef7JDjgJMYYvfLKK6qpqdG0adM0efJkuyNhAOMCLiDCGWN0xx336KWXqiR9\nTxcu7NHGjf+uf/on3jXjdFy5CwxSgUBAt9xyn5qbD0u6UtJ7iou7UefOnVZ0dLTd8WAjPp0TGKTq\n6uoUFTVFX5a+JE1Ue/sFNTU12RkLAxjFD0S4rKwstbfvl/SWJCOX69fy+ZI0fPhwu6NhgKL4gQiX\nkpKi3/3uN7rqqnxFRQ3RuHHPau/eF/m2K1jGGj8wQBhj9MUXX+jKK6/sfTAcgZO7AOAwnNwFAISE\n4gcAh6H4AcBhKH4AcBiKHwAchuIHAIeh+AHAYSh+AHAYih8AHIbiBwCHofgBwGEofgBwGIofAByG\n4gcAh6H4AcBhKH4AcBiKHwAcJuzir6ioUGpqqlJSUrR27douxyxbtkwpKSlKT0/X4cOHw90lACAM\nYRV/e3u7lixZooqKCh05ckRbt27V0aNHLxpTXl6uDz/8UNXV1dq4caPuv//+sAIDAMITVvFXVVUp\nOTlZSUlJio2N1fz581VWVnbRmF27dmnBggWSpOzsbJ05c0YNDQ3h7BYAEIaYcF5cW1urxMTEjvs+\nn09vvvlmr2Nqamrk8XguGldSUtLxs9/vl9/vDycaAAw6gUBAgUAg7O2EVfwulyukcd/+FviuXvfN\n4gcAdPbtg+IVK1ZY2k5YSz1er1fBYLDjfjAYlM/n63FMTU2NvF5vOLsFAIQhrOLPyspSdXW1jh8/\nrtbWVm3btk2FhYUXjSksLNTmzZslSZWVlRoxYkSnZR4AQP8Ja6knJiZG69ev16xZs9Te3q57771X\nEyZM0NNPPy1JKi4u1uzZs1VeXq7k5GQNHTpUzz77bJ8EBwBY4zLfXoC3I4TL1ek8AACgZ1a7kyt3\nAcBhKH6gn7377rsqLy+/6E0PXamoqFBGxveVkpKllSvX6sKFC/2UEINdWGv8AC7N8uX/T+vXb1Js\n7GS1tR3Sli3/oaKiWzuNq6ys1O23L1BLywZJo/Toow+ovb1dDz/8b/0fGoMOa/xAPzl06JBycm5V\nS8vbkq6VdFBud57OnPlvxcbGXjR22bL/pSeeuFbS10V/UGPH3qPjx//az6kRyVjjByLcJ598opiY\n7+nL0pekLF24EK3GxsZOY93uOEVFnf7GI40aMmRIf8SEA1D8QD9JS0tTW9sBSV9/kOEOXXWVW/Hx\n8Z3G3nffIg0b9pyion4h6VdyuxfqkUf+d3/GxSDGUg/Qj0pLn9N99y1RVNTViosz2rPnRWVlZXU5\n9uOPP9bjjz+ppqYW3XnnbcrNze3ntIh0VruT4gf6WXNzs06ePCmv19tpbR+4FBQ/ADgMJ3cBACGh\n+AHAYSh+AHAYih8AHIbiBwCHofgBwGEofgBwGIofAByG4gcAh6H4AcBhKH4AcBiKHwAchuIHAIeh\n+AHAYSx/2XpjY6PmzZunTz/9VElJSXrhhRc0YsSITuOSkpJ09dVXKzo6WrGxsaqqqgorMAaP+vp6\nHT16VImJiUpOTrY7DuAYlo/416xZo7y8PB07dkw333yz1qxZ0+U4l8ulQCCgw4cPU/ro8Mc/vqRx\n4yZr7tyHNWXKdD3ySNe/PwD6nuUvYklNTdVrr70mj8ej+vp6+f1+vf/++53GXXfddTp48KCuvfba\nLrbyVQi+iMVRWltbdc01o9XSsltStqR6ud1TVVm5R2lpaXbHAwYMq91peamnoaFBHo9HkuTxeNTQ\n0NBtsNzcXEVHR6u4uFiLFi3qclxJSUnHz36/X36/32o0RLiTJ0/KmCv0ZelL0mjFxEzVRx99RPED\nPQgEAgoEAmFvp8cj/ry8PNXX13d6fNWqVVqwYIFOnz7d8djIkSPV2NjYaWxdXZ0SEhJ08uRJ5eXl\n6YknnlBOTs7FITjid5S2tjaNGjVGZ85skjRbUrWuvHKG3n77PzV+/Hi74wEDxmU54t+3b1+3z329\nxDN69GjV1dVp1KhRXY5LSEiQJMXHx2vu3LmqqqrqVPxwltjYWL300g4VFNyu9vZhOn/+pB5//N8p\nfaCfWD65W1hYqNLSUklSaWmpioqKOo1paWlRU1OTJKm5uVl79+7ln/KQJM2YMUMnTnyst976o+rq\njmvRonvsjgQ4huWTu42NjfrpT3+qv/3tbxe9nfPEiRNatGiRXn75ZX388ce67bbbJEnnz5/XXXfd\npV/84hedQ7DUAwuqq6t14MABxcfHa9asWYqOjrY7EtCvrHan5eLvSxQ/LtXu3bv14x/fraiomZKO\nKjs7UXv2/IHyh6NQ/HCU+Pix+uyzzZJuknReV12Vo2ee+Rf95Cc/sTsa0G+sdicf2YABxxij06fr\nJN3w1SMxamubqrq6OjtjAQMGxY8Bx+VyKTNzhqKjV0u6IOl9RUfv1I033mh3NGBAoPgxIJWVbdHk\nyX9SdPSViou7Qb/+9SpNmzbN7ljAgMAaPwa0lpYWxcXFKSqKYxg4Dyd3AcBhOLkLAAgJxQ8ADkPx\nA4DDUPwA4DAUPwA4DMUPAA5D8QOAw1D8AOAwFD8AOAzFDwAOQ/EDgMNQ/ADgMBQ/ADgMxQ8ADkPx\nA4DDUPwA4DAUPwA4jOXi3759uyZNmqTo6GgdOnSo23EVFRVKTU1VSkqK1q5da3V3AIA+Yrn409LS\ntHPnTn3/+9/vdkx7e7uWLFmiiooKHTlyRFu3btXRo0et7hIA0AdirL4wNTW11zFVVVVKTk5WUlKS\nJGn+/PkqKyvThAkTrO4WABAmy8UfitraWiUmJnbc9/l8evPNN7scW1JS0vGz3++X3++/nNEAYMAJ\nBAIKBAJhb6fH4s/Ly1N9fX2nx1evXq05c+b0unGXyxVykG8WPwCgs28fFK9YscLSdnos/n379lna\n6Ne8Xq+CwWDH/WAwKJ/PF9Y2AQDh6ZO3cxpjunw8KytL1dXVOn78uFpbW7Vt2zYVFhb2xS4BABZZ\nLv6dO3cqMTFRlZWVKigoUH5+viTpxIkTKigokCTFxMRo/fr1mjVrliZOnKh58+ZxYhcAbOYy3R2u\n92cIl6vbfzUAALpmtTu5chcAHIbiBwCHofgBwGEofgBwGIofAByG4gcAh6H4AcBhKH4AcBiKHwAc\nhuIHAIeh+AHAYSj+QezcuXN69913derUKbujAIggFP8gFQgE9N3v/oNmzJgnn2+cNm7cZHckABGC\nT+cchFpbWxUfn6izZ7dIypVUrSuvnKG//vWAkpOT7Y4HoI/w6ZzoUF9fr/PnY/Rl6UtSiq644nt6\n//337YwFIEJQ/IPQqFGj5HJ9Ianyq0dq1Np6mKN9AJIo/kEpLi5OW7eWyu2+RcOHT1dcXIZKSpYr\nNTXV7mgAIgBr/INYQ0ODPvjgA40ZM0ZJSUl2xwHQx6x2J8UPAAMUJ3cBACGh+AHAYSh+AHAYih8A\nHMZy8W/fvl2TJk1SdHS0Dh061O24pKQkTZkyRZmZmbrhhhus7g4A0EdirL4wLS1NO3fuVHFxcY/j\nXC6XAoGARo4caXVXAIA+ZLn4L+ViIN6qCQCRw3Lxh8rlcik3N1fR0dEqLi7WokWLuhxXUlLS8bPf\n75ff77/c0QBgQAkEAgoEAmFvp8cLuPLy8lRfX9/p8dWrV2vOnDmSpB/84Af65S9/qalTp3a5jbq6\nOiUkJOjkyZPKy8vTE088oZycnItDDNALuJqamvTMM8+osfG0Zs7M04wZM+yOBMBBrHZnj0f8+/bt\nsxzoawkJCZKk+Ph4zZ07V1VVVZ2KfyBqampSZuYM1dam6u9/H691636ijRvX6a677rQ7GgD0qE/e\nztnd3zgtLS1qamqSJDU3N2vv3r1KS0vri13absuWLTpxYpy++OIFGbNSLS079cAD/2Z3LADoleXi\n37lzpxITE1VZWamCggLl5+dLkk6cOKGCggJJX34ufE5OjjIyMpSdna1bbrlFM2fO7JvkNjt79qza\n2q77xiPXqbn5c9vyAECo+JA2i95++21Nnz5T//M/v5c0XnFxD6mgYIh27NhsdzQADsGHtPWzjIwM\nbd/+rMaMeUDDh/+jbr11qEpLn7I7FgD0iiN+ABigOOIHAISE4gcAh6H4AcBhKH4AcBiKHwAchuIH\nAIeh+AHAYSh+AHAYih8AHIbiBwCHofgBwGEofgBwGIofAByG4gcAh6H4AcBhKH4AcBiKHwAchuIH\nAIeh+AHAYSh+AHAYir8PBAIBuyNYNpCzS+S3G/kHJsvF/9BDD2nChAlKT0/Xbbfdps8//7zLcRUV\nFUpNTVVKSorWrl1rOWgkG8i/PAM5u0R+u5F/YLJc/DNnztR7772nd955R+PHj9ejjz7aaUx7e7uW\nLFmiiooKHTlyRFu3btXRo0fDCgwACI/l4s/Ly1NU1Jcvz87OVk1NTacxVVVVSk5OVlJSkmJjYzV/\n/nyVlZVZTwsACJ/pA7fccovZsmVLp8e3b99ufvazn3Xcf+6558ySJUs6jZPEjRs3btws3KyIUQ/y\n8vJUX1/f6fHVq1drzpw5kqRVq1bpiiuu0J133tlpnMvl6mnzHb7sfgBAf+ix+Pft29fji3/729+q\nvLxcr776apfPe71eBYPBjvvBYFA+n89CTABAX7G8xl9RUaHHHntMZWVliouL63JMVlaWqqurdfz4\ncbW2tmrbtm0qLCy0HBYAED7Lxb906VKdO3dOeXl5yszM1OLFiyVJJ06cUEFBgSQpJiZG69ev16xZ\nszRx4kTNmzdPEyZM6JvkAABrLJ0ZCNODDz5oUlNTzZQpU8zcuXPNmTNnuhy3e/duc/3115vk5GSz\nZs2afk7ZvRdeeMFMnDjRREVFmb/85S/djhs7dqxJS0szGRkZZtq0af2YsGeh5o/U+T916pTJzc01\nKSkpJi8vz5w+fbrLcZE0/6HM5dKlS01ycrKZMmWKOXToUD8n7Flv+ffv32+uvvpqk5GRYTIyMswj\njzxiQ8quLVy40IwaNcpMnjy52zGRPPe95bcy97YU/969e017e7sxxpjly5eb5cuXdxpz/vx5M27c\nOPPJJ5+Y1tZWk56ebo4cOdLfUbt09OhR88EHHxi/399jcSYlJZlTp071Y7LQhJI/kuf/oYceMmvX\nrjXGGLNmzZouf3+MiZz5D2UuX375ZZOfn2+MMaaystJkZ2fbEbVLoeTfv3+/mTNnjk0Je/b666+b\nQ4cOdVuckTz3xvSe38rc2/KRDQP9GoDU1FSNHz8+pLEmAt+xFEr+SJ7/Xbt2acGCBZKkBQsW6MUX\nX+x2bCTMfyhz+c3/puzsbJ05c0YNDQ12xO0k1N+FSJjrruTk5Oiaa67p9vlInnup9/zSpc+97Z/V\n88wzz2j27NmdHq+trVViYmLHfZ/Pp9ra2v6MFjaXy6Xc3FxlZWXpN7/5jd1xLkkkz39DQ4M8Ho8k\nyePxdPuHNFLmP5S57GpMVwdEdgglv8vl0oEDB5Senq7Zs2fryJEj/R3Tskie+1BYmfse384Zjv66\nBuByCSV/b/785z8rISFBJ0+eVF5enlJTU5WTk9PXUbsUbv5Inf9Vq1ZddN/lcnWb1c75/yar17PY\n/f/ga6HkmDp1qoLBoNxut3bv3q2ioiIdO3asH9L1jUid+1BYmfvLVvwD/RqA3vKHIiEhQZIUHx+v\nuXPnqqqqqt+KJ9z8kTz/Ho9H9fX1Gj16tOrq6jRq1Kgux9k5/98Uylx+e0xNTY28Xm+/ZexJKPmH\nDRvW8XN+fr4WL16sxsZGjRw5st9yWhXJcx8KK3Nvy1LPYLoGoLu1tZaWFjU1NUmSmpubtXfvXqWl\npfVntJB0lz+S57+wsFClpaWSpNLSUhUVFXUaE0nzH8pcFhYWavPmzZKkyspKjRgxomM5y26h5G9o\naOj4XaqqqpIxZkCUvhTZcx8KS3Nv9UxzOJKTk82YMWM63n50//33G2OMqa2tNbNnz+4YV15ebsaP\nH2/GjRtnVq9ebUfULv3hD38wPp/PxMXFGY/HY370ox8ZYy7O/9FHH5n09HSTnp5uJk2aNODyGxO5\n83/q1Clz8803d3o7ZyTPf1dzuWHDBrNhw4aOMT//+c/NuHHjzJQpU3p8t5gdesu/fv16M2nSJJOe\nnm5uvPFG88Ybb9gZ9yLz5883CQkJJjY21vh8PrNp06YBNfe95bcy9y5jIvRUPADgsrD9XT0AgP5F\n8QOAw1D8AOAwFD8AOAzFDwAOQ/EDgMP8f6n9W6CSJepGAAAAAElFTkSuQmCC\n",
"text": [
"<matplotlib.figure.Figure at 0x110692150>"
]
}
],
"prompt_number": 9
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We know that these two vectors are highly correlated:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import scipy.stats\n",
"\n",
"# The pearsonr function returns a list containing two elements, the correlation and its p-value\n",
"# Here we just show the first value, the correlation\n",
"scipy.stats.pearsonr( aOne, aCloseToOne )[0]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 10,
"text": [
"0.98207184801233882"
]
}
],
"prompt_number": 10
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And the first two (unrelated) vectors are not:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"scipy.stats.pearsonr( aOne, aTwo )[0]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 11,
"text": [
"0.11706616197141602"
]
}
],
"prompt_number": 11
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Likewise the vector \"close\" to the first has a much lower Euclidean distance from it than the second does:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import scipy.spatial\n",
"\n",
"scipy.spatial.distance.euclidean( aOne, aCloseToOne )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 12,
"text": [
"0.50310625047363144"
]
}
],
"prompt_number": 12
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"scipy.spatial.distance.euclidean( aOne, aTwo )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 13,
"text": [
"2.892793164909699"
]
}
],
"prompt_number": 13
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But of course we can construct a third vector that remains correlated with the first without being close to it in space:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"aCorrelatedWithOne = aCloseToOne + 5\n",
"print( aCorrelatedWithOne )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[ 3.4484946 5.4065892 5.62730359 4.40293446 6.13445511 3.6941472\n",
" 4.51082326 4.86452679 4.85634116 5.56179362]\n"
]
}
],
"prompt_number": 14
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"scatter( aOne, aCorrelatedWithOne )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 15,
"text": [
"<matplotlib.collections.PathCollection at 0x111c3d210>"
]
},
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD9CAYAAABHnDf0AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGpNJREFUeJzt3X9UU2eCN/DvhQgYpDi4gkxCtQX8ESshKHLa1WNWxBZa\nqbQe19bpULtals6MTN+33R6nb1c9Y11n5HTH4+winTNa6eysdTizlbHB+qOmVSmihXpmdFrUxS6h\nmFIoPwQcSHjeP+qwxkASQiDJ4/dzDudwcx+Tb5/il+uTe3MVIYQAERFJJcTfAYiIyPdY7kREEmK5\nExFJiOVORCQhljsRkYRY7kREEnJb7u3t7Vi1ahXmzJkDnU6H6upqh/1msxnR0dEwGAwwGAzYtm3b\nmIUlIiLPqNwNKCoqQk5ODsrLy2Gz2dDd3e00ZsmSJaioqBiTgERENHIuy72jowOnTp3C/v37vx2s\nUiE6OtppHK+DIiIKLC7LvaGhAVOnTsW6detw4cIFzJ8/H7t27YJarR4coygKqqqqoNfrodFoUFxc\nDJ1O5/A8iqKMTXoiIsl5ffAsXDh37pxQqVSipqZGCCFEUVGReO211xzGdHZ2iu7ubiGEECaTSSQn\nJzs9j5uXCXibN2/2d4RRCeb8wZxdCOb3t2DPP5rudPmGqlarhVarRXp6OgBg1apVqK2tdRgTFRU1\neCSfnZ2N/v5+tLW1efebhoiIfMJluU+bNg0JCQmor68HABw/fhxz5851GGO1Wgf/2VBTUwMhBGJi\nYsYoLhERecLt2TK7d+/G2rVr0dfXh8TEROzduxelpaUAgIKCApSXl6OkpAQqlQpqtRoHDhwY89Dj\nzWg0+jvCqARz/mDODjC/vwV7/tFQhBj7U10UReEZNUREIzSa7uQVqkREEmK5ExFJiOVORCQhljsR\nkYRY7kREEmK5ExFJiOVORCQhljsRkYRY7kREEmK5ExFJiOVORCQhljsRkYRY7kREEmK5ExFJiOVO\nRCQhljsR0W3Onj2Lhx56GLNmLcSmTZths9n8HckrvFkHEdEt9fX1SEv7W3R3FwNIhlr9Kp591oB/\n+7c3/JKHN+sgIvKBd999F319TwHIB/AQenr24+23/8PfsbzCciciuiU8PBwhIR23PdKOCRPC/JZn\nNFjuRES3PPXUU4iKOgmV6v8C2AO1+km89to/+TuWV7jmTkR0my+//BI///m/4quvvsETT2Rj1aon\n/ZZlNN3JciciClB8Q5WIiBy4Lff29nasWrUKc+bMgU6nQ3V1tdOYjRs3Ijk5GXq9HnV1dWMSlIiI\nPKdyN6CoqAg5OTkoLy+HzWZDd3e3w36TyYQrV67g8uXLOHv2LAoLC4f8BUBEROPHZbl3dHTg1KlT\n2L9//7eDVSpER0c7jKmoqEB+fj4AICMjA+3t7bBarYiLi3MYt2XLlsHvjUYjjEajD+ITEcnDbDbD\nbDb75LlclntDQwOmTp2KdevW4cKFC5g/fz527doFtVo9OKapqQkJCQmD21qtFhaLxWW5ExGRszsP\nfLdu3er1c7lcc7fZbKitrcULL7yA2tpaREZGYseOHU7j7nw3V1EUrwMREdHouSx3rVYLrVaL9PR0\nAMCqVatQW1vrMEaj0aCxsXFw22KxQKPRjEFUIiLylMtynzZtGhISElBfXw8AOH78OObOneswJjc3\nF2VlZQCA6upqTJ482WlJhoiIxpfbi5guXLiA9evXo6+vD4mJidi7dy/eeecdAEBBQQEA4Ic//CGO\nHDmCyMhI7Nu3D2lpaY4vwouYiIhGjFeoEhFJiFeoEhGRA5Y7EZGEWO5ERBJiuRMRSYjlTkQkIZY7\nEZGEWO5ERBJiuRMRSYjlTkQkIZY7EZGEWO5ERBJiuRMRSYjlTkQkIZY7EZGEWO5ERBJyeYNsIrp7\nXb16FYcPH0Z4eDhWr16NmJgYf0eiEeDNOojISU1NDZYufRQ225MICfkG0dHnceHCx4iNjfV3tLsK\nb9ZBRD5VVPT/0N29E3/5yx709r6D1tZsFBf/wt+xaARY7kTkpKWlFYBucLu/X4fr11v9F4hGjOVO\nRE5WrMjCxIn/DOArAJ9Brf4FHnssEx9//DFOnDiBzs5Of0ckN7jmTkRO+vr68PzzRTh48D+hUoXj\n1VdfxnvvHUdd3RcICfkbhIc3oqrqOJKSkvwdVWpccycinwoLC8Nbb5Wgp6cdnZ1WhIWpcP58KG7c\n+CM6O0+htXUjnntuo79jkgtuy33GjBlISUmBwWDAwoULnfabzWZER0fDYDDAYDBg27ZtYxKUiPzn\ns8/+G729y/DXs6cHBh7GlStX/RuKXHJ7nruiKDCbzS7PcV2yZAkqKip8GoyIAkdGRip+85s30dPz\nDwCiMGHCrzF/vsHfscgFj5Zl3K35cD2dSG7PPvss/v7v0xAefi8mTtRg5swz2Lt3t79jkQtu31C9\n//77ER0djdDQUBQUFGDDhg0O+z/88EM88cQT0Gq10Gg0KC4uhk6ncxijKAo2b948uG00GmE0Gn33\nX0FE46KlpQU9PT1ISEhASAjfsvM1s9kMs9k8uL1161avD57dlntzczPi4+PR0tKCrKws7N69G4sX\nLx7c39XVhdDQUKjValRWVqKoqAj19fWOL8KzZYiIRmxMz5aJj48HAEydOhV5eXmoqalx2B8VFQW1\nWg0AyM7ORn9/P9ra2rwKQ0REvuGy3Ht6etDV1QUA6O7uxtGjRzFv3jyHMVardfA3S01NDYQQ/IAh\nIiI/c3m2jNVqRV5eHgDAZrNh7dq1WL58OUpLSwEABQUFKC8vR0lJCVQqFdRqNQ4cODD2qYkkdPHi\nRZw7dw7f/e53kZWVBUVR/B2JghivUCUKAL/97X9iw4YfQ1GWQ1Hq8MgjaTh4cD8L/i43mu5kuRP5\nmd1ux6RJMbh58zSAeQBuIjIyDYcO7UZmZqa/45Ef8eMHiIJYT08PbLZ+AA/ceiQCipKC5uZmf8ai\nIMdyJ/KzqKgoTJ+eDEX5VwACwCcYGPgA6enp/o5GQYzlThQAjh79LyQn/wYhIeGIjFyOsrI9mDVr\nlr9jURDjmjtRAOnt7UVERATfSCUAfEOViEhKfEOViIgcsNyJiCTEcicikhDLnYhIQix3IiIJsdyJ\niCTEcicikhDLnYhIQix3IiIJsdyJiCTEcicikhDLnYhIQix3IiIJsdyJiCTEcicikhDLnYhIQix3\nIiIJuS33GTNmICUlBQaDAQsXLhxyzMaNG5GcnAy9Xo+6ujqfhyQiopFRuRugKArMZjNiYmKG3G8y\nmXDlyhVcvnwZZ8+eRWFhIaqrq30elIiIPOfRsoyre/hVVFQgPz8fAJCRkYH29nZYrVbfpCMiIq94\ndOS+bNkyhIaGoqCgABs2bHDY39TUhISEhMFtrVYLi8WCuLg4h3FbtmwZ/N5oNMJoNI4uORGRZMxm\nM8xms0+ey225nzlzBvHx8WhpaUFWVhZmz56NxYsXO4y588heURSn57m93ImIyNmdB75bt271+rnc\nLsvEx8cDAKZOnYq8vDzU1NQ47NdoNGhsbBzctlgs0Gg0XgciIqLRc1nuPT096OrqAgB0d3fj6NGj\nmDdvnsOY3NxclJWVAQCqq6sxefJkpyUZIiIaXy6XZaxWK/Ly8gAANpsNa9euxfLly1FaWgoAKCgo\nQE5ODkwmE5KSkhAZGYl9+/aNfWoiInJJEa5OhfHViyiKyzNuiIjI2Wi6k1eoEhFJiOVO5GNffvkl\nKisr3V6t/cUXX2D58idw//2pWL36WbS1tY1TQrobcFmGyIdOnDiBxx9fA5XKgP7+z/H004/jzTd3\nOZ0efOPGDcycmYqvvnoWdns2wsL2Ys6cT1FbewohITzmom+NpjtZ7kQ+IoTAlCkafPPNbwAsBdCF\nyMgFqKgowdKlSx3Gnjx5EitXvorOzqpbjwxg4kQNPvvsLO69997xjk4BimvuRAGgr68PHR0tAP7u\n1iNRAB5EQ0OD09iIiAgMDHQAsN96pBcDAzcRHh4+PmFJeix3Ih8JDw9HQsJMAHtvPXINQhxDamqq\n09j09HTMnTsNERGrAJRArc7BypUreY0I+QyXZYh86OLFi8jMXIGurj7YbB34+c93oKjoB0OO7e3t\nxRtv7MKlS1fx0EMG/OM/FiA0NHScE1Mg45o7UQCx2WxoamrClClTMGnSJH/HoSDGcicikhDfUCUi\nIgcsdyIiCbHciYgkxHInIpIQy52ISEIsdyIiCbHciYgkxHInIpIQy52ISEIsdyIiCbHciYgkxHIn\nIpIQy52ISEIsdyIiCXlU7na7HQaDAStWrHDaZzabER0dDYPBAIPBgG3btvk8JAWnnp4enD59GufP\nn4fdbnf/B4jIZ1SeDNq1axd0Oh26urqG3L9kyRJUVFT4NBgFN4vFggcfzERn5z0YGOhESsp0nDhR\ngYiICH9HI7oruD1yt1gsMJlMWL9+/bAfGs8bcdCdnn/+/6C5+Sl0dp7DjRuXUFenxhtv7PJ3LKK7\nhtsj9xdffBE7d+5EZ2fnkPsVRUFVVRX0ej00Gg2Ki4uh0+mcxm3ZsmXwe6PRCKPR6HVoCnyff34Z\ndvs/3doKRW/vI/jTn877NRNRoDObzTCbzT55LpflfvjwYcTGxsJgMAz7gmlpaWhsbIRarUZlZSVW\nrlyJ+vp6p3G3lzvJLy0tBY2N+9HfPx/ATajVB7Fw4Up/xyIKaHce+G7dutXr53J5D9Wf/OQnePvt\nt6FSqXDz5k10dnbiySefRFlZ2bBPeN999+GTTz5BTEzM/74I76F612ltbYXR+CgaGr6E3d6DnJxH\ncPDgfoSGhvo7GlHQGJcbZH/44YcoLi7GH/7wB4fHrVYrYmNjoSgKampqsHr1aly7ds1nASl42e12\nXLt2DeHh4dBqtf6OQxR0RtOdHp0tc/sLAUBpaSkAoKCgAOXl5SgpKYFKpYJarcaBAwe8CkLyCQ0N\nRWJiotd/vqOjA0eOHIEQAsuXL3f41yARuebxkfuoXoRH7jRC169fR1raInR1zQIQiokTL+DcuY8w\nffp0f0cjGjej6U5eoUoB6dVXf4qWljzcuPEebtyoQFvbc3jppX/2dyyioMFyp4D0xRfNsNkWDm7b\n7QvxP//T7MdERMGF5U4BKSvrb6FW7wbQAeAGJk78BbKyFvk7FlHQYLlTQHrppR9jzZp5CA2NRWjo\nFDz+uAabN2/ydyyioME3VCmg9ff3QwiBsLAwf0chGnfjdiok0XibMGGCvyMQBSUuyxARSYjlTkQk\nIZY7EZGEWO5ERBJiuRMRSYjlTkQkIZY7EZGEWO5ERBJiuRMRSYjlTkQkIZY7EZGEWO5ERBJiuRMR\nSYjlTkQkIZY7EZGEWO5ERBJiuRMRScijcrfb7TAYDFixYsWQ+zdu3Ijk5GTo9XrU1dX5NCAREY2c\nR+W+a9cu6HQ6KIritM9kMuHKlSu4fPky3nzzTRQWFvo8JBERjYzbcrdYLDCZTFi/fv2QN2qtqKhA\nfn4+ACAjIwPt7e2wWq2+T0pERB5ze4PsF198ETt37kRnZ+eQ+5uampCQkDC4rdVqYbFYEBcX5zBu\ny5Ytg98bjUYYjUbvEhMRScpsNsNsNvvkuVyW++HDhxEbGwuDweDyBe88oh9q+eb2ciciImd3Hvhu\n3brV6+dyuSxTVVWFiooK3HfffXjqqafwwQcf4Pvf/77DGI1Gg8bGxsFti8UCjUbjdSAiIho9l+W+\nfft2NDY2oqGhAQcOHMDSpUtRVlbmMCY3N3fwserqakyePNlpSYaIiMaX2zX32/11uaW0tBQAUFBQ\ngJycHJhMJiQlJSEyMhL79u3zfUoiIhoRRQx1CoyvX0RRhjzThoiIhjea7uQVqkREEmK5ExFJiOVO\nRCQhljsRkYRY7kREEmK5ExFJiOVORCQhljsRkYRY7kREEmK5ExFJiOVORCQhlnsQs9ls+Oyzz3Dt\n2jV+dg8ROWC5B6mvvvoKDzyQgQULsjFnzkI8+eT3YLfb/R2LiAIEyz1IbdjwY1y9akR393/j5s0v\n8P77X6KkZI+/YxFRgGC5B6kLF/4Em+17ABQAE9HTswrnzv3R37GIKECw3IPU7NnJCA09dGurHxMn\nmjBv3ky/ZiKiwMGbdQQpi8WChx5aho4ONez2DqSnz8b77/8XwsLC/B2NiHxkNN3Jcg9ivb29+PTT\nTxEREQG9Xo+QEP5DjEgmLHciIgnxNntEROSA5U5EJCGWOxGRhFjuREQSclnuN2/eREZGBlJTU6HT\n6bBp0yanMWazGdHR0TAYDDAYDNi2bduYhSUiIs+oXO2MiIjAyZMnoVarYbPZsGjRIpw+fRqLFi1y\nGLdkyRJUVFSMaVAiIvKc22UZtVoNAOjr64PdbkdMTIzTGJ7mSEQUWFweuQPAwMAA0tLScPXqVRQW\nFkKn0znsVxQFVVVV0Ov10Gg0KC4udhoDAFu2bBn83mg0wmg0jjo8EZFMzGYzzGazT57L44uYOjo6\n8PDDD2PHjh0OxdzV1YXQ0FCo1WpUVlaiqKgI9fX1ji8SpBcxCSHwu9/9Dn/8458wc2Yy1q5dy6tA\niWjcjMtFTNHR0Xj00Udx/vx5h8ejoqIGl26ys7PR39+PtrY2r8IEmuef34jnnvsXbNumoLDw37F6\ndX5Q/pIioruPy3L/+uuv0d7eDuDbzzE5duwYDAaDwxir1TpYeDU1NRBCDLkuH2wsFgvefvu36O7+\nEMBWdHd/gMpKMy5evOjvaEREbrlcc29ubkZ+fj4GBgYwMDCAZ555BpmZmSgtLQUAFBQUoLy8HCUl\nJVCpVFCr1Thw4MC4BB9rnZ2dmDBhCv7yl3tuPTIRKlU8Ojo6/JqLiMgT/OCwYfT19SEpKQVNTesw\nMPA9KMohTJnyMzQ0XMSkSZP8HY+I7gL84LAxEBYWho8+OoKFC4/jnnvmIzX1AE6dep/FTkRBgUfu\nREQBikfuRETkgOVORCQhljsRkYRY7kREEmK5ExFJiOVORCQhljsRkYRY7kREEmK5ExFJiOVORCQh\nljsRkYRY7kREEmK5ExFJiOVORCQhljsRkYRY7kREEmK5ExFJiOVORCQhljsRkYRY7kREEmK5e8Bs\nNvs7wqgEc/5gzg4wv78Fe/7RcFnuN2/eREZGBlJTU6HT6bBp06Yhx23cuBHJycnQ6/Woq6sbk6D+\nFOw/IMGcP5izA8zvb8GefzRUrnZGRETg5MmTUKvVsNlsWLRoEU6fPo1FixYNjjGZTLhy5QouX76M\ns2fPorCwENXV1WMenIiIhud2WUatVgMA+vr6YLfbERMT47C/oqIC+fn5AICMjAy0t7fDarWOQVQi\nIvKYcMNutwu9Xi8mTZokXn75Zaf9jz32mDhz5szgdmZmpjh//rzDGAD84he/+MUvL7685XJZBgBC\nQkLw6aefoqOjAw8//DDMZjOMRqPDmG/7+38piuJyPxERjS2Pz5aJjo7Go48+ivPnzzs8rtFo0NjY\nOLhtsVig0Wh8l5CIiEbMZbl//fXXaG9vBwD09vbi2LFjMBgMDmNyc3NRVlYGAKiursbkyZMRFxc3\nRnGJiMgTLpdlmpubkZ+fj4GBAQwMDOCZZ55BZmYmSktLAQAFBQXIycmByWRCUlISIiMjsW/fvnEJ\nTkRELni9Wu/CSy+9JGbPni1SUlJEXl6eaG9vH3JcZWWlmDVrlkhKShI7duwYiyheOXjwoNDpdCIk\nJER88sknw46bPn26mDdvnkhNTRXp6enjmNA1T/MH6vy3traKZcuWieTkZJGVlSW++eabIccF2vx7\nMp8/+tGPRFJSkkhJSRG1tbXjnHB47rKfPHlS3HPPPSI1NVWkpqaKn/70p35IObR169aJ2NhY8cAD\nDww7JlDnXQj3+b2d+zEp96NHjwq73S6EEOKVV14Rr7zyitMYm80mEhMTRUNDg+jr6xN6vV5cunRp\nLOKM2J///Gfx+eefC6PR6LIcZ8yYIVpbW8cxmWc8yR/I8//yyy+Ln/3sZ0IIIXbs2DHkz48QgTX/\nnszne++9J7Kzs4UQQlRXV4uMjAx/RHXiSfaTJ0+KFStW+Cmhax999JGora0dthwDdd7/yl1+b+d+\nTD5+ICsrCyEh3z51RkYGLBaL05iamhokJSVhxowZmDBhAtasWYNDhw6NRZwRmz17NmbOnOnRWBGA\nZwJ5kj+Q5//2ayfy8/Px7rvvDjs2UObfk/kM1GtCPP1ZCJS5vtPixYvxne98Z9j9gTrvf+UuP+Dd\n3I/5Z8vs3bsXOTk5To83NTUhISFhcFur1aKpqWms4/iUoihYtmwZFixYgF/96lf+jjMigTz/Vqt1\n8E35uLi4Yf8iBtL8ezKfQ40Z6sBnvHmSXVEUVFVVQa/XIycnB5cuXRrvmF4L1Hn3lLdz7/Y89+Fk\nZWXh+vXrTo9v374dK1asAAC8/vrrCAsLw9NPPz1kYH/yJL87Z86cQXx8PFpaWpCVlYXZs2dj8eLF\nvo46pNHmD9T5f/311x22FUUZNqs/5/9Ons7nnUdg/v7/4GmGtLQ0NDY2Qq1Wo7KyEitXrkR9ff04\npPONQJx3T3k7916X+7Fjx1zuf+utt2AymXDixIkh9995fnxjYyO0Wq23cUbMXX5PxMfHAwCmTp2K\nvLw81NTUjFu5jDZ/IM9/XFwcrl+/jmnTpqG5uRmxsbFDjvPn/N/Jk/kM1GtCPMkeFRU1+H12djZe\neOEFtLW1OX0cSSAK1Hn3lLdzPybLMkeOHMHOnTtx6NAhREREDDlmwYIFuHz5Mq5du4a+vj688847\nyM3NHYs4ozLcWldPTw+6uroAAN3d3Th69CjmzZs3ntE8Mlz+QJ7/3Nxc7N+/HwCwf/9+rFy50mlM\noM2/J/MZqNeEeJLdarUO/izV1NRACBEUxQ4E7rx7yuu59+bdXXeSkpLEvffeO3jqTmFhoRBCiKam\nJpGTkzM4zmQyiZkzZ4rExESxffv2sYjild///vdCq9WKiIgIERcXJx555BEhhGP+q1evCr1eL/R6\nvZg7d27Q5RcicOe/tbVVZGZmOp0KGejzP9R87tmzR+zZs2dwzA9+8AORmJgoUlJSXJ6JNd7cZf/l\nL38p5s6dK/R6vXjwwQfFxx9/7M+4DtasWSPi4+PFhAkThFarFb/+9a+DZt6FcJ/f27lXhAjQt8CJ\niMhrvBMTEZGEWO5ERBJiuRMRSYjlTkQkIZY7EZGEWO5ERBL6/96hRNFIJoFKAAAAAElFTkSuQmCC\n",
"text": [
"<matplotlib.figure.Figure at 0x11072b510>"
]
}
],
"prompt_number": 15
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"scipy.stats.pearsonr( aOne, aCorrelatedWithOne )[0]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 16,
"text": [
"0.98207184801233849"
]
}
],
"prompt_number": 16
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"scipy.spatial.distance.euclidean( aOne, aCorrelatedWithOne )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 17,
"text": [
"15.793317833539311"
]
}
],
"prompt_number": 17
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's investigate how Python allows us to actually compute these similarity measures."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Manhattan distance\n",
"\n",
"Perhaps the simplest pairwise similarity score to compute is the **[Manhattan distance](http://en.wikipedia.org/wiki/Manhattan_distance)**, also known as an **L1 norm** or **city block distance**. Visually, the Manhattan distance between two vectors is the number of steps along a grid it would take to travel between them (as opposed to the length of a straight-line path, as is computed by Euclidean distance):\n",
"\n",
"<center><img src=\"http://upload.wikimedia.org/wikipedia/commons/thumb/0/08/Manhattan_distance.svg/200px-Manhattan_distance.svg.png\"/></center>\n",
"\n",
"The Manhattan distance between these two points is always 12, since any path along the grid requires 12 steps. This is how you might walk between locations in a city - hence the name! In plain English, the Manhattan distance between two vectors is the *sum of the absolute values of the differences of their elements*, which translates into the formula:\n",
"\n",
"> $$MD(x, y) = \\sum_i{|x_i - y_i|}$$\n",
"\n",
"Let's think about this like Python would for a second, translating from the formula. This means that Manhattan distance is:\n",
"\n",
"* A function of two input arguments that returns a continuous value.\n",
"\n",
"* Both inputs are lists that must be of the same length.\n",
"\n",
"* To compute the function's result, we must calculate a sum.\n",
"\n",
"* Each element in that sum is equal to an absolute value.\n",
"\n",
"* The contents of that absolute value are the difference between one pair of list elements.\n",
"\n",
"We can translate each of these steps into Python. First, let's define a function of two input arguments:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# \"pass\" is a special Python keyword that indicates an empty block\n",
"\n",
"def ManhattanDistance( first_argument, second_argument ):\n",
" \n",
" pass"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 18
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice all of the syntactical bells and whistles that Python uses to *define* a function: it starts with the `def` keyword, followed by its name, followed by zero or more arguments between parenthesis, separated by commas. It consists of a *block*, which starts with a colon and is followed by one or more indented lines. Although Python itself doesn't care, we can use our variable naming convention to specify the type of the arguments:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def ManhattanDistance( aListOne, aListTwo ):\n",
" \n",
" pass"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 19
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And since we know we'll have to calculate a continuously valued sum as the return value, we can create it, initialize it to zero, and return it at the end of the function:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def ManhattanDistance( aListOne, aListTwo ):\n",
" \n",
" dReturnSum = 0\n",
" return dReturnSum"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 20
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that these are all completely valid Python expressions, which means that we can call the function (with two empty lists as dummy arguments) to return the value zero:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ManhattanDistance( [], [] )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 21,
"text": [
"0"
]
}
],
"prompt_number": 21
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Even though Python doesn't know that the function requires lists as inputs, it does know that it requires exactly two arguments, and it'll complain if we break that rule:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ManhattanDistance( \"not a list\", 2 )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 22,
"text": [
"0"
]
}
],
"prompt_number": 22
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ManhattanDistance( \"only one argument\" )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"ename": "TypeError",
"evalue": "ManhattanDistance() takes exactly 2 arguments (1 given)",
"output_type": "pyerr",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-23-04a4d40cbd3c>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mManhattanDistance\u001b[0m\u001b[0;34m(\u001b[0m \u001b[0;34m\"only one argument\"\u001b[0m \u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: ManhattanDistance() takes exactly 2 arguments (1 given)"
]
}
],
"prompt_number": 23
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ManhattanDistance( \"too\", \"many\", \"arguments\" )"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Although the function has valid *syntax*, making it behave with the *semantics* we'd like is a bit trickier. We know that we'll have to do *something* with each element of both lists. Since they're guaranteed to be of the same length, this is equivalent to doing something with each element of the first list. \"Each element of the list\" sounds like a `for` loop to me:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def ManhattanDistance( aListOne, aListTwo ):\n",
" \n",
" dReturnSum = 0\n",
" for iIndex in xrange( len( aListOne ) ):\n",
" pass\n",
" \n",
" return dReturnSum"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can always add `print` statements to your code to \"see\" what Python is doing. For example, we can ask Python what the lengths of our input lists are:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def ManhattanDistance( aListOne, aListTwo ):\n",
" \n",
" dReturnSum = 0\n",
" print( len( aListOne ) )\n",
" print( len( aListTwo ) )\n",
" for iIndex in xrange( len( aListOne ) ):\n",
" pass\n",
" \n",
" return dReturnSum\n",
"\n",
"ManhattanDistance( [\"three\", \"element\", \"list\"], [1, 2, 3] )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"3\n",
"3\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 24,
"text": [
"0"
]
}
],
"prompt_number": 24
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ManhattanDistance( [\"a\", \"four\", \"element\", \"list\"], [4, 3, 2, 1] )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"4\n",
"4\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 25,
"text": [
"0"
]
}
],
"prompt_number": 25
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Or, even though we don't yet know exactly what our loop should be computing, we can watch the list index count up through each element in an input list:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def ManhattanDistance( aListOne, aListTwo ):\n",
" \n",
" dReturnSum = 0\n",
" for iIndex in xrange( len( aListOne ) ):\n",
" print( iIndex )\n",
" \n",
" return dReturnSum\n",
"\n",
"ManhattanDistance( [\"three\", \"element\", \"list\"], [1, 2, 3] )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0\n",
"1\n",
"2\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 26,
"text": [
"0"
]
}
],
"prompt_number": 26
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ManhattanDistance( [\"a\", \"four\", \"element\", \"list\"], [4, 3, 2, 1] )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0\n",
"1\n",
"2\n",
"3\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 27,
"text": [
"0"
]
}
],
"prompt_number": 27
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Why don't we see anything except the return value for an empty list?\n",
"ManhattanDistance( [], [] )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 28,
"text": [
"0"
]
}
],
"prompt_number": 28
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But it's actually not so hard to tell Python what we want to compute about each element pair in our lists. After all, it's easy to retrieve the two corresponding elements at a particular list index:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def ManhattanDistance( aListOne, aListTwo ):\n",
" \n",
" dReturnSum = 0\n",
" for iIndex in xrange( len( aListOne ) ):\n",
" dElementFromListOne = aListOne[iIndex]\n",
" dElementFromListTwo = aListTwo[iIndex]\n",
" print( [dElementFromListOne, dElementFromListTwo] )\n",
" \n",
" return dReturnSum\n",
"\n",
"ManhattanDistance( [\"three\", \"element\", \"list\"], [1, 2, 3] )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"['three', 1]\n",
"['element', 2]\n",
"['list', 3]\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 29,
"text": [
"0"
]
}
],
"prompt_number": 29
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And if we start passing in arguments that contain numbers like they're supposed to, we can compute each pair's difference:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def ManhattanDistance( aListOne, aListTwo ):\n",
" \n",
" dReturnSum = 0\n",
" for iIndex in xrange( len( aListOne ) ):\n",
" dElementFromListOne = aListOne[iIndex]\n",
" dElementFromListTwo = aListTwo[iIndex]\n",
" dDifference = dElementFromListOne - dElementFromListTwo\n",
" print( dDifference )\n",
" \n",
" return dReturnSum\n",
"\n",
"ManhattanDistance( [1, 2, 3], [4, 4.5, 5] )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"-3\n",
"-2.5\n",
"-2\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 30,
"text": [
"0"
]
}
],
"prompt_number": 30
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That's not exactly what we want, though - we need to take the absolute value of the difference:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def ManhattanDistance( aListOne, aListTwo ):\n",
" \n",
" dReturnSum = 0\n",
" for iIndex in xrange( len( aListOne ) ):\n",
" dElementFromListOne = aListOne[iIndex]\n",
" dElementFromListTwo = aListTwo[iIndex]\n",
" dDifference = dElementFromListOne - dElementFromListTwo\n",
" dAbsDiff = abs( dDifference )\n",
" print( dAbsDiff )\n",
" \n",
" return dReturnSum\n",
"\n",
"ManhattanDistance( [1, 2, 3], [4, 4.5, 5] )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"3\n",
"2.5\n",
"2\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 31,
"text": [
"0"
]
}
],
"prompt_number": 31
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And finally, we want to remember the sum of these absolute differences, not just calculate each one in isolation:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def ManhattanDistance( aListOne, aListTwo ):\n",
" \n",
" dReturnSum = 0\n",
" for iIndex in xrange( len( aListOne ) ):\n",
" dElementFromListOne = aListOne[iIndex]\n",
" dElementFromListTwo = aListTwo[iIndex]\n",
" dDifference = dElementFromListOne - dElementFromListTwo\n",
" dAbsDiff = abs( dDifference )\n",
" dReturnSum = dReturnSum + dAbsDiff\n",
" print( dReturnSum )\n",
" \n",
" return dReturnSum\n",
"\n",
"ManhattanDistance( [1, 2, 3], [4, 4.5, 5] )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"3\n",
"5.5\n",
"7.5\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 32,
"text": [
"7.5"
]
}
],
"prompt_number": 32
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Hey, all of a sudden that looks right! Python's built-in version even agrees with us:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"scipy.spatial.distance.cityblock( [1, 2, 3], [4, 4.5, 5] )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 33,
"text": [
"7.5"
]
}
],
"prompt_number": 33
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And as long as they contain numbers and are of the same length, we can throw any two vectors we'd like at this function:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ManhattanDistance( aOne, aTwo )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1.97097395723\n",
"2.57674478186\n",
"3.50311723578\n",
"3.88638981674\n",
"4.81949652434\n",
"5.49866607781\n",
"5.823535035\n",
"5.83248201875\n",
"7.08041224695\n",
"7.42185435878\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 34,
"text": [
"7.421854358780827"
]
}
],
"prompt_number": 34
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"scipy.spatial.distance.cityblock( aOne, aTwo )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 35,
"text": [
"7.421854358780827"
]
}
],
"prompt_number": 35
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we take out our `print` statement to prevent excessive output, this is true even for very large vectors:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def ManhattanDistance( aListOne, aListTwo ):\n",
" \n",
" dReturnSum = 0\n",
" for iIndex in xrange( len( aListOne ) ):\n",
" dElementFromListOne = aListOne[iIndex]\n",
" dElementFromListTwo = aListTwo[iIndex]\n",
" dDifference = dElementFromListOne - dElementFromListTwo\n",
" dAbsDiff = abs( dDifference )\n",
" dReturnSum = dReturnSum + dAbsDiff\n",
" \n",
" return dReturnSum"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 36
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"aHugeOne = standard_normal( 1000 )\n",
"print( aHugeOne )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[ 0.04742005 0.57400508 -0.27367586 -0.85702948 0.34246131 -1.98481762\n",
" -0.41611678 -0.18161337 -0.43331365 -0.26819452 0.18896425 0.88595997\n",
" 2.18520712 0.55418266 0.3590706 -0.73412195 -0.35401826 -1.56697823\n",
" 1.5647707 2.36693953 -0.86798789 1.50393854 0.12043649 0.97430954\n",
" 0.23870596 1.62641172 0.74616124 1.75143545 -0.06376632 1.14580637\n",
" 1.24674321 0.56051055 -0.25092774 -1.0418408 2.0175478 0.02572649\n",
" -0.39238756 0.73040191 1.49674503 -0.52709693 -0.68098966 -0.29406799\n",
" -2.17217621 0.602795 -0.07958609 -0.87104356 1.82708345 0.74264256\n",
" 0.46625441 -1.59814086 0.44658281 -1.01885764 1.09015436 1.8581048\n",
" -0.07681018 -0.7014339 -0.28218014 1.13540885 -0.30955816 0.30725744\n",
" 0.19844439 0.85796436 -0.50188492 -0.72958526 0.28044421 0.76544876\n",
" -1.63354333 0.10160297 -1.42012678 0.98476933 -0.71635204 -0.14792598\n",
" -0.46418065 1.44389756 1.6256733 -0.23300842 0.45007031 -1.04152085\n",
" 0.6333404 0.10755496 -0.71789185 0.94532341 0.15859704 1.70220624\n",
" -0.36224368 -0.37101814 0.13823027 -0.54816086 0.88950021 -1.57174401\n",
" 1.70387886 -1.36484487 -1.9016048 0.39297136 -0.69541267 0.40117329\n",
" 0.17091817 1.11807321 -0.43682307 0.36721417 -0.86407862 0.2749211\n",
" -1.49917733 0.75529303 -1.22145609 2.39378026 -0.39330553 0.17638119\n",
" -0.49817596 -0.19311772 -0.26883572 0.31079103 0.97057423 0.39860299\n",
" 0.23923581 0.85758594 -2.14713568 1.16182092 0.50516786 0.10136948\n",
" 0.19398883 -0.0334218 -0.14002341 -0.84496706 0.61438872 2.29874078\n",
" 0.13180309 0.34932085 -0.78195252 -1.05589232 0.36697903 -0.63030146\n",
" 0.43016666 1.29616365 0.0067651 1.11693455 0.1839555 -0.8665965\n",
" -0.48355991 -1.69205935 -1.29805974 -0.45507581 3.23706218 2.73841822\n",
" 0.58942424 1.58241088 0.1923382 -0.17104597 0.59156394 0.01624954\n",
" -0.23364195 0.18225109 0.34008518 -0.32017902 0.19526619 -0.4944087\n",
" -0.98539239 1.07371323 -0.72808684 1.20275673 0.18528227 0.7374933\n",
" 0.80752953 0.09963776 -1.83319032 0.176136 0.09848892 -0.17359568\n",
" -0.43944097 -0.96790893 1.06219145 0.33216473 0.51959286 -1.12713195\n",
" -1.07803157 0.08624305 0.81594232 0.21760287 0.09096145 0.40584635\n",
" 0.13559349 0.40559953 0.62556769 -0.13350865 1.60597115 -0.52921547\n",
" 1.75942535 0.29135025 -0.86011784 -1.51962304 -0.51832964 1.64672365\n",
" 0.4549815 -1.52719787 0.34234613 0.24621191 2.13400239 -0.18331773\n",
" -0.7331141 0.16186047 -0.88866296 0.32611393 -0.6366925 0.7185728\n",
" 1.24254965 0.70348224 -0.02081886 -0.55441922 -0.05448582 0.19090875\n",
" 0.61795762 -1.51262264 -0.65828007 0.46667056 -0.07763719 -1.4119087\n",
" -0.55990683 0.56770124 -1.91268477 0.97063386 -0.22378832 0.0693208\n",
" 1.34259795 -0.86788945 1.70549557 -0.68385545 -1.98434822 -1.6455808\n",
" -1.06762993 -0.39243525 -0.8876981 2.29925617 -0.73135546 -0.05625998\n",
" -0.36134817 -0.67856726 0.8463587 -0.14298856 -0.92602444 1.25247678\n",
" -0.05933687 -1.29541268 -1.30050845 0.64200046 -0.70865192 0.88346704\n",
" 0.65130229 0.1133544 2.41468982 -0.05755936 -0.48398067 0.70636843\n",
" -0.14791966 1.08780075 -1.6040276 -0.01613935 -0.885035 0.08708079\n",
" 1.99726204 -0.67907533 -0.60248714 0.32771321 -0.49803202 -0.96364321\n",
" 2.98012892 2.04368494 0.63081518 -0.62602163 0.02875186 -0.21557922\n",
" -0.08661063 0.41507904 -1.53428064 0.19331321 0.5550818 -1.00601844\n",
" 0.5951394 -0.90706423 -0.21583309 0.74441945 -2.43569204 -1.11265161\n",
" -0.10477028 0.77431512 -1.46631278 -0.56387313 -1.62715675 -0.97207001\n",
" -1.13507762 0.56283746 -1.57564004 0.22097946 -0.46956966 0.80284373\n",
" -1.05864003 -1.08971404 -0.88654388 1.14971469 -0.00774497 0.00525554\n",
" -0.98037601 0.10452743 2.26792089 1.5498696 1.22133654 0.18400587\n",
" 0.16075105 0.95328275 -2.45166154 -0.05086994 0.78108209 -0.11442729\n",
" -0.61174338 0.09412205 0.83157717 -0.46007539 -1.81849593 0.44441731\n",
" 0.83735093 0.49106047 -1.09255013 -0.66045876 -1.76901102 -0.69304312\n",
" 0.278254 -1.40589343 0.4034317 -0.68889777 0.37262217 -0.76174552\n",
" 0.25389543 -0.3547181 -1.8652101 0.77338594 1.49099868 0.24388489\n",
" 0.03699143 0.4420177 -0.5498402 -0.67958063 -0.10334337 1.69266209\n",
" -0.49483226 -0.25998228 -1.54800465 0.58034353 -1.60467459 0.0450156\n",
" -1.83948969 -1.71100499 1.26945032 -0.13955009 1.40939772 1.03950819\n",
" -2.02093565 -1.63062715 0.85275181 2.17913606 -1.09415927 -1.37241725\n",
" -0.01632905 -0.8246879 -1.73015584 -0.32755354 0.14087123 -0.69248016\n",
" -0.70022457 -1.65062894 0.58174528 1.19043835 -3.1204179 0.10421369\n",
" 0.02106392 1.28337674 0.46412532 0.00605815 0.50868927 2.48295642\n",
" -0.41931529 -1.31437592 0.24788381 0.85615681 0.28949469 0.14142192\n",
" 0.72456449 0.62995081 1.87928877 1.42476357 -0.27686624 0.67668074\n",
" -0.89845827 -0.38572874 0.80180714 0.72553569 -0.49347101 0.30976984\n",
" 0.78254348 -0.05074371 1.02345968 -1.40641892 1.19114999 -0.1741527\n",
" 0.56019741 0.63883045 1.92055695 0.9003111 1.63033797 0.24888082\n",
" -1.64379629 -1.08798222 -0.9868473 0.33232933 0.69908875 -2.62236054\n",
" -0.73007788 3.47534712 -1.00225392 0.03750563 -0.16038263 0.16550994\n",
" 1.34058866 0.42401525 0.62438113 0.39285362 -0.47349901 0.7321057\n",
" -0.70401504 -0.1697354 0.97394013 0.61244182 0.29379528 0.84567234\n",
" -1.20480759 0.40890639 2.86733068 -0.02757716 0.16963052 1.48486844\n",
" 1.09072934 -1.16192382 -1.92669163 1.24642906 0.73782586 -0.39781142\n",
" 1.19199939 0.37919335 0.74301557 0.67991709 -0.44243208 -2.12394016\n",
" 1.5395241 -0.98715089 -0.33814605 0.21167961 -1.24920373 -0.1650347\n",
" 0.20936001 0.74961795 1.45388009 0.61801033 -0.58167842 0.3227964\n",
" -3.02457119 -0.53443652 1.17309402 -0.15938502 -1.11711756 0.70885364\n",
" -0.16459794 -0.30685091 1.23825581 -1.10753522 -0.66061269 1.29086814\n",
" 0.32083386 1.70208128 0.69433833 -0.20837452 1.22352189 -0.48993192\n",
" 1.15690379 -0.06186818 -2.26294815 -0.10015766 0.52643372 1.43663088\n",
" 1.25602947 -0.77723654 -1.12117284 -1.4406927 0.24010051 0.3831462\n",
" 0.42238631 1.59410386 1.8693582 0.87584654 -0.04181144 0.38370203\n",
" -0.51798357 0.81473843 0.61593878 1.17213114 0.75944854 -0.6858496\n",
" 1.80153205 0.58050984 0.94937935 0.46029442 0.19390924 -1.45408195\n",
" 0.48257247 0.80473137 -0.67038179 0.1353026 0.50966153 -0.43520012\n",
" -0.37342026 1.47774579 -0.09336419 1.01837292 -0.5276257 0.1136697\n",
" -0.87280112 -0.03469607 0.46696442 0.07051096 0.08919989 0.57139414\n",
" -0.38722805 -0.10706207 0.40662579 1.98668868 -1.99006106 0.83056834\n",
" -0.64128287 -0.23248493 0.67221938 1.09282197 -1.44494217 1.20016344\n",
" -0.05372733 -0.45629655 -0.76530423 -1.11149847 -0.55319286 0.20313416\n",
" -0.59840696 0.62648747 0.37781396 -1.41175188 0.44507594 -0.92417606\n",
" -0.67639387 -1.17408923 2.01858596 0.4111741 -1.40581659 -0.55731413\n",
" 0.11475933 0.97333733 0.03136686 0.6696583 -0.25781387 -0.64494996\n",
" -0.11388615 -0.39924721 0.48176801 0.30308378 -0.48178473 1.12677996\n",
" -0.177552 -1.16489842 1.89031281 0.47441672 -0.9924946 0.38428794\n",
" -2.2918928 0.18932457 0.40061968 -0.29462037 -1.61704062 -0.73660339\n",
" -0.42519113 -0.74714423 1.92450476 -0.78779993 -0.07476298 -0.81320971\n",
" 0.31836997 -0.28709351 0.19780651 -0.83606478 -2.03211689 0.99080259\n",
" 0.36543224 -0.14491917 1.28668821 1.17019842 -0.04571117 -1.17450812\n",
" -0.82038515 0.36013671 -0.70561095 1.64420524 -0.56384152 -0.26394597\n",
" -0.08318512 1.67394672 0.2125625 1.35521929 0.95551649 -0.9471054\n",
" 1.49016638 1.60918104 1.43162201 -1.8800063 0.1297648 0.30732241\n",
" 0.75534157 -0.23002525 0.03794654 0.75504893 1.30353781 -0.71554722\n",
" 0.78739722 -0.70082447 0.82935951 0.49605304 -0.04035159 -0.18014564\n",
" -0.28258528 0.34327995 0.97979588 -0.41392033 1.17841949 1.44344648\n",
" -0.58582339 0.30798406 0.01022317 -0.83033029 0.4249539 0.2105601\n",
" -0.47083774 1.10180994 0.06323599 -0.25382229 -1.19906731 -0.15859909\n",
" 1.34053339 1.12920561 -0.0227069 -0.9791688 -0.8450499 -0.65822185\n",
" 0.3549455 -0.36201239 -1.06395557 1.13311066 0.92749385 -0.80247103\n",
" 1.11504198 -0.25183228 -0.87414974 -1.13100896 2.03420901 0.23937079\n",
" 1.3024334 0.20012948 -0.12600132 0.17157396 0.12203301 0.2148241\n",
" -0.37236093 -0.8546389 0.03749878 0.54253621 0.76052872 0.45755315\n",
" -2.74509883 -0.36939284 0.39322045 0.37519622 -1.07525638 0.95540871\n",
" 1.56342043 1.32832159 1.13827467 -1.25246951 -0.97759402 -0.4502375\n",
" 1.16565225 0.20226149 -1.12077214 0.97512394 1.86256047 0.90477651\n",
" -0.52806244 0.69847784 0.18995492 0.85278624 -0.7529672 -1.86127838\n",
" 0.37443971 -1.29312333 0.40189968 -0.5715199 -1.09939342 -0.89309206\n",
" -0.7668074 -1.75586261 -0.31879494 -1.31597762 0.33986565 -0.28445924\n",
" -0.08181546 0.64478561 -0.21155851 -1.25427522 -0.61759436 1.9640582\n",
" 0.66118485 0.73894115 0.66584544 -0.5731105 -0.01177141 -0.03047473\n",
" -0.05681349 0.27095284 1.17265144 0.31941918 -2.10676239 0.21834184\n",
" 1.01869371 -0.59510594 1.78302711 2.79983409 0.05787143 -0.74541654\n",
" -0.6468804 -1.28328677 1.03744758 -1.13294474 -0.26439124 -0.63769469\n",
" -0.29940959 -1.6958559 -1.87797598 0.45156871 1.72036664 -0.04677623\n",
" 0.36568162 0.14170681 1.81477255 0.73721247 -0.51626273 -0.27381526\n",
" -0.30788022 0.05245051 -0.57826001 -1.29155097 0.01350102 -1.62477206\n",
" -0.55813175 -1.32949699 -0.40727693 -1.1629641 0.69156378 2.27882903\n",
" 1.28904511 -1.5613294 0.34003362 0.60497338 0.09125902 2.00429743\n",
" -0.67076332 -1.06286596 0.32219462 -0.50460464 -0.49450995 -0.33701948\n",
" 1.36053036 -0.12860692 -0.63801799 1.20443283 -1.56799984 -0.65971196\n",
" 0.51566127 1.6905551 -1.47841582 0.77138773 0.45710677 0.38293344\n",
" 0.77184961 1.21070173 -0.85598662 1.26087596 -2.47402462 1.08316464\n",
" -0.07681146 -0.31277607 -1.60250208 -0.06939934 2.64139261 -1.01573083\n",
" 0.55396719 -1.42716687 -1.8833136 -0.02609377 0.70086017 -1.4490615\n",
" 0.67036284 0.97243795 1.17384608 0.32820935 1.6820939 0.36396986\n",
" 0.73518335 1.31322261 -0.27655216 0.41948054 0.04726066 -0.41301468\n",
" 0.81858838 1.40074095 0.73425402 1.59089813 -0.76963249 -0.00982439\n",
" -1.13671874 -1.85703733 -0.82178706 0.19788386 -0.17262463 0.90413429\n",
" 1.65545908 -0.29048495 -0.49132198 1.74176089 1.06698183 1.20636727\n",
" -0.18792082 -0.20236782 1.18878376 0.55398636 0.71120497 0.43183935\n",
" 0.04159535 0.54050078 -0.08280448 -0.40012186 -0.76944639 1.62672841\n",
" -0.14274938 -0.98040083 0.5528291 -0.10394194 1.40286985 0.40146633\n",
" 2.27757931 2.10848571 -0.62859083 -1.18203903 -1.10268568 0.54961667\n",
" 0.65000045 0.4923611 -1.15987772 0.50431628 1.59918339 1.04255461\n",
" -0.70062581 -0.07421602 0.215708 1.3852603 -0.70559319 -0.21165903\n",
" 0.68961234 0.51693851 -0.68292413 0.26431785 -0.67630411 -0.76463447\n",
" 0.10473239 -0.50197568 -0.56789493 -0.08935893 0.6982661 1.89538816\n",
" -0.14382057 -1.03797341 1.44581021 -0.22425698 0.04359213 0.89282972\n",
" 1.4670857 0.9959062 0.32384887 1.13041499 0.06349106 -1.17693984\n",
" 0.15916877 -0.1919054 -0.03113675 -0.10158782 0.32651212 -1.16705005\n",
" -0.40149591 1.04733123 -1.11890607 -0.91891794 -0.35362073 -0.30123889\n",
" -1.35967449 -0.19490134 0.31489901 0.78308536 1.30577444 -0.86972553\n",
" -0.45438025 -1.55447594 -0.45958947 0.21168112 1.90101349 -1.20596903\n",
" -0.06039991 -0.69773659 -0.3659275 -0.42951896 -1.41287056 -0.03492711\n",
" -0.67235853 -1.14304682 -0.95604633 -0.73810524 1.03378603 -0.16184625\n",
" 0.75804282 -0.9064536 0.40040867 0.40676789 1.0981376 -0.76528816\n",
" 0.1308808 0.21845082 0.17317996 0.66277978 1.41108271 -1.27710588\n",
" 0.61600397 -0.9445445 -0.82600319 0.08439313 0.61664268 -0.03229835\n",
" -0.89166003 -1.1863402 -0.73129243 -0.19529695 0.38188084 -1.82142964\n",
" 0.11764922 -0.48236709 -0.08952788 -0.8631671 -0.31939653 -1.2427421\n",
" 0.25540859 1.71338612 0.92880496 0.75021221 1.04789286 -0.08647519\n",
" 0.0862445 0.04763068 -1.34380958 -0.39575074 0.43544061 -1.45153839\n",
" 0.69673424 1.24049031 -0.00543123 0.58364582 -0.28169265 1.45126672\n",
" 0.96845645 -1.74362985 0.49546867 0.79051599 0.19246377 -0.37016452\n",
" 0.43272147 0.9415736 0.04817942 0.35502409 -1.14669879 -1.40125106\n",
" 0.8753509 -0.69240689 -1.24693491 0.54681624]\n"
]
}
],
"prompt_number": 37
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"aHugeTwo = standard_normal( len( aHugeOne ) )\n",
"ManhattanDistance( aHugeOne, aHugeTwo )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 38,
"text": [
"1153.5833863779235"
]
}
],
"prompt_number": 38
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"scipy.spatial.distance.cityblock( aHugeOne, aHugeTwo )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 39,
"text": [
"1153.5833863779235"
]
}
],
"prompt_number": 39
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That looks like a Manhattan distance to me! As a final step, we can abbreviate many of the unnecessary intermediate steps we've taken for the sake of explanation. Remember, in Python as in algebra, it's always equivalent to substitute an expression (or variable) with its evaluation result:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def ManhattanDistance( aListOne, aListTwo ):\n",
" \n",
" dReturnSum = 0\n",
" for iIndex in xrange( len( aListOne ) ):\n",
" dReturnSum += abs( aListOne[iIndex] - aListTwo[iIndex] )\n",
" \n",
" return dReturnSum\n",
"\n",
"ManhattanDistance( aHugeOne, aHugeTwo )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 40,
"text": [
"1153.5833863779235"
]
}
],
"prompt_number": 40
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Euclidean distance\n",
"\n",
"Having seen this derivation of Manhattan distance, it shouldn't be too much of a stretch to modify our function to compute **[Euclidean distance](http://en.wikipedia.org/wiki/Euclidean_distance)** (equivalently the **L2 norm**) instead. Visually, this is the distance of a straight line between two vectors (points) in space, i.e. the green line on:\n",
"\n",
"<center><img src=\"http://upload.wikimedia.org/wikipedia/commons/thumb/0/08/Manhattan_distance.svg/200px-Manhattan_distance.svg.png\"/></center>\n",
"\n",
"Mathematically, it's the *square root of the sum of squared differences between two vectors' elements*:\n",
"\n",
"> $$ED(x, y) = \\sqrt{\\sum_i{(x_i - y_i)^2}}$$\n",
"\n",
"This is really only barely different from the Manhattan distance; it's:\n",
"\n",
"* A function of two input arguments that returns a continuous value.\n",
"\n",
"* Both inputs are lists that must be of the same length.\n",
"\n",
"* To compute the function's result, we must calculate a sum.\n",
"\n",
"* Each element in that sum is equal to a squared value.\n",
"\n",
"* The value that's squared is the difference between one pair of list elements.\n",
"\n",
"* We take the square root of the sum before returning it.\n",
"\n",
"Let's start with the final version of Manhattan distance as defined above, renaming the function but changing nothing else:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def EuclideanDistance( aListOne, aListTwo ):\n",
" \n",
" dReturnSum = 0\n",
" for iIndex in xrange( len( aListOne ) ):\n",
" dReturnSum += abs( aListOne[iIndex] - aListTwo[iIndex] )\n",
" \n",
" return dReturnSum"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 41
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Rather than taking an absolute value, let's square our difference (and print out the *running sum* for visualization's sake):"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def EuclideanDistance( aListOne, aListTwo ):\n",
" \n",
" dReturnSum = 0\n",
" for iIndex in xrange( len( aListOne ) ):\n",
" dReturnSum += ( aListOne[iIndex] - aListTwo[iIndex] )**2\n",
" print( dReturnSum )\n",
" \n",
" return dReturnSum\n",
"\n",
"EuclideanDistance( [1, 2, 3], [4, 4.5, 5] )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"9\n",
"15.25\n",
"19.25\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 42,
"text": [
"19.25"
]
}
],
"prompt_number": 42
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The only other change we should need is to take the sum's square root before returning it:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def EuclideanDistance( aListOne, aListTwo ):\n",
" \n",
" dReturnSum = 0\n",
" for iIndex in xrange( len( aListOne ) ):\n",
" dReturnSum += ( aListOne[iIndex] - aListTwo[iIndex] )**2\n",
" \n",
" return dReturnSum**0.5\n",
"\n",
"EuclideanDistance( [1, 2, 3], [4, 4.5, 5] )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 43,
"text": [
"4.387482193696061"
]
}
],
"prompt_number": 43
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"scipy.spatial.distance.euclidean( [1, 2, 3], [4, 4.5, 5] )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 44,
"text": [
"4.3874821936960613"
]
}
],
"prompt_number": 44
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"EuclideanDistance( aHugeOne, aHugeTwo )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 45,
"text": [
"45.827697028097326"
]
}
],
"prompt_number": 45
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"scipy.spatial.distance.euclidean( aHugeOne, aHugeTwo )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 46,
"text": [
"45.827697028097326"
]
}
],
"prompt_number": 46
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Not bad!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pairwise similarity exercises\n",
"\n",
"**1.** Modify the function below to compute the mean of a vector. Note that you can add cells to the IPython Notebook to perform additional tests if needed (use the Insert/Cell Below menu above), and you can add `print` statements to the function to \"see\" what's happening."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def Mean( aList ):\n",
" \n",
" dSum = 0.0\n",
" for iIndex in xrange( len( [] ) ):\n",
" dSum += 0\n",
" \n",
" return ( dSum / len( aList ) )"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 47
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This should result in all of the following comparisons producing `True`:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"Mean( [1, 2, 3] ) == 2"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 48,
"text": [
"False"
]
}
],
"prompt_number": 48
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Built-in version\n",
"Mean( [1, 2, 3] ) == mean( [1, 2, 3] )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 49,
"text": [
"False"
]
}
],
"prompt_number": 49
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"Mean( aHugeOne ) == mean( aHugeOne )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 50,
"text": [
"False"
]
}
],
"prompt_number": 50
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**2.** Modify the function below to compute Pearson correlation. Recall that Pearson correlation is the Euclidean distance of the z-scores of two vectors' values. That is, for each value, you subtract the vector mean, divide by its standard deviation, then sum up the squared differences and take the root of the sum. This is equivalent to the formula:\n",
"\n",
"> $$PC(x, y) = \\frac{\\sum_i{(x_i - \\bar{x})(y_i - \\bar{y})}}{\\sqrt{\\sum_i{(x_i - \\bar{x})^2}}\\sqrt{\\sum_i{(y_i - \\bar{y})^2}}}$$\n",
"\n",
"Modify the function below, with the same comments as above with respect to testing, new cells, `print`s, and `True` statements below."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def PearsonCorrelation( aOne, aTwo ):\n",
" \n",
" dMeanOne = Mean( aOne )\n",
" dMeanTwo = 0\n",
" dSumPairs = dSumOneSquared = dSumTwoSquared = 0.0\n",
" for iIndex in xrange( len( aOne ) ):\n",
" dDiffOne = aOne[iIndex] - dMeanOne\n",
" dDiffTwo = 0\n",
" dSumPairs += 0\n",
" dSumOneSquared += dDiffOne**2\n",
" dSumTwoSquared += 0\n",
" \n",
" return ( dSumPairs / ( dSumOneSquared**0.5 * 0 ) )"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 51
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# We'll round our result to 5 decimal places so the equality is safe\n",
"round( PearsonCorrelation( [1, 2, 3], [4, 5, 6] ), 5 ) == 1"
],
"language": "python",
"metadata": {},
"outputs": [
{
"ename": "ZeroDivisionError",
"evalue": "float division by zero",
"output_type": "pyerr",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mZeroDivisionError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-52-b52f6d20d1ea>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# We'll round our result to 5 decimal places so the equality is safe\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mround\u001b[0m\u001b[0;34m(\u001b[0m \u001b[0mPearsonCorrelation\u001b[0m\u001b[0;34m(\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;36m4\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m5\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m6\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m5\u001b[0m \u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m<ipython-input-51-6f329128b82d>\u001b[0m in \u001b[0;36mPearsonCorrelation\u001b[0;34m(aOne, aTwo)\u001b[0m\n\u001b[1;32m 11\u001b[0m \u001b[0mdSumTwoSquared\u001b[0m \u001b[0;34m+=\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 12\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 13\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0;34m(\u001b[0m \u001b[0mdSumPairs\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0;34m(\u001b[0m \u001b[0mdSumOneSquared\u001b[0m\u001b[0;34m**\u001b[0m\u001b[0;36m0.5\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0;36m0\u001b[0m \u001b[0;34m)\u001b[0m \u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mZeroDivisionError\u001b[0m: float division by zero"
]
}
],
"prompt_number": 52
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"round( PearsonCorrelation( [1, 2, 3], [4, 4.4, 3.4] ), 5 ) == -0.59604"
],
"language": "python",
"metadata": {},
"outputs": [
{
"ename": "ZeroDivisionError",
"evalue": "float division by zero",
"output_type": "pyerr",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mZeroDivisionError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-53-9d8dbe85fec6>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mround\u001b[0m\u001b[0;34m(\u001b[0m \u001b[0mPearsonCorrelation\u001b[0m\u001b[0;34m(\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;36m4\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m4.4\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3.4\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m5\u001b[0m \u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m-\u001b[0m\u001b[0;36m0.59604\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m<ipython-input-51-6f329128b82d>\u001b[0m in \u001b[0;36mPearsonCorrelation\u001b[0;34m(aOne, aTwo)\u001b[0m\n\u001b[1;32m 11\u001b[0m \u001b[0mdSumTwoSquared\u001b[0m \u001b[0;34m+=\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 12\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 13\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0;34m(\u001b[0m \u001b[0mdSumPairs\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0;34m(\u001b[0m \u001b[0mdSumOneSquared\u001b[0m\u001b[0;34m**\u001b[0m\u001b[0;36m0.5\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0;36m0\u001b[0m \u001b[0;34m)\u001b[0m \u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mZeroDivisionError\u001b[0m: float division by zero"
]
}
],
"prompt_number": 53
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"round( PearsonCorrelation( aHugeOne, aHugeTwo ), 5 ) == \\\n",
" round( scipy.stats.pearsonr( aHugeOne, aHugeTwo )[0], 5 )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 97,
"text": [
"False"
]
}
],
"prompt_number": 97
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"## Pairwise gold standard comparisons\n",
"\n",
"Like we discussed earlier, a **performance evaluation** is just a special kind of pairwise similarity measure. It assesses the similarity between a **gold standard** ground truth and a series of **predictions**. These are just two special vectors, the former always a set of binary values indicating whether some hypothesis tests should pass or not:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"aGoldStandard = [True, False, False, True, False, True, False, False]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 55
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"and the latter comprising any ranked scores, often p-values:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"aPredictionsPValues = [0.01, 0.02, 0.5, 0.6, 0.03, 0.04, 0.8, 0.9]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 56
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Most performance evaluations are carried out by converting these predictions into binary values, for example whether or not each p-value is below some **critical value**:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"aPredictionsBelow05 = [True, True, False, False, True, True, False, False]\n",
"print( aPredictionsBelow05 )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[True, True, False, False, True, True, False, False]\n"
]
}
],
"prompt_number": 57
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Or equivalently using Python trickery\n",
"aPredictionsBelow05 = [( d < 0.05 ) for d in aPredictionsPValues]\n",
"print( aPredictionsBelow05 )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[True, True, False, False, True, True, False, False]\n"
]
}
],
"prompt_number": 58
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Or equivalently using different Python trickery\n",
"aPredictionsBelow05 = array( aPredictionsPValues ) < 0.05\n",
"print( aPredictionsBelow05 )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[ True True False False True True False False]\n"
]
}
],
"prompt_number": 59
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"By adding a little bit of jitter to the point, we can even see how comparing our predictions to the gold standard is exactly the same as the similarity calculations we performed above:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"aJitter = standard_normal( len( aGoldStandard ) ) / 10\n",
"scatter( aGoldStandard + aJitter, aPredictionsBelow05 + aJitter )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 60,
"text": [
"<matplotlib.collections.PathCollection at 0x111d30290>"
]
},
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAD9CAYAAAC7iRw+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGshJREFUeJzt3X9wVOXB9vFrIVsJhAf5GWQ3TiQJ+UFI0hYM0UEXLIJp\nCX2FduIzVQoxTX1aJf3DWuu0Jn1aCK3vOJS0SltAW5DRqT9CH0NUkAUFklDgAYtU0RLYBIxEyAsa\ngWS53z/UlJAENnuSnITz/cwws2f33nMuoufi5D5nz7qMMUYAAMcYYHcAAEDvovgBwGEofgBwGIof\nAByG4gcAh6H4AcBhLBX/okWLFB0drUmTJl123K5duxQREaEXXnjByuYAAN3AUvEvXLhQFRUVlx0T\nDAb10EMPafbs2eIjAwBgP0vFP23aNA0fPvyyY1asWKH58+dr9OjRVjYFAOgmET258rq6OpWVlen1\n11/Xrl275HK52o3p6DkAwJWFO4vSoyd3CwsLVVJSIpfLJWNMpyG/eK0//nn00Udtz0B++3OQv//9\n6c/ZjbE2bd6jR/y7d+9Wbm6uJKmhoUEbN26U2+1WTk5OT24WAHAZPVr8//rXv1ofL1y4UHPmzKH0\nAcBmlor/rrvu0tatW9XQ0KCYmBgVFxerublZklRQUNAtAfs6n89ndwRLyG8v8tunP2e3ymWsThZZ\nDfD5/D8AIHRWupNP7gKAw1D8AOAwFD8AOAzFDwAOQ/EDgMNQ/ADgMBQ/ADgMxQ8ADkPxA4DDUPwA\n4DAUPwA4DMUPAA5D8QOAw1D8AOAwFD8AOAzFDwAOQ/EDgMNQ/ADgMBQ/ADiMpeJftGiRoqOjNWnS\npA5fX7dundLT05WWlqabb75Z+/fvt7I5AEA3sFT8CxcuVEVFRaevjx8/Xtu2bdP+/fv1s5/9TN/7\n3vesbA4A0A0irLx52rRpqqmp6fT1rKys1seZmZmqra3tcFxRUVHrY5/PJ5/PZyUWAFx1/H6//H5/\nt6zLZYwxVlZQU1OjOXPm6K233rrsuMcee0zvvvuu/vCHP7QN4HLJYgQAcBwr3WnpiD9UW7Zs0erV\nq7V9+/be2BwA4DJ6vPj379+v/Px8VVRUaPjw4T29OVzk2LFjeuyx5WpoaNS8edmaO3eu3ZEA9AE9\nWvxHjx7VnXfeqbVr1yo+Pr4nN4VLfPjhh8rIyNLJk3cqGJyk55//kR577APdd1+B3dEA2MzSHP9d\nd92lrVu3qqGhQdHR0SouLlZzc7MkqaCgQPfee69efPFFXX/99ZIkt9ut6urqtgGY4+8Rjz/+uB5+\neJ/OnXvq82f2atSo/6MTJ2psTAWgu1jpTssnd62i+HvG0qVL9bOffahg8PHPn6nRsGFZamw8bmsu\nAN3DSnfyyd2r1Ny5c3XNNWslrZNUpcGDF+k///Muu2MB6AM44r+Kbd++XYWFP9epU42aP//r+uUv\nf66IiF65kAtAD2OqBwAchqkeAEDIKH4AcBiKHwAchuIHgG505swZHTp0SJ9++qndUTpF8QNAN1m3\nbr3GjInRV74yS2PHxmrbtm12R+oQV/X0ES0tLdq2bZuampp00003acSIEXZHAtAFR44cUUrKZDU1\nbZGUKulV/cd/3K36+iMaNGhQt2+vz9+dE5d39uxZ3Xprtg4ePCWXa7QGDizQm2++ppSUFLujAQjR\nP//5T7nd6fqs9CXpdgWDkaqrq1NcXJyd0dphqqcP+P3vn9Bbb0XpzJndOn36VTU2PqJFixbbHQtA\nF9xwww06f/4tScc+f+Z/deHC/9PYsWPtjNUhir8PeO+9I/r001v1xX8OY6bryJEj9oYC0CUTJkzQ\nz3/+Y0VGflnDhs1QZOTX9NRTf9SQIUPsjtYOc/x9wLp161RQ8H/1ySebJF0rt/sBZWef0ksvrbM7\nGoAuOnTokGpqapScnCyv19tj2+GWDf2cMUaLF/9YTz75ew0YcI0mTkzVq6++qJEjR9odDXC8uro6\n7dq1S2PGjFFWVpZcLpfdkSRR/FeNM2fO6NNPP9Xo0aP7zP9cgJNt2bJFc+Z8WwMHZioYfFd33DFV\nzz33dJ/YPyl+AOgBY8eOV339E5JmSTqrqKgsrVtXrJycHLujcZM2AOhuxhidOHFU0vTPnxmklpap\nOnr0qJ2xugXFDwAdcLlcSk6erAEDVnz+zBENGPA/+upXv2prru5gqfgXLVqk6OhoTZo0qdMxDzzw\ngBISEpSenq69e/da2RwA9KoNG57R9dev1qBBo/WlL03UL3/5Y2VlZdkdyzJLc/xvvPGGoqKidM89\n9+itt95q93p5eblKS0tVXl6uqqoqLV68WJWVlW0DMMcPoA+7cOGCPvzwQw0bNkyRkZF2x2ll2xz/\ntGnTNHz48E5f37BhgxYsWCBJyszMVGNjo+rr661sEgB61YABAzR27Ng+VfpW9ei9eurq6hQTE9O6\n7PV6VVtbq+jo6DbjioqKWh/7fD75fL6ejAUA/Y7f75ff7++WdfX4Tdou/VWko+tfLy5+AEB7lx4U\nFxcXh72uHr2qx+PxKBAItC7X1tbK4/H05CYBAFfQo8Wfk5OjP//5z5KkyspKXXvtte2meQAAvcvS\nVM9dd92lrVu3qqGhQTExMSouLlZzc7MkqaCgQNnZ2SovL1d8fLyGDBmiNWvWdEtoAED4uGUDAPRD\n3LIBABAyih8AHIbiBwCHofgBwGEofgBwGIofAByG4gcAh6H4AcBhKH4AcBiKHwAchuIHAIeh+AHA\nYSh+AHAYih8AHIbiBwCHofgBwGEofgBwGIofAByG4gcAh7FU/BUVFUpKSlJCQoKWLVvW7vWGhgbN\nnj1bGRkZSk1N1VNPPWVlcwCAbhD2l60Hg0ElJiZq06ZN8ng8mjJlitavX6/k5OTWMUVFRTp37pyW\nLl2qhoYGJSYmqr6+XhEREf8OwJetA0CX2fJl69XV1YqPj1dsbKzcbrdyc3NVVlbWZsx1112n06dP\nS5JOnz6tkSNHtil9AEDvC7uF6+rqFBMT07rs9XpVVVXVZkx+fr5mzJihcePG6cyZM3ruuec6XFdR\nUVHrY5/PJ5/PF24sALgq+f1++f3+bllX2MXvcrmuOGbJkiXKyMiQ3+/X+++/r5kzZ2rfvn0aOnRo\nm3EXFz8AoL1LD4qLi4vDXlfYUz0ej0eBQKB1ORAIyOv1thmzY8cOfetb35IkxcXF6YYbbtA777wT\n7iYBAN0g7OKfPHmyDh06pJqaGp0/f17PPvuscnJy2oxJSkrSpk2bJEn19fV65513NH78eGuJAQCW\nhD3VExERodLSUs2aNUvBYFB5eXlKTk7WypUrJUkFBQX66U9/qoULFyo9PV0XLlzQr3/9a40YMaLb\nwgMAui7syzm7LQCXcwJAl9lyOScAoH+i+AHAYSh+AHAYih8AHIbiBwCHofgBwGEofgBwGIofAByG\n4gcAh6H4AcBhKH4AcBiKHwAchuIHAIeh+AHAYSh+AHAYih8AHIbiBwCHofgBwGEofgBwGEvFX1FR\noaSkJCUkJGjZsmUdjvH7/fryl7+s1NRU+Xw+K5sDAHSDsL9sPRgMKjExUZs2bZLH49GUKVO0fv16\nJScnt45pbGzUzTffrFdeeUVer1cNDQ0aNWpU2wB82ToAdJktX7ZeXV2t+Ph4xcbGyu12Kzc3V2Vl\nZW3GPPPMM5o3b568Xq8ktSt9AEDviwj3jXV1dYqJiWld9nq9qqqqajPm0KFDam5u1vTp03XmzBkt\nXrxYd999d7t1FRUVtT72+XxMCQHAJfx+v/x+f7esK+zid7lcVxzT3NysPXv2aPPmzWpqalJWVpam\nTp2qhISENuMuLn4AQHuXHhQXFxeHva6wi9/j8SgQCLQuBwKB1imdL8TExGjUqFGKjIxUZGSkbrnl\nFu3bt69d8QMAek/Yc/yTJ0/WoUOHVFNTo/Pnz+vZZ59VTk5OmzFz587Vm2++qWAwqKamJlVVVSkl\nJcVyaABA+MI+4o+IiFBpaalmzZqlYDCovLw8JScna+XKlZKkgoICJSUlafbs2UpLS9OAAQOUn59P\n8QOAzcK+nLPbAnA5JwB0mS2XcwIA+ieKHwAchuIHAIeh+AHAYSh+AHAYih8AHIbiBwCHofgBwGEo\nfgBwGIofAByG4gcAh6H4AcBhKH4AcBiKHwAchuIHAIeh+AHAYSh+AHAYih8AHIbiBwCHsVT8FRUV\nSkpKUkJCgpYtW9bpuF27dikiIkIvvPCClc0BALpB2MUfDAb1wx/+UBUVFXr77be1fv16HTx4sMNx\nDz30kGbPns2XqgNAHxB28VdXVys+Pl6xsbFyu93Kzc1VWVlZu3ErVqzQ/PnzNXr0aEtBAQDdIyLc\nN9bV1SkmJqZ12ev1qqqqqt2YsrIyvf7669q1a5dcLleH6yoqKmp97PP55PP5wo0FAFclv98vv9/f\nLesKu/g7K/GLFRYWqqSkRC6XS8aYTqd6Li5+AEB7lx4UFxcXh72usIvf4/EoEAi0LgcCAXm93jZj\ndu/erdzcXElSQ0ODNm7cKLfbrZycnHA3CwCwyGXCPOPa0tKixMREbd68WePGjdONN96o9evXKzk5\nucPxCxcu1Jw5c3TnnXe2DfD5bwMAgNBZ6c6wj/gjIiJUWlqqWbNmKRgMKi8vT8nJyVq5cqUkqaCg\nINxVAwB6UNhH/N0WgCN+AOgyK93JJ3cBwGEofgBwGIofAByG4gcAh6H4AcBhKH4AcBiKHwAchuIH\nAIeh+AHAYSh+AHAYih8AHIbiBwCHofgBwGEofgBwGIofAByG4gcAh6H4AcBhKH4AcBiKHwAcxnLx\nV1RUKCkpSQkJCVq2bFm719etW6f09HSlpaXp5ptv1v79+61uEgBggaUvWw8Gg0pMTNSmTZvk8Xg0\nZcoUrV+/XsnJya1jdu7cqZSUFA0bNkwVFRUqKipSZWXlvwPwZesA0GW2fdl6dXW14uPjFRsbK7fb\nrdzcXJWVlbUZk5WVpWHDhkmSMjMzVVtba2WTAACLIqy8ua6uTjExMa3LXq9XVVVVnY5ftWqVsrOz\n2z1fVFTU+tjn88nn81mJBQBXHb/fL7/f3y3rslT8Lpcr5LFbtmzR6tWrtX379navXVz86Bl/+9vf\ntGXLG/J6x+r73/++Bg8ebHckAF1w6UFxcXFx2OuyVPwej0eBQKB1ORAIyOv1thu3f/9+5efnq6Ki\nQsOHD7eySYShpOQx/fd/r1RT0yINGrRdq1c/q7//fasGDRpkdzQANrB0crelpUWJiYnavHmzxo0b\npxtvvLHdyd2jR49qxowZWrt2raZOndo+ACd3e9SFCxcUGTlU588flHS9JKOoKJ+eeuoBzZs3z+54\nAMJkpTstHfFHRESotLRUs2bNUjAYVF5enpKTk7Vy5UpJUkFBgX7xi1/o1KlTuu+++yRJbrdb1dXV\nVjaLLmhpaVFLS7OksZ8/45IxHn388cd2xgJgI0tH/N0SgCP+Hjdjxhxt3z5a588/LOnvioparAMH\n/q7rr7/e7mgAwmTb5ZzoH158ca3mzGnRqFG3a+LE32rTpr9R+oCDccQPAP0QR/wAgJBR/ADgMBQ/\nADgMxQ8ADkPxA4DDUPwA4DAUPwA4DMUPAA5D8QOAw1D8AOAwFD8AOAzFDwAOQ/EDgMNQ/ADgMBQ/\nADgMxQ8ADkPxA4DDWCr+iooKJSUlKSEhQcuWLetwzAMPPKCEhASlp6dr7969VjbXJxlj9Mc/rlJW\n1mzNnHmndu7caXckALg8E6aWlhYTFxdnDh8+bM6fP2/S09PN22+/3WbMyy+/bO644w5jjDGVlZUm\nMzOz3XosROgTli8vNYMHJxnpJSP90QwePMrs3bvX7lgArnJWujPsI/7q6mrFx8crNjZWbrdbubm5\nKisrazNmw4YNWrBggSQpMzNTjY2Nqq+vt/LvVJ+zfPmf1NT0J0lzJd2rpqb7tWbNWrtjAUCnIsJ9\nY11dnWJiYlqXvV6vqqqqrjimtrZW0dHRbcYVFRW1Pvb5fPL5fOHG6nUul0tSy0XPNGvgQE6dAOhe\nfr9ffr+/W9YVdvF/VnhXZi75FviO3ndx8fc3Dz98vx54YKGamoolfaghQ57UvfdutTsWgKvMpQfF\nxcXFYa8r7OL3eDwKBAKty4FAQF6v97Jjamtr5fF4wt1kn5SXt1BDh0bpqaf+qqioSD3yyCalpKTY\nHQsAOuUylx6Sh6ilpUWJiYnavHmzxo0bpxtvvFHr169XcnJy65jy8nKVlpaqvLxclZWVKiwsVGVl\nZdsALle73woAAJdnpTvDPuKPiIhQaWmpZs2apWAwqLy8PCUnJ2vlypWSpIKCAmVnZ6u8vFzx8fEa\nMmSI1qxZE+7mAADdJOwj/m4LwBE/AHSZle7k8hNJn3zyiebNu1uDBg3V8OHj9Kc/rbY7EgD0mLCn\neq4mBQWFKi8/p3PnanTuXECLF8/R+PGxmjFjht3RAHQgGAzqpZde0vHjxzV16lRNnjzZ7kj9CsUv\n6ZVXXtPZs5sljZQ0Uk1N+XrllU0UP9AHXbhwQdnZ87VjxzG1tHxFAwYs0YoVS7Ro0XftjtZvMNUj\n6dprR0h6u3X5mmsOasyYkfYFAtCpV199VTt2HNbHH7+ps2efUFPT6/qv/7pfFy5csDtav0HxS3ri\niV9r8OBF+tKXfqjBg+fK4/mH8vPz7Y4FoAMnTpyQlCzJ/fkziWppadbZs2dtTNW/cFXP5w4cOKBX\nX31VUVFRys3N1dChQ+2OBKAD7733ntLTs9TU9LykTA0cuFRJSRX6xz8qr/jeq4mV7qT4AfQ75eXl\nuuee7+vUqePKyLhJZWXr2t054GpH8QNwJGNMyPcNu9pwHT8AR3Jq6VtF8QOAw1D8AOAwFD8AOAzF\nDwAOQ/EDgMNQ/ADgMBQ/ADgMxQ8ADkPxA4DDUPwW+f1+uyNYQn57kd8+/Tm7VWEX/8mTJzVz5kxN\nmDBBt99+uxobG9uNCQQCmj59uiZOnKjU1FT99re/tRS2L+rv//OQ317kt09/zm5V2MVfUlKimTNn\n6t1339Vtt92mkpKSdmPcbrcef/xxHThwQJWVlfrd736ngwcPWgoMALAm7OLfsGGDFixYIElasGCB\nXnrppXZjxo4dq4yMDElSVFSUkpOTdezYsXA3CQDoBmHflnn48OE6deqUpM9ujTpixIjW5Y7U1NTo\n1ltv1YEDBxQVFfXvANxdDwDCEu5tmS/7ZeszZ87UBx980O75X/3qV22WXS7XZQv8448/1vz587V8\n+fI2pS+FHxwAEJ7LFv9rr73W6WvR0dH64IMPNHbsWB0/flxjxozpcFxzc7PmzZun73znO/rmN79p\nLS0AwLKw5/hzcnL09NNPS5KefvrpDkvdGKO8vDylpKSosLAw/JQAgG4T9hz/yZMn9e1vf1tHjx5V\nbGysnnvuOV177bU6duyY8vPz9fLLL+vNN9/ULbfcorS0tNapoKVLl2r27Nnd+pcAAHSBscFHH31k\nvva1r5mEhAQzc+ZMc+rUqXZjjh49anw+n0lJSTETJ040y5cvtyHpv23cuNEkJiaa+Ph4U1JS0uGY\n+++/38THx5u0tDSzZ8+eXk54eVfKv3btWpOWlmYmTZpkbrrpJrNv3z4bUnYulJ+/McZUV1ebgQMH\nmueff74X011ZKPm3bNliMjIyzMSJE82tt97auwGv4Er5T5w4YWbNmmXS09PNxIkTzZo1a3o/ZCcW\nLlxoxowZY1JTUzsd05f33SvlD2fftaX4H3zwQbNs2TJjjDElJSXmoYceajfm+PHjZu/evcYYY86c\nOWMmTJhg3n777V7N+YWWlhYTFxdnDh8+bM6fP2/S09PbZXn55ZfNHXfcYYwxprKy0mRmZtoRtUOh\n5N+xY4dpbGw0xny2k/e3/F+Mmz59uvn6179u/vrXv9qQtGOh5D916pRJSUkxgUDAGPNZkfYVoeR/\n9NFHzU9+8hNjzGfZR4wYYZqbm+2I2862bdvMnj17Oi3OvrzvGnPl/OHsu7bcsqG/fQagurpa8fHx\nio2NldvtVm5ursrKytqMufjvlJmZqcbGRtXX19sRt51Q8mdlZWnYsGGSPstfW1trR9QOhZJfklas\nWKH58+dr9OjRNqTsXCj5n3nmGc2bN09er1eSNGrUKDuidiiU/Nddd51Onz4tSTp9+rRGjhypiIjL\nXjvSa6ZNm6bhw4d3+npf3nelK+cPZ9+1pfjr6+sVHR0t6bOrg670Q66pqdHevXuVmZnZG/Haqaur\nU0xMTOuy1+tVXV3dFcf0lfIMJf/FVq1apezs7N6IFpJQf/5lZWW67777JPWtz4eEkv/QoUM6efKk\npk+frsmTJ+svf/lLb8fsVCj58/PzdeDAAY0bN07p6elavnx5b8cMW1/ed7sq1H23x/5J7o3PAPSW\nUEvEXHKevK+UT1dybNmyRatXr9b27dt7MFHXhJK/sLBQJSUlcrlcMp9NYfZCstCEkr+5uVl79uzR\n5s2b1dTUpKysLE2dOlUJCQm9kPDyQsm/ZMkSZWRkyO/36/3339fMmTO1b98+DR06tBcSWtdX992u\n6Mq+22PFfzV9BsDj8SgQCLQuBwKB1l/JOxtTW1srj8fTaxkvJ5T8krR//37l5+eroqLisr9a9rZQ\n8u/evVu5ubmSpIaGBm3cuFFut1s5OTm9mrUjoeSPiYnRqFGjFBkZqcjISN1yyy3at29fnyj+UPLv\n2LFDjzzyiCQpLi5ON9xwg9555x1Nnjy5V7OGoy/vu6Hq8r7bbWcguuDBBx9svTJg6dKlHZ7cvXDh\ngrn77rtNYWFhb8drp7m52YwfP94cPnzYnDt37oond3fu3NmnThCFkv/IkSMmLi7O7Ny506aUnQsl\n/8W++93v9qmrekLJf/DgQXPbbbeZlpYW88knn5jU1FRz4MABmxK3FUr+H/3oR6aoqMgYY8wHH3xg\nPB6P+eijj+yI26HDhw+HdHK3r+27X7hc/nD2Xdsu57ztttvaXc5ZV1dnsrOzjTHGvPHGG8blcpn0\n9HSTkZFhMjIyzMaNG+2Ia4wxpry83EyYMMHExcWZJUuWGGOMefLJJ82TTz7ZOuYHP/iBiYuLM2lp\naWb37t12Re3QlfLn5eWZESNGtP6sp0yZYmfcdkL5+X+hrxW/MaHl/81vfmNSUlJMamqq7ZcvX+pK\n+U+cOGG+8Y1vmLS0NJOammrWrVtnZ9w2cnNzzXXXXWfcbrfxer1m1apV/WrfvVL+cPbdsD/ABQDo\nn/gGLgBwGIofAByG4gcAh6H4AcBhKH4AcBiKHwAc5v8DCyCYjy2wZmEAAAAASUVORK5CYII=\n",
"text": [
"<matplotlib.figure.Figure at 0x111cf4b10>"
]
}
],
"prompt_number": 60
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Or if you'd prefer, we can visualize this using the original p-values instead; if our predictions are good, the column on the left should be \"higher\" than the column on the right (why?):"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"scatter( aGoldStandard + aJitter, aPredictionsPValues )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 61,
"text": [
"<matplotlib.collections.PathCollection at 0x111d76d10>"
]
},
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAD9CAYAAAC7iRw+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGIFJREFUeJzt3X1wVPW9x/HPSlbL86PysBsNJIEkhCRMoREfMIDhSU0Z\ngTZ26kWam6ZYing7lnKZW0NbEcbr9KJ4K7aIiIrYag0VSBFkWxWSKCC0wMVIiWwiMEbIBYyXJMvv\n/mFNCUlg2RNyEn7v1wwzObvfnPMhw/nkcPbsWY8xxggAYI2r3A4AAGhdFD8AWIbiBwDLUPwAYBmK\nHwAsQ/EDgGUcFf/3vvc99e3bV8OGDWt2Zs6cOYqPj1dqaqp27drlZHMAgBbgqPhnzpypwsLCZp/f\nsGGDPvroI5WWluqZZ57RrFmznGwOANACHBX/rbfeqp49ezb7/Lp16zRjxgxJUnp6uqqqqnTs2DEn\nmwQAOBR1OVdeUVGh6Ojo+mW/36/y8nL17du3/jGPx3M5IwDAFSvSGy9c9hd3zw/WVNEbY9rtn4cf\nftj1DOR3Pwf529+f9pzdGGd32rmsxe/z+RQMBuuXy8vL5fP5LucmAQAXcVmLPysrS88//7wkqaio\nSD169GhwmgcA0PocneO/55579Oc//1mVlZWKjo7WwoULVVtbK0nKy8vT5MmTtWHDBsXFxalz585a\nuXJli4RuSzIyMtyO4Aj53UV+97Tn7E55jNOTRU4DeDyOz1cBgG2cdCfv3AUAy1D8AGAZih8ALEPx\nA4BlKH4AsAzFDwCWofgBwDIUPwBYhuIHAMtQ/ABgmct6P/725P3339cbb6xXt25ddd9996lXr15u\nRwKAy4J79Uh644039O1v5+j//i9HXm9QvXsX6a9/Lab8AbRZ3KvHoTlzFqi6erXOnl2kM2dWq7Ly\nZv32t791OxYAXBYUv6RTp05KGli/XFs7UCdO/K97gQDgMqL4JU2Zcqc6dnxQUpmkt9Wx43Ldccck\nl1MBwOXBi7uSli37T4VC/6bXX79FnTp10eOP/5duueUWt2MBwGXBi7sA0A7x4i4AIGwUPwBYhuIH\nAMtQ/ABgGYofACxD8QOAZSh+ALAMxQ8AlqH4AcAyFD8AWIbiBwDLOCr+wsJCJSQkKD4+XkuWLGn0\nfGVlpSZOnKi0tDQlJyfrueeec7I5AEALiLj4Q6GQZs+ercLCQu3bt09r1qzR/v37G8wsW7ZMw4cP\n1wcffKBAIKAf//jHqqurcxy6LamtrdX99z+oXr38GjBgsFatet7tSABwQREXf0lJieLi4hQTEyOv\n16vs7GwVFBQ0mOnfv79OnjwpSTp58qR69+6tqKgr607QP/nJf2jVqr/qxIk/68iR53X//Qu0adMm\nt2MBQLMibuGKigpFR0fXL/v9fhUXFzeYyc3N1dixYzVgwACdOnVKr7zySpPrys/Pr/86IyNDGRkZ\nkcZqda+++oaqq1dLipUUq+rqB/SHP6zX+PHj3Y4G4AoSCAQUCARaZF0RF7/H47nozKJFi5SWlqZA\nIKCDBw8qMzNTu3fvVteuXRvMnVv87U23bt0kHZI0XJIUFXVIvXv3djUTgCvP+QfFCxcujHhdEZ/q\n8fl8CgaD9cvBYFB+v7/BzLZt2zR9+nRJUmxsrAYOHKgDBw5Eusk2aenSX6hTpzxdddU8XX31TPXq\ntV6zZ9/vdiwAaFbExT9ixAiVlpaqrKxMNTU1Wrt2rbKyshrMJCQkaPPmzZKkY8eO6cCBAxo0aJCz\nxG3MuHHjtG3bZj38cBc98shQ/e1v76lfv35uxwKAZjn66MWNGzdq7ty5CoVCysnJ0fz587V8+XJJ\nUl5eniorKzVz5kwdPnxYZ8+e1fz58/Wd73ynYQA+ehEALpmT7uQzdwGgHeIzdwEAYaP4AcAyFD8A\nWIbiBwDLUPwAYBmKHwAsQ/EDgGUofgCwDMUPAJah+AHAMhQ/AFiG4gcAy1D8AGAZih8ALEPxA4Bl\nKH4AsAzFDwCWofgBwDIUPwBYhuIHAMtQ/ABgGYofACxD8QOAZSh+ALAMxQ8AlqH4AcAyFD8AWIbi\nBwDLOCr+wsJCJSQkKD4+XkuWLGlyJhAIaPjw4UpOTlZGRoaTzQEAWoDHGGMi+cZQKKQhQ4Zo8+bN\n8vl8GjlypNasWaPExMT6maqqKt18883605/+JL/fr8rKSvXp06dhAI9HEUYAYKmKigrNm5evjz8+\nottvv0kLFvxEUVFRbsdqVU66M+KfVElJieLi4hQTEyNJys7OVkFBQYPif+mllzR16lT5/X5JalT6\nAHCpqqqq9PWv36rKymyFQndq584nVFp6SC+88Bu3o7UbERd/RUWFoqOj65f9fr+Ki4sbzJSWlqq2\ntlZjxozRqVOn9MADD+jee+9ttK78/Pz6rzMyMjglBKBZmzZt0uefJygUWiRJqq4eq5df7qNnn31K\nV199tcvpLp9AIKBAINAi64q4+D0ez0VnamtrtXPnTm3ZskXV1dUaNWqUbrzxRsXHxzeYO7f4AQCN\nnX9QvHDhwojXFfGLuz6fT8FgsH45GAzWn9L5SnR0tMaPH6+OHTuqd+/eGj16tHbv3h1xWAAYP368\nunT5H0VF/bukAnXqNEXZ2f9yRR/tt7SIi3/EiBEqLS1VWVmZampqtHbtWmVlZTWY+eY3v6l33nlH\noVBI1dXVKi4uVlJSkuPQAOzVo0cPvf/+28rOrtQtt/xG8+bdruee+7XbsdqViE/1REVFadmyZZow\nYYJCoZBycnKUmJio5cuXS5Ly8vKUkJCgiRMnKiUlRVdddZVyc3MpfgCO+Xw+rV79jNsx2q2IL+ds\nsQBczgkAl8xJd/LOXQCwDMUPAJah+AHAMhQ/AFiG4gcAy1D8AGAZih8ALEPxA4BlKH4AsAzFDwCW\nofgtEQqFVFFRoerqarejAHAZxW+B/fv36/rrExQf/3X17NlXv/41N7cCbMZN2iwwaNAwlZXNljF5\nkg6qU6db9c476zV8+HC3owGIEDdpQ7POnDmjjz/+Hxnz/X88EiuPJ1O7du1yNRcA91D8V7irr75a\nXbv2lvT2Px45LalEN9xwg4upALiJ4r/CeTwerV27Sp06TVW3bneoc+cUTZ8+RmPHjnU7GgCXcI7f\nEsFgULt27VK/fv00cuRIeTwetyMBcMBJd1L8ANAO8eIuACBsFD8AWIbiBwDLUPwAYBmKHwAsQ/ED\ngGUofgCwDMUPAJah+AHAMhQ/AFjGUfEXFhYqISFB8fHxWrJkSbNz7733nqKiovTaa6852RwAoAVE\nXPyhUEizZ89WYWGh9u3bpzVr1mj//v1Nzs2bN08TJ07knjwA0AZEXPwlJSWKi4tTTEyMvF6vsrOz\nVVBQ0GjuySef1LRp03Tttdc6CgoAaBlRkX5jRUWFoqOj65f9fr+Ki4sbzRQUFOitt97Se++91+yt\ngPPz8+u/zsjIUEZGRqSxAOCKFAgEFAgEWmRdERd/OPdznzt3rhYvXlx/+9DmTvWcW/wAgMbOPyhe\nuHBhxOuKuPh9Pp+CwWD9cjAYlN/vbzCzY8cOZWdnS5IqKyu1ceNGeb1eZWVlRbpZAIBDEX8QS11d\nnYYMGaItW7ZowIAB+sY3vqE1a9YoMTGxyfmZM2fqrrvu0t13390wAB/EAgCXzEl3RnzEHxUVpWXL\nlmnChAkKhULKyclRYmKili9fLknKy8uLdNUAgMuIj14EgHaIj14EAISN4gcAy1D8AGAZih8ALEPx\nA4BlKH4AsAzFDwCWofgBwDIUPwBYhuIHAMtQ/ABgGYofACxD8QOAZSh+ALAMxQ8AlqH4AcAyFD8A\nWIbiBwDLUPwAYBmKHwAsQ/EDgGUofgCwDMUPAJah+AHAMhQ/AFiG4gcAy1D8AGAZih8ALOO4+AsL\nC5WQkKD4+HgtWbKk0fMvvviiUlNTlZKSoptvvll79uxxukkAgAMeY4yJ9JtDoZCGDBmizZs3y+fz\naeTIkVqzZo0SExPrZ7Zv366kpCR1795dhYWFys/PV1FR0T8DeDxyEAEArOSkOx0d8ZeUlCguLk4x\nMTHyer3Kzs5WQUFBg5lRo0ape/fukqT09HSVl5c72SQAwKEoJ99cUVGh6Ojo+mW/36/i4uJm51es\nWKHJkyc3ejw/P7/+64yMDGVkZDiJBQBXnEAgoEAg0CLrclT8Ho8n7NmtW7fq2Wef1bvvvtvouXOL\nHwDQ2PkHxQsXLox4XY6K3+fzKRgM1i8Hg0H5/f5Gc3v27FFubq4KCwvVs2dPJ5sEADjk6Bz/iBEj\nVFpaqrKyMtXU1Gjt2rXKyspqMHP48GHdfffdeuGFFxQXF+coLADAOUdH/FFRUVq2bJkmTJigUCik\nnJwcJSYmavny5ZKkvLw8/fznP9eJEyc0a9YsSZLX61VJSYnz5ACAiDi6nLNFAnA5JwBcMtcu5wQA\ntD8UPwBYhuIHAMtQ/ABgGYofACxD8QOAZSh+ALAMxQ8AlqH4AcAyFD8AWIbiBwDLUPwAYBmKHwAs\nQ/EDgGUofgCwDMUPAJah+AHAMhQ/AFiG4gcAy1D8AGAZih8ALEPxA4BlotwOAABt2caNG7Vp01vq\n1+9azZr1A3Xr1s3tSI55jDHG1QAej1yOAABNeuKJpzR//uOqrs7VNdf8VdHRe/XBB9vUuXNnt6M5\n6k5O9VzBPvzwQ91zz/d0++1365lnVvALFrhE8+f/h6qrCyXN15kzL+rIkQF67bXX3I7lGKd6rlDB\nYFAjR47W6dNzdPZsrIqKHtGnn1ZqwYJ5bkcD2gVjjGpqqiUN+McjHp0969Pp06fdjNUiOOJvIzZs\n2KDk5Js0aFCafvazXygUCjla38svv6wvvpiis2f/XdK39fnna/X440+2TFjAAh6PR5MmTdE11/yr\npAOSfqerripQZmam29Ec44i/Ddi+fbumTZupL774jaR+evzxB3T2rNEvf/mziNf55WmdDuc80oFT\nPcAlWrNmhX7wgwf15pt3qE+fa/X0039QXFyc27Ec44i/DVi79lV98cUcSVmSvqHq6v/WqlUvO1rn\n9OnTdc01v5PH81+S/qhOnb6j++//fkvEBazRuXNnrV79jI4e/Uh/+9t23XLLLRecLyoq0g03JMnr\n/ZqGDRuljz76qJWSXhpHxV9YWKiEhATFx8dryZIlTc7MmTNH8fHxSk1N1a5du5xs7orVuXNHdehQ\nec4jlfra1zo6WufAgQO1fftbuvPOYo0a9d/65S9nOPofBIALq6ys1Pjx39Thw79QXd1n2rv32xoz\n5g7V1dW5Ha0xE6G6ujoTGxtrDh06ZGpqakxqaqrZt29fg5n169ebSZMmGWOMKSoqMunp6Y3W4yDC\nFePw4cOmR4/+pkOHHxvpP03Hjv3NK6/8zu1YAC7Bpk2bTPfuY4xk6v907hxt/v73v1+W7TnpzoiP\n+EtKShQXF6eYmBh5vV5lZ2eroKCgwcy6des0Y8YMSVJ6erqqqqp07NgxJ7+nrkjR0dHavbtIDz7o\n1fe//7HWr39R06dPczsWgEvQu3dv1dUdklT9j0eOqrb2hHr27OlmrCZF/OJuRUWFoqOj65f9fr+K\ni4svOlNeXq6+ffs2mMvPz6//OiMjQxkZGZHGareuv/56PfbYo27HABCh4cOH6667xuiPf7xZtbWj\n5fW+oYcemq8ePXq0yPoDgYACgUCLrCvi4vd4PGHNmfOuJGnq+84tfgBojzwej156aYUKCgp08OBB\nDR/+G40dO7bF1n/+QfHChQsjXlfExe/z+RQMBuuXg8Gg/H7/BWfKy8vl8/ki3SQAtGkej0dTpkxx\nO8ZFRXyOf8SIESotLVVZWZlqamq0du1aZWVlNZjJysrS888/L+nLy5x69OjR6DQPAKB1RXzEHxUV\npWXLlmnChAkKhULKyclRYmKili9fLknKy8vT5MmTtWHDBsXFxalz585auXJliwUHAESGu3MCQDvE\n3TkBAGGj+AHAMhQ/AFiG4gcAy1D8AGAZih8ALEPxA4BlKH4AsAzFDwCWofgBwDIUPwBYhuIHAMtQ\n/ABgGYofACxD8QOAZSh+ALAMxQ8AlqH4AcAyFD8AWIbiBwDLUPwAYBmKHwAsQ/EDgGUofgCwDMUP\nAJah+AHAMhQ/AFiG4ncoEAi4HcER8ruL/O5pz9mdirj4jx8/rszMTA0ePFjjx49XVVVVo5lgMKgx\nY8Zo6NChSk5O1hNPPOEobFvU3v/xkN9d5HdPe87uVMTFv3jxYmVmZurDDz/UuHHjtHjx4kYzXq9X\nv/rVr7R3714VFRXpqaee0v79+x0FBgA4E3Hxr1u3TjNmzJAkzZgxQ6+//nqjmX79+iktLU2S1KVL\nFyUmJuqTTz6JdJMAgBbgMcaYSL6xZ8+eOnHihCTJGKNevXrVLzelrKxMt912m/bu3asuXbr8M4DH\nE8nmAcB6Eda3oi70ZGZmpo4ePdro8UceeaTBssfjuWCBnz59WtOmTdPSpUsblL4UeXAAQGQuWPxv\nvvlms8/17dtXR48eVb9+/XTkyBFdd911Tc7V1tZq6tSp+u53v6spU6Y4SwsAcCzic/xZWVlatWqV\nJGnVqlVNlroxRjk5OUpKStLcuXMjTwkAaDERn+M/fvy4vvWtb+nw4cOKiYnRK6+8oh49euiTTz5R\nbm6u1q9fr3feeUejR49WSkpK/amgRx99VBMnTmzRvwQA4BIYF3z22Wfm9ttvN/Hx8SYzM9OcOHGi\n0czhw4dNRkaGSUpKMkOHDjVLly51Iek/bdy40QwZMsTExcWZxYsXNznzox/9yMTFxZmUlBSzc+fO\nVk54YRfL/8ILL5iUlBQzbNgwc9NNN5ndu3e7kLJ54fz8jTGmpKTEdOjQwbz66qutmO7iwsm/detW\nk5aWZoYOHWpuu+221g14ERfL/+mnn5oJEyaY1NRUM3ToULNy5crWD9mMmTNnmuuuu84kJyc3O9OW\n992L5Y9k33Wl+B966CGzZMkSY4wxixcvNvPmzWs0c+TIEbNr1y5jjDGnTp0ygwcPNvv27WvVnF+p\nq6szsbGx5tChQ6ampsakpqY2yrJ+/XozadIkY4wxRUVFJj093Y2oTQon/7Zt20xVVZUx5sudvL3l\n/2puzJgx5o477jC///3vXUjatHDynzhxwiQlJZlgMGiM+bJI24pw8j/88MPmpz/9qTHmy+y9evUy\ntbW1bsRt5C9/+YvZuXNns8XZlvddYy6eP5J915VbNrS39wCUlJQoLi5OMTEx8nq9ys7OVkFBQYOZ\nc/9O6enpqqqq0rFjx9yI20g4+UeNGqXu3btL+jJ/eXm5G1GbFE5+SXryySc1bdo0XXvttS6kbF44\n+V966SVNnTpVfr9fktSnTx83ojYpnPz9+/fXyZMnJUknT55U7969FRV1wWtHWs2tt96qnj17Nvt8\nW953pYvnj2TfdaX4jx07pr59+0r68uqgi/2Qy8rKtGvXLqWnp7dGvEYqKioUHR1dv+z3+1VRUXHR\nmbZSnuHkP9eKFSs0efLk1ogWlnB//gUFBZo1a5aktvX+kHDyl5aW6vjx4xozZoxGjBih1atXt3bM\nZoWTPzc3V3v37tWAAQOUmpqqpUuXtnbMiLXlffdShbvvXrZfya3xHoDWEm6JmPNeJ28r5XMpObZu\n3apnn31W77777mVMdGnCyT937lwtXrxYHo9H5stTmK2QLDzh5K+trdXOnTu1ZcsWVVdXa9SoUbrx\nxhsVHx/fCgkvLJz8ixYtUlpamgKBgA4ePKjMzEzt3r1bXbt2bYWEzrXVffdSXMq+e9mK/0p6D4DP\n51MwGKxfDgaD9f8lb26mvLxcPp+v1TJeSDj5JWnPnj3Kzc1VYWHhBf9r2drCyb9jxw5lZ2dLkior\nK7Vx40Z5vV5lZWW1atamhJM/Ojpaffr0UceOHdWxY0eNHj1au3fvbhPFH07+bdu2acGCBZKk2NhY\nDRw4UAcOHNCIESNaNWsk2vK+G65L3ndb7BWIS/DQQw/VXxnw6KOPNvni7tmzZ829995r5s6d29rx\nGqmtrTWDBg0yhw4dMmfOnLnoi7vbt29vUy8QhZP/448/NrGxsWb79u0upWxeOPnPdd9997Wpq3rC\nyb9//34zbtw4U1dXZz7//HOTnJxs9u7d61LihsLJ/+CDD5r8/HxjjDFHjx41Pp/PfPbZZ27EbdKh\nQ4fCenG3re27X7lQ/kj2Xdcu5xw3blyjyzkrKirM5MmTjTHGvP3228bj8ZjU1FSTlpZm0tLSzMaN\nG92Ia4wxZsOGDWbw4MEmNjbWLFq0yBhjzNNPP22efvrp+pkf/vCHJjY21qSkpJgdO3a4FbVJF8uf\nk5NjevXqVf+zHjlypJtxGwnn5/+Vtlb8xoSX/7HHHjNJSUkmOTnZ9cuXz3ex/J9++qm58847TUpK\niklOTjYvvviim3EbyM7ONv379zder9f4/X6zYsWKdrXvXix/JPtuxG/gAgC0T3wCFwBYhuIHAMtQ\n/ABgGYofACxD8QOAZSh+ALDM/wMc+Fwf3pXtUQAAAABJRU5ErkJggg==\n",
"text": [
"<matplotlib.figure.Figure at 0x111d268d0>"
]
}
],
"prompt_number": 61
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Instead of performing evaluations using correlations or distances, though, typical **performance measures** comprise other functions of how often our predictions and the gold standard agree. Each test that should pass in the gold standard is called a **positive**, and tests that should fail are called **negatives**. We can thus, for example, count the total number of positives in a gold standard:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def Positives( aGoldStandard ):\n",
" \n",
" iSum = 0\n",
" for fValue in aGoldStandard:\n",
" if fValue:\n",
" iSum += 1\n",
" \n",
" return iSum\n",
"\n",
"Positives( [True, False, True] )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 62,
"text": [
"2"
]
}
],
"prompt_number": 62
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Note that the local variable aGoldStandard inside the function is\n",
"# different from our global variable above\n",
"print( aGoldStandard )\n",
"Positives( aGoldStandard )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[True, False, False, True, False, True, False, False]\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 63,
"text": [
"3"
]
}
],
"prompt_number": 63
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Or equivalently to show an example Pythonism:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def Positives( aGoldStandard ):\n",
" \n",
" return sum( [( 1 if f else 0 ) for f in aGoldStandard] )\n",
"\n",
"Positives( aGoldStandard )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 64,
"text": [
"3"
]
}
],
"prompt_number": 64
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And of course it's easy to do the same thing for negatives a few different ways:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def Negatives( aGoldStandard ):\n",
" \n",
" iSum = 0\n",
" for fValue in aGoldStandard:\n",
" if not fValue:\n",
" iSum += 1\n",
" \n",
" return iSum\n",
"\n",
"Negatives( aGoldStandard )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 65,
"text": [
"5"
]
}
],
"prompt_number": 65
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def Negatives( aGoldStandard ):\n",
" \n",
" return sum( [( 0 if f else 1 ) for f in aGoldStandard] )\n",
"\n",
"Negatives( aGoldStandard )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 66,
"text": [
"5"
]
}
],
"prompt_number": 66
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def Negatives( aGoldStandard ):\n",
" \n",
" return ( len( aGoldStandard ) - Positives( aGoldStandard ) )\n",
"\n",
"Negatives( aGoldStandard )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 67,
"text": [
"5"
]
}
],
"prompt_number": 67
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### True positives\n",
"\n",
"A performance evaluation doesn't actually happen until we *compare* our predictions with the gold standard, though. Recall from class that a **true positive** is a prediction that *should* be positive (in the gold standard) and *is* (in our predictions). That is, the element in a particular position in *both* vectors is `True`. Thus the number of true positives shared between a gold standard and some predictions is a similarity measure, not all that different from Manhattan distance as defined above:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def TruePositives( aGoldStandard, aPredictions ):\n",
" \n",
" iReturnSum = 0\n",
" for iIndex in xrange( len( aGoldStandard ) ):\n",
" if aGoldStandard[iIndex] and aPredictions[iIndex]:\n",
" iReturnSum += 1\n",
" \n",
" return iReturnSum\n",
"\n",
"print( aGoldStandard )\n",
"print( aPredictionsBelow05 )\n",
"TruePositives( aGoldStandard, aPredictionsBelow05 )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[True, False, False, True, False, True, False, False]\n",
"[ True True False False True True False False]\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 68,
"text": [
"2"
]
}
],
"prompt_number": 68
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Well, the answer's correct - there are two pairs of elements in which both the gold standard and the predictions contain `True`, the first (index 0) and sixth (index 5). We've only made three changes with respect to Manhattan distance:\n",
"\n",
"1. Our return sum is now guaranteed to be an integer, so we change its name to `iReturnSum`.\n",
"\n",
"2. Rather than changing it for each element of the two vectors, we change it only for indices where both of them contain `True`, i.e. the first `and` the second.\n",
"\n",
"3. We change the sum only by one each time (since a subtraction would be meaningless).\n",
"\n",
"Like any other Python function, we can now reuse this prepackaged bag of instructions for any (appropriate) inputs:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"TruePositives( [True, False, True], [False, True, True] )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 69,
"text": [
"1"
]
}
],
"prompt_number": 69
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Some Python shorthand:\n",
"[True] * 3"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 70,
"text": [
"[True, True, True]"
]
}
],
"prompt_number": 70
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"( [True] * 3 ) + ( [False] * 2 )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 71,
"text": [
"[True, True, True, False, False]"
]
}
],
"prompt_number": 71
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"TruePositives( [True] * 10, ( [True] * 4 ) + ( [False] * 6 ) )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 72,
"text": [
"4"
]
}
],
"prompt_number": 72
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### False positives, true negatives, and false negatives\n",
"\n",
"From here, it's no stretch to define equivalent functions for the three other possible combinations of gold standard and predicted values: **false positives** (that should be `False` but are predicted `True`), **true negatives** (that should be `False` and are), and **false negatives** (that should be `True` but are predicted `False`):"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def FalsePositives( aGoldStandard, aPredictions ):\n",
" \n",
" iReturnSum = 0\n",
" for iIndex in xrange( len( aGoldStandard ) ):\n",
" if ( not aGoldStandard[iIndex] ) and aPredictions[iIndex]:\n",
" iReturnSum += 1\n",
" \n",
" return iReturnSum\n",
"\n",
"print( \"FPs occur in the second (index 1) and fifth (index 4) positions:\" )\n",
"print( aGoldStandard )\n",
"print( aPredictionsBelow05 )\n",
"FalsePositives( aGoldStandard, aPredictionsBelow05 )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"FPs occur in the second (index 1) and fifth (index 4) positions:\n",
"[True, False, False, True, False, True, False, False]\n",
"[ True True False False True True False False]\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 73,
"text": [
"2"
]
}
],
"prompt_number": 73
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def TrueNegatives( aGoldStandard, aPredictions ):\n",
" \n",
" iReturnSum = 0\n",
" for iIndex in xrange( len( aGoldStandard ) ):\n",
" if ( not aGoldStandard[iIndex] ) and ( not aPredictions[iIndex] ):\n",
" iReturnSum += 1\n",
" \n",
" return iReturnSum\n",
"\n",
"print( \"TNs occur in the third (2nd), seventh (6th), and eighth (7th) positions:\" )\n",
"print( aGoldStandard )\n",
"print( aPredictionsBelow05 )\n",
"TrueNegatives( aGoldStandard, aPredictionsBelow05 )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"TNs occur in the third (2nd), seventh (6th), and eighth (7th) positions:\n",
"[True, False, False, True, False, True, False, False]\n",
"[ True True False False True True False False]\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 74,
"text": [
"3"
]
}
],
"prompt_number": 74
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def FalseNegatives( aGoldStandard, aPredictions ):\n",
" \n",
" iReturnSum = 0\n",
" for iIndex in xrange( len( aGoldStandard ) ):\n",
" if aGoldStandard[iIndex] and ( not aPredictions[iIndex] ):\n",
" iReturnSum += 1\n",
" \n",
" return iReturnSum\n",
"\n",
"print( \"FNs occur in only the fourth (3rd) position:\" )\n",
"print( aGoldStandard )\n",
"print( aPredictionsBelow05 )\n",
"FalseNegatives( aGoldStandard, aPredictionsBelow05 )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"FNs occur in only the fourth (3rd) position:\n",
"[True, False, False, True, False, True, False, False]\n",
"[ True True False False True True False False]\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 75,
"text": [
"1"
]
}
],
"prompt_number": 75
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Of course most of these values can be calculated as functions of each other, which would allow us to simplify some of the Python if we so chose:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# The trailing backslash tells Python I'm continuing with code on the following line\n",
"# just for visualization's sake (so it doesn't scroll off to the right)\n",
"Positives( aGoldStandard ) == TruePositives( aGoldStandard, aPredictionsBelow05 ) + \\\n",
" FalseNegatives( aGoldStandard, aPredictionsBelow05 )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 76,
"text": [
"True"
]
}
],
"prompt_number": 76
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"Negatives( aGoldStandard ) == TrueNegatives( aGoldStandard, aPredictionsBelow05 ) + \\\n",
" FalsePositives( aGoldStandard, aPredictionsBelow05 )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 77,
"text": [
"True"
]
}
],
"prompt_number": 77
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Precision and recall\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These basic similarity measures between predictions and a gold standard allow us to define more sophisticated functions that capture intuitive concepts about how \"accurate\" a hypothesis test is. For example, the [**precision**](http://en.wikipedia.org/wiki/Information_retrieval#Precision) (also [**positive predictive value**](http://en.wikipedia.org/wiki/Positive_predictive_value)) measures the fraction of predicted positives that really are - that is, how \"trustworthy\" a positive prediction is:\n",
"\n",
"> $$precision = \\frac{TP}{TP + FP}$$\n",
"\n",
"This is now easy to define in Python as well:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Note that we ***MUST*** convert at least one integer value to a float to ensure that\n",
"# division happens the way we expect it to, i.e. without truncation.\n",
"\n",
"def Precision( aGoldStandard, aPredictions ):\n",
" \n",
" iTP = TruePositives( aGoldStandard, aPredictions )\n",
" return ( float(iTP) / ( iTP + FalsePositives( aGoldStandard, aPredictions ) ) )\n",
"\n",
"print( TruePositives( aGoldStandard, aPredictionsBelow05 ) )\n",
"print( FalsePositives( aGoldStandard, aPredictionsBelow05 ) )\n",
"Precision( aGoldStandard, aPredictionsBelow05 )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2\n",
"2\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 78,
"text": [
"0.5"
]
}
],
"prompt_number": 78
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A complementary measure is the **[recall](http://en.wikipedia.org/wiki/Information_retrieval#Recall)** (also **[true positive rate, TPR](http://en.wikipedia.org/wiki/True_Positive_Rate)** or **[sensitivity](http://en.wikipedia.org/wiki/Sensitivity_(test%29)**), defined as how many of all possible positives are predicted as such:\n",
"\n",
"> $$recall = \\frac{TP}{P} = \\frac{TP}{TP + FN}$$\n",
"\n",
"Where $P$ is the total number of positives in the gold standard (i.e. `Positives` as defined above). It's typically difficult for a test to have both high precision (which requires it to be careful about which tests produce a positive prediction) and high recall (which requires it to produce a positive prediction for everything that should have one). This combination of measures is thus useful in tandem when evaluating a hypothesis test, and again easy to define in Python:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def Recall( aGoldStandard, aPredictions ):\n",
" \n",
" return ( float(TruePositives( aGoldStandard, aPredictions )) / Positives( aGoldStandard ) )\n",
"\n",
"print( TruePositives( aGoldStandard, aPredictionsBelow05 ) )\n",
"print( Positives( aGoldStandard ) )\n",
"Recall( aGoldStandard, aPredictionsBelow05 )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2\n",
"3\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 79,
"text": [
"0.6666666666666666"
]
}
],
"prompt_number": 79
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Gold standard exercises\n",
"\n",
"**1.** This one should be easy - define a Python function for sensitivity:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def Sensitivity( aGoldStandard, aPredictions ):\n",
"\n",
" pass"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 80
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**2.** A little harder, but not much - define a Python function for **[specificity](http://en.wikipedia.org/wiki/Specificity_(tests%29)**, or the **[true negative rate, TNR](http://en.wikipedia.org/wiki/True_Negative_Rate)**:\n",
"\n",
"> $$specificity = \\frac{TN}{N} = \\frac{TN}{TN + FP}$$\n",
"\n",
"As before, $N$ is the total number of negatives in the gold standard (i.e. `Negatives`)."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def Specificity( aGoldStandard, aPredictions ):\n",
"\n",
" pass"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 81
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# This should return True\n",
"Specificity( aGoldStandard, aPredictionsBelow05 ) == 0.6"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 82,
"text": [
"False"
]
}
],
"prompt_number": 82
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Precision/recall and ROC plots\n",
"\n",
"Finally, these definitions of performance measures for *particular* critical values allow us to visualize performance by plotting them over a *range* of critical values. A **[ROC (Receiver Operating Characteristic)](http://en.wikipedia.org/wiki/Receiver_operating_characteristic)** plot is one of the most common visualizations of the performance of a hypothesis test. It places the false positive rate (equal to one minus specificity) on the X axis as measure of how \"good\" predictions are, and the true positive rate (sensitivity) on the Y axis as a measure of \"thorough\" they are. Thus for any one critical value, a ROC is simply a single point:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Ok, so if you read ahead you can cheat on the exercises above...\n",
"Sensitivity = Recall\n",
"\n",
"def Specificity( aGoldStandard, aPredictions ):\n",
" \n",
" return ( TrueNegatives( aGoldStandard, aPredictions ) / float(Negatives( aGoldStandard )) )\n",
"\n",
"scatter( [1 - Specificity( aGoldStandard, aPredictionsBelow05 )],\n",
" [Sensitivity( aGoldStandard, aPredictionsBelow05 )] )\n",
"xlabel( \"1 - Specificity (FPR)\" )\n",
"ylabel( \"Sensitivity (TPR)\" )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 83,
"text": [
"<matplotlib.text.Text at 0x111d92ed0>"
]
},
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAZwAAAEMCAYAAADwJwB6AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XtcVWW+x/EPCCaW6ajEGHsnCigbUNzAdrS84JSXLuIk\nTGkOmaDj8WThdDlWM72y5pxG59SUl5qsMeymw6iVnrGBExSjnBnBxHRmKkFFIzLzEojScNk+5w+P\n68goIcZeKPN9v168Xuy11+X5uXR/fZ717LX8jDEGERERH/Nv7waIiMg/BwWOiIjYQoEjIiK2UOCI\niIgtFDgiImILBY6IiNjCp4GTk5NDVFQUkZGRLFq06JzrFBQU4Ha7iY2NJSkpyVpeVVVFamoqLpeL\n6OhotmzZAsCOHTsYPnw4gwcPJjk5mZqaGl+WICIibcTPV9/D8Xq9DBw4kLy8PEJDQ/F4PKxevRqX\ny2WtU1VVxXXXXUdubi4Oh4PDhw/Tu3dvAKZPn87o0aNJT0+nsbGREydO0L17dzweD7/61a8YOXIk\nWVlZlJeX88QTT/iiBBERaUM+6+EUFxcTERFBWFgYgYGBTJkyhfXr1zdZZ9WqVaSkpOBwOACssKmu\nrmbz5s2kp6cDEBAQQPfu3QEoKytj5MiRANxwww2sW7fOVyWIiEgbCvDVjisrK3E6ndZrh8NBUVFR\nk3XKyspoaGhgzJgx1NTUkJmZSVpaGuXl5QQHBzNjxgx27NhBQkICixcvpmvXrsTExLB+/XomTZrE\nmjVrqKioOOvYfn5+vipLRKRD8+XNZ3zWwzmfD/2GhgZKSkp45513yM3N5ec//zllZWU0NjZSUlLC\nv/7rv1JSUsLll1/OwoULAXj55Zd5/vnnSUxM5Pjx43Tu3Pmc+zbGdNifxx57rN3boNpUn+rreD++\n5rMeTmhoaJPeR0VFhTV0dprT6aR3794EBQURFBTEqFGj2LlzJyNGjMDhcODxeABITU21AmfgwIHk\n5uYCUFpaysaNG31VgoiItCGf9XASExMpKytj37591NfXk52dTXJycpN1Jk2aRGFhIV6vl9raWoqK\ninC5XISEhOB0OiktLQUgLy+PmJgYAA4dOgTAyZMn+fd//3fmzJnjqxJERKQN+ayHExAQwLJlyxg/\nfjxer5eMjAxcLhfLly8HYPbs2URFRTFhwgQGDx6Mv78/s2bNIjo6GoClS5cybdo06uvrCQ8PJysr\nC4DVq1fz3HPPAZCSksJdd93lqxIuWmdOH+9oOnJtoPoudR29Pl/z2bTo9uTn52fLeKSISEfi689O\n3WlARERsocARERFbKHBERMQWChwREbGFAkdERGyhwBEREVsocERExBYKHBERsYUCR0REbKHAERER\nWyhwRETEFgocERGxhQJHRERsocARERFbKHBERMQWChwREbGFAkdERGyhwBEREVsocERExBYKHBER\nsYUCR0REbKHAERERWyhwRETEFgocERGxhQJHRERsocARERFbKHBERMQWChwREbGFAkdERGyhwBER\nEVsocERExBYKHBERsYUCR0REbKHAERERWyhwRETEFj4NnJycHKKiooiMjGTRokXnXKegoAC3201s\nbCxJSUnW8qqqKlJTU3G5XERHR7NlyxYAiouLGTp0KG63G4/Hw9atW31ZgoiItBXjI42NjSY8PNyU\nl5eb+vp6ExcXZz766KMm63z11VcmOjraVFRUGGOMOXTokPXenXfeaVasWGGMMaahocFUVVUZY4wZ\nPXq0ycnJMcYY884775ikpKSzju3DskREOixff3b6rIdTXFxMREQEYWFhBAYGMmXKFNavX99knVWr\nVpGSkoLD4QCgd+/eAFRXV7N582bS09MBCAgIoHv37gD06dOH6upq4FQvKDQ01FcliIhIGwrw1Y4r\nKytxOp3Wa4fDQVFRUZN1ysrKaGhoYMyYMdTU1JCZmUlaWhrl5eUEBwczY8YMduzYQUJCAosXL6Zr\n164sXLiQESNG8MADD3Dy5En+/Oc/n/P4CxYssH5PSkpqMlwnIiKnLmkUFBTYdjyfBY6fn1+L6zQ0\nNFBSUkJ+fj61tbUMHz6cYcOG0djYSElJCcuWLcPj8TBv3jwWLlzIE088QUZGBkuWLOHWW29lzZo1\npKen8+6775617zMDR0REzvaP/xl//PHHfXo8nw2phYaGUlFRYb2uqKiwhs5OczqdjBs3jqCgIHr1\n6sWoUaPYuXMnTqcTh8OBx+MBICUlhZKSEuDUUN2tt94KQGpqKsXFxb4qQURE2pDPAicxMZGysjL2\n7dtHfX092dnZJCcnN1ln0qRJFBYW4vV6qa2tpaioCJfLRUhICE6nk9LSUgDy8/OJiYkBICIigj/+\n8Y8AvPfeewwYMMBXJYiISBvy2ZBaQEAAy5YtY/z48Xi9XjIyMnC5XCxfvhyA2bNnExUVxYQJExg8\neDD+/v7MmjWL6OhoAJYuXcq0adOor68nPDycrKwsAF588UXuvvtu6urqCAoK4sUXX/RVCSIi0ob8\n/m8qXIfi5+dHByxLRMSnfP3ZqTsNiIiILRQ4IiJiCwWOiIjYQoEjIiK2UOCIiIgtFDgiImILBY6I\niNhCgSMiIrZQ4IiIiC0UOCIiYgsFjoiI2EKBIyIitlDgiIiILRQ4IiJiCwWOiIjYQoEjIiK2UOCI\niIgtFDgiImILBY6IiNhCgSMiIrZQ4IiIiC0UOCIiYovzDpy///3v1NXV+bItIiLSgTUbOCdPnuTN\nN9/khz/8IaGhofTr14++ffsSGhpKamoqb731FsYYO9sqIiKXMD/TTGqMGjWKkSNHkpyczJAhQ7js\nsssAqKurY/v27WzYsIHCwkI2bdpka4PPh5+fn8JQRKSVfP3Z2Wzg1NXVWSHTnPNZpz0ocEREWs/X\nn53NDqk1FyTHjx9n0aJF37iOiIjIP2o2cD7//HPuuecebrrpJv7t3/6N48eP88wzzxAVFUVlZaWd\nbRQRkQ4goLk37rzzTkaMGMHNN99MTk4OsbGxDBs2jA8++IDvfve7drZRREQ6gGav4QwZMoQPP/zQ\neu1wONi/fz+dOnWyrXEXStdwRERaz9efnc32cE6ePMnRo0cBMMbQs2dPqqurrfd79uzps0aJiEjH\n02wPJywsDD8/v3Nv5OfH3r17fdqwb0M9HBGR1mu3adGXMgWOiEjrtdu06IMHD5KZmcnNN9/Mww8/\nzLFjx3zWCBER6fiaDZw777yTK664gnvuuYeamhruvfdeO9slIiIdTLNDanFxcezYscN67Xa72b59\nu20N+zY0pCYi0nrtNqRmjOHo0aMcPXqUI0eO4PV6rdenZ6+1JCcnh6ioKCIjI627E/yjgoIC3G43\nsbGxJCUlWcurqqpITU3F5XIRHR1NUVERALfffjtutxu3202/fv1wu92tKFdERNqLz2apeb1eBg4c\nSF5eHqGhoXg8HlavXo3L5bLWqaqq4rrrriM3NxeHw8Hhw4fp3bs3ANOnT2f06NGkp6fT2NjIiRMn\n6N69e5NjPPDAA/To0YOf/exnZ7VPPRwRkdZpt+/h/PGPf6Rv374XvOPi4mIiIiIICwsDYMqUKaxf\nv75J4KxatYqUlBQcDgeAFTbV1dVs3ryZV1555VQjAwLOChtjDL/73e94//33L7iNIiJin2YD59Zb\nb6WkpOSCd1xZWYnT6bReOxwOa1jstLKyMhoaGhgzZgw1NTVkZmaSlpZGeXk5wcHBzJgxgx07dpCQ\nkMDixYvp2rWrte3mzZsJCQkhPDz8nMdfsGCB9XtSUlKT4ToRETl1SaOgoMC24zUbON+2W9XccNyZ\nGhoaKCkpIT8/n9raWoYPH86wYcNobGykpKSEZcuW4fF4mDdvHgsXLuSJJ56wtl29ejV33HFHs/s+\nM3BERORs//if8ccff9ynx2s2cCorK7n33nvPGTx+fn4sWbLkG3ccGhpKRUWF9bqiosIaOjvN6XTS\nu3dvgoKCCAoKYtSoUezcuZMRI0bgcDjweDwApKamsnDhQmu7xsZG3nrrrW/VAxMREXs1GzhBQUEk\nJCRgjGnSW/nH181JTEykrKyMffv2cfXVV5Odnc3q1aubrDNp0iTmzp2L1+ulrq6OoqIi7rvvPkJC\nQnA6nZSWljJgwADy8vKIiYmxtsvLy8PlcnH11VdfSM0iItIOmg2cnj17Mn369AvfcUAAy5YtY/z4\n8Xi9XjIyMnC5XCxfvhyA2bNnExUVxYQJExg8eDD+/v7MmjWL6OhoAJYuXcq0adOor68nPDycrKws\na9/Z2dlMnTr1gtsmIiL2a3Za9LBhw9iyZYvd7WkTmhYtItJ67fbFzzfeeKPFjffs2dOmjRERkY6r\n2R7OlClTOH78OMnJySQmJtKnTx+MMRw4cIAPPviADRs20K1bN37729/a3eYWqYcjItJ67fp4gt27\nd/Pb3/6W//mf/2H//v0A9O3blxEjRjB16lT69+/vs4Z9GwocEZHW0/NwLoACR0Sk9drtGo6IiEhb\nUuCIiIgtFDgiImKLFgNn8uTJbNy4kZMnT9rRHpEO65lnlvCd71zN5Zf3ZNase2hoaGjvJonYqsXA\nmTNnDm+88QYRERE89NBD7Nq1y452iXQo69at42c/W0pV1bvU1v6VN974hIceeqy9myViqxYDZ+zY\nsaxatYqSkhLCwsK4/vrrufbaa8nKytL/0ETO09tv51Jb+xMgBriar7/+d9avz23vZonY6ryu4Rw5\ncoSVK1fym9/8hvj4eO699162bdvG2LFjfd0+kQ7hqqt6EhBw5ujALnr37tlu7RFpDy1+D+fWW2/l\nk08+IS0tjRkzZtCnTx/rvYSEBLZt2+bzRraWvocjF5svvviCIUOGU109nJMnv0Ng4O/Iz/893/ve\n99q7aSKWdv/i5zvvvMNNN93UZFldXR2XXXaZzxr1bSlw5GJ0+PBhsrOzqaurY+LEiURGRrZ3k0Sa\naPfAcbvdbN++vcmy+Pj4i/rhZwocEZHW8/VnZ7PPwzlw4ACff/45X3/9NSUlJdaD144dO0Ztba3P\nGiQiIh1Ts4GTm5vLK6+8QmVlJffff7+1vFu3bjz55JO2NE5ERDqOFofU1q1bR0pKil3taRMaUhMR\nab12u4bz2muvkZaWxtNPP42fn5+1/PTQ2n333eezRn1bChwRkdZrt2s4p6/T1NTUNAkcERGRC9Hi\nkNqXX37JVVddZVd72oR6OCIirdfuz8MZMWIE48aNY8WKFXz11Vc+a4iIiHRsLQZOaWkpP//5z/nr\nX/9KQkICt9xyC6+99podbRMRkQ6kVY+YPnz4MD/5yU944403LurHFWhITUSk9dp9SK26upqVK1dy\n4403Mnz4cPr06cPWrVt91iAREemYWuzh9OvXj0mTJnH77bczbNiwS2LGmno4IiKt1+73Ujv9vZtL\niQJHRKT12u17OJmZmSxevJjk5ORzNmrDhg0+a5SIiHQ8zQbOnXfeCdDkPmqnXWo9HhERaX/NBk5C\nQgIAH374IfPmzWvy3rPPPsvo0aN92zIREelQWpyl9sorr5y1bOXKlb5oi4iIdGDN9nBWr17NqlWr\nKC8vZ+LEidbympoaevXqZUvjRESk42g2cK699lr69OnDoUOHeOCBB6yZC926dSMuLs62BoqISMfQ\nqjsNXCo0LVpEpPXa7U4D1113HQBXXHEF3bp1a/Jz5ZVX+qxBIiLSMamHIyIiwEVwL7U9e/bw97//\nHYD333+fJUuWUFVV5bMGiYhIx9Ri4EyePJmAgAB2797N7Nmzqaio4I477jivnefk5BAVFUVkZCSL\nFi065zoFBQW43W5iY2NJSkqylldVVZGamorL5SI6OpotW7ZY7y1duhSXy0VsbCzz588/r7aIiEj7\nanaW2mn+/v4EBATw5ptvcs8993DPPffgdrtb3LHX62Xu3Lnk5eURGhqKx+MhOTkZl8tlrVNVVcXd\nd99Nbm4uDoeDw4cPW+9lZmZy0003sXbtWhobGzlx4gRwqpe1YcMGdu7cSWBgIIcOHbqQukVExGYt\n9nA6d+7MqlWrePXVV7nlllsAaGhoaHHHxcXFREREEBYWRmBgIFOmTGH9+vVN1lm1ahUpKSk4HA4A\nevfuDZx6JMLmzZtJT08HICAggO7duwPw61//mocffpjAwEAAgoODz7dWERFpRy32cF5++WVeeOEF\nfvrTn9KvXz/27t3Lj370oxZ3XFlZidPptF47HA6KioqarFNWVkZDQwNjxoyhpqaGzMxM0tLSKC8v\nJzg4mBkzZrBjxw4SEhJYvHgxXbt2paysjE2bNvHII4/QpUsXnnrqKRITE886/oIFC6zfk5KSmgzX\niYjIqUsaBQUFth3PZ7PU1q1bR05ODi+99BIAr7/+OkVFRSxdutRaZ+7cuZSUlJCfn09tbS3Dhw9n\n48aNVFdXM3z4cP70pz/h8XiYN28eV155JU888QSDBg3i+9//PosXL2br1q3cfvvt7N27t2lRmqUm\nItJq7T5LrbCwkLFjxxIZGUm/fv3o168f/fv3b3HHoaGhVFRUWK8rKiqsobPTnE4n48aNIygoiF69\nejFq1Ch27tyJ0+nE4XDg8XgASElJoaSkBDjVU5o8eTIAHo8Hf39/jhw5cv4Vi4hIu2gxcDIyMrjv\nvvsoLCxk69atbN26leLi4hZ3nJiYSFlZGfv27aO+vp7s7Oyznq0zadIkCgsL8Xq91NbWUlRUhMvl\nIiQkBKfTSWlpKQD5+fnExMQA8IMf/ID33nsPgNLSUurr63VvNxGRS0CL13B69OjBjTfe2PodBwSw\nbNkyxo8fj9frJSMjA5fLxfLlywGYPXs2UVFRTJgwgcGDB+Pv78+sWbOIjo4GTk19njZtGvX19YSH\nh5OVlQVAeno66enpDBo0iM6dO/Pqq6+2um0iImK/Fq/hPPTQQ3i9XiZPnsxll11mLY+Pj/d54y6U\nruGIiLSerz87WwycpKSkcz7h8/333/dZo74tBY6ISOu1e+BcihQ4IiKt1+6z1L744gsyMjKYMGEC\nAB999BErVqzwWYNERKRjajFw7rrrLsaNG8fnn38OQGRkJM8884zPGyYiIh1Li4Fz+PBhbr/9djp1\n6gRAYGAgAQEtTm4TERFposXAueKKK5p8sXLLli3Wfc1ERETOV4tdlaeffpqJEyeyd+9err32Wg4d\nOsTatWvtaJuIiHQg5zVLraGhgV27dgEwcOBA607NFyvNUhMRab12m6VWXFzMgQMHgFPXbbZt28Yj\njzzC/fffz9GjR33WIBER6ZiaDZzZs2dbdxbYtGkTDz30ENOnT+fKK6/kxz/+sW0NFBGRjqHZazgn\nT56kZ8+eAGRnZzN79mxSUlJISUkhLi7OtgaKiEjH0GwPx+v1Wk/2zMvLY8yYMdZ7jY2Nvm+ZiIh0\nKM32cKZOncro0aPp3bs3Xbt2ZeTIkcCpp3T26NHDtgaKiEjH8I2z1P785z/zxRdfMG7cOC6//HLg\n1DNojh8/rrtFi4h0MLp55wVQ4IiItF6737xTRESkLShwRETEFgocERGxhQJHRERsocARERFbKHBE\nRMQWChwREbGFAkdERGyhwBEREVsocERExBYKHBERsYUCR0REbKHAERERWyhwRETEFgocERGxhQJH\nRERsocARERFbKHBERMQWChwREbGFAkdERGzh08DJyckhKiqKyMhIFi1adM51CgoKcLvdxMbGkpSU\nZC2vqqoiNTUVl8tFdHQ0RUVFACxYsACHw4Hb7cbtdpOTk+PLEkREpI34GWOML3bs9XoZOHAgeXl5\nhIaG4vF4WL16NS6Xy1qnqqqK6667jtzcXBwOB4cPH6Z3794ATJ8+ndGjR5Oenk5jYyMnTpyge/fu\nPP7443Tr1o377ruv+aL8/PBRWSIiHZavPzt91sMpLi4mIiKCsLAwAgMDmTJlCuvXr2+yzqpVq0hJ\nScHhcABYYVNdXc3mzZtJT08HICAggO7du1vbKUxERC49Ab7acWVlJU6n03rtcDisYbHTysrKaGho\nYMyYMdTU1JCZmUlaWhrl5eUEBwczY8YMduzYQUJCAosXL6Zr164ALF26lFdffZXExESefvppevTo\ncdbxFyxYYP2elJTUZLhOREROXdIoKCiw7Xg+G1Jbt24dOTk5vPTSSwC8/vrrFBUVsXTpUmuduXPn\nUlJSQn5+PrW1tQwfPpyNGzdSXV3N8OHD+dOf/oTH42HevHlceeWVPPHEE3z55ZcEBwcD8Oijj3Lg\nwAFWrFjRtCgNqYmItNolO6QWGhpKRUWF9bqiosIaOjvN6XQybtw4goKC6NWrF6NGjWLnzp04nU4c\nDgcejweA1NRUSkpKALjqqqvw8/PDz8+PmTNnUlxc7KsSRESkDfkscBITEykrK2Pfvn3U19eTnZ1N\ncnJyk3UmTZpEYWEhXq+X2tpaioqKcLlchISE4HQ6KS0tBSAvL4+YmBgADhw4YG3/1ltvMWjQIF+V\nICIibchn13ACAgJYtmwZ48ePx+v1kpGRgcvlYvny5QDMnj2bqKgoJkyYwODBg/H392fWrFlER0cD\np67TTJs2jfr6esLDw8nKygJg/vz5fPjhh/j5+dGvXz9rfyIicnHz2TWc9qRrOCIirXfJXsMRERE5\nkwJHRERsocARERFbKHBERMQWChwREbGFAkdERGyhwBEREVsocERExBYKHBERsYUCR0REbKHAERER\nWyhwRETEFgocERGxhQJHRERsocARERFbKHBERMQWChwREbGFAkdERGyhwBEREVsocERExBYKHBER\nsYUCR0REbKHAERERWyhwRETEFgocERGxhQJHRERsocARERFbKHBERMQWChwREbGFAkdERGyhwBER\nEVsocERExBYKHBERsYUCR0REbKHAERERWyhwLkEFBQXt3QSf6ci1geq71HX0+nzNp4GTk5NDVFQU\nkZGRLFq06JzrFBQU4Ha7iY2NJSkpyVpeVVVFamoqLpeL6OhotmzZ0mS7p59+Gn9/f44ePerLEi5K\nHfkvfUeuDVTfpa6j1+drAb7asdfrZe7cueTl5REaGorH4yE5ORmXy2WtU1VVxd13301ubi4Oh4PD\nhw9b72VmZnLTTTexdu1aGhsbOXHihPVeRUUF7777Ln379vVV80VEpI35rIdTXFxMREQEYWFhBAYG\nMmXKFNavX99knVWrVpGSkoLD4QCgd+/eAFRXV7N582bS09MBCAgIoHv37tZ29913H7/85S991XQR\nEfEF4yNr1qwxM2fOtF6/9tprZu7cuU3WmTdvnrn77rtNUlKSSUhIMK+++qoxxpjt27eboUOHmrvu\nusu43W4zc+ZMc+LECWOMMW+//baZN2+eMcaYsLAwc+TIkbOODehHP/rRj34u4MeXfDak5ufn1+I6\nDQ0NlJSUkJ+fT21tLcOHD2fYsGE0NjZSUlLCsmXL8Hg8zJs3j4ULF/Lwww/z5JNP8u6771r7OJUv\nTZ1rmYiItC+fDamFhoZSUVFhva6oqLCGzk5zOp2MGzeOoKAgevXqxahRo9i5cydOpxOHw4HH4wEg\nNTWVkpIS9uzZw759+4iLi6Nfv3589tlnJCQk8OWXX/qqDBERaSM+C5zExETKysrYt28f9fX1ZGdn\nk5yc3GSdSZMmUVhYiNfrpba2lqKiIlwuFyEhITidTkpLSwHIy8sjJiaG2NhYDh48SHl5OeXl5Tgc\nDkpKSrjqqqt8VYaIiLQRnw2pBQQEsGzZMsaPH4/X6yUjIwOXy8Xy5csBmD17NlFRUUyYMIHBgwfj\n7+/PrFmziI6OBmDp0qVMmzaN+vp6wsPDycrKOusY5zNsJyIiFwmfXiG6QH/4wx/MwIEDTUREhFm4\ncOFZ77/99ttm8ODBZsiQISY+Pt7k5+db7z377LMmNjbWxMTEmGeffdZaXlRUZDwejxkyZIhJTEw0\nxcXFxhhjysvLTZcuXcyQIUPMkCFDzJw5cy7J+j788EMzbNgwM2jQIDNx4kRz7Ngx670nn3zSRERE\nmIEDB5rc3FzfFmfsre9iPH+nFRcXm06dOpm1a9e2uO2RI0fMDTfcYCIjI83YsWPNV199Zb13sZ2/\n09qiPrvPny9q+93vfmeio6ONv7+/2bZtW5P9dIRz11x9F3LuLrrAaWxsNOHh4aa8vNzU19ebuLg4\n89FHHzVZ5/jx49bvO3fuNOHh4cYYY/7yl7+Y2NhY8/XXX5vGxkZzww03mN27dxtjjBk9erTJyckx\nxhjzzjvvmKSkJGPMqT+02NhYO0ozxviuvsTERLNp0yZjjDEvv/yyefTRR40xxvztb38zcXFxpr6+\n3pSXl5vw8HDj9Xo7TH0X4/k7vd6YMWPMzTffbP2j/qZtH3zwQbNo0SJjjDELFy408+fPN8ZcnOev\nLeuz8/z5qraPP/7Y7Nq1yyQlJTX5QO4o5665+i7k3F10t7Y5n+/vXH755dbvx48ft76/8/HHH/O9\n732PLl260KlTJ0aPHs2bb74JQJ8+faiurgZOfeE0NDTUpoqa8lV9ZWVljBw5EoAbbriBdevWAbB+\n/XqmTp1KYGAgYWFhREREUFxc3GHqs9v51AenhoRTU1MJDg4+r203bNjA9OnTAZg+fTpvv/02cHGe\nv7asz06+qi0qKooBAwactZ+Ocu6aq+9CXHSBU1lZidPptF47HA4qKyvPWu/tt9/G5XJx4403smTJ\nEgAGDRrE5s2bOXr0KLW1tWzcuJHPPvsMgIULF3L//fdzzTXX8OCDD/KLX/zC2ld5eTlut5ukpCQK\nCwsvyfpiYmKsvyBr1qyxZgh+/vnnTWYHNne8S7U+uPjOX2VlJevXr2fOnDnA/19r/KZtDx48SEhI\nCAAhISEcPHgQuDjPX1vWB/adP1/V1pyOcu6+SWvP3UUXOOc7EeAHP/gBH3/8Mf/1X/9FWloacCqJ\n58+fz7hx47jxxhtxu9106tQJgIyMDJYsWcKnn37KM888Y93F4Oqrr6aiooLt27fzq1/9ijvuuIOa\nmhrfFEfb1+fvf+oUvvzyyzz//PMkJiZy/PhxOnfu/K3bcCHsru9iPH+nvzfm5+eHOTVsfc5tjTHn\n3J+fn9/zdEFIAAAIc0lEQVQ3Hqe9z19b1mfn+WvL2nzZBl/uuy3ru5Bz57NZahfqfL6/c6aRI0fS\n2NjIkSNH6NWrF+np6VaYPPLII1xzzTXAqS5jXl4ecOp7PTNnzgSgc+fO1odXfHw84eHhlJWVER8f\nf0nVN3DgQHJzcwEoLS1l48aN5zzeZ5995tPhRLvruxjP37Zt25gyZQoAhw8f5g9/+AOBgYHfeC5C\nQkL44osv+O53v8uBAwesqf4X4/lry/rsPH9tWVtLf6/PdbxL6dydT30XdO5afWXKxxoaGkz//v1N\neXm5qaurO+eFr927d5uTJ08aY4zZtm2b6d+/v/XewYMHjTHG7N+/30RFRZnq6mpjjDFut9sUFBQY\nY4zJy8sziYmJxhhjDh06ZBobG40xxuzZs8eEhoY2mSF0qdT35ZdfGmOM8Xq9Ji0tzWRlZRlj/v/C\nZV1dndm7d6/p37+/te+OUN/FeP7OdNddd5l169a1uO2DDz5ozQz6xS9+cdakgYvp/LVlfXaeP1/V\ndlpSUpL54IMPrNcd5dyd9o/1Xci5u+gCx5hTs8gGDBhgwsPDzZNPPmmMMeaFF14wL7zwgjHGmEWL\nFpmYmBgzZMgQM2LECGuKszHGjBw50kRHR5u4uDjz3nvvWcu3bt1qhg4dauLi4sywYcNMSUmJMcaY\ndevWWfuKj483v//97y/J+hYvXmwGDBhgBgwYYB5++OEmx/uP//gPEx4ebgYOHGjN1Oso9V2M5+9M\nZ/6jbm5bY05NG77++uvPOS36Yjt/Z/q29dl9/nxR25tvvmkcDofp0qWLCQkJMRMmTLDe6wjnrrn6\n1q5d2+pz52eMbjwmIiK+d9FNGhARkY5JgSMiIrZQ4IiIiC0UOCIiYgsFjlxy0tPTCQkJYdCgQRe0\n/e9//3vi4+MZMmQIMTExvPjii23avscee4z8/HwANm/eTExMDPHx8Xz++ef88Ic//MZtZ82axSef\nfALAk08+2epj19XVMXr0aIwx7Nu3j6CgINxuN263m/j4eBoaGli5ciXBwcG43W5iYmL4zW9+A9Bk\neXR0NM8//7y13yVLlvDaa6+1uj0iTXyrOXgi7WDTpk2mpKTkgm76WF9fb66++mpTWVlpvd61a1db\nN9Eye/Zs8/rrr1/QtldccUWrt1mxYoX55S9/aYxp/uaKK1euNPfcc48x5tT3m4KDg83BgwebLD9y\n5Ii56qqrrO9FHTt2zHg8nguqQ+Q09XDkkjNy5Ei+853vXNC2NTU1NDY20rNnTwACAwOtGxPedddd\n/Mu//Asej4eBAwdadzPwer08+OCDDB06lLi4uCY9okWLFjF48GCGDBnCI488Yu1n3bp1rFixgjVr\n1vDoo4+SlpbG/v37iY2Ntfb5wAMPMGjQIOLi4njuuecASEpKYtu2bTz00EN8/fXXuN1ufvSjH/HY\nY4+xePFi67g//elPrXvQnWn16tVMmjSpxT8H83/fhggODiY8PJz9+/c3Wd6zZ0/69+9vLe/WrRu9\nevXib3/72/n+UYuc5aK7tY2IL/Xs2ZPk5GT69u3L9ddfzy233MLUqVOt+3t9+umnbN26ld27dzNm\nzBh2797NK6+8Qo8ePSguLqauro4RI0Ywbtw4Pv74YzZs2EBxcTFdunShqqoK+P97hWVkZFBYWMjE\niROZPHky+/bts+5Z9eKLL/Lpp5+yY8cO/P39+eqrr5psu3DhQp577jm2b98OwP79+5k8eTKZmZmc\nPHmS7Oxstm7d2qQ2r9fLX//61yZ39t2zZw9utxuAESNGsHTpUitUAPbu3cvevXuJjIxsEib79+9n\n7969hIeHW8uGDh3Kpk2biImJactTIv9EFDjyT+ell14iMzOTvLw8nnrqKd59913ribK33XYbABER\nEfTv359PPvmE//7v/+Yvf/kLa9euBeDYsWOUlZWRn59Peno6Xbp0AaBHjx7nPJ45x3er8/PzmTNn\njnVz0pZ6bH379qVXr158+OGHfPHFF8THx5+1zeHDh+nWrVuTZeHh4VZonSk7O5vCwkIuu+wyXnzx\nRavt2dnZbNq0iU8++YSnnnrK6gnCqZs17t279xvbKfJNFDjS4Xi9XhITEwGYNGkSCxYsOGud2NhY\nYmNjSUtLo1+/fud8hDn8/110ly1bxtixY5u8l5ube84wOV+t3XbmzJlkZWVx8OBB6wanF7JPPz8/\npkyZctaQ3JnLt23bxm233caMGTO44oorrH3rse7ybegajnQ4nTp1Yvv27Wzfvv2ssDlx4gQFBQXW\n6+3btxMWFgac+kBds2YNxhj27NnD3r17iYqKYvz48Tz//PM0NjYCp+5WXVtby9ixY8nKyuLrr78G\nsIbFzsfYsWNZvnw5Xq+32W0DAwOtYwLceuut5OTk8MEHHzB+/Piz1u/duzfHjx9v8djmjNvSN7c8\nISGBiRMnNgmlAwcOWH9WIhdCgSOXnKlTp3LttddSWlqK0+lstndyLsYY/vM//5OoqCjcbjePP/44\nK1euBE79D/+aa65h6NCh3HTTTSxfvpzOnTszc+ZMoqOjiY+PZ9CgQcyZMwev18v48eNJTk4mMTER\nt9vN008/fc5jntkrOP37zJkzueaaa6wJB6tXrz5rux//+McMHjzYel5QYGAg3//+97ntttvO2dPo\n1KkTsbGx7Nq165zHPnPZ+SyfP38+v/71r6mtrQVOPeLj9FNXRS6Ebt4p8n9mzJhhXeC/GJ08eZKE\nhATWrl3b5GL+mVauXMnBgweZP39+mx772LFjXH/99WdNVBBpDfVwRC4BH330EZGRkdxwww3Nhg3A\nHXfcwcaNG7/VtaVzWblyJZmZmW26T/nnox6OiIjYQj0cERGxhQJHRERsocARERFbKHBERMQWChwR\nEbGFAkdERGzxv/thCMAz5IIDAAAAAElFTkSuQmCC\n",
"text": [
"<matplotlib.figure.Figure at 0x111d46c50>"
]
}
],
"prompt_number": 83
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But what if we're not sure about the right critical value, or we want to examine test performance over a range of possible critical values? We can easily plot the same single points for multiple possible critical values - the gold standard doesn't change, but exactly which predictions are considered positive or not does each time:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"aPredictionsBelow025 = [( d < 0.025 ) for d in aPredictionsPValues]\n",
"aPredictionsBelow7 = [( d < 0.7 ) for d in aPredictionsPValues]\n",
"\n",
"aFPRs = [\n",
" 1 - Specificity( aGoldStandard, aPredictionsBelow025 ),\n",
" 1 - Specificity( aGoldStandard, aPredictionsBelow05 ),\n",
" 1 - Specificity( aGoldStandard, aPredictionsBelow7 )]\n",
"aTPRs = [\n",
" Sensitivity( aGoldStandard, aPredictionsBelow025 ),\n",
" Sensitivity( aGoldStandard, aPredictionsBelow05 ),\n",
" Sensitivity( aGoldStandard, aPredictionsBelow7 )]\n",
"scatter( aFPRs, aTPRs )\n",
"xlabel( \"1 - Specificity (FPR)\" )\n",
"ylabel( \"Sensitivity (TPR)\" )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 84,
"text": [
"<matplotlib.text.Text at 0x111e8c710>"
]
},
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAYYAAAEICAYAAABbOlNNAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHq1JREFUeJzt3XlYlXX+//HXUU6GSzqINQrkBgGKsoi7JuYoLkmGWdgV\n5RrjXLk0NaOTv7nUqWvStG+5VKNlUrlOWmpT4iQj0WKiSDblLoqESuGSkoVwuH9/mGe6Q+ScA4cD\n+Hxcl9fFfZ/7fO73h4P363zu1WIYhiEAAH5Wz9MFAABqFoIBAGBCMAAATAgGAIAJwQAAMPHydAEV\nsVgsni4BAGolV086rRUjBsMw6uy/WbNmebwG+kff6F/d+1cZtSIYAADVh2AAAJgQDB4WExPj6RLc\nqi73ry73TaJ/NzKLUdmdUW5msVgqvb8MAG40ldl2MmIAAJgQDAAAE4IBAGBCMAAATAgGAIAJwQAA\nMCEYAAAmBAMAwIRgAACYEAwAABOCAQBgQjAAAEwIBgCACcEAADAhGAAAJgQDAMDEbcEwbtw43Xbb\nberUqVO5y0yZMkVBQUEKDw9XVlaWu0oBAJOLFy8qJSVFH374oX766SdPl1PjuC0Yxo4dq5SUlHJf\n/+CDD3TkyBEdPnxYy5Yt06RJk9xVCgDY5eXlKSQkSg888KxGjvx/iojore+//97TZdUoXu5quG/f\nvjp+/Hi5r2/evFmPPPKIJKl79+46f/688vPzddttt5VZdvbs2fafY2JieFYrAJdNmTJD+fkJstme\nlmTo+PGJmjPnWf3f/831dGmVkpaWprS0tCppy23BUJG8vDwFBATYp/39/fXNN99UGAwAUBlHjuTI\nZhv385RFRUUxOnTofY/WVBV+/aV5zpw5Lrfl0YPPv35QtcVi8VAlAG4UvXtH6+abl0oqlnRJDRsm\nq2/fLp4uq0bxWDD4+fkpNzfXPv3NN9/Iz8/PU+UAuEHMn/+0evYs1E03tZDVepuGDfPTE09M83RZ\nNYrHdiXFxcVpyZIlSkhI0Oeff65mzZpdczcSAFSlRo0aKTX1PRUUFKh+/fry8fHxdEk1jtuCYfTo\n0froo49UUFCggIAAzZkzR8XFxZKkpKQkDR06VB988IECAwPVqFEjrVixwl2lAICJxWJRixYtPF1G\njWUxfr2jv4axWCxljkUAAK6vMttOrnwGAJgQDAAAE4IBAGBCMAAATAgGAIAJwQAAMCEYAAAmBAMA\nwIRgAACYEAwAABOCAQBgQjAAAEwIBgCACcEAADAhGAAAJgQDAMCEYAAAmBAMAAATggEAYEIwAABM\nCAYAgAnBAAAwIRgAACYEAwDAhGAAAJgQDAAAE4IBAGBCMAAATAgGAIAJwQAAMCEYAAAmBAMAwIRg\nAACYEAwAABOCAQBgQjAAAEwIBgCACcEAADAhGAAAJgQDAMDErcGQkpKikJAQBQUFad68eWVeLygo\n0ODBgxUREaGwsDAlJye7sxwAgAMshmEYjiz4008/yWKxqEGDBg41bLPZFBwcrG3btsnPz09du3bV\nmjVrFBoaal9m9uzZKioq0rPPPquCggIFBwcrPz9fXl5e/yvQYpGDJQIAflaZbWe5I4bS0lK98847\nGjVqlPz8/NS2bVu1bt1afn5+uu+++/Tuu+9ed6UZGRkKDAxUmzZtZLValZCQoE2bNpmWadmypS5c\nuCBJunDhgpo3b24KBQBA9St3KxwTE6O+ffvqySefVEREhH2kUFRUpKysLG3evFkvvPCC0tPTr/n+\nvLw8BQQE2Kf9/f21c+dO0zITJ07UXXfdpVatWunixYv65z//ec22Zs+ebaorJibG0f4BwA0hLS1N\naWlpVdJWubuSioqKKtxtdL1lNmzYoJSUFL366quSpJUrV2rnzp1avHixfZlnnnlGBQUFevHFF3X0\n6FENHDhQe/fuVZMmTf5XILuSAMBpbtmVVN4Gv7Cw0H4g+XrB4efnp9zcXPt0bm6u/P39Tct89tln\nGjVqlCSpffv2atu2rQ4ePOh49QCAKlduMJw8eVKTJ0/W0KFD9ec//1mFhYV64YUXFBISory8vAob\njo6O1uHDh3X8+HFdvnxZ69atU1xcnGmZkJAQbdu2TZKUn5+vgwcPql27dpXsEgCgMso9xvDwww+r\nT58+GjZsmFJSUhQWFqYePXpo9+7d+u1vf1txw15eWrJkiWJjY2Wz2TR+/HiFhoZq6dKlkqSkpCQ9\n9dRTGjt2rMLDw1VaWqrnnntOPj4+Vdc7AIDTyj3GEBERoS+++MI+7e/vr5ycHNWvX7/aipM4xgAA\nrqjMtrPcEUNpaanOnj0rSTIMQz4+Pvr+++/tr/PNHgDqpnJHDG3atJHFYrn2mywWZWdnu7WwX66L\nEQMAOKcy206Hr3z2FIIBAJznltNV8/PzNXXqVA0bNkx/+ctf7FcoAwDqtnKD4eGHH1bjxo01efJk\nXbx4UVOmTKnOugAAHlLurqTw8HDt3bvXPh0ZGamsrKxqK+wqdiUBgPPcclaSYRims5JsNpt9WuKs\nJACoqzgrCQDqILeclZSTk6PWrVtXqrCqQDAAgPPcclbSvffe63JBAIDaq9xg4Fs6ANyYyt2VdOut\ntyohIeGaAWGxWLRo0SK3F3d1XYQUADjHLWcleXt7q0uXLjIMw3QQ+tfTAIC6pdwRg6euW/g1RgwA\n4LxqfYIbAKBuKzcYVq1aVeGbjx49WqXFAAA8r9xdSQkJCSosLFRcXJyio6PVsmVLGYahU6dOaffu\n3dq8ebOaNGmitWvXurdAdiUBgNPcdtvtI0eOaO3atfr000+Vk5MjSWrdurX69Omj0aNHV8vzmQkG\nAHAez2MAAJi45eAzAODGRDAAAEwIBgCASYXBEB8fr/fff1+lpaXVUQ8AwMMqDIZJkyZp1apVCgwM\n1IwZM3Tw4MHqqAsA4CEOn5V0/vx5rV27Vs8884xuv/12TZw4UQ899JCsVqt7C+SsJABwmtvPSjpz\n5oySk5P12muvKSoqSlOmTFFmZqYGDhzo0kqB2u7s2bN6/PHpGjHiIS1Z8jK7WlGnVDhiuPfee3Xg\nwAElJiZq7Nixatmypf21Ll26KDMz070FMmJADfPDDz+oU6ceysvrpcuXe6lhw38oMbGb/vGPhZ4u\nDbBz6wVuH3zwgYYOHWqaV1RUVG032SMYUNO8++67evjhxSosTJVkkXReXl6/VWHh99x8EjWGW3cl\nzZw5s8y8nj17urQyoC4oLi6WxdJIV0JBkrxlGJLNZvNkWUCVKfdBPadOndLJkyf1448/as+ePfYH\n9Fy4cEGXLl2qzhqBGmXAgAG66aYnVK/ePJWW9pS390LddVecGjZs6OnSgCpR7q6k5ORkvfHGG9q9\ne7eio6Pt85s0aaIxY8YoPj6+egpkVxJqoKNHj2ry5L/oxIk8xcT01Pz5T8vb29vTZQF2bj3GsGHD\nBo0cOdKlxqsCwQAAznNLMLz11ltKTEzU888/f81nPv/xj390rVpnCyQYAMBpldl2lnuM4epxhIsX\nL5qCAQBQt1W4K+nbb7/VrbfeWl31lMGIAQCc59bTVfv06aNBgwZp+fLlOnfunEsrAQDUHhUGw6FD\nh/T000/rq6++UpcuXXT33Xfrrbfeqo7aAAAe4NSjPQsKCvT4449r1apV1XZvGHYlAYDz3Lor6fvv\nv1dycrKGDBminj17qmXLltq1a5dLKwMA1HwVjhjatm2re+65Rw888IB69OhR7WcoMWIAAOe59QK3\nq9cteArBAADOc8t1DFOnTtXChQsVFxd3zRVu3ry5wsZTUlI0bdo02Ww2TZgwQdOnTy+zTFpamh5/\n/HEVFxfL19dXaWlpzvUAAFClyh0xZGZmqkuXLtfcUFssFvXr1++6DdtsNgUHB2vbtm3y8/NT165d\ntWbNGoWGhtqXOX/+vHr37q2tW7fK399fBQUF8vX1LbMuRgwA4By3HHzu0qWLJOmLL75QTEyM6V9W\nVlaFDWdkZCgwMFBt2rSR1WpVQkKCNm3aZFpm9erVGjlypPz9/SWpTCgAAKpfubuSrnrjjTc0bdo0\n07zk5OQy834tLy9PAQEB9ml/f3/t3LnTtMzhw4dVXFys/v376+LFi5o6daoSExPLtDV79mz7z1fD\nCQDwP2lpaVW2K77cYFizZo1Wr16tY8eOafjw4fb5Fy9eVPPmzSts2JED1sXFxdqzZ49SU1N16dIl\n9ezZUz169FBQUJBpuV8GAwCgrF9/aZ4zZ47LbZUbDL169VLLli313Xff6cknn7Tvq2rSpInCw8Mr\nbNjPz0+5ubn26dzcXPsuo6sCAgLk6+srb29veXt7684779TevXvLBAMAoPo4deWzM0pKShQcHKzU\n1FS1atVK3bp1K3Pw+cCBA3rssce0detWFRUVqXv37lq3bp06dOjwvwI5+AwATnPL6aq9e/fWp59+\nqsaNG5fZLXT1EZ/XbdjLS0uWLFFsbKxsNpvGjx+v0NBQLV26VJKUlJSkkJAQDR48WJ07d1a9evU0\nceJEUygAAKqf20YMVYURAwA4z633Sjp69Kh++uknSdL27du1aNEinT9/3qWVAQBqvgqDIT4+Xl5e\nXjpy5IiSkpKUm5urBx98sDpqAwB4QIXBUK9ePXl5eemdd97R5MmTNX/+fJ06dao6agMAeECFwXDT\nTTdp9erVevPNN3X33XdLunL9AQCgbqowGF5//XXt2LFDM2fOVNu2bZWdna2HHnqoOmoDAHgAZyUB\nQB3klusYrvrkk080Z84cHT9+XCUlJfYVZmdnu7RCAEDNVuGIITg4WC+++KKioqJUv359+/zquhMq\nIwYAcJ5bRwzNmjXTkCFDXGocAFD7VDhimDFjhmw2m+Lj49WgQQP7/KioKLcXJzFiAABXuPWZzzEx\nMde8hfb27dtdWqGzCAYAcJ5bg8HTCAYAcJ5b75V0+vRpjR8/XoMHD5Yk7du3T8uXL3dpZQCAmq/C\nYBgzZowGDRqkkydPSpKCgoL0wgsvuL0wAIBnVBgMBQUFeuCBB+ynqlqtVnl5VXgyEwCglqowGBo3\nbqwzZ87Ypz///HM1bdrUrUUBADynwq/+zz//vIYPH67s7Gz16tVL3333ndavX18dtQEAPMChs5KK\ni4t18OBBSVeuhLZarW4v7CrOSgIA57nlrKSMjAz7cxesVqsyMzP11FNP6YknntDZs2ddqxQAUOOV\nGwxJSUn2K53T09M1Y8YMPfLII7rlllv06KOPVluBAIDqVe4xhtLSUvn4+EiS1q1bp6SkJI0cOVIj\nR45UeHh4tRUIAKhe5Y4YbDab/Ult27ZtU//+/e2vXb39NgCg7il3xDB69Gj169dPvr6+atiwofr2\n7StJOnz4sJo1a1ZtBQIAqtd1z0rasWOHTp8+rUGDBqlRo0aSpEOHDqmwsJC7qwJADcZN9AAAJm69\niR4A4MZCMAAATAgGAIAJwQAAMCEYAAAmBAMAwIRgAACYEAwAABOCAQBgQjAAAEwIBgCACcEAADAh\nGAAAJgQDAMCEYAAAmLg1GFJSUhQSEqKgoCDNmzev3OV27dolLy8vvfPOO+4sBwDgALcFg81m02OP\nPaaUlBTt27dPa9as0f79+6+53PTp0zV48GAeyAMANYDbgiEjI0OBgYFq06aNrFarEhIStGnTpjLL\nLV68WPfdd59atGjhrlIAAE7wclfDeXl5CggIsE/7+/tr586dZZbZtGmT/vOf/2jXrl2yWCzXbGv2\n7Nn2n2NiYhQTE+OOkgGg1kpLS1NaWlqVtOW2YChvI/9L06ZN09y5c+3PJi1vV9IvgwEAUNavvzTP\nmTPH5bbcFgx+fn7Kzc21T+fm5srf39+0TGZmphISEiRJBQUF2rJli6xWq+Li4txVFgCgAhbDTUd8\nS0pKFBwcrNTUVLVq1UrdunXTmjVrFBoaes3lx44dq+HDhys+Pt5c4M+jCQCA4yqz7XTbiMHLy0tL\nlixRbGysbDabxo8fr9DQUC1dulSSlJSU5K5VAwAqwW0jhqrCiAEAnFeZbSdXPgMATAgGAIAJwQAA\nMCEYAAAmBAMAwIRgAACYEAwAABOCAQBgQjAAAEwIBg85e/asBgy4R1brzfLx8dPates8XRIASOKW\nGB4zcOAIpae31OXL8yXtl7f3cKWn/0vR0dGeLg1AHcAtMWqh9PRtunz5WUmNJXWVzZagjz76yNNl\nAQDB4ClNm/pK+vrnKUNW69fy9fX1ZEkAIIldSR6zceNGPfjgoyotHSUvrwO6447L2rFjmxo0aODp\n0gDUAZXZdhIMHrR3716lpaWpefPmGjVqFKEAoMoQDAAAEw4+AwCqDMEAADAhGAAAJgQDAMCEYAAA\nmBAMAAATggEAYEIwAABMCAYAgAnBAAAwIRgAACYEAwDAhGAAAJgQDAAAE4IBAGBCMAAATAgGAIAJ\nwQAAMCEYAAAmBAMAwIRgAACYEAwAABOCAQBgQjAAAEwIBg9LS0vzdAluVZf7V5f7JtG/G5lbgyEl\nJUUhISEKCgrSvHnzyry+atUqhYeHq3Pnzurdu7e+/PJLd5ZTI9X1P8663L+63DeJ/t3IvNzVsM1m\n02OPPaZt27bJz89PXbt2VVxcnEJDQ+3LtGvXTunp6WratKlSUlL06KOP6vPPP3dXSQAAB7htxJCR\nkaHAwEC1adNGVqtVCQkJ2rRpk2mZnj17qmnTppKk7t2765tvvnFXOQAAB1kMwzDc0fD69eu1detW\nvfrqq5KklStXaufOnVq8ePE1l1+wYIEOHTqkZcuWmQu0WNxRHgDUea5u3t22K8mZDfr27dv1+uuv\n69NPPy3zmptyCwBQDrcFg5+fn3Jzc+3Tubm58vf3L7Pcl19+qYkTJyolJUW/+c1v3FUOAMBBbjvG\nEB0drcOHD+v48eO6fPmy1q1bp7i4ONMyJ06cUHx8vFauXKnAwEB3lQIAcILbRgxeXl5asmSJYmNj\nZbPZNH78eIWGhmrp0qWSpKSkJP3tb3/TuXPnNGnSJEmS1WpVRkaGu0oCADjCqCG2bNliBAcHG4GB\ngcbcuXPLvL5//36jR48eRoMGDYwFCxZ4oMLKqah/K1euNDp37mx06tTJ6NWrl7F3714PVOm6ivq3\nceNGo3PnzkZERIQRFRVlpKameqBK11TUt6syMjKM+vXrGxs2bKjG6iqvov5t377duOWWW4yIiAgj\nIiLCePrppz1Qpesc+fy2b99uREREGB07djT69etXvQVWUkX9mz9/vv2zCwsLM+rXr2+cO3fuum3W\niGAoKSkx2rdvbxw7dsy4fPmyER4ebuzbt8+0zLfffmvs2rXLmDlzZq0LBkf699lnnxnnz583DOPK\nB929e3dPlOoSR/pXWFho//nLL7802rdvX91lusSRvl1drn///sawYcOM9evXe6BS1zjSv+3btxvD\nhw/3UIWV40j/zp07Z3To0MHIzc01DMMwvvvuO0+U6hJH/z6veu+994wBAwZU2G6NuCWGI9c8tGjR\nQtHR0bJarR6q0nV1/ZoOR/rXqFEj+8+FhYXy9fWt7jJd4kjfJGnx4sW677771KJFCw9U6TpH+2fU\n0rMDHenf6tWrNXLkSPvJMbXlb1Ny/PO7avXq1Ro9enSF7daIYMjLy1NAQIB92t/fX3l5eR6sqGo5\n27/ly5dr6NCh1VFalXC0fxs3blRoaKiGDBmiRYsWVWeJLnOkb3l5edq0aZP9WFltuvbGkf5ZLBZ9\n9tlnCg8P19ChQ7Vv377qLtNljvTv8OHDOnv2rPr376/o6Gi99dZb1V2my5zZtly6dElbt27VyJEj\nK2zXbQefnVGb/iO5oqqu6aipHO3fiBEjNGLECH388cdKTEzUwYMH3VxZ5TnSt2nTpmnu3LmyWCwy\nruyerYbKqoYj/YuKilJubq4aNmyoLVu2aMSIETp06FA1VFd5jvSvuLhYe/bsUWpqqi5duqSePXuq\nR48eCgoKqoYKK8eZbct7772nPn36qFmzZhUuWyOCwdFrHmqrun5Nh7OfX9++fVVSUqIzZ86oefPm\n1VGiyxzpW2ZmphISEiRJBQUF2rJli6xWa5nTs2siR/rXpEkT+89DhgzRH/7wB509e1Y+Pj7VVqer\nHOlfQECAfH195e3tLW9vb915553au3dvrQgGZ/7vrV271qHdSJJqxllJxcXFRrt27Yxjx44ZRUVF\n1z2AMmvWrFp38NmR/uXk5Bjt27c3duzY4aEqXedI/44cOWKUlpYahmEYmZmZRrt27TxRqtOc+ds0\nDMMYM2ZMrToryZH+nT592v7Z7dy502jdurUHKnWNI/3bv3+/MWDAAKOkpMT44YcfjLCwMOPrr7/2\nUMXOcfTv8/z584aPj49x6dIlh9qtESMGR655OH36tLp27aoLFy6oXr16Wrhwofbt26fGjRt7uPqK\n1fVrOhzp34YNG/Tmm2/KarWqcePGWrt2rYerdowjfavNHOnf+vXr9corr8jLy0sNGzasNZ+d5Fj/\nQkJCNHjwYHXu3Fn16tXTxIkT1aFDBw9X7hhH/z43btyo2NhYeXt7O9Su226iBwConWrEWUkAgJqD\nYAAAmBAMAAATggEAYEIwwOPGjRun2267TZ06dXLp/f/6178UFRWliIgIdezYscxTACtr1qxZSk1N\nlSR9/PHH6tixo6KionTy5EmNGjXquu+dOHGiDhw4IEn6+9//7vS6i4qK1K9fPxmGoePHj8vb21uR\nkZGKjIxUVFSUiouLlZycrBYtWigyMlIdO3bUa6+9Jkmm+R06dNDLL79sb3fRokW16gpfVLMqPakW\ncEF6erqxZ88eIywszOn3Xr582WjVqpWRl5dnnz548GBVl2iXlJRkrFy50qX3Nm7c2On3LF++3Hju\nuecMwzCMY8eOXfN3lJycbEyePNkwjCs3m2zRooWRn59vmn/mzBnj1ltvNfLz8w3DMIwLFy4YXbt2\ndakfqPsYMcDj+vbt6/KV3hcvXlRJSYn9Klyr1ao77rhDkjRmzBj9/ve/V9euXRUcHKz3339fkmSz\n2fSnP/1J3bp1U3h4uGmEMW/ePHXu3FkRERF66qmn7O1s2LBBy5cv19tvv62//vWvSkxMVE5OjsLC\nwuxtPvnkk+rUqZPCw8P10ksvSZJiYmKUmZmpGTNm6Mcff1RkZKQeeughzZo1SwsXLrSvd+bMmde8\nf9SaNWt0zz33VPh7MH4+67xFixZq3769cnJyTPN9fHzUrl07+/wmTZqoefPm+vrrrx39VeMGUiMu\ncANc5ePjo7i4OLVu3VoDBgzQ3XffrdGjR8tischisejEiRPatWuXjhw5ov79++vIkSN644031KxZ\nM2VkZKioqEh9+vTRoEGDtH//fm3evFkZGRm6+eabdf78eUmytzV+/Hh98sknGj58uOLj43X8+HH7\nvWqWLVumEydOaO/evapXr57OnTtneu/cuXP10ksvKSsrS5KUk5Oj+Ph4TZ06VaWlpVq3bp127dpl\n6pvNZtNXX31lDzpJOnr0qCIjIyVJffr00eLFi033ZsrOzlZ2draCgoJMG/2cnBxlZ2erffv29nnd\nunVTenq6OnbsWJUfCeoAggG13quvvqqpU6dq27ZtWrBggT788EOtWLFCknT//fdLkgIDA9WuXTsd\nOHBA//73v/Xf//5X69evlyRduHBBhw8fVmpqqsaNG6ebb75Zksq92ZhxjWtCU1NTNWnSJNWrd2UQ\nXtEIqHXr1mrevLm++OILnT59WlFRUWXeU1BQYLpPkSS1b9/eHi6/tG7dOn3yySdq0KCBli1bZq99\n3bp1Sk9P14EDB7RgwQLT/Y1atWql7Ozs69aJGxPBgBrPZrMpOjpaknTPPfdo9uzZZZYJCwtTWFiY\nEhMT1bZtW3sw/NrVb/hLlizRwIEDTa9t3bq1UndGdfa9EyZM0IoVK5Sfn69x48a53KbFYlFCQkKZ\nXVG/nJ+Zman7779fY8eOtd9GxjCMOn9nY7iGYwyo8erXr6+srCxlZWWVCYUffvhBaWlp9umsrCy1\nadNG0pUN39tvvy3DMHT06FFlZ2crJCREsbGxevnll1VSUiJJOnTokC5duqSBAwdqxYoV+vHHHyXJ\nvjvIEQMHDtTSpUtls9nKfa/VarWvU5LuvfdepaSkaPfu3YqNjS2zvK+vrwoLCytct1HOrb5/Ob9L\nly4aPny4KTxOnTpl/10Bv0QwwONGjx6tXr166dChQwoICCj32/61GIah+fPnKyQkRJGRkZozZ46S\nk5MlXfnGfPvtt6tbt24aOnSoli5dqptuukkTJkxQhw4dFBUVpU6dOmnSpEmy2WyKjY1VXFycoqOj\nFRkZqeeff/6a6/zlt+yrP0+YMEG33367/cD1mjVryrzv0UcfVefOnZWYmCjpSlDcdddduv/++6/5\nzb1+/foKCwszPbfiWstdPY5R0fzp06frlVde0aVLlyRdefpX3759r9lH3Ni4iR7qrLFjx9oPFNdE\npaWl6tKli9avX286KPxLycnJys/P1/Tp06t03RcuXNCAAQPKHPAGJEYMgEfs27dPQUFB+t3vfldu\nKEjSgw8+qPfff7/KnwqXnJysqVOnVmmbqDsYMQAATBgxAABMCAYAgAnBAAAwIRgAACYEAwDAhGAA\nAJj8f/hiakEL/3SjAAAAAElFTkSuQmCC\n",
"text": [
"<matplotlib.figure.Figure at 0x111da0150>"
]
}
],
"prompt_number": 84
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A complete ROC plot is simply a curve connecting the dots for *every* possible critical value, i.e. placing one threshold at a time between the smallest two p-values, then the next smallest, then the next, and so forth:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import sklearn.metrics\n",
"\n",
"# We reverse the order of our predictions since this particular built-in\n",
"# function expects higher predictions to mean greater confidence.\n",
"aFPR, aTPR, aThresh = sklearn.metrics.roc_curve( aGoldStandard, 1 - array( aPredictionsPValues ) )\n",
"print( aThresh )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[ 0.99 0.98 0.97 0.96 0.5 0.4 0.2 0.1 ]\n"
]
}
],
"prompt_number": 85
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"plot( aFPR, aTPR )\n",
"xlabel( \"1 - Specificity (FPR)\" )\n",
"ylabel( \"Sensitivity (TPR)\" )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 86,
"text": [
"<matplotlib.text.Text at 0x1120be490>"
]
},
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAYYAAAEMCAYAAADAqxFbAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X1UVHX+B/D3xEyFwk/Dh1ZmiOeFQXQYBJ+CHCtFLcfU\nHrATJSKxnqNhW7uyevYIux1XXDuVUh1sTdIS2bQWSsVdyMlMZUyJLR8QJWhEJTERieJhuL8/XCev\nzAwjcGcA369zOIc78+XeD1/xvud77/feKxMEQQAREdH/3OHqAoiIqHdhMBARkQiDgYiIRBgMREQk\nwmAgIiIRBgMREYlIFgwLFizAvffei1GjRtls88ILLyA4OBgajQalpaVSlUJERLdAsmBITExEYWGh\nzfd37dqF06dPo6KiAhs2bMCiRYukKoWIiG6BZMEQGxuLe+65x+b7BQUFeO655wAA48aNQ319PWpr\na6Uqh4iIHCR31YZramrg4+NjWVapVDh79izuvfdeUTuZTObs0oiI+oWu3tjCpSefby7aVggIgsAv\nQcDKlStdXkNv+WJf/PoFsC+uf/Hv4tev7nBZMCiVSphMJsvy2bNnoVQqXVUOERH9j8uCQa/XY/Pm\nzQCAQ4cOYfDgwR0OIxERkfNJdo5h3rx5+Pzzz1FXVwcfHx9kZGSgtbUVAJCSkoIZM2Zg165dCAoK\nwsCBA7Fp0yapSuk3dDqdq0voNdgXN9K5uoBeg38XPUMmdPdglMRkMlm3j5cR9WcyGcD/InSz7uw7\neeUzERGJMBiIiEiEwUBERCIMBiIiEmEwEBGRCIOBiIhEGAxERCTCYCAiIhEGAxERiTAYiIhIhMFA\nREQiDAYiIhJhMBARkQiDgYiIRBgMREQkwmAgIiIRBgMREYkwGIiISITBQEREIgwGIiISYTAQEZEI\ng4GIiEQYDEREJCJpMBQWFiI0NBTBwcHIzMzs8P7ly5cxe/ZsaDQajBs3DseOHZOyHCIicoBkwWA2\nm7F48WIUFhbi+PHjyM3NxYkTJ0RtVq1ahcjISJSVlWHz5s1ITU2VqhwiInKQXKoVG41GBAUFwc/P\nDwAQHx+P/Px8qNVqS5sTJ04gLS0NABASEoKqqipcvHgRw4YNE60rPT3d8r1Op4NOp5OqbCKiPslg\nMMBgMPTIuiQLhpqaGvj4+FiWVSoVSkpKRG00Gg0++ugjxMTEwGg0orq6GmfPnrUbDERE1NHNH5oz\nMjK6vC7JDiXJZLJO26SlpaG+vh5arRZZWVnQarVwc3OTqiQiInKAZCMGpVIJk8lkWTaZTFCpVKI2\nnp6eePfddy3L/v7+CAgIkKokIiJygGQjhqioKFRUVKCqqgotLS3Iy8uDXq8Xtbly5QpaWloAAO+8\n8w4mTZoEDw8PqUoiIiIHSDZikMvlyMrKQlxcHMxmM5KSkqBWq5GdnQ0ASElJwfHjxzF//nzIZDKE\nh4dj48aNUpVDREQOkgmCILi6CHtkMhl6eYlELiWTAfwvQjfrzr6TVz4TEZEIg4GIiEQYDEREJMJg\nICIiEQYDERGJMBiIiEiEwUBERCIMBiIiEmEwEBGRCIOBiIhEGAxERCTCYCAiIhEGAxERiTAYiIhI\nhMFAREQiDAYiIhJhMBARkQiDgYiIRBgMREQkwmAgIiIRBgMREYkwGIiISETSYCgsLERoaCiCg4OR\nmZnZ4f26ujpMmzYNERERCA8PR05OjpTlEBGRA2SCIAhSrNhsNiMkJARFRUVQKpWIjo5Gbm4u1Gq1\npU16ejqam5vxt7/9DXV1dQgJCUFtbS3kcvmvBcpkkKhEon5BJgP4X4Ru1p19p2QjBqPRiKCgIPj5\n+UGhUCA+Ph75+fmiNiNGjEBDQwMAoKGhAUOGDBGFAhEROZ9ke+Gamhr4+PhYllUqFUpKSkRtkpOT\n8eCDD8Lb2xtXr17FP//5T6vrSk9Pt3yv0+mg0+mkKJmIqM8yGAwwGAw9si7JgkEmk3XaZtWqVYiI\niIDBYMCZM2cwZcoUlJWVwdPTU9TuxmAgIqKObv7QnJGR0eV1SXYoSalUwmQyWZZNJhNUKpWozYED\nB/DEE08AAAIDA+Hv74/y8nKpSiIiIgdIFgxRUVGoqKhAVVUVWlpakJeXB71eL2oTGhqKoqIiAEBt\nbS3Ky8sREBAgVUlEROQAyQ4lyeVyZGVlIS4uDmazGUlJSVCr1cjOzgYApKSkYPny5UhMTIRGo0F7\nezvWrFkDLy8vqUoiIiIHSDZdtadwuiqRfZyuStb0yumqRETUNzEYiIhIhMFAREQiDgfDL7/8gubm\nZilrISKiXsBmMLS3t+Ojjz7CE088AaVSCX9/f/j6+kKpVOLxxx/Hxx9/zJPCRET9kM1ZSQ888ABi\nY2Oh1+sRERGBu+66CwDQ3NyM0tJSFBQUYP/+/di3b5+0BXJWEpFdnJVE1nRn32kzGJqbmy1hYIsj\nbbqLwUBkH4OBrJFkuqqtHX5jY6Pl2QpShwIRETmfzWA4d+4clixZghkzZuCPf/wjGhsb8dprryE0\nNBQ1NTXOrJGIiJzI5i0xnn32WcTExOCRRx5BYWEhwsPDMX78eHz11Vf4zW9+48waiYjIiWyeY4iI\niMDXX39tWVapVKiuroabm5vTigN4joGoMzzHQNZ0Z99pc8TQ3t6OH3/8EQAgCAK8vLxw5coVy/u8\n2R0RUf9kc8Tg5+dn82E7MpkMlZWVkhZ247Y4YiCyjSMGskaS6aq9BYOByD4GA1kjyXTV2tpapKam\n4pFHHsGf/vQnNDQ0dLlAIiLqO2wGw7PPPgsPDw8sWbIEV69exQsvvODMuoiIyEVsHkrSaDQoKyuz\nLGu1WpSWljqtsOt4KInIPh5KImskmZUkCIJoVpLZbLYsA5yVRETUX3FWElEfxxEDWSPJrKTq6mr4\n+vp2q7CewGAgso/BQNZIMitp9uzZXS6IiIj6LpvBwE/pRES3J5uHkoYPH474+HirASGTybBu3TrJ\ni7u+LYYUkW08lETWSDIryd3dHWPGjIEgCKKT0Dcv21NYWIilS5fCbDZj4cKFWLZsmej9tWvX4oMP\nPgAAtLW14cSJE6irq8PgwYO78rsQEVEPsDli6O51C2azGSEhISgqKoJSqUR0dDRyc3OhVquttv/0\n00/x+uuvo6ioSFwgRwxEdnHEQNY49QlujjIajQgKCoKfnx8UCgXi4+ORn59vs/3WrVsxb968bm2T\niIi6z+ahpOuHeOw5c+YMAgMDrb5XU1MDHx8fy7JKpUJJSYnVtk1NTdizZw/eeustq++np6dbvtfp\ndNDpdJ3WRkR0OzEYDDAYDD2yLpvBsGLFCjQ2NkKv1yMqKgojRoyAIAg4f/48vvrqKxQUFMDT0xPb\ntm2z+vOOnocAgE8++QQxMTE2zy3cGAxERNTRzR+aMzIyurwum8Gwbds2nD59Gtu2bcOKFStQXV0N\nAPD19UVMTAzWr1+PgIAAmytWKpUwmUyWZZPJBJVKZXNbPIxERNQ7SPY8hra2NoSEhKC4uBje3t4Y\nO3as1ZPPV65cQUBAAM6ePQt3d/eOBfLkM5FdPPlM1kgyXbW75HI5srKyEBcXB7PZjKSkJKjVamRn\nZwMAUlJSAAD/+te/EBcXZzUUiIjI+fgEN6I+jiMGskaS6apERHR76jQY5syZg507d6K9vd0Z9RAR\nkYt1GgyLFi3CBx98gKCgIKSlpaG8vNwZdRERkYs4fI6hvr4e27ZtwyuvvIL77rsPycnJeOaZZ6BQ\nKKQtkOcYiOziOQayRpIH9dzo0qVL2LJlC95//314e3vj6aefxv79+/Htt9/22JV2NgtkMBDZxWAg\naySdrjp79mycPHkSCQkJ+OSTTzBixAgAQHx8PMaMGdOljRJ1l5cXcPmyq6voHe65x9UVUH/T6Yhh\n165dmDFjhui15ubmbt9kz1EcMZA1/JRMZJ+k01VXrFjR4bUJEyZ0aWNERNT72TyUdP78eZw7dw4/\n//wzjh49anlAT0NDA5qampxZIxEROZHNYNizZw/ee+891NTU4KWXXrK87unpiVWrVjmlOCIicr5O\nzzHs2LEDc+fOdVY9HfAcA1nDcwxE9kkyXXXLli1ISEjAq6++avWZz7///e+7Vu2tFshgICsYDET2\nSTJd9fp5hKtXr97SQ3eIiKhv6/RQ0g8//IDhw4c7q54OOGIgazhiILJP0umqMTExmDp1KjZu3IjL\nvKKIiKjf6zQYTp06hb/+9a/49ttvMWbMGDz66KPYsmWLM2ojIiIXuKUH9dTV1eHFF1/EBx984LTb\ncPNQElnDQ0lE9kl6KOnKlSvIycnB9OnTMWHCBIwYMQKHDx/u0saIiKj363TE4O/vj1mzZuGpp57C\n+PHjnT5DiSMGsoYjBiL7JL3t9vXrFlyFwUDWMBiI7JPkOobU1FS88cYb0Ov1VjdYUFDQpQ0SEVHv\nZjMYnn32WQAQ3SfpOl7wRkTUf9kMhusP4fn666+xdOlS0Xuvv/46Jk2aJG1lRETkEp3OSnrvvfc6\nvJaTk+PQygsLCxEaGorg4GBkZmZabWMwGKDVahEeHg6dTufQeomISDo2Tz7n5uZi69at+OKLLxAb\nG2t5/erVq3Bzc0NxcbHdFZvNZoSEhKCoqAhKpRLR0dHIzc2FWq22tKmvr8f999+PPXv2QKVSoa6u\nDkOHDhUXyJPPZAVPPhPZJ8nJ54kTJ2LEiBG4ePEiXn75ZcsGPD09odFoOl2x0WhEUFAQ/Pz8AFx7\nRnR+fr4oGLZu3Yq5c+dCpVIBQIdQICIi57MZDL6+vvD19cWhQ4e6tOKamhr4+PhYllUqFUpKSkRt\nKioq0NraismTJ+Pq1atITU1FQkJCh3Wlp6dbvtfpdDzkRER0E4PBAIPB0CPrshkM999/P7788kt4\neHh0mIV0/RGf9jgyc6m1tRVHjx5FcXExmpqaMGHCBIwfPx7BwcGidjcGAxERdXTzh+aMjIwur8tm\nMHz55ZcAgMbGxi6tWKlUwmQyWZZNJpPlkNF1Pj4+GDp0KNzd3eHu7o4HHngAZWVlHYKBiIicp9NZ\nSWfOnMEvv/wCANi7dy/WrVuH+vr6TlccFRWFiooKVFVVoaWlBXl5eR0ulps1axb2798Ps9mMpqYm\nlJSUICwsrIu/ChER9YROg2HOnDmQy+U4ffo0UlJSYDKZ8PTTT3e6YrlcjqysLMTFxSEsLAxPPfUU\n1Go1srOzkZ2dDQAIDQ3FtGnTMHr0aIwbNw7JyckMBiIiF+v0XklarRalpaVYs2YN3N3dsWTJEstr\nTimQ01XJCk5XJbJP0ttu33nnndi6dSs2b96MRx99FMC1k8ZERNQ/dRoM7777Lg4ePIgVK1bA398f\nlZWVeOaZZ5xRGxERucAtPcHNFXgoiazhoSQi+yS58vm6/fv3IyMjA1VVVWhra7NssLKysksbJCKi\n3q3TEUNISAhef/11REZGws3NzfK6s25fwREDWcMRA5F9ko4YBg8ejOnTp3dp5URE1Pd0OmJIS0uD\n2WzGnDlzcNddd1lej4yMlLw4gCMGso4jBiL7JH3ms06ns3rfo71793Zpg7eKwUDWMBiI7JM0GFyN\nwUDWMBiI7JP0ArcLFy4gKSkJ06ZNAwAcP34cGzdu7NLGiIio9+s0GObPn4+pU6fi3LlzAIDg4GC8\n9tprkhdGRESu0Wkw1NXV4amnnrJMVVUoFJDLO53MREREfVSnweDh4YFLly5Zlg8dOoRBgwZJWhQR\nEblOpx/9X331VcycOROVlZWYOHEiLl68iO3btzujNiIicgGHZiW1traivLwcwLUroRUKheSFXcdZ\nSWQNZyUR2SfJrCSj0Yjz588DuHZe4ciRI1i+fDleeukl/Pjjj12rlIiIej2bwZCSkmK50nnfvn1I\nS0vDc889h//7v//D888/77QCiYjIuWyeY2hvb4eXlxcAIC8vDykpKZg7dy7mzp0LjUbjtAKJiMi5\nbI4YzGaz5UltRUVFmDx5suW967ffJiKi/sfmiGHevHmYNGkShg4digEDBiA2NhYAUFFRgcGDBzut\nQCIici67s5IOHjyICxcuYOrUqRg4cCAA4NSpU2hsbOTdVcmlOCuJyD7eRI9uOwwGIvskvYkeERHd\nXiQNhsLCQoSGhiI4OBiZmZkd3jcYDBg0aBC0Wi20Wi1eeeUVKcshIiIHSHY3PLPZjMWLF6OoqAhK\npRLR0dHQ6/VQq9WidpMmTUJBQYFUZRAR0S2SbMRgNBoRFBQEPz8/KBQKxMfHIz8/v0M7nj8gIupd\nJBsx1NTUwMfHx7KsUqlQUlIiaiOTyXDgwAFoNBoolUqsXbsWYWFhHdaVnp5u+V6n00Gn00lVNhFR\nn2QwGGAwGHpkXZIFg7XnRN8sMjISJpMJAwYMwO7du/HYY4/h1KlTHdrdGAxERNTRzR+aMzIyurwu\nyQ4lKZVKmEwmy7LJZIJKpRK18fT0xIABAwAA06dPR2trK2/QR0TkYpIFQ1RUFCoqKlBVVYWWlhbk\n5eVBr9eL2tTW1lrOMRiNRgiCYLk/ExERuYZkh5LkcjmysrIQFxcHs9mMpKQkqNVqZGdnA7h299bt\n27fj7bffhlwux4ABA7Bt2zapyiEiIgfxymfqk3jlM5F9vPKZiIh6DIOBiIhEGAxERCTCYCAiIhEG\nAxERiTAYiIhIhMFAREQiDAYiIhJhMBARkQiDgYiIRBgMREQkwmAgIiIRBgMREYkwGIiISITBQERE\nIgwGIiISYTAQEZEIg4GIiEQYDEREJMJgICIiEQYDERGJMBiIiEiEwUBERCKSBkNhYSFCQ0MRHByM\nzMxMm+0OHz4MuVyOjz76SMpyiIjIAZIFg9lsxuLFi1FYWIjjx48jNzcXJ06csNpu2bJlmDZtGgRB\nkKocIiJykGTBYDQaERQUBD8/PygUCsTHxyM/P79Du/Xr1+Pxxx/HsGHDpCqFiIhugVyqFdfU1MDH\nx8eyrFKpUFJS0qFNfn4+PvvsMxw+fBgymczqutLT0y3f63Q66HQ6KUomIuqzDAYDDAZDj6xLsmCw\ntZO/0dKlS7F69WrIZDIIgmDzUNKNwUBERB3d/KE5IyOjy+uSLBiUSiVMJpNl2WQyQaVSidocOXIE\n8fHxAIC6ujrs3r0bCoUCer1eqrKIiKgTMkGiM75tbW0ICQlBcXExvL29MXbsWOTm5kKtVlttn5iY\niJkzZ2LOnDniAv83miC6kUwG8M+CyLbu7DslGzHI5XJkZWUhLi4OZrMZSUlJUKvVyM7OBgCkpKRI\ntWkiIuoGyUYMPYUjBrKGIwYi+7qz7+SVz0REJMJgICIiEQYDERGJMBiIiEiEwUBERCIMBiIiEmEw\nEBGRCIOBiIhEGAxERCTCYCAiIhEGAxERiTAYiIhIhMFAREQiDAYiIhJhMBARkYhkD+rpSQ48Pppu\nM/fc4+oKiPqvPhEMfCALEZHz8FASERGJMBiIiEiEwUBERCIMBiIiEmEwEBGRCIOhDzEYDK4uoddg\nX/yKffEr9kXPkDQYCgsLERoaiuDgYGRmZnZ4Pz8/HxqNBlqtFmPGjMFnn30mZTl9Hv/of8W++BX7\n4lfsi54h2XUMZrMZixcvRlFREZRKJaKjo6HX66FWqy1tHn74YcyaNQsA8M0332D27Nk4ffq0VCUR\nEZEDJBsxGI1GBAUFwc/PDwqFAvHx8cjPzxe1GThwoOX7xsZGDB06VKpyiIjIUYJEPvzwQ2HhwoWW\n5S1btgiLFy/u0O7jjz8WQkNDhUGDBgklJSUd3gfAL37xi1/86sJXV0l2KEnm4A2OHnvsMTz22GP4\n4osvkJCQgPLyctH7Au+HQUTkVJIdSlIqlTCZTJZlk8kElUpls31sbCza2tpw6dIlqUoiIiIHSBYM\nUVFRqKioQFVVFVpaWpCXlwe9Xi9qc+bMGcuI4OjRowCAIUOGSFUSERE5QLJDSXK5HFlZWYiLi4PZ\nbEZSUhLUajWys7MBACkpKdixYwc2b94MhUIBDw8PbNu2TapyiIjIUV0+O9HDdu/eLYSEhAhBQUHC\n6tWrrbZZsmSJEBQUJIwePVo4evSokyt0ns764v333xdGjx4tjBo1Spg4caJQVlbmgiqdw5G/C0EQ\nBKPRKLi5uQk7duxwYnXO5Uhf7N27V4iIiBBGjhwpTJo0ybkFOlFnfXHx4kUhLi5O0Gg0wsiRI4VN\nmzY5v0gnSExMFIYPHy6Eh4fbbNOV/WavCIa2tjYhMDBQ+O6774SWlhZBo9EIx48fF7XZuXOnMH36\ndEEQBOHQoUPCuHHjXFGq5BzpiwMHDgj19fWCIFz7D3I798X1dpMnTxYeeeQRYfv27S6oVHqO9MXl\ny5eFsLAwwWQyCYJwbefYHznSFytXrhTS0tIEQbjWD15eXkJra6srypXUvn37hKNHj9oMhq7uN3vF\nLTEcueahoKAAzz33HABg3LhxqK+vR21trSvKlZQjfTFhwgQMGjQIwLW+OHv2rCtKlZwjfQEA69ev\nx+OPP45hw4a5oErncKQvtm7dirlz51omefTX64Ic6YsRI0agoaEBANDQ0IAhQ4ZALu8TzyW7JbGx\nsbjHzuMMu7rf7BXBUFNTAx8fH8uySqVCTU1Np2364w7Rkb640caNGzFjxgxnlOZ0jv5d5OfnY9Gi\nRQAcnybd1zjSFxUVFfjxxx8xefJkREVFYcuWLc4u0ykc6Yvk5GQcO3YM3t7e0Gg0eOONN5xdZq/Q\n1f1mr4hQR/8zCzdd09AfdwK38jvt3bsX7777Lr788ksJK3IdR/pi6dKlWL16NWQyGYRrh0adUJnz\nOdIXra2tOHr0KIqLi9HU1IQJEyZg/PjxCA4OdkKFzuNIX6xatQoREREwGAw4c+YMpkyZgrKyMnh6\nejqhwt6lK/vNXhEMjlzzcHObs2fPQqlUOq1GZ3H0+o///ve/SE5ORmFhod2hZF/mSF8cOXIE8fHx\nAIC6ujrs3r0bCoWiw9Tovs6RvvDx8cHQoUPh7u4Od3d3PPDAAygrK+t3weBIXxw4cAArVqwAAAQG\nBsLf3x/l5eWIiopyaq2u1uX9Zo+cAemm1tZWISAgQPjuu++E5ubmTk8+Hzx4sN+ecHWkL6qrq4XA\nwEDh4MGDLqrSORzpixvNnz+/385KcqQvTpw4ITz00ENCW1ub8NNPPwnh4eHCsWPHXFSxdBzpixdf\nfFFIT08XBEEQLly4ICiVSuHSpUuuKFdy3333nUMnn29lv9krRgyOXPMwY8YM7Nq1C0FBQRg4cCA2\nbdrk4qql4Uhf/OUvf8Hly5ctx9UVCgWMRqMry5aEI31xu3CkL0JDQzFt2jSMHj0ad9xxB5KTkxEW\nFubiynueI32xfPlyJCYmQqPRoL29HWvWrIGXl5eLK+958+bNw+eff466ujr4+PggIyMDra2tALq3\n35QJQj89KEtERF3SK2YlERFR78FgICIiEQYDERGJMBiIiEiEwUAut2DBAtx7770YNWpUl37+008/\nRWRkJCIiIjBy5Ehs2LChR+tbuXIliouLAQBffPEFRo4cicjISJw7dw5PPPGE3Z9NTk7GyZMnAVy7\n6OpWNTc3Y9KkSRAEAVVVVXB3d4dWq4VWq0VkZCRaW1uRk5ODYcOGQavVYuTIkfjHP/4BAKLXw8LC\n8NZbb1nWu27dun57ZTT1gJ6ZSUvUdZ3dCMyelpYWwdvbW6ipqbEsl5eX93SJFikpKcL777/fpZ/1\n8PC45Z/ZuHGjsGbNGkEQbM9Xz8nJEZYsWSIIgiD88MMPwrBhw4Ta2lrR65cuXRKGDx8u1NbWCoIg\nCA0NDUJ0dHSXfg/q/zhiIJfr7EZg9ly9ehVtbW2WOeoKhQK//e1vAQDz58/H7373O0RHRyMkJAQ7\nd+4EAJjNZvzhD3/A2LFjodFoRCOMzMxMjB49GhEREVi+fLllPTt27MDGjRvx4Ycf4s9//jMSEhJQ\nXV2N8PBwyzpffvlljBo1ChqNBm+++SYAQKfT4ciRI0hLS8PPP/8MrVaLZ555BitXrhTdv2fFihVY\nt25dh98vNzcXs2bN6rQfhP/NOh82bBgCAwNRXV0tet3LywsBAQGW1z09PTFkyBAcO3bM0a6m20iv\nuMCNqKu8vLyg1+vh6+uLhx56CI8++ijmzZsHmUwGmUyG77//HocPH8bp06cxefJknD59Gu+99x4G\nDx4Mo9GI5uZmxMTEYOrUqThx4gQKCgpgNBpx9913o76+HgAs60pKSsL+/fsxc+ZMzJkzB1VVVZb7\nzmzYsAHff/89ysrKcMcdd+Dy5cuin129ejXefPNNlJaWAgCqq6sxZ84cpKamor29HXl5eTh8+LDo\ndzObzfj2228tQQdce+qhVqsFAMTExGD9+vWie+FUVlaisrISwcHBop1+dXU1KisrERgYaHlt7Nix\n2LdvH0aOHNmT/yTUDzAYqM975513kJqaiqKiIqxduxb/+c9/LFd4PvnkkwCAoKAgBAQE4OTJk/j3\nv/+Nb775Btu3bwdw7bbMFRUVKC4uxoIFC3D33XcDAAYPHmx1e4KVa0KLi4uxaNEi3HHHtUF4ZyMg\nX19fDBkyBF9//TUuXLiAyMjIDj9TV1fX4aZvgYGBlnC5UV5eHvbv34+77roLGzZssNSel5eHffv2\n4eTJk1i7dq3o6l9vb29UVlbarZNuTwwG6vXMZrPl5mezZs1Cenp6hzbh4eEIDw9HQkIC/P39bV76\nf/0TflZWFqZMmSJ6b8+ePd26O+ut/uzChQuxadMm1NbWYsGCBV1ep0wmQ3x8fIdDUTe+fuTIETz5\n5JNITEyEh4eHZd398Q7F1H08x0C9npubG0pLS1FaWtohFH766ScYDAbLcmlpKfz8/ABc2/F9+OGH\nEAQBZ86cQWVlJUJDQxEXF4e33noLbW1tAIBTp06hqakJU6ZMwaZNm/Dzzz8DgOVwkCOmTJmC7Oxs\nmM1mmz+rUCgs2wSA2bNno7CwEF999RXi4uI6tB86dCgaGxs73bZg43bjN74+ZswYzJw5UxQe58+f\nt/QV0Y0YDORy8+bNw8SJE3Hq1Cn4+Pjc0g0SBUHA3//+d4SGhkKr1SIjIwM5OTkArn1ivu+++zB2\n7FjMmDHwtbw7AAABFUlEQVQD2dnZuPPOO7Fw4UKEhYUhMjISo0aNwqJFi2A2mxEXFwe9Xo+oqCho\ntVq8+uqrVrd546fs698vXLgQ9913n+XEdW5uboefe/755zF69GgkJCQAuBYUDz74IJ588kmrn9zd\n3NwQHh6O8vJyq9u+8TVHXl+2bBnefvttNDU1Abj2JLTY2FirvyPd3ngTPeq3EhMTLSeKe6P29naM\nGTMG27dvF50UvlFOTg5qa2uxbNmyHt12Q0MDHnrooQ4nvIkAjhiIXOL48eMIDg7Gww8/bDMUAODp\np5/Gzp07e/zJdDk5OUhNTe3RdVL/wREDERGJcMRAREQiDAYiIhJhMBARkQiDgYiIRBgMREQkwmAg\nIiKR/weNA0BDAAUhwwAAAABJRU5ErkJggg==\n",
"text": [
"<matplotlib.figure.Figure at 0x111d6d550>"
]
}
],
"prompt_number": 86
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It looks ugly with so few data points because there are only a few thresholds that we can use - that is, only a few points where the FPR and TPR can possibly differ. But with a larger gold standard and prediction set, this becomes the familiar (mostly) smooth curve you know and love (or will soon!) A precision/recall curve uses exactly the same principle, but instead places recall (TPR) on the X axis and precision (PPV) on the Y axis:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"aPrecision, aRecall, aThresh = sklearn.metrics.precision_recall_curve(\n",
" aGoldStandard, 1 - array( aPredictionsPValues ) )\n",
"print( aThresh )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[ 0.4 0.5 0.96 0.97 0.98 0.99]\n"
]
}
],
"prompt_number": 87
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"plot( aRecall, aPrecision )\n",
"xlabel( \"Recall\" )\n",
"ylabel( \"Precision\" )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 88,
"text": [
"<matplotlib.text.Text at 0x1121249d0>"
]
},
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEKCAYAAAASByJ7AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHW9JREFUeJzt3XtwVPXdx/HPyuaRAOEuVnYzDZA0WQy5QCLSiiyNPAE6\nhos4jU/HCyCmdqjXaWuxM006lhJr+2hNq9EKrVRiEC9BJZEGWZViEipMcABpoIZZwlOUSwxKNMly\nnj+yLoTcNsnZ3Wx4v2YyyWF/nP3mBzmf/M45312LYRiGAACXvMtCXQAAoH8gEAAAkggEAIAXgQAA\nkEQgAAC8CAQAgKQABMKyZct05ZVXasqUKZ2OueeeexQXF6fk5GTt2bPH7BIAAL1geiAsXbpUZWVl\nnT6+ZcsWHTp0SDU1NXrmmWd09913m10CAKAXTA+EmTNnatSoUZ0+vnnzZt1+++2SpOnTp6u+vl7H\njx83uwwAQA9Zg/2EdXV1io6O9m3b7XYdPXpUV155ZZtxFosl2KUBwIDQ2xegCHogSO2L7ezgX1fH\nq2pI0u9+l6sHH8wNdRk9Ehcn/ec/UlSUufvNzc1Vbm6uuTsNU8zFeczFeX35ZTrogWCz2eR2u33b\nR48elc1m63Ds+PHBqqp/i4oKv7m4jPvXgLAT9B/brKwsPf/885KkiooKjRw5st3pIgBA8Jm+Qrjl\nllv0zjvv6MSJE4qOjlZeXp6am5slSTk5OZo/f762bNmi2NhYDR06VOvWrTO7hAHH6XSGuoR+g7k4\nj7k4j7kwh6W/vvy1xWLp9YURhF5UlHTsmPnXEAB0rS/HTs70AgAkEQgAAC8CAQAgiUAAAHgRCAAA\nSQQCAMCLQAAASCIQAABeBAIAQBKBAADwIhAAAJIIBACAF4EAAJBEIAAAvAgEAIAkAgEA4EUgAAAk\nEQgAAC8CAQAgiUAAAHgRCAAASQQCAMCLQAAASApQIJSVlSkhIUFxcXHKz89v9/jp06e1aNEiJScn\na/r06dq3b18gygAA9IDpgeDxeLRy5UqVlZVp//79Kioq0oEDB9qMWb16taZOnarq6mo9//zzuvfe\ne80uAwDQQ1azd1hVVaXY2FjFxMRIkrKzs1VSUiKHw+Ebc+DAAT300EOSpPj4eNXW1urTTz/VFVdc\n0WZfubm5vq+dTqecTqfZ5QJAWHO5XHK5XKbsy/RAqKurU3R0tG/bbrersrKyzZjk5GS98soruu66\n61RVVaUjR47o6NGjXQYCAKC9i39ZzsvL6/W+TD9lZLFYuh3z0EMPqb6+XqmpqSooKFBqaqoGDRpk\ndikAgB4wfYVgs9nkdrt92263W3a7vc2YqKgorV271rc9YcIETZw40exSAAA9YPoKIS0tTTU1Naqt\nrVVTU5OKi4uVlZXVZsxnn32mpqYmSdKzzz6rWbNmadiwYWaXAgDoAdNXCFarVQUFBcrMzJTH49Hy\n5cvlcDhUWFgoScrJydH+/ft1xx13yGKxKDExUc8995zZZQAAeshiGIYR6iI6YrFY1E9Lgx+ioqRj\nx1o/Awievhw76VQGAEgiEAAAXgQCAEASgQAA8CIQAACSCAQAgBeBAACQRCAAALwIBACAJAIBAOBF\nIAAAJBEIAAAvAgEAIIlAAAB4EQgAAEkEAgDAi0AAAEgiEAAAXgQCAEASgQAA8CIQAACSCAQAgFdA\nAqGsrEwJCQmKi4tTfn5+u8dPnDihuXPnKiUlRYmJifrLX/4SiDIAAD1gMQzDMHOHHo9H8fHxKi8v\nl81mU3p6uoqKiuRwOHxjcnNz9dVXX+k3v/mNTpw4ofj4eB0/flxWq/V8YRaLTC4NQRQVJR071voZ\nQPD05dhp+gqhqqpKsbGxiomJUUREhLKzs1VSUtJmzFVXXaWGhgZJUkNDg8aMGdMmDAAAwWf6Ubiu\nrk7R0dG+bbvdrsrKyjZjVqxYoe9+97saP368zpw5o40bN3a4r9zcXN/XTqdTTqfT7HIBIKy5XC65\nXC5T9mV6IFgslm7HrF69WikpKXK5XDp8+LDmzJmj6upqRV10fuHCQAAAtHfxL8t5eXm93pfpp4xs\nNpvcbrdv2+12y263txmzc+dO3XzzzZKkSZMmacKECTp48KDZpQAAesD0QEhLS1NNTY1qa2vV1NSk\n4uJiZWVltRmTkJCg8vJySdLx48d18OBBTZw40exSAAA9YPopI6vVqoKCAmVmZsrj8Wj58uVyOBwq\nLCyUJOXk5GjVqlVaunSpkpOTde7cOT366KMaPXq02aUAAHrA9NtOzcJtp+GN206B0OhXt50CAMIT\ngQAAkEQgAAC8CAQAgCQCAQDgRSAAACQRCAAALwIBACCJQAAAeBEIAABJBAIAwItAAABIIhAAAF4E\nAgBAEoEAAPAiEAAAkggEAIAXgQAAkEQgAAC8CAQAgCQCAQDgRSAAACQRCAAAr24DYceOHZozZ47i\n4uI0YcIETZgwQRMnTuzy75SVlSkhIUFxcXHKz89v9/hjjz2m1NRUpaamasqUKbJaraqvr+/9dwEA\n6DOLYRhGVwPi4+P1+OOPa+rUqRo0aJDvz8eOHdvheI/Ho/j4eJWXl8tmsyk9PV1FRUVyOBwdjn/j\njTf0+OOPq7y8vG1hFou6KQ39WFSUdOxY62cAwdOXY6e1uwEjR47UvHnz/N5hVVWVYmNjFRMTI0nK\nzs5WSUlJp4GwYcMG3XLLLX7vHwAQGN0GwuzZs/WTn/xEixcv1uWXX+7786lTp3Y4vq6uTtHR0b5t\nu92uysrKDseePXtWb731lv70pz91+Hhubq7va6fTKafT2V25AHBJcblccrlcpuyr20CoqKiQxWLR\nP//5zzZ/vn379g7HWywWv5/89ddf13XXXaeRI0d2+PiFgQAAaO/iX5bz8vJ6va9uA6GnyWOz2eR2\nu33bbrdbdru9w7Evvvgip4sAoJ/o9i6j+vp63X///Zo2bZqmTZumBx98UJ999lmn49PS0lRTU6Pa\n2lo1NTWpuLhYWVlZ7cZ99tlnevfdd7VgwYK+fQcAAFN0GwjLli3T8OHD9dJLL2njxo2KiorS0qVL\nOx1vtVpVUFCgzMxMTZ48Wd///vflcDhUWFiowsJC37jXXntNmZmZioyMNOc7AQD0Sbe3nSYnJ6u6\nurrbPzO9MG47DWvcdgqERl+Ond2uECIjI/Xee+/5tnfs2KEhQ4b06skAAP1XtxeVn376ad12222+\n6wajRo3SX//614AXBgAIrm5PGX2toaFBkjR8+PCAFvQ1ThmFN04ZAaERkE7l9evX69Zbb9Xvfve7\nNr0FhmHIYrHogQce6NUTAgD6p04D4ezZs5KkM2fOdBgIAICBxe9TRsHGKaPwxikjIDQCepfRT3/6\nUzU0NKi5uVkZGRkaO3as1q9f36snAwD0X90GwltvvaXhw4frjTfeUExMjA4fPqzf/va3wagNABBE\n3QZCS0uLpNb3LViyZIlGjBjBNQQAGIC67UO48cYblZCQoMGDB+upp57SJ598osGDBwejNgBAEPl1\nUfnkyZMaOXKkBg0apC+++EJnzpzRN77xjcAWxkXlsMZFZSA0AtKHsG3bNmVkZOjll1/2nSL6+kks\nFosWL17cqycEAPRPnQbCu+++q4yMDL3++usdXjMgEABgYKEPAQHBKSMgNALah7Bq1SrV19f7tk+f\nPq1f/OIXvXoyAED/1W0gbNmypc17Ho8aNUpvvvlmQIsCAARft4Fw7tw5ffnll77txsZGNTU1BbQo\nAEDwdduH8IMf/EAZGRlatmyZDMPQunXrdNtttwWjNgBAEPl1Ubm0tFTbtm2TJM2ZM0eZmZmBL4yL\nymGNi8pAaASkD+FCDodDVqtVc+bM0dmzZ3XmzBlF8ZMOAANKt9cQnnnmGd1888364Q9/KEk6evSo\nFi5cGPDCAADB1W0g/PGPf9SOHTt8b535rW99S5988knACwMABFe3gXD55Zfr8ssv9223tLTwaqcA\nMAB1GwizZs3Sr3/9a509e1Z///vfdfPNN+vGG2/s8u+UlZUpISFBcXFxys/P73CMy+VSamqqEhMT\n5XQ6e1U8AMA83d5ldO7cOf35z3/W1q1bJUmZmZm68847O10leDwexcfHq7y8XDabTenp6SoqKpLD\n4fCNqa+v13e+8x299dZbstvtOnHihMaOHdu2MO4yCmvcZQSERsDuMmppaVFiYqI++ugj3XXXXX7t\nsKqqSrGxsYqJiZEkZWdnq6SkpE0gbNiwQTfddJPsdrsktQsDAEDwdRkIVqtV8fHxOnLkiL75zW/6\ntcO6ujpFR0f7tu12uyorK9uMqampUXNzs2bPnq0zZ87o3nvv1a233tpuX7m5ub6vnU4np5YA4CIu\nl0sul8uUfXXbh3Dq1CldffXVuuaaazR06FBJrUuSzZs3dzjenwvOzc3N2r17t7Zt26azZ89qxowZ\nuvbaaxUXF9dm3IWBAABo7+JflvPy8nq9r24D4ZFHHpGkNuekujro22w2ud1u37bb7fadGvpadHS0\nxo4dq8jISEVGRur6669XdXV1u0AAAARPp4HQ2Niop59+WocOHVJSUpKWLVumiIiIbneYlpammpoa\n1dbWavz48SouLlZRUVGbMQsWLNDKlSvl8Xj01VdfqbKyUg888EDfvxsAQK91Ggi33367/uu//ksz\nZ87Uli1btH//fj3xxBPd79BqVUFBgTIzM+XxeLR8+XI5HA4VFhZKknJycpSQkKC5c+cqKSlJl112\nmVasWKHJkyeb910BAHqs09tOp0yZog8//FBS691G6enp2rNnT/AK47bTsMZtp0BoBOQd06xWa4df\nAwAGpk5XCIMGDdKQIUN8242NjYqMjGz9SxaLGhoaAlsYK4SwxgoBCI2ANKZ5PJ5eFwQACD/dvpYR\nAODSQCAAACQRCAAALwIBACCJQAAAeBEIAABJBAIAwItAAABIIhAAAF4EAgBAEoEAAPAiEAAAkggE\nAIAXgQAAkEQgAAC8CAQAgCQCAQDgRSAAACQRCAAALwIBACApQIFQVlamhIQExcXFKT8/v93jLpdL\nI0aMUGpqqlJTU/XII48EogwAQA9Yzd6hx+PRypUrVV5eLpvNpvT0dGVlZcnhcLQZN2vWLG3evNns\npwcA9JLpK4SqqirFxsYqJiZGERERys7OVklJSbtxhmGY/dQAgD4wfYVQV1en6Oho37bdbldlZWWb\nMRaLRTt37lRycrJsNpsee+wxTZ48ud2+cnNzfV87nU45nU6zywVwkYoKyWKRpk8PdSXwh8vlksvl\nkiSdPt23fZkeCBaLpdsxU6dOldvt1pAhQ1RaWqqFCxfqX//6V7txFwYCgMAxDGnbNmn1amnnTmnx\nYmnDhlBXhe40NkpffunU6dNOlZZKZ85IUl6v92f6KSObzSa32+3bdrvdstvtbcZERUVpyJAhkqR5\n8+apublZp06dMrsUAN04d0569dXW1cA990hLl0qFhaGuCl05dEh68klp/nzpyiulX/+69XNxsXTs\nWN/2bfoKIS0tTTU1NaqtrdX48eNVXFysoqKiNmOOHz+ucePGyWKxqKqqSoZhaPTo0WaXAqATLS1S\nUZG0Zo0UGSk9/LC0YIF02WWsDPqbxkbpnXek0lL5VgHz5rWG94YN0siR5j2X6YFgtVpVUFCgzMxM\neTweLV++XA6HQ4XeXztycnK0adMmPfXUU7JarRoyZIhefPFFs8sA0IEvv5TWrZMefVSKiZEef1y6\n4YbWawboPw4dOh8AO3ZIycmtIVBcLKWkBO7fy2L009t9LBYLdyKFsaio1uVrVFSoK4HU+lvl009L\n//u/0rRp0s9/Ln372x2P3bBBeuMNVgrB1NkqYN48ac6cnq0C+nLsNH2FAKD/OHlS+sMfpD/9qfXA\nUlYmJSWFuipIoVsFdIVAAAagujrp979vPT20ZIn0/vtSbGyoq7q0BfNaQG8RCMAAcuhQ6/WBTZuk\nO+6Q9u6VLrrJD0HUH1cBXSEQgAHgww+l3/xG2rpV+tGPpH/9Sxo7NtRVXXrCYRXQFQIBCGMVFa3N\nZLt2Sfff33rhePjwUFd1aQm3VUBXCAQgzFzYVfzvf0s/+1nrwScyMtSVXRrCfRXQFQIBCBPnzkkl\nJa2nhj7/vPXW0exsKSIi1JUNfANpFdAVAgHo57rqKkZgXLgK2LKlNYAHyiqgKwQC0E/RVRxcna0C\nNm4cWKuArhAIQD9zcVfxCy903lWM3rtUVwFdIRCAfoKu4sBjFdA1AgEIMbqKA4dVQM8QCECI0FUc\nGF2tApKTuRjfFQIBCDK6is3V0Spg7tzWVcALL0ijRoW6wvBBIABBQlexeVgFBAaBAAQQXcXmYBUQ\nHAQCEAB0Ffcdq4DgIxAAE9FV3HusAkKPQABMQFdx77AK6F8IBKAP6CruGVYB/RuBAPQCXcX+YxUQ\nPggEoAfoKu4eq4DwRSAAfqCruGusAgaGgPwzlZWVKSEhQXFxccrPz+903K5du2S1WvXKK68Eogyg\nzz78UPqf/5GuvVb6xjdau4p//3vCoLGx9TTZvfdKcXHSzJnS7t2tq4AjR6T33pNWrZJSUwmDcGL6\nCsHj8WjlypUqLy+XzWZTenq6srKy5HA42o372c9+prlz58owDLPLAPqEruL2WAUMfKYHQlVVlWJj\nYxUTEyNJys7OVklJSbtAePLJJ7VkyRLt2rXL7BKAXqGruC2uBVx6TA+Euro6RUdH+7btdrsqKyvb\njSkpKdHbb7+tXbt2ydLJzdq5ubm+r51Op5xOp9nlAnQVd6C0VLrySlYB4cDlcsnlcpmyL9MDobOD\n+4Xuu+8+rVmzRhaLRYZhdHrK6MJAAMxGV3HHMjOltWslp5NVQDi4+JflvLy8Xu/L9ECw2Wxyu92+\nbbfbLftFV+A++OADZWdnS5JOnDih0tJSRUREKCsry+xygHboKu7amDHSokWhrgKhYHogpKWlqaam\nRrW1tRo/fryKi4tVVFTUZsy///1v39dLly7VjTfeSBgg4OgqBrpmeiBYrVYVFBQoMzNTHo9Hy5cv\nl8PhUGFhoSQpJyfH7KcEukRXMeAfi9FP7/n8+voCwlNUlHTsWOvnULm4q/inP6WrGANfX46dl/jl\nMwxEhw5Jd90lTZnSeivp3r3SM88QBkB3CAQMGHQVA31DICDsVVRIWVnSf/+3lJLS2lT2q1/xxvVA\nT/HidghLdBUD5iMQEFboKgYCh0BAWLiwq3jIkNZX0qSrGDAXgYB+ja5iIHgIBPRLdBUDwUcgoF+h\nqxgIHc7Aol+oq5MefLD13bf+7/9a36t4wwbCAAgmAgEhRVcx0H8QCAgJuoqB/odAQFDRVQz0X1xU\nRsDRVQyEBwIBAXPunPTqq3QVA+GC90NAQERFSVddJY0YQVcxEEx9OXayQkBA3HuvNGsWXcVAOGGF\nAAADCO+YBgDoMwIBACCJQAAAeBEIAABJBEJYcLlcoS6h32AuzmMuzmMuzBGQQCgrK1NCQoLi4uKU\nn5/f7vGSkhIlJycrNTVV06ZN09tvvx2IMgYM/rOfx1ycx1ycx1yYw/Q+BI/Ho5UrV6q8vFw2m03p\n6enKysqSw+Hwjbnhhhu0YMECSdKHH36oRYsW6dChQ2aXAgDoAdNXCFVVVYqNjVVMTIwiIiKUnZ2t\nkpKSNmOGDh3q+/rzzz/XWF7ZDABCzzDZSy+9ZNx5552+7fXr1xsrV65sN+7VV181EhISjBEjRhiV\nlZXtHpfEBx988MFHLz56y/RTRhY/X6dg4cKFWrhwod577z3deuutOnjwYJvHDbqUASCoTD9lZLPZ\n5Ha7fdtut1v2Lt71ZObMmWppadHJkyfNLgUA0AOmB0JaWppqampUW1urpqYmFRcXKysrq82Yw4cP\n+1YAu3fvliSNGTPG7FIAAD1g+ikjq9WqgoICZWZmyuPxaPny5XI4HCosLJQk5eTk6OWXX9bzzz+v\niIgIDRs2TC+++KLZZQAAeqrXVx9MUlpaasTHxxuxsbHGmjVrOhzz4x//2IiNjTWSkpKM3bt3B7nC\n4OluLv72t78ZSUlJxpQpU4xvf/vbRnV1dQiqDA5//l8YhmFUVVUZgwYNMl5++eUgVhdc/szF9u3b\njZSUFOPqq682Zs2aFdwCg6i7ufj000+NzMxMIzk52bj66quNdevWBb/IIFi6dKkxbtw4IzExsdMx\nvTluhjQQWlpajEmTJhkff/yx0dTUZCQnJxv79+9vM+bNN9805s2bZxiGYVRUVBjTp08PRakB589c\n7Ny506ivrzcMo/UH41Kei6/HzZ492/je975nbNq0KQSVBp4/c3H69Glj8uTJhtvtNgyj9aA4EPkz\nF7/85S+Nhx56yDCM1nkYPXq00dzcHIpyA+rdd981du/e3Wkg9Pa4GdKXrvCnZ2Hz5s26/fbbJUnT\np09XfX29jh8/HopyA8qfuZgxY4ZGjBghqXUujh49GopSA86fuZCkJ598UkuWLNEVV1wRgiqDw5+5\n2LBhg2666SbfzRsDta/Hn7m46qqr1NDQIElqaGjQmDFjZLUOvPcBmzlzpkaNGtXp4709boY0EOrq\n6hQdHe3bttvtqqur63bMQDwQ+jMXF3ruuec0f/78YJQWdP7+vygpKdHdd98tyf/bncONP3NRU1Oj\nU6dOafbs2UpLS9P69euDXWZQ+DMXK1as0L59+zR+/HglJyfriSeeCHaZ/UJvj5shjU5/f4iNi3oS\nBuIPf0++p+3bt2vt2rX6xz/+EcCKQsefubjvvvu0Zs0a37tDXfx/ZKDwZy6am5u1e/dubdu2TWfP\nntWMGTN07bXXKi4uLggVBo8/c7F69WqlpKTI5XLp8OHDmjNnjqqrqxUVFRWECvuX3hw3QxoI/vQs\nXDzm6NGjstlsQasxWPzt39i7d69WrFihsrKyLpeM4cyfufjggw+UnZ0tSTpx4oRKS0sVERHR7hbn\ncOfPXERHR2vs2LGKjIxUZGSkrr/+elVXVw+4QPBnLnbu3KmHH35YkjRp0iRNmDBBBw8eVFpaWlBr\nDbVeHzdNucLRS83NzcbEiRONjz/+2Pjqq6+6vaj8/vvvD9gLqf7MxZEjR4xJkyYZ77//foiqDA5/\n5uJCd9xxx4C9y8ifuThw4ICRkZFhtLS0GF988YWRmJho7Nu3L0QVB44/c3H//fcbubm5hmEYxn/+\n8x/DZrMZJ0+eDEW5Affxxx/7dVG5J8fNkK4Q/OlZmD9/vrZs2aLY2FgNHTpU69atC2XJAePPXPzq\nV7/S6dOnfefNIyIiVFVVFcqyA8KfubhU+DMXCQkJmjt3rpKSknTZZZdpxYoVmjx5cogrN58/c7Fq\n1SotXbpUycnJOnfunB599FGNHj06xJWb75ZbbtE777yjEydOKDo6Wnl5eWpubpbUt+OmxTAG6MlX\nAECP8I5pAABJBAIAwItAAABIIhAAAF4EAi5pgwYNUmpqqpKSkrR48WJ9/vnnpu4/JiZGp06dkiQN\nGzbM1H0DZiMQcEkbMmSI9uzZo71792r48OG+WxjNcmF36EDssMfAQiAAXjNmzNDhw4cltb6J07x5\n85SWlqbrr7/e9xavx48f16JFi5SSkqKUlBRVVFRIkhYtWqS0tDQlJibq2WefDdn3APTFwHsZQKAX\nPB6Ptm7dqoyMDEnSXXfdpcLCQsXGxqqyslI/+tGPtG3bNt1zzz2aPXu2Xn31VZ07d853imnt2rUa\nNWqUGhsbdc0112jJkiUD9qVFMHDRmIZLmtVq1ZQpU1RXV6eYmBhVVFTo7NmzGjdunOLj433jmpqa\ntG/fPo0bN051dXWKiIhos5/c3Fy99tprkqTa2lpt3bpV11xzjSZMmKAPPvhAo0ePVlRUlM6cORPU\n7w/oCVYIuKRFRkZqz549amxsVGZmpkpKSnTDDTdo5MiR2rNnT4d/5+LfoVwul7Zt26aKigoNHjxY\ns2fP1pdffhmM8gFTcQ0BUGsw/OEPf9DDDz+sYcOGacKECdq0aZOk1gDYu3evJCkjI0NPPfWUpNbT\nTA0NDWpoaNCoUaM0ePBgffTRR77rCkC4IRBwSbvwzp+UlBTFxsZq48aNeuGFF/Tcc88pJSVFiYmJ\n2rx5syTpiSee0Pbt25WUlKS0tDQdOHBAc+fOVUtLiyZPnqyf//znmjFjRrfPBfRHXEMAAEhihQAA\n8CIQAACSCAQAgBeBAACQRCAAALwIBACAJOn/AVxPk+ZosUJgAAAAAElFTkSuQmCC\n",
"text": [
"<matplotlib.figure.Figure at 0x1120cf090>"
]
}
],
"prompt_number": 88
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Recision/recall and ROC exercises\n",
"\n",
"**1.** What does a hypothesis test determine with respect to a null hypothesis?"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"strFillInYourAnswerHere01 = \"\"\"here\"\"\""
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 89
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**2.** Can the null hypothesis be proved by a hypothesis test? Disproved?"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"strFillInYourAnswerHere02 = \"\"\"here\"\"\""
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 90
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**3.** Define \"p-value\" and \"power\" in a sentence or two (or three) each."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"strAnswerPValue03 = \"\"\"here\"\"\"\n",
"\n",
"strAnswerPower03 = \"\"\"here\"\"\""
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 91
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**4.** Define \"gold standard\", \"positive example\", and \"negative example\"."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"strAnswerGS04 = \"\"\"here\"\"\"\n",
"\n",
"strAnswerPositive04 = \"\"\"here\"\"\"\n",
"\n",
"strAnswerNegative04 = \"\"\"here\"\"\""
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 92
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**5.** Consider the following precision/recall and ROC plots for the same data:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"aGoldStandard = ( [True] * 50 ) + ( [False] * 50 )\n",
"aPredictions = rand( len( aGoldStandard ) )\n",
"# Yes, this permits the \"p-values\" to be slightly negative, but it's convenient\n",
"aPredictions[:25] -= standard_normal( 25 ) / 5\n",
"aPredictions[:5] = rand( 5 ) / 10000\n",
"\n",
"aPrecision, aRecall, aThresh = sklearn.metrics.precision_recall_curve( aGoldStandard, 1 - aPredictions )\n",
"plot( aRecall, aPrecision )\n",
"xlabel( \"Recall\" )\n",
"ylabel( \"Precision\" )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 93,
"text": [
"<matplotlib.text.Text at 0x1121c8350>"
]
},
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEKCAYAAAASByJ7AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XtYVNXeB/DvxJCi4i3UdIbzosLhIlfDW5kOkS9qSYp6\nxEpNTe1iedRO2eV5go6alJ206ChpWtqBvNQJK0WTnMxjgCmiKSp5xHegRFERDRUY1vvHGlGCgZlh\nzwzg9/M8PjHOYs/Pne4va6+91lIJIQSIiOi2d4ezCyAioqaBgUBERAAYCEREZMJAICIiAAwEIiIy\nYSAQEREAOwTCtGnT0K1bNwQFBZlt8/zzz8PHxwchISHIzs5WugQiIrKB4oEwdepUpKWlmX1/69at\n+OWXX5CXl4cPP/wQTz/9tNIlEBGRDRQPhPvvvx+dOnUy+/6WLVswZcoUAMCAAQNQUlKCoqIipcsg\nIiIrqR39gYWFhfD09Kx+rdVqUVBQgG7dutVop1KpHF0aEVGLYOsCFA4PBKB2seYu/oWFzXNVjenT\ngRkzgJgYZY4XFxeHuLg4ZQ7WzPFc3MRzcRPPxU2N+WHa4YGg0WhgMBiqXxcUFECj0dTZtkcPR1Wl\nLDc3Z1dARGQ9hz92Gh0djXXr1gEAMjIy0LFjx1q3i4iIyPEU7yFMnDgR33//PYqLi+Hp6Yn4+HhU\nVFQAAGbNmoWRI0di69at8Pb2Rtu2bbF27VqlS2hxdDqds0toMngubuK5uInnQhmqprr8tUqlsnlg\nxNliYoDHH1duDIGIyFKNuXZypjIREQFgIBARkQkDgYiIADAQiIjIhIFAREQAGAhERGTCQCAiIgAM\nBCIiMmEgEBERAAYCERGZMBCIiAgAA4GIiEwYCEREBICBQEREJgwEIiICwEAgIiITBgIREQFgIBAR\nkQkDgYiIADAQiIjIhIFAREQAGAhERGTCQCAiIgAMBCIiMmEgEBERAAYCERGZMBCIiAgAA4GIiEzs\nEghpaWnw8/ODj48PEhISar1/8eJFjBkzBiEhIRgwYACOHDlijzKIiMgKigeC0WjE7NmzkZaWhqNH\njyIlJQW5ubk12ixevBh9+/ZFTk4O1q1bhzlz5ihdBhERWUnxQMjKyoK3tze8vLzg6uqK2NhYpKam\n1miTm5uLiIgIAICvry/y8/Nx7tw5pUshIiIrqJU+YGFhITw9Patfa7VaZGZm1mgTEhKCL774AoMH\nD0ZWVhZOnz6NgoICdOnSpUa7uLi46q91Oh10Op3S5RIRNWt6vR56vV6RYykeCCqVqsE2CxYswJw5\ncxAWFoagoCCEhYXBxcWlVrtbA4GIiGr74w/L8fHxNh9L8UDQaDQwGAzVrw0GA7RabY027u7uWLNm\nTfXrnj17olevXkqXQkREVlB8DCE8PBx5eXnIz89HeXk5NmzYgOjo6BptLl26hPLycgDAqlWrMHTo\nULRr107pUoiIyAqKB4JarUZiYiKioqIQEBCACRMmwN/fH0lJSUhKSgIAHD16FEFBQfDz88P27dux\nfPlypctokkpKgMREQAhnV0JEVJtKiKZ5eVKpVGiipTUoJgZ4/HH53xtycoCxY4GTJ4HycsDV1Xn1\nEVHL1ZhrJ2cqO8DHHwMPPgi88QagVnzUhohIGbw82dG1a8DzzwO7dwN6PdCnDzBlirOrIiKqGwPB\nTk6dAu67D+jdG9i3D3B3d3ZFRET14y0jO3n5ZWDSJGDDBoYBETUPHFS2gy1bAA8P4N57a7/n6gqU\nlXFQmYjsozHXTgaCgzEQiMie+JQRERE1GgOBiIgAMBCIiMiEgUBERAAYCEREZMJAICIiAAwEIiIy\nYSAQEREABgIREZkwEIiICAADgYiITBgIREQEgIFAREQmDAQiIgLAQCAiIhMGAhERAWAgEBGRCQOB\niIgAMBCIiMiEgUBERAAYCEREZMJAICIiAHYKhLS0NPj5+cHHxwcJCQm13i8uLsbw4cMRGhqKwMBA\nfPzxx/Yog4iIrKASQgglD2g0GuHr64udO3dCo9GgX79+SElJgb+/f3WbuLg4XL9+HW+++SaKi4vh\n6+uLoqIiqNXqm4WpVFC4tCbB1RUoK5P/JSJSWmOunYr3ELKysuDt7Q0vLy+4uroiNjYWqampNdp0\n794dpaWlAIDS0lLcddddNcKAiIgcT/GrcGFhITw9Patfa7VaZGZm1mgzY8YMPPDAA+jRowcuX76M\njRs31nmsuLi46q91Oh10Op3S5RIRNWt6vR56vV6RYykeCCqVqsE2ixcvRmhoKPR6PU6ePIlhw4Yh\nJycH7u7uNdrdGghERFTbH39Yjo+Pt/lYit8y0mg0MBgM1a8NBgO0Wm2NNnv37sX48eMBAL1790bP\nnj1x/PhxpUshIiIrKB4I4eHhyMvLQ35+PsrLy7FhwwZER0fXaOPn54edO3cCAIqKinD8+HH06tVL\n6VKIiMgKit8yUqvVSExMRFRUFIxGI6ZPnw5/f38kJSUBAGbNmoVXXnkFU6dORUhICKqqqvDWW2+h\nc+fOSpdCRERWUPyxU6XwsVMiIus1qcdOiYioeWIgEBERAAYCERGZMBCIiAgAA4GIiEwYCEREBICB\nQEREJgwEIiICwEAgIiITBgIREQFgIBARkQkDgYiIAFiw2umePXsQHx+P/Px8VFZWApCLJ/33v/+1\ne3FEROQ4Da526uvri2XLlqFv375wcXGp/n0PDw/7FsbVTomIrNaYa2eDPYSOHTtixIgRNh2ciIia\njwZ7CAsWLIDRaERMTAxatWpV/ft9+/a1b2HsIRARWa0x184GA0Gn00GlUtX6/V27dtn0gZZiIBAR\nWc+ugeAsDAQiIuvZdce0kpISzJ07F/fccw/uuecezJ8/H5cuXbLpw4iIqOlqMBCmTZuG9u3bY9Om\nTdi4cSPc3d0xdepUR9RGREQO1OAto5CQEOTk5DT4e4oXxltGRERWs+stIzc3N/zwww/Vr/fs2YM2\nbdrY9GFERNR0NTgPYeXKlZg8eXL1uEGnTp3wySef2L0wIiJyLIufMiotLQUAtG/f3q4F3cBbRkRE\n1rPLTOX169dj0qRJeOedd2rMQxBCQKVSYd68eTZ9IBERNU1mA6GsrAwAcPny5ToDgYiIWhZOTHMw\n3jIiInuy61NGL774IkpLS1FRUYHIyEh4eHhg/fr1Nn0YERE1XQ0Gwvbt29G+fXt8/fXX8PLywsmT\nJ/H222/X+z1paWnw8/ODj48PEhISar2/dOlShIWFISwsDEFBQVCr1SgpKbH9T0FERI3WYCDc2BTn\n66+/xrhx49ChQ4d6xxCMRiNmz56NtLQ0HD16FCkpKcjNza3R5oUXXkB2djays7Px5ptvQqfToWPH\njo38oxARUWM0GAijRo2Cn58f9u/fj8jISJw9exatW7c22z4rKwve3t7w8vKCq6srYmNjkZqaarZ9\ncnIyJk6caFv1RESkmAYnpi1ZsgR/+9vf0LFjR7i4uKBt27b1XuALCwvh6elZ/Vqr1SIzM7POtmVl\nZdi+fTv++c9/1vl+XFxc9dc6nQ46na6hcomIbit6vR56vV6RY5kNhPT0dERGRuLzzz+vvkV0Y+Ra\npVIhJiamzu+z5pHUr776CoMHDzZ7u+jWQCAiotr++MNyfHy8zccyGwi7d+9GZGQkvvrqqzov8uYC\nQaPRwGAwVL82GAzQarV1tv3ss894u4iIqIlQfB5CZWUlfH19kZ6ejh49eqB///5ISUmBv79/jXaX\nLl1Cr169UFBQADc3t9qFcR4CEZHV7DoP4ZVXXqnxSOjFixfx2muvmW2vVquRmJiIqKgoBAQEYMKE\nCfD390dSUhKSkpKq23355ZeIioqqMwyIiMjxGuwhhIaG4uDBgzV+LywsDNnZ2fYtjD0EIiKr2bWH\nUFVVhWvXrlW/vnr1KsrLy236MCIiaroafOz0scceQ2RkJKZNmwYhBNauXYvJkyc7ojYiInIgiwaV\nt23bhvT0dADAsGHDEBUVZf/CeMuIiMhqdtkP4Vb+/v5Qq9UYNmwYysrKcPnyZbi7u9v0gURE1DQ1\nOIbw4YcfYvz48XjqqacAAAUFBRg9erTdCyMiIsdqMBA++OAD7Nmzp3rrzD//+c84e/as3QsjIiLH\najAQWrVqhVatWlW/rqys5I5pREQtUIOBMHToUCxatAhlZWX49ttvMX78eIwaNcoRtRERkQM1+JRR\nVVUVVq9ejR07dgAAoqKi8OSTT9q9l8CnjIiIrNeYa2e9gVBZWYnAwEAcO3bM5uJsxUAgIrKe3WYq\nq9Vq+Pr64vTp0zYdnIiImo8G5yFcuHABffr0Qf/+/dG2bVsAMoG2bNli9+KIiMhxGgyEhQsXAkCN\nLgifMiIiannMBsLVq1excuVK/PLLLwgODsa0adPgyhvfREQtltkxhClTpmD//v0IDg7G1q1b8cIL\nLziyLiIicjCzTxkFBQXh8OHDAOTTRv369bP7Hgg1CrsNnzISAuDdOCJqDLs8ZaRWq+v8muxj1Spg\n2jRnV0FEtzOzPQQXFxe0adOm+vXVq1ert7tUqVQoLS21b2G3UQ/h9GkgMBAYPBjYts15tRFR82eX\n5a+NRqPNBZHlhABmzgT69nV2JUR0u2twLSOyr48/BoqLgfnznV0JEd3uODjgRL/+Crz0EvDtt/Jr\nIiJnYg/BSYQAnn4aeOopICTE2dUQEbGH4DQbNgAnTwIbNzbc9vhxwMMDuOsu+9dFRLcv9hCc4Nw5\n4K9/BdasAW7Ze6hORUXAkCGWBQcRUWMwEJzgueeAyZOB/v3rb1dVBTzxBHDlirKff/06cO2assck\nouaPgeAEhw8D8fENt3vvPeDiRWDiROU+u6QEGDgQWL5cuWMSUcvAQHCwdu2Ajz4CTHP8zDp4EFi0\nCEhOVm4znStXgJEjgcJC9hCIqDYOKjtYQQFg2lbCrLIy2StYtgzo1av+tmfPAl27Nvy5164Bo0cD\n/v5AZKTl9RLR7cMuPYS0tDT4+fnBx8cHCQkJdbbR6/UICwtDYGAgdDqdPcpokhoKAwCYNw8IDwce\ne6z+dj/8AGi1MhTqU1EBxMYCnTsDH34I3FHP//XDh+XYBRHdfhTvIRiNRsyePRs7d+6ERqNBv379\nEB0dDX9//+o2JSUlePbZZ7F9+3ZotVoUFxcrXUaz9cUXcqJaQwvL/vqrvMgD8oJvzo2B6YoK+aSS\ni4v5tp98AkydChw4AISGWl06ETVzivcQsrKy4O3tDS8vL7i6uiI2Nhapqak12iQnJ2Ps2LHQarUA\nAA8PD6XLaJZ+/VVOVktOBtq3N9+uvBwYP1627dLFfDshgGeekWMGmzcDd95pvu2aNcCrrwLdu7OH\nQHS7UryHUFhYCE9Pz+rXWq0WmZmZNdrk5eWhoqICERERuHz5MubMmYNJkybVOlZcXFz11zqdrsXf\nWjp0SA4kDxhQf7t58+QktVdeAVasqLuNEHJZjAMHgJ076x/EXr1aPvX03Xc3ex1E1Dzo9Xro9XpF\njqV4IFiy33JFRQUOHDiA9PR0lJWVYdCgQRg4cCB8fHxqtLs1EFo6T095MX7ppfrbrVsH7NgB7NtX\n/1jAu+8CW7cC339ff28jKUmG0HffAX84/UTUDPzxh+V4S55pN0PxQNBoNDAYDNWvDQZD9a2hGzw9\nPeHh4QE3Nze4ublhyJAhyMnJqRUIt5PAQCAlpf422dlyVdRdu4AOHcy3+/JL4J13gB9/rH+5iw8+\nAN56Sx6vd2/b6iailkPxMYTw8HDk5eUhPz8f5eXl2LBhA6Kjo2u0eeSRR7Bnzx4YjUaUlZUhMzMT\nAQEBSpfSoly4AIwdCyQmyvAwZ/9+YMYMIDUV+NOfzLfbuBFYuhTQ6xkGRCQp3kNQq9VITExEVFQU\njEYjpk+fDn9/fyQlJQEAZs2aBT8/PwwfPhzBwcG44447MGPGDAZCAxYtkoPIEyaYb2MwAI88Ih8t\nDQ83306tlnMddu0CvLwUL5WImimzW2g6W0vdQtMWzz0n5wfs3Ckv5rfSaICsLDlOMHgwMGkS8MIL\n9R/v4kU56Ny5c+33+vaVg8zcwY2oebLLFprUdLz+unxK6I9hcENlpRyQHjjQsp3XOnVStj4iahkY\nCM1AQ9M05s+XE88SEwELHvIiIqoTF7drAY4dAzZtUm4RPEdISwOGDXN2FUR0K/YQmrnp04Fp0+p/\nDLUpqaoC3nhDzn84f97Z1RDRrRgIzdwbbzi7AsudPw88/jhw9SqQmQl4ezu7IiK6FW8ZkUP89BNw\nzz1yDsXOncDdd9t2nB07gDNnlK2NiCQGAtmVEMCqVXJjnnfeAd5+2/zTUvW5dAmYMgWIigK2b1e+\nTiJiIJAdXb8uZ00vWyb3bhg71rbj/PCDXI67dWsgJkbZGonoJgYC2cXZs3JntgsX5HiBr6/1xygv\nB15+GfjLX+T+0klJcgtSIrIPBgIp7tAhuYR3RITch8GWi3hurpxo9/PPcn/pUaOUr5OIamIgkKJS\nU2XPYPFi4O9/r3+J7roIAaxcCQwZAjz1FLBlC9Ctm31qrc+VK7KO69cd/9lEzsLHTski587J/RWm\nTKn7fSGAhAQ5W/qbb4D+/a3/jEuX5JjDiRPAnj223WZqLCHkMuQvviifZoqIcE4dRM7AHgI16Ndf\ngaFD5Rabdbl2TQbF5s1yvMCWMNi3Ty6o16ULkJFh/UX42jV5IW/MeojZ2cD998unoTZs4LLgdPth\nIFC98vPlRTIysu73L1wA/vd/5QV59265+qo1hJC7uz30kOxhfPCBfJrIGllZQFgY8OijQEmJdd8L\nyN7PrFnAiBHAE0/I4913n/XHIWruGAhk1rFj8l7+vHl1b+35f/8nl9weMAD47DOgTRvrjl9VJfdv\nSEmRvYJx46z7/uvX5b7S0dFAXJz1y3cYjfIWV0CAXE02Nxd48knAxcW64xC1FAwEqlNODvDAA3Jp\njGefrfv9++4DZs6Uk82sHTxWqeT3eHvL8YJevaz7/v375czn3FxZy4QJ1q30um+fvLW1ebPcNW7Z\nMi4LTsRBZaolIwOIj5c/PY8fX/v99HRg4kT5/l/+YttnuLoCv/1W/57P5nz4IfDLL/JW08SJ1gVB\nSYnsVXzxhdxPetIkLhlOdAN7CFTL3/4GrF1bdxgkJ8t79Zs22R4GN9gSBl27yv0hDh6UdVh6MRcC\n+PRTwN9ffp2bC0yezDAguhV7CFTDjTWHIiJqv3fmDLBggewhBAY6vjZA3p6y1okTck5DSQnw5Zdy\nzIOIamMPgWpYuLDuMOjYUa4jtHev88LAWhUVcoLcvffKweusLIYBUX3YQyCLtGsHbNzo7Cos99NP\n8omh7t3lAPT//I8yx62qkreeDh0Cli5V5phETQUDgVqUsjLZK1i/Xl6wH3tMuXGC3bvlI7ilpfI1\nA4FaGt4yohZl4ECgsBA4fFjuzqZEGJw8KZfunjQJmD8f+Oqrxh+TqCliIFCLMWaMXJAuOVkugaGE\n+Hg57nDPPXKinrWPuRI1J7xlRC3GmjXKHq9HDzn7+uefbd/yk6g5YSAQmbFrl7MrIHIs3jIiIiIA\nDAQiIkVUVQG//173e41Zlt2RGAhETVh5efO/yLRk167JjaNmzpRjTtHRN98rK5M7/t2YD5OU5Lw6\nLWWXQEhLS4Ofnx98fHyQkJBQ6329Xo8OHTogLCwMYWFhWLhwoT3KIGq2rl4F3ntPrgI7d27N9y5e\nlDPKu3UDduywbx1CAN99B8yeLWu6VWWlXAqkrsH8q1fl0145OfatzxlKSoB//Uuu9dWtG7BkCeDn\nJ5dVKS4GVq+WwXD33cDy5UBQEDBsmG17dTia4oPKRqMRs2fPxs6dO6HRaNCvXz9ER0fD39+/Rruh\nQ4diy5YtSn88UbN2Yy/nd96Rj7tOmCA38AHkznXvvisvwNHRcke3G5PklFZSAqxbB6xYIfeHOHVK\n7onh6Sn3wVi9GvjoI8DdXe4lMW2a/L6DB+V7KSlyRduZM4GQEPvU6Ehnzsjw+/e/gR9/lMu7PPKI\n3NCpa1fZ5uef5RyYnTvl48mffHJzSfW69hNpihTvIWRlZcHb2xteXl5wdXVFbGwsUlNTa7UT7O8S\nVbt0Sc6w7t1brrm0fbu8AIWEyCCYOVOuIVVeLrf6XLvW+t3pLHHwoNw9rmdPuW5VUpKc5Ne5M7Bt\nG/Dww3J3upISWWNKiuwNrFwJhIfLi2SXLrLGZ55Rvj5HOnUK+Mc/5CZQ/v7ADz/I/w+//QakpsoQ\nvBEGgPz/U1wsN4uaOLF57q+heA+hsLAQnp6e1a+1Wi0yMzNrtFGpVNi7dy9CQkKg0WiwdOlSBAQE\n1DpWXFxc9dc6nQ46nU7pcokUc+IE8NprcrvRWbMs+55Ll+RthfffB4YPl5v13NqZbtsWOHBALtB3\n4oRc+ltpFRVyf4j33wdOn5a15+bWnHvRpo3smcyaJde0urE73vHjwH//K1fAXbQIePDB+necMxhk\nz+Pzz+VP20qtMaWU48dlbZs3y5/2H3lE7iX+wANAq1bOrq5uer0e336rR36+3CekMRQPBJUF0zj7\n9u0Lg8GANm3aYNu2bRg9ejROnDhRq92tgUDUVBUVyRnNmzbJHeDy8hr+nsuX5RjBsmVyL+e9ewEf\nn9rtYmLk7SFXV+XrPntWbja0YoX87Llz5QVQXcdV4fBh4M47a/++r6/8s9R3sbx2Tf5EvWaN3Klu\nwgQ5DnLunPMDQQjgyBEZAJ9/LvcIj4mRt+YGD26626kKARw9CqSlAWlpOmRk6BAeLmfrZ2XF23xc\nxW8ZaTQaGAyG6tcGgwFarbZGG3d3d7Qx/YgxYsQIVFRU4MKFC0qXQmRXV67IIAgIAFq3lktbjB3b\n8PcsWSJvDeXmyu1D162rOwwAuUyG0mHw00/AlCnyYn76tLwVpNfL2usKA6DuMLihvjDYtAnQaoFV\nq+RnFhbKAKprcyQh5C2rzZut+uNYTQi5Wu1rr8ne2EMPybGYpCTZg3n/fWDo0KYXBpcvy17VzJnA\nn/4k6/7lFzngX1goJ1IuWNC4z1C8hxAeHo68vDzk5+ejR48e2LBhA1JSUmq0KSoqQteuXaFSqZCV\nlQUhBDp37qx0KUR289tvwJ//LAcXf/pJ3nOvT1mZHIBculTefvj++5q3hpTy66+y5/H11/LieuMC\nf+OJoHffBQoK5D7Z//iHbbvWWSoiQgbapEmAl5f5dmfOyKd2PvlE9lruuAMYN07ZWoSQg74bN8pf\n16/Lp4TWrQP69Wua61Pd6AVs3SpDe98+uXjjiBGyN+fnp3zdigeCWq1GYmIioqKiYDQaMX36dPj7\n+yPJ9BDurFmzsHnzZqxYsQJqtRpt2rTBZ599pnQZRHbTrZvsms+ZIxe9q095uXwaZ+FC+Y/5u++A\nPn2Ur+nnn+WTSamp8gJ87BhgNMo5DKtXy596tVp5IRk92nxPQEn33y9/mfP118Drr8vbZaNHyyDz\n9gb696+7/Y2wsGYc5ehROci7aZMM5fHj5dLoTTUEysrk35FvvpFBoFLJXQznzpUB266dnQsQTVQT\nLo3IrLffFmL+fCEqK4VYt06Inj2FiIoSYt8+5T9r7Fgh5s0TYvhwIe6+W4hFi4Q4f16+d+edQjz9\ntBCdOgkxcaIQmZnKf35jPPywEJGR8hxduXLz9wsLheje/ebr338XIjlZiJEjhVCrhXj22YaPnZcn\nxMKFQgQGCqHRCDF3rhAZGUJUVSn/57DUiy8KsWRJ3e+dOiVEYqIQI0YI0a6dEEOHCvHWW0IcOWJb\nzY25dnJxOyKF7dsnHxft0EE+Hjp0qH0+p21b+VPkCy/Ie8utW998r18/oH17ea/8D0N4TUJ9e0pU\nVcln+T/9VPZ4Bg2Se1s88ID5AXuDQd4K+uwzOU9i/Hjgn/8E7rtP9iqaEqMRyMiQPaSvvpI9nxEj\ngCeekJP5OnZ0Xm0MBCIFde0qb9MkJMiuvj1vS6xeLQc+67rg7dljv8+1pzvvlE9tvfyyDIGEBHmL\nDpBzHW5VXCwHoJOT5ZNCY8YAb74J6HSOuSVmrcxMYPJkOR7QowcwapT8f9ivX9MZwFaZuhhNjkql\n4uQ1otvQ+fN1D3avXAn85z/yp+nkZBl6w4cDjz4KREU13XkCgBww37BBTux7+GH5lJC9NObayUAg\nomZh7Vpgxgx58X/0UTlnwu6DrM0QA4GIWrzycvkUjjPvsTcHDAQiIgLQuGtnExt/JyIiZ2EgEBER\nAAYCERGZMBCIiAgAA4GIiEwYCEREBICBQEREJgwEIiICwEAgIiITBgIREQFgIBARkQkDgYiIADAQ\niIjIhIFAREQAGAhERGTCQCAiIgAMBCIiMmEgEBERAAYCERGZMBCIiAgAA4GIiEwYCM2AXq93dglN\nBs/FTTwXN/FcKMMugZCWlgY/Pz/4+PggISHBbLt9+/ZBrVbjiy++sEcZLQb/st/Ec3ETz8VNPBfK\nUDwQjEYjZs+ejbS0NBw9ehQpKSnIzc2ts91LL72E4cOHQwihdBlERGQlxQMhKysL3t7e8PLygqur\nK2JjY5Gamlqr3fvvv49x48ahS5cuSpdARES2EArbtGmTePLJJ6tfr1+/XsyePbtGm4KCAqHT6URV\nVZV44oknxOeff17rOAD4i7/4i7/4y4ZftlJDYSqVqsE2f/3rX7FkyRKoVCoIIeq8ZVTX7xERkf0o\nHggajQYGg6H6tcFggFarrdFm//79iI2NBQAUFxdj27ZtcHV1RXR0tNLlEBGRhVRC4R/FKysr4evr\ni/T0dPTo0QP9+/dHSkoK/P3962w/depUjBo1CjExMUqWQUREVlK8h6BWq5GYmIioqCgYjUZMnz4d\n/v7+SEpKAgDMmjVL6Y8kIiIl2Dz6oJBt27YJX19f4e3tLZYsWVJnm+eee054e3uL4OBgceDAAQdX\n6DgNnYtPP/1UBAcHi6CgIHHvvfeKnJwcJ1TpGJb8vRBCiKysLOHi4lLngwkthSXnYteuXSI0NFT0\n6dNHDB061LEFOlBD5+LcuXMiKipKhISEiD59+oi1a9c6vkgHmDp1qujatasIDAw028aW66ZTA6Gy\nslL07t14QGS/AAAFnUlEQVRbnDp1SpSXl4uQkBBx9OjRGm2++eYbMWLECCGEEBkZGWLAgAHOKNXu\nLDkXe/fuFSUlJUII+Q/jdj4XN9pFRESIhx56SGzevNkJldqfJefi4sWLIiAgQBgMBiGEvCi2RJac\ni9dff10sWLBACCHPQ+fOnUVFRYUzyrWr3bt3iwMHDpgNBFuvm05dusKSOQtbtmzBlClTAAADBgxA\nSUkJioqKnFGuXVlyLgYNGoQOHToAkOeioKDAGaXaHeey3GTJuUhOTsbYsWOrH97w8PBwRql2Z8m5\n6N69O0pLSwEApaWluOuuu6BWK35n3Onuv/9+dOrUyez7tl43nRoIhYWF8PT0rH6t1WpRWFjYYJuW\neCG05Fzc6qOPPsLIkSMdUZrDWfr3IjU1FU8//TQAyx53bo4sORd5eXm4cOECIiIiEB4ejvXr1zu6\nTIew5FzMmDEDR44cQY8ePRASEoLly5c7uswmwdbrplOj09J/xOIPD0K1xH/81vyZdu3ahTVr1uA/\n//mPHStyHqXmsrQElpyLiooKHDhwAOnp6SgrK8OgQYMwcOBA+Pj4OKBCx7HkXCxevBihoaHQ6/U4\nefIkhg0bhpycHLi7uzugwqbFluumUwPBkjkLf2xTUFAAjUbjsBodxZJzAQCHDh3CjBkzkJaWVm+X\nsTnjXJabLDkXnp6e8PDwgJubG9zc3DBkyBDk5OS0uECw5Fzs3bsXr776KgCgd+/e6NmzJ44fP47w\n8HCH1upsNl83FRnhsFFFRYXo1auXOHXqlLh+/XqDg8o//vhjix1IteRcnD59WvTu3Vv8+OOPTqrS\nMSw5F7cyt/xJS2DJucjNzRWRkZGisrJS/P777yIwMFAcOXLESRXbjyXnYu7cuSIuLk4IIcSZM2eE\nRqMR58+fd0a5dnfq1CmLBpWtuW46tYdgyZyFkSNHYuvWrfD29kbbtm2xdu1aZ5ZsN5acizfeeAMX\nL16svm/u6uqKrKwsZ5ZtF5zLcpMl58LPzw/Dhw9HcHAw7rjjDsyYMQMBAQFOrlx5lpyLV155BVOn\nTkVISAiqqqrw1ltvoXPnzk6uXHkTJ07E999/j+LiYnh6eiI+Ph4VFRUAGnfdVHymMhERNU/cMY2I\niAAwEIiIyISBQEREABgIRERkwkCg25qLiwvCwsIQHByMmJgYXLlyRdHje3l54cKFCwCAdu3aKXps\nIqUxEOi21qZNG2RnZ+PQoUNo37599SOMSrl1dmhLnGFPLQsDgchk0KBBOHnyJADg5MmTGDFiBMLD\nwzFkyBAcP34cAFBUVIQxY8YgNDQUoaGhyMjIAACMGTMG4eHhCAwMxKpVq5z2ZyBqjJa3DCCRDYxG\nI3bs2IHIyEgAwMyZM5GUlARvb29kZmbimWeeQXp6Op5//nlERETg3//+N6qqqqpvMa1ZswadOnXC\n1atX0b9/f4wbN67FLi1CLRcnptFtTa1WIygoCIWFhfDy8kJGRgbKysrQtWtX+Pr6VrcrLy/HkSNH\n0LVrVxQWFsLV1bXGceLi4vDll18CAPLz87Fjxw70798fPXv2xP79+9G5c2e4u7vj8uXLDv3zEVmD\nPQS6rbm5uSE7OxtXr15FVFQUUlNT8eCDD6Jjx47Izs6u83v++DOUXq9Heno6MjIy0Lp1a0RERODa\ntWuOKJ9IURxDIIIMhvfeew+vvvoq2rVrh549e2Lz5s0AZAAcOnQIABAZGYkVK1YAkLeZSktLUVpa\nik6dOqF169Y4duxY9bgCUXPDQKDb2q1P/oSGhsLb2xsbN27Ev/71L3z00UcIDQ1FYGAgtmzZAgBY\nvnw5du3aheDgYISHhyM3NxfDhw9HZWUlAgIC8PLLL2PQoEENfhZRU8QxBCIiAsAeAhERmTAQiIgI\nAAOBiIhMGAhERASAgUBERCYMBCIiAgD8P6de16jIcn5DAAAAAElFTkSuQmCC\n",
"text": [
"<matplotlib.figure.Figure at 0x112124dd0>"
]
}
],
"prompt_number": 93
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"aFPR, aTPR, aThresh = sklearn.metrics.roc_curve( aGoldStandard, 1 - aPredictions )\n",
"plot( aFPR, aTPR )\n",
"xlabel( \"1 - Specificity (FPR)\" )\n",
"ylabel( \"Sensitivity (TPR)\" )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 94,
"text": [
"<matplotlib.text.Text at 0x1121f6bd0>"
]
},
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAYYAAAEMCAYAAADAqxFbAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHxRJREFUeJzt3X9QVXX+x/EXCpmbpqLmKpCosICi/BB/pom1ilnipP3A\nJjM0Yp1Jra3dXJsdof1Oq/2YSq1d3TUpfxCbtaOzGu3qSq6lYkq25S8UNUSlUAzNQryc7x+sV09w\n5XK5594LPB8zzHDuPfecN5/svO75fM7nHD/DMAwBAPA/rbxdAADAtxAMAAATggEAYEIwAABMCAYA\ngAnBAAAwsSwYpk+frm7duql///4O15k9e7bCw8MVExOjgoICq0oBADSAZcGQmpqq3Nxch+9v3LhR\nhw8fVmFhoZYtW6aZM2daVQoAoAEsC4aRI0eqU6dODt9fv369pk2bJkkaMmSIzp07p9LSUqvKAQA4\nyd9bOy4pKVFISIh9OTg4WCdOnFC3bt1M6/n5+Xm6NABoFly9sYVXB59/WrSjEDAMgx/D0Pz5871e\ng6/80Ba0BW1x/Z/G8FowBAUFqbi42L584sQJBQUFeascAMD/eC0YkpOT9c4770iSduzYoY4dO9bq\nRgIAeJ5lYwxTpkzRxx9/rLKyMoWEhCgzM1NVVVWSpPT0dI0fP14bN25UWFiYbrrpJq1YscKqUpqN\nxMREb5fgM2iLq2iLq2gL9/AzGtsZZTE/P79G95cBQEvTmGMnM58BACYEAwDAhGAAAJgQDAAAE4IB\nAGBCMAAATAgGAIAJwQAAMCEYAAAmBAMAwIRgAACYeO1BPQDQlAUGSuXl3q7CGtxEDwBc4Ocn+fKh\niZvoAQDchmAAAJgQDAAAE4IBABwIDKwZS6jrp1Mnb1dnHQafAcABXx9gvh4GnwEAbkMwAABMCAYA\ngAnBAAAwIRgAACYEAwDAhGAAAJgQDAAAE4IBAGBCMAAATAgGAIAJwQAAMCEYAAAmBAMAwIRgAACY\nEAwAABOCAQBgYmkw5ObmKjIyUuHh4Vq4cGGt98vKyjRu3DjFxsYqOjpaWVlZVpYDAHCCZY/2tNls\nioiI0KZNmxQUFKRBgwYpOztbUVFR9nUyMjJUWVmpP/7xjyorK1NERIRKS0vl7+9/tUAe7QnAS3i0\np5vl5+crLCxMoaGhCggIUEpKitatW2dap3v37qqoqJAkVVRUqHPnzqZQAAB4nmVH4ZKSEoWEhNiX\ng4ODtXPnTtM6aWlpuuOOO9SjRw+dP39ef/vb3+rcVkZGhv33xMREJSYmWlEyADRZeXl5ysvLc8u2\nLAsGPz+/etd54YUXFBsbq7y8PB05ckRjxozR3r171b59e9N61wYDAKC2n35pzszMdHlblnUlBQUF\nqbi42L5cXFys4OBg0zqffvqp7r//fklSnz591KtXLx08eNCqkgAATrAsGBISElRYWKhjx47p0qVL\nysnJUXJysmmdyMhIbdq0SZJUWlqqgwcPqnfv3laVBABwgmVdSf7+/lqyZImSkpJks9k0Y8YMRUVF\naenSpZKk9PR0zZs3T6mpqYqJiVF1dbVefPFFBQYGWlUSAMAJll2u6i5crgqgsQIDpfLyhn+uUyfp\n7Fn31+MJjTl2EgwAmr2mPB/BVT45jwEA0DQRDAAAE4IBQJMSGFjTNdSQn06dvF1108IYA4AmpSWO\nF7iCMQYAgNsQDAAAE4IBAGBCMADwGgaSfRODzwC8hoFk6zD4DABwG4IBAGBCMABoNFfGChgv8F2M\nMQBoNMYKfA9jDAAAtyEYAAAmBAMAwIRgAOA0R4PMDCI3Lww+A3Aag8xNB4PPAAC3IRgAACYEAwCT\n601WYyyhZXB6jOHHH3+Un5+f2rRpY3VNJowxAJ7FOELzYMkYQ3V1tT744APdf//9CgoKUq9evdSz\nZ08FBQXpvvvu09///ncO2ADQDDk8Y7j99ts1cuRIJScnKzY21n6mUFlZqYKCAq1fv17btm3T1q1b\nrS2QMwbAozhjaB4ac+x0GAyVlZX1dhs5s05jEQyAZxEMzYMlXUmODvgXLlzQwoULr7sOAKDpchgM\nJ0+e1KxZszR+/Hj99re/1YULF/Tqq68qMjJSJSUlnqwRAOBB/o7eeOSRRzRixAjdfffdys3NVXR0\ntIYOHarPPvtMP//5zz1ZIwDAgxyOMcTGxurzzz+3LwcHB+v48eNq3bq1x4qTGGMAPI0xhuahMcdO\nh2cM1dXVOnv2rCTJMAwFBgbqu+++s78fGBjo0g4BAL7N4RlDaGio/Pz86v6Qn5+KioosLezafXHG\nAHgOZwzNgyWXq/oKggHwLIKhebDkctXS0lLNmTNHd999t373u9+poqLC5QIBAE2Hw2B45JFH1K5d\nO82aNUvnz5/X7NmzPVkXAAtxozxcj8OupJiYGO3du9e+HBcXp4KCAo8VdgVdSYD70V3U/FnSlWQY\nhs6ePauzZ8/qzJkzstls9uUrVyvVJzc3V5GRkQoPD7fPlv6pvLw8xcXFKTo6WomJiS79EQAA97Hs\nqiSbzaaIiAht2rRJQUFBGjRokLKzsxUVFWVf59y5c7rtttv00UcfKTg4WGVlZerSpUutfXHGALgX\nZwzNnyXzGD7++GP17NnT5aLy8/MVFham0NBQSVJKSorWrVtnCoY1a9Zo8uTJCg4OlqRaoQAA8DyH\nwXDvvfdqz549Lm+4pKREISEh9uXg4GDt3LnTtE5hYaGqqqo0evRonT9/XnPmzNHUqVNrbSsjI8P+\ne2JiIl1OaHECA6XycvdtjwHm5icvL095eXlu2ZbDYGhs942jbqhrVVVVac+ePdq8ebMuXryoYcOG\naejQoQoPDzetd20wAC1ReTldP7i+n35pzszMdHlbDoOhpKREs2fPrjMg/Pz8tGjRoutuOCgoSMXF\nxfbl4uJie5fRFSEhIerSpYvatm2rtm3b6vbbb9fevXtrBQMAwHMcBkPbtm01cOBAGYZh+vb/02VH\nEhISVFhYqGPHjqlHjx7KyclRdna2aZ2JEyfqiSeekM1mU2VlpXbu3Klf//rXjfhzAACN5TAYAgMD\nNW3aNNc37O+vJUuWKCkpSTabTTNmzFBUVJSWLl0qSUpPT1dkZKTGjRunAQMGqFWrVkpLS1Pfvn1d\n3ifQlF1vHIExAXiSw8tVhw4dqh07dni6nlq4XBUtBZeQwp0smeC2evXqej985MgRl3YKAPBdDs8Y\nUlJSdOHCBSUnJyshIUHdu3eXYRg6deqUPvvsM61fv17t27fXu+++a22BnDGgheCMAe5k2W23Dx8+\nrHfffVeffPKJjh8/Lknq2bOnRowYoSlTpqh3796uVdyQAgkGtBAEA9yJ5zEAzQDBAHeyZIwBANAy\nEQwAABOCAQBgUm8wTJo0SRs2bFB1dbUn6gEAeFm9wTBz5kytXr1aYWFhmjt3rg4ePOiJugAAXuL0\nVUnnzp3Tu+++q//7v//TrbfeqrS0ND388MMKCAiwtkCuSkILwVVJcCfLL1c9c+aMVq5cqVWrVqlH\njx566KGHtG3bNn355Zduu/+3wwIJBrQQBAPcydJguPfee3XgwAFNnTpVqamp6t69u/29gQMHavfu\n3S7t2OkCCQa0EAQD3MnSYNi4caPGjx9veq2yslJt2rRxaYcNRTCgKXLliWudOklnz1pTD1oeS4Mh\nLi5OBQUFptfi4+Mb9djPhiAY0BTx7R/e1phjp8PnMZw6dUonT57UDz/8oD179tgf0FNRUaGLFy+6\nXCwAwLc5DIaPPvpIb7/9tkpKSvT000/bX2/fvr1eeOEFjxQHAPC8eruS3n//fU2ePNlT9dRCVxLc\nxZV+f1cxXgBvs2SMYeXKlZo6dapeeeWVOp/57KlnMxMMcBf6/dGSWDLGcGUc4fz586ZgAAA0b/V2\nJX3zzTe65ZZbPFVPLZwxwF04Y0BLYunzGEaMGKGxY8dq+fLlKvdUBy0AwGvqDYZDhw7pD3/4g778\n8ksNHDhQ99xzj1auXOmJ2tDCBQbWfMt310+nTt7+i4CmoUGP9iwrK9NTTz2l1atXe+w23HQltVx0\n/QCus7Qr6bvvvlNWVpbuuusuDRs2TN27d9euXbtc2hkAwPfVe8bQq1cvTZw4UQ8++KCGDh3q8SuU\nOGNouThjAFxn6b2Srsxb8BaCoeUiGADXWTKPYc6cOXr99deVnJxc5w7Xr1/v0g4BAL7NYTA88sgj\nkmS6T9IVTHgDgObLYTAMHDhQkvT555/rySefNL332muvadSoUdZWBgDwinqvSnr77bdrvZaVlWVF\nLQAAH+DwjCE7O1tr1qzR0aNHNWHCBPvr58+fV+fOnT1SHADA8xwGw/Dhw9W9e3d9++23euaZZ+yj\n2+3bt1dMTIzHCgQAeFaDZj57A5ertlxcrgq4zpKZz7fddpskqV27dmrfvr3p5+abb3atUgCAz+OM\nAR7hytPTeAoa4DpL75V05MgR/fjjj5KkLVu2aNGiRTp37pxLO0PLVV5e0y3UkB9CAfCOeoNh0qRJ\n8vf31+HDh5Wenq7i4mI99NBDTm08NzdXkZGRCg8P18KFCx2ut2vXLvn7++uDDz5wvnIAgCXqDYZW\nrVrZD9qzZs3SSy+9pFOnTtW7YZvNpieeeEK5ubnat2+fsrOztX///jrXe/bZZzVu3Di6jADAB9Qb\nDDfccIPWrFmjd955R/fcc48kqaqqqt4N5+fnKywsTKGhoQoICFBKSorWrVtXa73FixfrvvvuU9eu\nXV0oHwDgbg7nMVzx1ltv6c9//rOee+459erVS0VFRXr44Yfr3XBJSYlCQkLsy8HBwdq5c2etddat\nW6d///vf2rVrl8N7MGVkZNh/T0xMVGJiYr37hzVcGUSWeHoaYLW8vDzl5eW5ZVv1BkO/fv20ePFi\n+3Lv3r01d+7cejfszI32nnzySS1YsMA+eu6oK+naYIB3XRlEBuBbfvqlOTMz0+Vt1RsM27ZtU2Zm\npo4dO6bLly9LqjnoFxUVXfdzQUFBKi4uti8XFxcrODjYtM7u3buVkpIiqeaxoR9++KECAgLqvNU3\nAMAz6p3HEBERoddee03x8fFq3bq1/fUuXbpcd8OXL19WRESENm/erB49emjw4MHKzs5WVFRUneun\npqZqwoQJmjRpkrlA5jH4FGYjA02DJQ/quaJjx4666667Gr5hf38tWbJESUlJstlsmjFjhqKiorR0\n6VJJUnp6esOrBQBYrt4zhrlz58pms2nSpElq06aN/fX4+HjLi5M4Y/A1nDEATYOlz3xOTEyscyB5\ny5YtLu2woQgG30IwAE2DpcHgbQSDbyEYgKbB0nslnT59WjNmzNC4ceMkSfv27dPy5ctd2hl8S2Bg\nzYG+IT/MRwCav3qD4dFHH9XYsWN18uRJSVJ4eLheffVVywuD9bixHYC61BsMZWVlevDBB+2XqgYE\nBMjfv96LmQAATVS9wdCuXTudOXPGvrxjxw516NDB0qIAAN5T71f/V155RRMmTFBRUZGGDx+ub7/9\nVmvXrvVEbQAAL3DqqqSqqiodPHhQUs1M6ICAAMsLu4KrkqzDFUZA82XJVUn5+fn25y4EBARo9+7d\nmjdvnp5++mmdZQQSAJoth8GQnp5un+m8detWzZ07V9OmTdPNN9+sxx9/3GMFAgA8y+EYQ3V1tQID\nAyVJOTk5Sk9P1+TJkzV58mTFxMR4rEAAgGc5PGOw2Wz2J7Vt2rRJo0ePtr935fbbAIDmx+EZw5Qp\nUzRq1Ch16dJFP/vZzzRy5EhJUmFhoTp27OixAgEAnnXdq5K2b9+u06dPa+zYsbrpppskSYcOHdKF\nCxe4u2ozwFVJQPPFTfTgEoIBaL4svYkeAKBlIRiauevdQZU7pQKoC11JzRzdRUDLRFcSAMBtCAYA\ngAkPVvAxgYE1D9BxF8YRADQUYww+hjEBAO7AGAMAwG0IBgCACcEAADAhGLzE0cQzBosBeBuDz17C\nIDMAKzH4DABwG4IBAGBCMAAATAgGAIAJwQAAMCEYAAAmBAMAwIRgsBBPTwPQFFkaDLm5uYqMjFR4\neLgWLlxY6/3Vq1crJiZGAwYM0G233aYvvvjCynI8rry8ZhJbXT9nz3q7OgCom2Uzn202myIiIrRp\n0yYFBQVp0KBBys7OVlRUlH2d7du3q2/fvurQoYNyc3OVkZGhHTt2mAtswjOfmd0MwFt8cuZzfn6+\nwsLCFBoaqoCAAKWkpGjdunWmdYYNG6YOHTpIkoYMGaITJ05YVQ4AwEmWPcGtpKREISEh9uXg4GDt\n3LnT4frLly/X+PHj63wvIyPD/ntiYqISExPdVWajXe+Ja4wjAPCUvLw85eXluWVblgWDn5+f0+tu\n2bJFb731lj755JM63782GHzNlXEEAPCmn35pzszMdHlblgVDUFCQiouL7cvFxcUKDg6utd4XX3yh\ntLQ05ebmqhNfsQHA6ywbY0hISFBhYaGOHTumS5cuKScnR8nJyaZ1vv76a02aNEmrVq1SWFiYVaUA\nABrAsjMGf39/LVmyRElJSbLZbJoxY4aioqK0dOlSSVJ6erqef/55lZeXa+bMmZKkgIAA5efnW1US\nAMAJPKjHCfUNMDMnAYCvacyxk2BwqgYGmAE0LT45jwEA0DQRDAAAE8sGnz3hen3/7sRVtABakiY9\nxkDfPwDUjTEGAIDbEAwAABOCAQBg0iQGnx3dj49BYQBwvyYRDAwwA4Dn0JUEADAhGAAAJgQDAMCE\nYAAAmBAMAAATggEAYEIwAABMCAYAgAnBAAAwIRgAACYEAwDAhGAAAJgQDAAAE4IBAGBCMAAATAgG\nAIAJwQAAMCEYAAAmBAMAwIRgAACYEAwAABOCAQBgQjAAAEwIBgCACcEAADAhGJqQvLw8b5fgM2iL\nq2iLq2gL97A0GHJzcxUZGanw8HAtXLiwznVmz56t8PBwxcTEqKCgwMpymjz+0V9FW1xFW1xFW7iH\nZcFgs9n0xBNPKDc3V/v27VN2drb2799vWmfjxo06fPiwCgsLtWzZMs2cOdOqcgAATrIsGPLz8xUW\nFqbQ0FAFBAQoJSVF69atM62zfv16TZs2TZI0ZMgQnTt3TqWlpVaVBABwgr9VGy4pKVFISIh9OTg4\nWDt37qx3nRMnTqhbt26m9fz8/Kwqs8nJzMz0dgk+g7a4ira4irZoPMuCwdmDuWEY1/3cT98HAFjL\nsq6koKAgFRcX25eLi4sVHBx83XVOnDihoKAgq0oCADjBsmBISEhQYWGhjh07pkuXLiknJ0fJycmm\ndZKTk/XOO+9Iknbs2KGOHTvW6kYCAHiWZV1J/v7+WrJkiZKSkmSz2TRjxgxFRUVp6dKlkqT09HSN\nHz9eGzduVFhYmG666SatWLHCqnIAAM4yfMSHH35oREREGGFhYcaCBQvqXGfWrFlGWFiYMWDAAGPP\nnj0ertBz6muLVatWGQMGDDD69+9vDB8+3Ni7d68XqvQMZ/5dGIZh5OfnG61btzbef/99D1bnWc60\nxZYtW4zY2FijX79+xqhRozxboAfV1xbffvutkZSUZMTExBj9+vUzVqxY4fkiPSA1NdW45ZZbjOjo\naIfruHLc9IlguHz5stGnTx/j6NGjxqVLl4yYmBhj3759pnU2bNhg3HXXXYZhGMaOHTuMIUOGeKNU\nyznTFp9++qlx7tw5wzBq/gdpyW1xZb3Ro0cbd999t7F27VovVGo9Z9qivLzc6Nu3r1FcXGwYRs3B\nsTlypi3mz59vzJ071zCMmnYIDAw0qqqqvFGupbZu3Wrs2bPHYTC4etz0iVtiMOfhKmfaYtiwYerQ\noYOkmrY4ceKEN0q1nDNtIUmLFy/Wfffdp65du3qhSs9wpi3WrFmjyZMn2y/y6NKlizdKtZwzbdG9\ne3dVVFRIkioqKtS5c2f5+1vWc+41I0eOVKdOnRy+7+px0yeCoa75DCUlJfWu0xwPiM60xbWWL1+u\n8ePHe6I0j3P238W6devss+ab65wXZ9qisLBQZ8+e1ejRo5WQkKCVK1d6ukyPcKYt0tLS9NVXX6lH\njx6KiYnR66+/7ukyfYKrx02fiFB3zXloDhryN23ZskVvvfWWPvnkEwsr8h5n2uLJJ5/UggUL5Ofn\nJ6Oma9QDlXmeM21RVVWlPXv2aPPmzbp48aKGDRumoUOHKjw83AMVeo4zbfHCCy8oNjZWeXl5OnLk\niMaMGaO9e/eqffv2HqjQt7hy3PSJYGDOw1XOtIUkffHFF0pLS1Nubu51TyWbMmfaYvfu3UpJSZEk\nlZWV6cMPP1RAQECtS6ObOmfaIiQkRF26dFHbtm3Vtm1b3X777dq7d2+zCwZn2uLTTz/Vc889J0nq\n06ePevXqpYMHDyohIcGjtXqby8dNt4yANFJVVZXRu3dv4+jRo0ZlZWW9g8/bt29vtgOuzrTF8ePH\njT59+hjbt2/3UpWe4UxbXOvRRx9ttlclOdMW+/fvN+68807j8uXLxvfff29ER0cbX331lZcqto4z\nbfHUU08ZGRkZhmEYxunTp42goCDjzJkz3ijXckePHnVq8Lkhx02fOGNgzsNVzrTF888/r/Lycnu/\nekBAgPLz871ZtiWcaYuWwpm2iIyM1Lhx4zRgwAC1atVKaWlp6tu3r5crdz9n2mLevHlKTU1VTEyM\nqqur9eKLLyowMNDLlbvflClT9PHHH6usrEwhISHKzMxUVVWVpMYdN/0Mo5l2ygIAXOITVyUBAHwH\nwQAAMCEYAAAmBAMAwIRggNdNnz5d3bp1U//+/V36/D/+8Q/Fx8crNjZW/fr107Jly9xa3/z587V5\n82ZJ0n/+8x/169dP8fHxOnnypO6///7rfjYtLU0HDhyQVDPpqqEqKys1atQoGYahY8eOqW3btoqL\ni1NcXJzi4+NVVVWlrKwsde3aVXFxcerXr5/++te/SpLp9b59++rNN9+0b3fRokXNdmY03MA9V9IC\nrqvvRmDXc+nSJaNHjx5GSUmJffngwYPuLtEuPT3dWLVqlUufbdeuXYM/s3z5cuPFF180DMPx9epZ\nWVnGrFmzDMMwjG+++cbo2rWrUVpaanr9zJkzxi233GKUlpYahmEYFRUVxqBBg1z6O9D8ccYAr6vv\nRmDXc/78eV2+fNl+jXpAQIB+8YtfSJIeffRR/epXv9KgQYMUERGhDRs2SJJsNpt+85vfaPDgwYqJ\niTGdYSxcuFADBgxQbGys5s2bZ9/O+++/r+XLl+u9997T73//e02dOlXHjx9XdHS0fZvPPPOM+vfv\nr5iYGL3xxhuSpMTERO3evVtz587VDz/8oLi4OD388MOaP3++6f49zz33nBYtWlTr78vOztbEiRPr\nbQfjf1edd+3aVX369NHx48dNrwcGBqp3797219u3b6/OnTvrq6++crap0YL4xAQ3wFWBgYFKTk5W\nz549deedd+qee+7RlClT5OfnJz8/P3399dfatWuXDh8+rNGjR+vw4cN6++231bFjR+Xn56uyslIj\nRozQ2LFjtX//fq1fv175+fm68cYbde7cOUmyb2vGjBnatm2bJkyYoEmTJunYsWP2+84sW7ZMX3/9\ntfbu3atWrVqpvLzc9NkFCxbojTfeUEFBgSTp+PHjmjRpkubMmaPq6mrl5ORo165dpr/NZrPpyy+/\ntAedJB05ckRxcXGSpBEjRmjx4sWme+EUFRWpqKhI4eHhpoP+8ePHVVRUpD59+thfGzx4sLZu3ap+\n/fq58z8JmgGCAU3eX/7yF82ZM0ebNm3Syy+/rH/961/2GZ4PPPCAJCksLEy9e/fWgQMH9M9//lP/\n/e9/tXbtWkk1t2UuLCzU5s2bNX36dN14442SpI4dO9a5P6OOOaGbN2/WzJkz1apVzUl4fWdAPXv2\nVOfOnfX555/r9OnTio+Pr/WZsrKyWjd969Onjz1crpWTk6Nt27apTZs2WrZsmb32nJwcbd26VQcO\nHNDLL79smv3bo0cPFRUVXbdOtEwEA3yezWaz3/xs4sSJysjIqLVOdHS0oqOjNXXqVPXq1cvh1P8r\n3/CXLFmiMWPGmN776KOPGnV31oZ+9rHHHtOKFStUWlqq6dOnu7xNPz8/paSk1OqKuvb13bt364EH\nHlBqaqratWtn33ZzvEMxGo8xBvi81q1bq6CgQAUFBbVC4fvvv1deXp59uaCgQKGhoZJqDnzvvfee\nDMPQkSNHVFRUpMjISCUlJenNN9/U5cuXJUmHDh3SxYsXNWbMGK1YsUI//PCDJNm7g5wxZswYLV26\nVDabzeFnAwIC7PuUpHvvvVe5ubn67LPPlJSUVGv9Ll266MKFC/Xu23Bwu/FrXx84cKAmTJhgCo9T\np07Z2wq4FsEAr5syZYqGDx+uQ4cOKSQkpEE3SDQMQy+99JIiIyMVFxenzMxMZWVlSar5xnzrrbdq\n8ODBGj9+vJYuXaobbrhBjz32mPr27av4+Hj1799fM2fOlM1mU1JSkpKTk5WQkKC4uDi98sorde7z\n2m/ZV35/7LHHdOutt9oHrrOzs2t97vHHH9eAAQM0depUSTVBcccdd+iBBx6o85t769atFR0drYMH\nD9a572tfc+b1Z599Vn/605908eJFSTVPQhs5cmSdfyNaNm6ih2YrNTXVPlDsi6qrqzVw4ECtXbvW\nNCh8raysLJWWlurZZ591674rKip055131hrwBiTOGACv2Ldvn8LDw/XLX/7SYShI0kMPPaQNGza4\n/cl0WVlZmjNnjlu3ieaDMwYAgAlnDAAAE4IBAGBCMAAATAgGAIAJwQAAMCEYAAAm/w9JKzuJw3ND\nAQAAAABJRU5ErkJggg==\n",
"text": [
"<matplotlib.figure.Figure at 0x11212b190>"
]
}
],
"prompt_number": 94
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**5.** Eyeballing the ROC plot, roughly what would you say the AUC is? Is this better or worse than random?"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"dAUC = 0\n",
"\n",
"fMuchBetterThanRandom = None"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 95
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**6.** This predictor overall does a moderately crummy job of getting things right. But it might not be a complete loss; how might you be able to take advantage of its predictions? (Hint: the ROC plot is nearly vertical in the lower-left, and the precision is nearly one at low recall; what does this tell you about the most confident predictions?)"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"strAnswer06 = \"\"\"here\"\"\""
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 96
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 96
}
],
"metadata": {}
}
]
}
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment