clarkgrubb/gist:7ac2563fffb27a0fb484

## gistfile1.txt
{
 "metadata": {
  "name": "",
  "signature": "sha256:8e904721f90a0b0c0c0dd648eb9f9f2473c22dc731e07b4f14c14c1cf2c3df9c"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Click Mathematics"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This IPython notebook accompanies [http://clarkgrubb.com/click-math](http://clarkgrubb.com/click-math), which shows how to perform some calculations involving the two fundamental quantities of web metrics: the impression and the click.\n",
      "\n",
      "In this notebook we show how to perform those same calculations in Python."
     ]
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Setup"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "In addition to IPython, we are using the Python libraries NumPy and SciPy.  An easy way to get all three of these products is to install the [Anaconda Scientific Python Distribution](http://continuum.io/downloads)."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import sys, os, re, math"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 4
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "In Python 2.7, the / operator returns a quotient (i.e. an integer) when operating on integers.\n",
      "\n",
      "We can change this behavior:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from __future__ import division"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 10
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The following module aliases are commonly used in the Scientific Python community:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import numpy as np\n",
      "import scipy as sp\n",
      "import scipy.stats as stats\n",
      "import matplotlib as mpl\n",
      "import matplotlib.pyplot as plt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 14
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "NumPy"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "NumPy provides an array type that we will use instead of the built-in Python list.\n",
      "\n",
      "One of the advantages of the NumPy array is that the basic arithmetic operations are vectorized.  This will help us to avoid writing loops.\n",
      "\n",
      "Note the difference in meaning of the + operator:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "np.array([1, 2, 3]) + np.array([3, 4, 5])\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 2,
       "text": [
        "array([4, 6, 8])"
       ]
      }
     ],
     "prompt_number": 2
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "[1,2,3] + [3,4,5]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 3,
       "text": [
        "[1, 2, 3, 3, 4, 5]"
       ]
      }
     ],
     "prompt_number": 3
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "When an arithmetic operator has an NumPy array and a simple type as arguments, the simple type is \"broadcast\" over the entire array:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "np.array([1, 2, 3]) * 2"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 9,
       "text": [
        "array([2, 4, 6])"
       ]
      }
     ],
     "prompt_number": 9
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The native Python list does not broadcast:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "[1, 2, 3] * 2"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 8,
       "text": [
        "[1, 2, 3, 1, 2, 3]"
       ]
      }
     ],
     "prompt_number": 8
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This isn't Here is an example of the Python's list comprehension syntax, which we will use:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "squares = [n * n for n in range(0, 11)]\n",
      "squares"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 18,
       "text": [
        "[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]"
       ]
      }
     ],
     "prompt_number": 18
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "CTR"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Here is our raw click data.\n",
      "\n",
      "The first element of the clicks array goes with the first element of the impressions array.  I.e. there was a link which we displayed 313 times and which received 23 clicks."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "imps = np.array([313, 285, 298, 34, 3398, 333, 301])\n",
      "clicks = np.array([23, 20, 8, 2, 128, 15, 11])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 5
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "non_clicks = imps - clicks\n",
      "non_clicks"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 7,
       "text": [
        "array([ 290,  265,  290,   32, 3270,  318,  290])"
       ]
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "ctr = clicks/imps\n",
      "ctr"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 11,
       "text": [
        "array([ 0.07348243,  0.07017544,  0.02684564,  0.05882353,  0.03766922,\n",
        "        0.04504505,  0.03654485])"
       ]
      }
     ],
     "prompt_number": 11
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "P-Values"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "expected_ctr = 0.04\n",
      "expected_clicks = expected_ctr * imps\n",
      "expected_non_clicks = imps - expected_clicks\n",
      "expected_clicks\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 18,
       "text": [
        "array([  12.52,   11.4 ,   11.92,    1.36,  135.92,   13.32,   12.04])"
       ]
      }
     ],
     "prompt_number": 18
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "p_values = np.array([stats.chisquare([clicks[i], non_clicks[i]], [expected_clicks[i], expected_non_clicks[i]])[1]\n",
      "                     for i\n",
      "                     in range(len(imps))])\n",
      "p_values"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 25,
       "text": [
        "array([ 0.00250367,  0.00933262,  0.24653354,  0.57540302,  0.48809458,\n",
        "        0.63849131,  0.7596781 ])"
       ]
      }
     ],
     "prompt_number": 25
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Python functions have doc strings.  This is how you get the documentation for a function:\n",
      "\n",
      "     print(stats.chisquare.__doc__)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "bonferroni_p_values = [min(1.0, p_values[i] * len(p_values))\n",
      "                       for i\n",
      "                       in range(len(p_values))]\n",
      "bonferroni_p_values"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 28,
       "text": [
        "[0.017525659707906406, 0.065328314692726625, 1.0, 1.0, 1.0, 1.0, 1.0]"
       ]
      }
     ],
     "prompt_number": 28
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "TODO: Holm-Bonferroni.\n",
      "\n",
      "Would it be better to use Pandas for the Holm-Bonferroni example?  We need to sort."
     ]
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Confidence Intervals"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "alpha = 0.05\n",
      "z = stats.norm.ppf(1 - alpha / 2)\n",
      "z"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 30,
       "text": [
        "1.959963984540054"
       ]
      }
     ],
     "prompt_number": 30
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def lower_normal_conf(i, c, z):\n",
      "  h = z * sqrt(c * (i - c) / (i ** 3))\n",
      "  ctr = c / i\n",
      "  return ctr - h\n",
      "\n",
      "lower_normal_conf(imps, clicks, z)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 31,
       "text": [
        "array([ 0.044576  ,  0.04051902,  0.0084943 , -0.02026613,  0.03126757,\n",
        "        0.02276885,  0.01534691])"
       ]
      }
     ],
     "prompt_number": 31
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def upper_normal_conf(i, c, z):\n",
      "  h = z * sqrt(c * (i - c) / (i ** 3))\n",
      "  ctr = c / i\n",
      "  return ctr + h\n",
      "\n",
      "upper_normal_conf(imps, clicks, z)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 32,
       "text": [
        "array([ 0.10238886,  0.09983186,  0.04519697,  0.13791319,  0.04407087,\n",
        "        0.06732124,  0.05774279])"
       ]
      }
     ],
     "prompt_number": 32
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def lower_wilson_score_conf(i, c, z):\n",
      "    h = z * sqrt(c * (i - c) / (i ** 3) + z ** 2 / (4 * i ** 2))\n",
      "    \n",
      "    return (i / (i + z ** 2)) * (c / i + z ** 2 / (2 * i ) - h)\n",
      "\n",
      "lower_wilson_score_conf(imps, clicks, z)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 34,
       "text": [
        "array([ 0.04946129,  0.04588384,  0.01366458,  0.01628266,  0.031772  ,\n",
        "        0.02748511,  0.02052648])"
       ]
      }
     ],
     "prompt_number": 34
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def upper_wilson_score_conf(i, c, z):\n",
      "    h = z * sqrt(c * (i - c) / (i ** 3) + z ** 2 / (4 * i ** 2))\n",
      "    \n",
      "    return (i / (i + z ** 2)) * (c / i + z ** 2 / (2 * i ) + h)\n",
      "\n",
      "upper_wilson_score_conf(imps, clicks, z)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 35,
       "text": [
        "array([ 0.10784596,  0.10589998,  0.05207013,  0.19093607,  0.04461059,\n",
        "        0.07298191,  0.06424368])"
       ]
      }
     ],
     "prompt_number": 35
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Beta Distribution"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "expected_mean = .03\n",
      "expected_stddev = .03"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 1
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def beta_a(m, sd):\n",
      "    return (m ** 2 - m ** 3 - m * sd ** 2)/sd ** 2\n",
      "\n",
      "prior_a = beta_a(expected_mean, expected_stddev)\n",
      "prior_a"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 5,
       "text": [
        "0.94"
       ]
      }
     ],
     "prompt_number": 5
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def beta_b(m, sd):\n",
      "    return ((-1 + m) * (-m + m ** 2 + sd ** 2))/sd ** 2\n",
      "\n",
      "prior_b = beta_b(expected_mean, expected_stddev)\n",
      "prior_b"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 7,
       "text": [
        "30.39333333333333"
       ]
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "imps2 = np.array([1, 100, 100, 100, 100, 10, 10])\n",
      "clicks2 = np.array([1, 10, 4, 5, 3, 0, 10])\n",
      "non_clicks2 = imps2 - clicks2\n",
      "non_clicks2"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 6,
       "text": [
        "array([ 0, 90, 96, 95, 97, 10,  0])"
       ]
      }
     ],
     "prompt_number": 6
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "posterior_a = prior_a + clicks2\n",
      "posterior_a"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 11,
       "text": [
        "array([  1.94,  10.94,   4.94,   5.94,   3.94,   0.94,  10.94])"
       ]
      }
     ],
     "prompt_number": 11
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "posterior_b = prior_b + non_clicks2\n",
      "posterior_b"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 12,
       "text": [
        "array([  30.39333333,  120.39333333,  126.39333333,  125.39333333,\n",
        "        127.39333333,   40.39333333,   30.39333333])"
       ]
      }
     ],
     "prompt_number": 12
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "posterior_mean = stats.beta.mean(posterior_a, posterior_b)\n",
      "posterior_mean"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 15,
       "text": [
        "array([ 0.06      ,  0.08329949,  0.03761421,  0.04522843,  0.03      ,\n",
        "        0.02274194,  0.26467742])"
       ]
      }
     ],
     "prompt_number": 15
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "posterior_stddev = stats.beta.std(posterior_a, posterior_b)\n",
      "posterior_stddev"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 17,
       "text": [
        "array([ 0.04113393,  0.02402151,  0.01653926,  0.01806429,  0.014829  ,\n",
        "        0.02291274,  0.06780413])"
       ]
      }
     ],
     "prompt_number": 17
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "random_draws = stats.beta.rvs(posterior_a, posterior_b)\n",
      "random_draws"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 19,
       "text": [
        "array([ 0.02125831,  0.07894332,  0.04574358,  0.040027  ,  0.00985993,\n",
        "        0.01236459,  0.19463317])"
       ]
      }
     ],
     "prompt_number": 19
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    }
   ],
   "metadata": {}
  }
 ]
}
	{
	"metadata": {
	"name": "",
	"signature": "sha256:8e904721f90a0b0c0c0dd648eb9f9f2473c22dc731e07b4f14c14c1cf2c3df9c"
	},
	"nbformat": 3,
	"nbformat_minor": 0,
	"worksheets": [
	{
	"cells": [
	{
	"cell_type": "heading",
	"level": 1,
	"metadata": {},
	"source": [
	"Click Mathematics"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"This IPython notebook accompanies [http://clarkgrubb.com/click-math](http://clarkgrubb.com/click-math), which shows how to perform some calculations involving the two fundamental quantities of web metrics: the impression and the click.\n",
	"\n",
	"In this notebook we show how to perform those same calculations in Python."
	]
	},
	{
	"cell_type": "heading",
	"level": 1,
	"metadata": {},
	"source": [
	"Setup"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"In addition to IPython, we are using the Python libraries NumPy and SciPy. An easy way to get all three of these products is to install the [Anaconda Scientific Python Distribution](http://continuum.io/downloads)."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"import sys, os, re, math"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 4
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"In Python 2.7, the / operator returns a quotient (i.e. an integer) when operating on integers.\n",
	"\n",
	"We can change this behavior:"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"from __future__ import division"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 10
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"The following module aliases are commonly used in the Scientific Python community:"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"import numpy as np\n",
	"import scipy as sp\n",
	"import scipy.stats as stats\n",
	"import matplotlib as mpl\n",
	"import matplotlib.pyplot as plt"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 14
	},
	{
	"cell_type": "heading",
	"level": 1,
	"metadata": {},
	"source": [
	"NumPy"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"NumPy provides an array type that we will use instead of the built-in Python list.\n",
	"\n",
	"One of the advantages of the NumPy array is that the basic arithmetic operations are vectorized. This will help us to avoid writing loops.\n",
	"\n",
	"Note the difference in meaning of the + operator:"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"np.array([1, 2, 3]) + np.array([3, 4, 5])\n"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 2,
	"text": [
	"array([4, 6, 8])"
	]
	}
	],
	"prompt_number": 2
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"[1,2,3] + [3,4,5]"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 3,
	"text": [
	"[1, 2, 3, 3, 4, 5]"
	]
	}
	],
	"prompt_number": 3
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"When an arithmetic operator has an NumPy array and a simple type as arguments, the simple type is \"broadcast\" over the entire array:"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"np.array([1, 2, 3]) * 2"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 9,
	"text": [
	"array([2, 4, 6])"
	]
	}
	],
	"prompt_number": 9
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"The native Python list does not broadcast:"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"[1, 2, 3] * 2"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 8,
	"text": [
	"[1, 2, 3, 1, 2, 3]"
	]
	}
	],
	"prompt_number": 8
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"This isn't Here is an example of the Python's list comprehension syntax, which we will use:"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"squares = [n * n for n in range(0, 11)]\n",
	"squares"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 18,
	"text": [
	"[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]"
	]
	}
	],
	"prompt_number": 18
	},
	{
	"cell_type": "heading",
	"level": 1,
	"metadata": {},
	"source": [
	"CTR"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Here is our raw click data.\n",
	"\n",
	"The first element of the clicks array goes with the first element of the impressions array. I.e. there was a link which we displayed 313 times and which received 23 clicks."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"imps = np.array([313, 285, 298, 34, 3398, 333, 301])\n",
	"clicks = np.array([23, 20, 8, 2, 128, 15, 11])"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 5
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"non_clicks = imps - clicks\n",
	"non_clicks"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 7,
	"text": [
	"array([ 290, 265, 290, 32, 3270, 318, 290])"
	]
	}
	],
	"prompt_number": 7
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"ctr = clicks/imps\n",
	"ctr"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 11,
	"text": [
	"array([ 0.07348243, 0.07017544, 0.02684564, 0.05882353, 0.03766922,\n",
	" 0.04504505, 0.03654485])"
	]
	}
	],
	"prompt_number": 11
	},
	{
	"cell_type": "heading",
	"level": 1,
	"metadata": {},
	"source": [
	"P-Values"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"expected_ctr = 0.04\n",
	"expected_clicks = expected_ctr * imps\n",
	"expected_non_clicks = imps - expected_clicks\n",
	"expected_clicks\n"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 18,
	"text": [
	"array([ 12.52, 11.4 , 11.92, 1.36, 135.92, 13.32, 12.04])"
	]
	}
	],
	"prompt_number": 18
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"p_values = np.array([stats.chisquare([clicks[i], non_clicks[i]], [expected_clicks[i], expected_non_clicks[i]])[1]\n",
	" for i\n",
	" in range(len(imps))])\n",
	"p_values"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 25,
	"text": [
	"array([ 0.00250367, 0.00933262, 0.24653354, 0.57540302, 0.48809458,\n",
	" 0.63849131, 0.7596781 ])"
	]
	}
	],
	"prompt_number": 25
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Python functions have doc strings. This is how you get the documentation for a function:\n",
	"\n",
	" print(stats.chisquare.__doc__)"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"bonferroni_p_values = [min(1.0, p_values[i] * len(p_values))\n",
	" for i\n",
	" in range(len(p_values))]\n",
	"bonferroni_p_values"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 28,
	"text": [
	"[0.017525659707906406, 0.065328314692726625, 1.0, 1.0, 1.0, 1.0, 1.0]"
	]
	}
	],
	"prompt_number": 28
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"TODO: Holm-Bonferroni.\n",
	"\n",
	"Would it be better to use Pandas for the Holm-Bonferroni example? We need to sort."
	]
	},
	{
	"cell_type": "heading",
	"level": 1,
	"metadata": {},
	"source": [
	"Confidence Intervals"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"alpha = 0.05\n",
	"z = stats.norm.ppf(1 - alpha / 2)\n",
	"z"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 30,
	"text": [
	"1.959963984540054"
	]
	}
	],
	"prompt_number": 30
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"def lower_normal_conf(i, c, z):\n",
	" h = z * sqrt(c * (i - c) / (i ** 3))\n",
	" ctr = c / i\n",
	" return ctr - h\n",
	"\n",
	"lower_normal_conf(imps, clicks, z)"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 31,
	"text": [
	"array([ 0.044576 , 0.04051902, 0.0084943 , -0.02026613, 0.03126757,\n",
	" 0.02276885, 0.01534691])"
	]
	}
	],
	"prompt_number": 31
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"def upper_normal_conf(i, c, z):\n",
	" h = z * sqrt(c * (i - c) / (i ** 3))\n",
	" ctr = c / i\n",
	" return ctr + h\n",
	"\n",
	"upper_normal_conf(imps, clicks, z)"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 32,
	"text": [
	"array([ 0.10238886, 0.09983186, 0.04519697, 0.13791319, 0.04407087,\n",
	" 0.06732124, 0.05774279])"
	]
	}
	],
	"prompt_number": 32
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"def lower_wilson_score_conf(i, c, z):\n",
	" h = z * sqrt(c * (i - c) / (i 3) + z 2 / (4 * i ** 2))\n",
	" \n",
	" return (i / (i + z ** 2)) * (c / i + z ** 2 / (2 * i ) - h)\n",
	"\n",
	"lower_wilson_score_conf(imps, clicks, z)"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 34,
	"text": [
	"array([ 0.04946129, 0.04588384, 0.01366458, 0.01628266, 0.031772 ,\n",
	" 0.02748511, 0.02052648])"
	]
	}
	],
	"prompt_number": 34
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"def upper_wilson_score_conf(i, c, z):\n",
	" h = z * sqrt(c * (i - c) / (i 3) + z 2 / (4 * i ** 2))\n",
	" \n",
	" return (i / (i + z ** 2)) * (c / i + z ** 2 / (2 * i ) + h)\n",
	"\n",
	"upper_wilson_score_conf(imps, clicks, z)"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 35,
	"text": [
	"array([ 0.10784596, 0.10589998, 0.05207013, 0.19093607, 0.04461059,\n",
	" 0.07298191, 0.06424368])"
	]
	}
	],
	"prompt_number": 35
	},
	{
	"cell_type": "heading",
	"level": 1,
	"metadata": {},
	"source": [
	"Beta Distribution"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"expected_mean = .03\n",
	"expected_stddev = .03"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 1
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"def beta_a(m, sd):\n",
	" return (m 2 - m 3 - m * sd 2)/sd 2\n",
	"\n",
	"prior_a = beta_a(expected_mean, expected_stddev)\n",
	"prior_a"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 5,
	"text": [
	"0.94"
	]
	}
	],
	"prompt_number": 5
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"def beta_b(m, sd):\n",
	" return ((-1 + m) * (-m + m 2 + sd 2))/sd ** 2\n",
	"\n",
	"prior_b = beta_b(expected_mean, expected_stddev)\n",
	"prior_b"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 7,
	"text": [
	"30.39333333333333"
	]
	}
	],
	"prompt_number": 7
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"imps2 = np.array([1, 100, 100, 100, 100, 10, 10])\n",
	"clicks2 = np.array([1, 10, 4, 5, 3, 0, 10])\n",
	"non_clicks2 = imps2 - clicks2\n",
	"non_clicks2"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 6,
	"text": [
	"array([ 0, 90, 96, 95, 97, 10, 0])"
	]
	}
	],
	"prompt_number": 6
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"posterior_a = prior_a + clicks2\n",
	"posterior_a"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 11,
	"text": [
	"array([ 1.94, 10.94, 4.94, 5.94, 3.94, 0.94, 10.94])"
	]
	}
	],
	"prompt_number": 11
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"posterior_b = prior_b + non_clicks2\n",
	"posterior_b"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 12,
	"text": [
	"array([ 30.39333333, 120.39333333, 126.39333333, 125.39333333,\n",
	" 127.39333333, 40.39333333, 30.39333333])"
	]
	}
	],
	"prompt_number": 12
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"posterior_mean = stats.beta.mean(posterior_a, posterior_b)\n",
	"posterior_mean"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 15,
	"text": [
	"array([ 0.06 , 0.08329949, 0.03761421, 0.04522843, 0.03 ,\n",
	" 0.02274194, 0.26467742])"
	]
	}
	],
	"prompt_number": 15
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"posterior_stddev = stats.beta.std(posterior_a, posterior_b)\n",
	"posterior_stddev"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 17,
	"text": [
	"array([ 0.04113393, 0.02402151, 0.01653926, 0.01806429, 0.014829 ,\n",
	" 0.02291274, 0.06780413])"
	]
	}
	],
	"prompt_number": 17
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"random_draws = stats.beta.rvs(posterior_a, posterior_b)\n",
	"random_draws"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 19,
	"text": [
	"array([ 0.02125831, 0.07894332, 0.04574358, 0.040027 , 0.00985993,\n",
	" 0.01236459, 0.19463317])"
	]
	}
	],
	"prompt_number": 19
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [],
	"language": "python",
	"metadata": {},
	"outputs": []
	}
	],
	"metadata": {}
	}
	]
	}