vitillo/Power calculation.ipynb

## Power calculation.ipynb
{"nbformat_minor": 0, "cells": [{"source": "Given our critical value C (i.e. the threshold we determined empirically), we can easily compute the probability of a value greater than the threshold assuming that the two sample means don't differ, with the Welch t-test.", "cell_type": "markdown", "metadata": {}}, {"source": "$$Pr(t > C | \\mu_a = \\mu_b) = \\alpha = 1 - F(C) $$ where", "cell_type": "markdown", "metadata": {}}, {"source": "$$t= \\frac{\\overline{X}_1 - \\overline{X}_2}{\\sqrt{\\frac{s_1}{n_1} + \\frac{s_2}{n_2}}} \\sim T_{df} $$", "cell_type": "markdown", "metadata": {}}, {"source": "i.e. *t* is distributed according a Student's t distribution with the following degrees of freedom", "cell_type": "markdown", "metadata": {}}, {"source": "$$ df=\\frac{(\\frac{s_1^2}{n_1} + \\frac{s_2^2}{n_2} )^2} {\\frac{(\\frac{s_1^2}{n_1})^2}{n_1 - 1} + \\frac{(\\frac{s_2^2}{n_2})^2}{n_2 - 1}} $$", "cell_type": "markdown", "metadata": {}}, {"source": "and $$F(x)$$ is the inverse cumulative distribution function of $$T_{df}$$", "cell_type": "markdown", "metadata": {}}, {"source": "As I am lazy though, let's approximate the the t-distribution with the Normal distribution (z-test). To determine the sample size needed to detect a positive difference between the two group means, we need another probability called power:", "cell_type": "markdown", "metadata": {}}, {"source": "$$Power = Pr(\\hat{t} > C|\\mu_a > \\mu_b)$$", "cell_type": "markdown", "metadata": {}}, {"source": "i.e. the probability of detecting a difference when there is actually one. Let's say we know both group means; if $$\\mu_a - \\mu_b = \\Delta > 0 $$ then ", "cell_type": "markdown", "metadata": {}}, {"source": "$$\\hat{t} = \\frac{(\\overline{X}_1 - \\overline{X}_2) - \\Delta}{\\sqrt{\\frac{s_1}{n_1} + \\frac{s_2}{n_2}}} \\sim Normal(0, 1) $$", "cell_type": "markdown", "metadata": {}}, {"source": "So", "cell_type": "markdown", "metadata": {}}, {"source": "$$Pr(\\frac{(\\overline{X}_1 - \\overline{X}_2)}{\\sqrt{\\frac{s_1}{n_1} + \\frac{s_2}{n_2}}} > C) = $$", "cell_type": "markdown", "metadata": {}}, {"source": "which, by subtracting a constant from both sides of the inequality, is equal to", "cell_type": "markdown", "metadata": {}}, {"source": "$$ = Pr(\\frac{(\\overline{X}_1 - \\overline{X}_2) - \\Delta}{\\sqrt{\\frac{s_1}{n_1} + \\frac{s_2}{n_2}}} > C - \\frac{\\Delta}{\\sqrt{\\frac{s_1}{n_1} + \\frac{s_2}{n_2}}}) = $$", "cell_type": "markdown", "metadata": {}}, {"source": "$$= Pr(\\hat{t} > C - \\frac{\\Delta}{\\sqrt{\\frac{s_1}{n_1} + \\frac{s_2}{n_2}}})= Power$$", "cell_type": "markdown", "metadata": {}}, {"source": "Note that:\n- a smaller value of C has a larger power\n- the stronger the difference between the group means, the higher the power\n- the smaller the variances of the groups, the higher the power\n- the higher the number of samples of the groups, the higher the power", "cell_type": "markdown", "metadata": {}}, {"source": "Now suppose we would like to have a power of 80% to detect a certain positive difference between the group means, what is the sample size needed? For simplicity, let's assume that the sample sizes are equal for both groups.", "cell_type": "markdown", "metadata": {}}, {"source": "$$Power = Pr(\\hat{t} > C - \\frac{\\Delta}{\\sqrt{\\frac{s_1+s_2}{n}}}) = 0.8 $$", "cell_type": "markdown", "metadata": {}}, {"source": "Then", "cell_type": "markdown", "metadata": {}}, {"source": "$$ C - \\frac{\\Delta}{\\sqrt{\\frac{s_1+s_2}{n}}} = F^{-1}(1-0.8)$$", "cell_type": "markdown", "metadata": {}}, {"source": "$$n = \\frac{(C - F^{-1}(0.2))^2 (s_1 + s_2)}{\\Delta^2} $$", "cell_type": "markdown", "metadata": {}}, {"source": "Let's see a concrete example, say we have to group A and B", "cell_type": "markdown", "metadata": {}}, {"execution_count": 1, "cell_type": "code", "source": "from scipy.stats import norm\nfrom numpy import var, mean, sqrt\nfrom __future__ import division", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 2, "cell_type": "code", "source": "A = [3.27, 3.14, 3.15, 3.17, 3.15, 3.24]", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}, {"execution_count": 3, "cell_type": "code", "source": "B = [3.12, 3.13, 3.19, 3.21, 3.26, 3.30]", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}, {"source": "In our z-test we are using the threshold *treshold* to mark group B as a regression. Let't assume we would like a probability of *power* of detecting when the mean of group B is greater than the one of A by *delta*. What is the sample size we need in order to do that?", "cell_type": "markdown", "metadata": {}}, {"execution_count": 4, "cell_type": "code", "source": "treshold = 1.96\npower = 0.9\ndelta = 0.1", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}, {"execution_count": 5, "cell_type": "code", "source": "round(((treshold - norm.ppf(1 - power))**2*(var(A) + var(B)))/delta**2)", "outputs": [{"execution_count": 5, "output_type": "execute_result", "data": {"text/plain": "7.0"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"source": "This means that each group should have 7 samples. In reality the sample size is going to be larger than that as I used the Normal distribution instead of the t-distribution. That said, the same mechanics applies. It's rather easy to get a feeling of the sample size needed with a simple montecarlo simulation.", "cell_type": "markdown", "metadata": {}}, {"execution_count": null, "cell_type": "code", "source": "", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}], "nbformat": 4, "metadata": {"kernelspec": {"display_name": "Python 2", "name": "python2", "language": "python"}, "language_info": {"mimetype": "text/x-python", "nbconvert_exporter": "python", "version": "2.7.9", "name": "python", "file_extension": ".py", "pygments_lexer": "ipython2", "codemirror_mode": {"version": 2, "name": "ipython"}}}}