Last active
August 29, 2015 14:21
-
-
Save vitillo/9238bd152037d0c741ed to your computer and use it in GitHub Desktop.
Power calculation for z-test
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{"nbformat_minor": 0, "cells": [{"source": "Given our critical value C (i.e. the threshold we determined empirically), we can easily compute the probability of a value greater than the threshold assuming that the two sample means don't differ, with the Welch t-test.", "cell_type": "markdown", "metadata": {}}, {"source": "$$Pr(t > C | \\mu_a = \\mu_b) = \\alpha = 1 - F(C) $$ where", "cell_type": "markdown", "metadata": {}}, {"source": "$$t= \\frac{\\overline{X}_1 - \\overline{X}_2}{\\sqrt{\\frac{s_1}{n_1} + \\frac{s_2}{n_2}}} \\sim T_{df} $$", "cell_type": "markdown", "metadata": {}}, {"source": "i.e. *t* is distributed according a Student's t distribution with the following degrees of freedom", "cell_type": "markdown", "metadata": {}}, {"source": "$$ df=\\frac{(\\frac{s_1^2}{n_1} + \\frac{s_2^2}{n_2} )^2} {\\frac{(\\frac{s_1^2}{n_1})^2}{n_1 - 1} + \\frac{(\\frac{s_2^2}{n_2})^2}{n_2 - 1}} $$", "cell_type": "markdown", "metadata": {}}, {"source": "and $$F(x)$$ is the inverse cumulative distribution function of $$T_{df}$$", "cell_type": "markdown", "metadata": {}}, {"source": "As I am lazy though, let's approximate the the t-distribution with the Normal distribution (z-test). To determine the sample size needed to detect a positive difference between the two group means, we need another probability called power:", "cell_type": "markdown", "metadata": {}}, {"source": "$$Power = Pr(\\hat{t} > C|\\mu_a > \\mu_b)$$", "cell_type": "markdown", "metadata": {}}, {"source": "i.e. the probability of detecting a difference when there is actually one. Let's say we know both group means; if $$\\mu_a - \\mu_b = \\Delta > 0 $$ then ", "cell_type": "markdown", "metadata": {}}, {"source": "$$\\hat{t} = \\frac{(\\overline{X}_1 - \\overline{X}_2) - \\Delta}{\\sqrt{\\frac{s_1}{n_1} + \\frac{s_2}{n_2}}} \\sim Normal(0, 1) $$", "cell_type": "markdown", "metadata": {}}, {"source": "So", "cell_type": "markdown", "metadata": {}}, {"source": "$$Pr(\\frac{(\\overline{X}_1 - \\overline{X}_2)}{\\sqrt{\\frac{s_1}{n_1} + \\frac{s_2}{n_2}}} > C) = $$", "cell_type": "markdown", "metadata": {}}, {"source": "which, by subtracting a constant from both sides of the inequality, is equal to", "cell_type": "markdown", "metadata": {}}, {"source": "$$ = Pr(\\frac{(\\overline{X}_1 - \\overline{X}_2) - \\Delta}{\\sqrt{\\frac{s_1}{n_1} + \\frac{s_2}{n_2}}} > C - \\frac{\\Delta}{\\sqrt{\\frac{s_1}{n_1} + \\frac{s_2}{n_2}}}) = $$", "cell_type": "markdown", "metadata": {}}, {"source": "$$= Pr(\\hat{t} > C - \\frac{\\Delta}{\\sqrt{\\frac{s_1}{n_1} + \\frac{s_2}{n_2}}})= Power$$", "cell_type": "markdown", "metadata": {}}, {"source": "Note that:\n- a smaller value of C has a larger power\n- the stronger the difference between the group means, the higher the power\n- the smaller the variances of the groups, the higher the power\n- the higher the number of samples of the groups, the higher the power", "cell_type": "markdown", "metadata": {}}, {"source": "Now suppose we would like to have a power of 80% to detect a certain positive difference between the group means, what is the sample size needed? For simplicity, let's assume that the sample sizes are equal for both groups.", "cell_type": "markdown", "metadata": {}}, {"source": "$$Power = Pr(\\hat{t} > C - \\frac{\\Delta}{\\sqrt{\\frac{s_1+s_2}{n}}}) = 0.8 $$", "cell_type": "markdown", "metadata": {}}, {"source": "Then", "cell_type": "markdown", "metadata": {}}, {"source": "$$ C - \\frac{\\Delta}{\\sqrt{\\frac{s_1+s_2}{n}}} = F^{-1}(1-0.8)$$", "cell_type": "markdown", "metadata": {}}, {"source": "$$n = \\frac{(C - F^{-1}(0.2))^2 (s_1 + s_2)}{\\Delta^2} $$", "cell_type": "markdown", "metadata": {}}, {"source": "Let's see a concrete example, say we have to group A and B", "cell_type": "markdown", "metadata": {}}, {"execution_count": 1, "cell_type": "code", "source": "from scipy.stats import norm\nfrom numpy import var, mean, sqrt\nfrom __future__ import division", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 2, "cell_type": "code", "source": "A = [3.27, 3.14, 3.15, 3.17, 3.15, 3.24]", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}, {"execution_count": 3, "cell_type": "code", "source": "B = [3.12, 3.13, 3.19, 3.21, 3.26, 3.30]", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}, {"source": "In our z-test we are using the threshold *treshold* to mark group B as a regression. Let't assume we would like a probability of *power* of detecting when the mean of group B is greater than the one of A by *delta*. What is the sample size we need in order to do that?", "cell_type": "markdown", "metadata": {}}, {"execution_count": 4, "cell_type": "code", "source": "treshold = 1.96\npower = 0.9\ndelta = 0.1", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}, {"execution_count": 5, "cell_type": "code", "source": "round(((treshold - norm.ppf(1 - power))**2*(var(A) + var(B)))/delta**2)", "outputs": [{"execution_count": 5, "output_type": "execute_result", "data": {"text/plain": "7.0"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"source": "This means that each group should have 7 samples. In reality the sample size is going to be larger than that as I used the Normal distribution instead of the t-distribution. That said, the same mechanics applies. It's rather easy to get a feeling of the sample size needed with a simple montecarlo simulation.", "cell_type": "markdown", "metadata": {}}, {"execution_count": null, "cell_type": "code", "source": "", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}], "nbformat": 4, "metadata": {"kernelspec": {"display_name": "Python 2", "name": "python2", "language": "python"}, "language_info": {"mimetype": "text/x-python", "nbconvert_exporter": "python", "version": "2.7.9", "name": "python", "file_extension": ".py", "pygments_lexer": "ipython2", "codemirror_mode": {"version": 2, "name": "ipython"}}}} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment