Skip to content

Instantly share code, notes, and snippets.

@araastat
Created November 21, 2013 05:24
Show Gist options
  • Save araastat/7576484 to your computer and use it in GitHub Desktop.
Save araastat/7576484 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "Sample Size"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<script type=\"text/javascript\">\n",
" on = \"View code\";\n",
" off = \"Hide code\"\n",
" function onoff(){\n",
" currentvalue = document.getElementById('onoff').value;\n",
" if(currentvalue == off){\n",
" document.getElementById(\"onoff\").value=on;\n",
" $('div.input').hide();\n",
" $('div.output_prompt').hide();\n",
" }else{\n",
" document.getElementById(\"onoff\").value=off;\n",
" $('div.input').show();\n",
" $('div.output_prompt').show();\n",
" }\n",
"}\n",
" window.onload = onoff;\n",
"</script>\n",
"<input type=\"button\" class=\"ui-button ui-widget ui-state-default ui-corner-all ui-button-text-only\" value=\"Hide code\" id=\"onoff\" onclick=\"onoff();\">\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Sample size computations\n",
"\n",
"## Two-sample t-test with unequal variances\n",
"\n",
"If we want to detect a difference $\\delta$ between two independent populations with standard deviations $\\sigma_1$ and $\\sigma_2$, with Type I error $\\alpha$ and power $1-\\beta$, then the common sample size can be computed as \n",
"\n",
"$$\n",
"n = (z_{1-\\alpha/2}+z_{1-\\beta})^2 \\frac{\\sigma_1^2+\\sigma_2^2}{\\delta^2}\n",
"$$\n",
"\n",
"This can be computed using the following function:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from scipy.stats import norm\n",
"from math import ceil\n",
"\n",
"def SS2S(delta, s1, s2, alpha=0.05, power=0.9):\n",
" za = norm.ppf(1-alpha/2.)\n",
" zb = norm.ppf(power)\n",
" n = (za+zb)**2 * (s1*s1+s2*s2)/(delta*delta)\n",
" n = ceil(n)\n",
" return n\n"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Sample size computations based on Arnedt et al, 2013\n",
"\n",
"## Insomnia Severity Index\n",
"\n",
"The insomnia severity index showed a post-treatment mean of 5.2 (sd=3.7) in the CBT group against a post-treatment mean of 7.8 (sd=4.9) in the IPC group. We will compute sample sizes to detect differences of 2, 2.5 and 3, using the standard deviations observed, at both 80% and 90% power"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from SScalc import *\n",
"Sample_Sizes = SS2Stable(D=[2,2.5,3], s1=3.7, s2=4.9, alpha=0.05, power = [0.8, 0.9])\n",
"Sample_Sizes"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>Sample size per arm</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Difference</th>\n",
" <th>Power</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">2</th>\n",
" <th>80.0%</th>\n",
" <td> 74</td>\n",
" </tr>\n",
" <tr>\n",
" <th>90.0%</th>\n",
" <td> 100</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">2.5</th>\n",
" <th>80.0%</th>\n",
" <td> 48</td>\n",
" </tr>\n",
" <tr>\n",
" <th>90.0%</th>\n",
" <td> 64</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">3</th>\n",
" <th>80.0%</th>\n",
" <td> 33</td>\n",
" </tr>\n",
" <tr>\n",
" <th>90.0%</th>\n",
" <td> 45</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"output_type": "pyout",
"prompt_number": 2,
"text": [
" Sample size per arm\n",
"Difference Power \n",
"2 80.0% 74\n",
" 90.0% 100\n",
"2.5 80.0% 48\n",
" 90.0% 64\n",
"3 80.0% 33\n",
" 90.0% 45"
]
}
],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Total Sleep Time\n",
"\n",
"The total sleep time, measured in minutes, shows a post-treatment average of 406.8 (sd=67) in the CBT group and 391.7 (sd=57.6) in the IPC group, i.e. an observed difference of 15.1 minutes. We will compute sample sizes for differences of 10, 15 and 20 minutes."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"Sample_Sizes = SS2Stable(D = [10,15,20], s1 = 67, s2=57.6, alpha=0.05, power=[0.8,0.9])\n",
"Sample_Sizes"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>Sample size per arm</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Difference</th>\n",
" <th>Power</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">10</th>\n",
" <th>80.0%</th>\n",
" <td> 613</td>\n",
" </tr>\n",
" <tr>\n",
" <th>90.0%</th>\n",
" <td> 821</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">15</th>\n",
" <th>80.0%</th>\n",
" <td> 273</td>\n",
" </tr>\n",
" <tr>\n",
" <th>90.0%</th>\n",
" <td> 365</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">20</th>\n",
" <th>80.0%</th>\n",
" <td> 154</td>\n",
" </tr>\n",
" <tr>\n",
" <th>90.0%</th>\n",
" <td> 206</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"output_type": "pyout",
"prompt_number": 3,
"text": [
" Sample size per arm\n",
"Difference Power \n",
"10 80.0% 613\n",
" 90.0% 821\n",
"15 80.0% 273\n",
" 90.0% 365\n",
"20 80.0% 154\n",
" 90.0% 206"
]
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now look at the kinds of differences we could statistically detect given the sample sizes we have planned for based on the Insomnia Severity Index"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"Diffs = Diff2Stable(N = [50,75,100], s1=67, s2=57.6, alpha=0.05, power=[0.8,0.9])\n",
"Diffs"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>Detectable difference</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Sample size</th>\n",
" <th>Power</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">50 </th>\n",
" <th>80.0%</th>\n",
" <td> 35.01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>90.0%</th>\n",
" <td> 40.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">75 </th>\n",
" <th>80.0%</th>\n",
" <td> 28.58</td>\n",
" </tr>\n",
" <tr>\n",
" <th>90.0%</th>\n",
" <td> 33.07</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">100</th>\n",
" <th>80.0%</th>\n",
" <td> 24.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>90.0%</th>\n",
" <td> 28.64</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"output_type": "pyout",
"prompt_number": 4,
"text": [
" Detectable difference\n",
"Sample size Power \n",
"50 80.0% 35.01\n",
" 90.0% 40.50\n",
"75 80.0% 28.58\n",
" 90.0% 33.07\n",
"100 80.0% 24.75\n",
" 90.0% 28.64"
]
}
],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also look at the statistical power we would have to detect differences like we have observed"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"Power = Pow2Stable(D = [10,15,20], N = [50,75,100], s1=67, s2=57.6)\n",
"Power"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>Power</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Sample size</th>\n",
" <th>Difference</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"3\" valign=\"top\">50 </th>\n",
" <th>10</th>\n",
" <td> 12.3%</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td> 22.4%</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td> 36.0%</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"3\" valign=\"top\">75 </th>\n",
" <th>10</th>\n",
" <td> 16.4%</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td> 31.2%</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td> 50.0%</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"3\" valign=\"top\">100</th>\n",
" <th>10</th>\n",
" <td> 20.4%</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td> 39.7%</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td> 61.9%</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"output_type": "pyout",
"prompt_number": 5,
"text": [
" Power\n",
"Sample size Difference \n",
"50 10 12.3%\n",
" 15 22.4%\n",
" 20 36.0%\n",
"75 10 16.4%\n",
" 15 31.2%\n",
" 20 50.0%\n",
"100 10 20.4%\n",
" 15 39.7%\n",
" 20 61.9%"
]
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Sleep efficiency \n",
"\n",
"Sleep efficiency is a percentage, so we would test differences using a two-sample test of proportions. The observed post-treatment efficiency in the IPC group was 83.6 percent. We will start with baseline control group proportions of 80% and 85%, and treatment group proportions of 90% and 95%"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"Sample_Sizes = SS2ptable(P1=[.8, .85], P2 = [.9, .95], alpha=0.05, power=[.8, .9])\n",
"Sample_Sizes"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th>Sample size per arm</th>\n",
" </tr>\n",
" <tr>\n",
" <th>P1</th>\n",
" <th>P2</th>\n",
" <th>Power</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"4\" valign=\"top\">0.80</th>\n",
" <th rowspan=\"2\" valign=\"top\">0.90</th>\n",
" <th>80.0%</th>\n",
" <td> 199</td>\n",
" </tr>\n",
" <tr>\n",
" <th>90.0%</th>\n",
" <td> 266</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">0.95</th>\n",
" <th>80.0%</th>\n",
" <td> 76</td>\n",
" </tr>\n",
" <tr>\n",
" <th>90.0%</th>\n",
" <td> 101</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"4\" valign=\"top\">0.85</th>\n",
" <th rowspan=\"2\" valign=\"top\">0.90</th>\n",
" <th>80.0%</th>\n",
" <td> 686</td>\n",
" </tr>\n",
" <tr>\n",
" <th>90.0%</th>\n",
" <td> 918</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">0.95</th>\n",
" <th>80.0%</th>\n",
" <td> 141</td>\n",
" </tr>\n",
" <tr>\n",
" <th>90.0%</th>\n",
" <td> 188</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"output_type": "pyout",
"prompt_number": 6,
"text": [
" Sample size per arm\n",
"P1 P2 Power \n",
"0.80 0.90 80.0% 199\n",
" 90.0% 266\n",
" 0.95 80.0% 76\n",
" 90.0% 101\n",
"0.85 0.90 80.0% 686\n",
" 90.0% 918\n",
" 0.95 80.0% 141\n",
" 90.0% 188"
]
}
],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As before, we will see the values of P2 (efficiency in the treatment group) we can detect given a range of sample sizes based on the ISI analysis"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"Diffs = Diff2ptable(N=[75,100], P1=[.8,.85], alpha=.05, power=[.8, .9])\n",
"Diffs"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th>Detectable P2</th>\n",
" </tr>\n",
" <tr>\n",
" <th>P1</th>\n",
" <th>Sample size</th>\n",
" <th>Power</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"4\" valign=\"top\">0.80</th>\n",
" <th rowspan=\"2\" valign=\"top\">75 </th>\n",
" <th>80.0%</th>\n",
" <td> 0.950</td>\n",
" </tr>\n",
" <tr>\n",
" <th>90.0%</th>\n",
" <td> 0.967</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">100</th>\n",
" <th>80.0%</th>\n",
" <td> 0.934</td>\n",
" </tr>\n",
" <tr>\n",
" <th>90.0%</th>\n",
" <td> 0.950</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"4\" valign=\"top\">0.85</th>\n",
" <th rowspan=\"2\" valign=\"top\">75 </th>\n",
" <th>80.0%</th>\n",
" <td> 0.977</td>\n",
" </tr>\n",
" <tr>\n",
" <th>90.0%</th>\n",
" <td> 0.991</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">100</th>\n",
" <th>80.0%</th>\n",
" <td> 0.964</td>\n",
" </tr>\n",
" <tr>\n",
" <th>90.0%</th>\n",
" <td> 0.977</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"output_type": "pyout",
"prompt_number": 7,
"text": [
" Detectable P2\n",
"P1 Sample size Power \n",
"0.80 75 80.0% 0.950\n",
" 90.0% 0.967\n",
" 100 80.0% 0.934\n",
" 90.0% 0.950\n",
"0.85 75 80.0% 0.977\n",
" 90.0% 0.991\n",
" 100 80.0% 0.964\n",
" 90.0% 0.977"
]
}
],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also look at the statistical power we would have to detect differences like we have observed."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"P = Pow2ptable(N = [50,75,100], P1=[.8, .85], P2 = [.9, .95], alpha=0.05)\n",
"P"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th>Power</th>\n",
" </tr>\n",
" <tr>\n",
" <th>P1</th>\n",
" <th>P2</th>\n",
" <th>Sample size</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"6\" valign=\"top\">0.80</th>\n",
" <th rowspan=\"3\" valign=\"top\">0.90</th>\n",
" <th>50 </th>\n",
" <td> 28.6%</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75 </th>\n",
" <td> 40.2%</td>\n",
" </tr>\n",
" <tr>\n",
" <th>100</th>\n",
" <td> 50.8%</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"3\" valign=\"top\">0.95</th>\n",
" <th>50 </th>\n",
" <td> 62.4%</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75 </th>\n",
" <td> 79.9%</td>\n",
" </tr>\n",
" <tr>\n",
" <th>100</th>\n",
" <td> 90.0%</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"6\" valign=\"top\">0.85</th>\n",
" <th rowspan=\"3\" valign=\"top\">0.90</th>\n",
" <th>50 </th>\n",
" <td> 11.4%</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75 </th>\n",
" <td> 15.0%</td>\n",
" </tr>\n",
" <tr>\n",
" <th>100</th>\n",
" <td> 18.6%</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"3\" valign=\"top\">0.95</th>\n",
" <th>50 </th>\n",
" <td> 38.3%</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75 </th>\n",
" <td> 53.3%</td>\n",
" </tr>\n",
" <tr>\n",
" <th>100</th>\n",
" <td> 65.6%</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"output_type": "pyout",
"prompt_number": 8,
"text": [
" Power\n",
"P1 P2 Sample size \n",
"0.80 0.90 50 28.6%\n",
" 75 40.2%\n",
" 100 50.8%\n",
" 0.95 50 62.4%\n",
" 75 79.9%\n",
" 100 90.0%\n",
"0.85 0.90 50 11.4%\n",
" 75 15.0%\n",
" 100 18.6%\n",
" 0.95 50 38.3%\n",
" 75 53.3%\n",
" 100 65.6%"
]
}
],
"prompt_number": 8
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment