Skip to content

Instantly share code, notes, and snippets.

@EderSantana
Last active March 20, 2016 03:48
Show Gist options
  • Save EderSantana/8250355c529d0e9f141b to your computer and use it in GitHub Desktop.
Save EderSantana/8250355c529d0e9f141b to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "",
"signature": "sha256:b54f13b4a9093bdac425f5ac66f5228fe92cd4b5bc85e6f75c3e47ad8f95f10f"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "code",
"collapsed": false,
"input": [
"%matplotlib inline\n",
"import numpy as np\n",
"import pandas"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Question 1"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Load the excel file and parse the first dataset\n",
"xl = pandas.ExcelFile('HW2_DataSet.xlsx')\n",
"print xl.sheet_names"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[u'DataSet1', u'DataSet2', u'DataSet3', u'DataSet4']\n"
]
}
],
"prompt_number": 2
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"xl = pandas.ExcelFile('HW2_DataSet.xlsx')\n",
"print xl.sheet_names"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[u'DataSet1', u'DataSet2', u'DataSet3', u'DataSet4']\n"
]
}
],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"dataset1 = xl.parse('DataSet1')\n",
"dataset1.T"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" <th>3</th>\n",
" <th>4</th>\n",
" <th>5</th>\n",
" <th>6</th>\n",
" <th>7</th>\n",
" <th>8</th>\n",
" <th>9</th>\n",
" <th>...</th>\n",
" <th>20</th>\n",
" <th>21</th>\n",
" <th>22</th>\n",
" <th>23</th>\n",
" <th>24</th>\n",
" <th>25</th>\n",
" <th>26</th>\n",
" <th>27</th>\n",
" <th>28</th>\n",
" <th>29</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Region</th>\n",
" <td> Anterior</td>\n",
" <td> Anterior</td>\n",
" <td> Anterior</td>\n",
" <td> Anterior</td>\n",
" <td> Anterior</td>\n",
" <td> Anterior</td>\n",
" <td> Posterior</td>\n",
" <td> Posterior</td>\n",
" <td> Posterior</td>\n",
" <td> Posterior</td>\n",
" <td>...</td>\n",
" <td> Lateral</td>\n",
" <td> Lateral</td>\n",
" <td> Lateral</td>\n",
" <td> Lateral</td>\n",
" <td> Intermediate Zone</td>\n",
" <td> Intermediate Zone</td>\n",
" <td> Intermediate Zone</td>\n",
" <td> Intermediate Zone</td>\n",
" <td> Intermediate Zone</td>\n",
" <td> Intermediate Zone</td>\n",
" </tr>\n",
" <tr>\n",
" <th>AnimalID</th>\n",
" <td> Animal 1</td>\n",
" <td> Animal 2</td>\n",
" <td> Animal 3</td>\n",
" <td> Animal 4</td>\n",
" <td> Animal 5</td>\n",
" <td> Animal 6</td>\n",
" <td> Animal 1</td>\n",
" <td> Animal 2</td>\n",
" <td> Animal 3</td>\n",
" <td> Animal 4</td>\n",
" <td>...</td>\n",
" <td> Animal 3</td>\n",
" <td> Animal 4</td>\n",
" <td> Animal 5</td>\n",
" <td> Animal 6</td>\n",
" <td> Animal 1</td>\n",
" <td> Animal 2</td>\n",
" <td> Animal 3</td>\n",
" <td> Animal 4</td>\n",
" <td> Animal 5</td>\n",
" <td> Animal 6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Stiffness</th>\n",
" <td> 15</td>\n",
" <td> 24</td>\n",
" <td> 38</td>\n",
" <td> 50</td>\n",
" <td> 20</td>\n",
" <td> 35</td>\n",
" <td> 15</td>\n",
" <td> 26</td>\n",
" <td> 41</td>\n",
" <td> 48</td>\n",
" <td>...</td>\n",
" <td> 25</td>\n",
" <td> 34</td>\n",
" <td> 14</td>\n",
" <td> 28</td>\n",
" <td> 8</td>\n",
" <td> 11</td>\n",
" <td> 26</td>\n",
" <td> 32</td>\n",
" <td> 13</td>\n",
" <td> 24</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>3 rows \u00d7 30 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 4,
"text": [
" 0 1 2 3 4 5 \\\n",
"Region Anterior Anterior Anterior Anterior Anterior Anterior \n",
"AnimalID Animal 1 Animal 2 Animal 3 Animal 4 Animal 5 Animal 6 \n",
"Stiffness 15 24 38 50 20 35 \n",
"\n",
" 6 7 8 9 ... 20 \\\n",
"Region Posterior Posterior Posterior Posterior ... Lateral \n",
"AnimalID Animal 1 Animal 2 Animal 3 Animal 4 ... Animal 3 \n",
"Stiffness 15 26 41 48 ... 25 \n",
"\n",
" 21 22 23 24 25 \\\n",
"Region Lateral Lateral Lateral Intermediate Zone Intermediate Zone \n",
"AnimalID Animal 4 Animal 5 Animal 6 Animal 1 Animal 2 \n",
"Stiffness 34 14 28 8 11 \n",
"\n",
" 26 27 28 \\\n",
"Region Intermediate Zone Intermediate Zone Intermediate Zone \n",
"AnimalID Animal 3 Animal 4 Animal 5 \n",
"Stiffness 26 32 13 \n",
"\n",
" 29 \n",
"Region Intermediate Zone \n",
"AnimalID Animal 6 \n",
"Stiffness 24 \n",
"\n",
"[3 rows x 30 columns]"
]
}
],
"prompt_number": 4
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"a) If you were restricted to a 1-way ANOVA, what is the factor and how many treatments are there within this factor? "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using 1-way ANOVA, here we consider 'Region' as the factor, thus, we have five treatments: 'Anterior', 'Posterior', 'Medial', 'Lateral' and 'Intermediate Zone'"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"b) Write the hypothesis for this 1-way ANOVA"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let us call the mean of each treatment mentioned above by a number: \n",
"$\\bar{x}_1:$ 'Anterior' \n",
"$\\bar{x}_2:$ 'Posterior' \n",
"$\\bar{x}_3:$ 'Medial' \n",
"$\\bar{x}_4:$ 'Lateral' \n",
"$\\bar{x}_5:$ 'Intermediate Zone' \n",
"\n",
"At our ANOVA test we have the following hypothesis: \n",
" $H_0:$ The means are equal $\\bar{x}_1 = \\bar{x}_2 = \\bar{x}_3 = \\bar{x}_4 = \\bar{x}_5$, \n",
" $H_a:$ At least one of the means is statistically different from the others,\n",
"\n",
"given a significance of $\\alpha = 0.05$"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"c) In Excel or by hand, calculate Sum of Squares Total and Sum of Squares Treatments for your 1-way ANOVA with region as the factor"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For educational purposes, let us conduct the test posed above step by step. First let us state the ANOVA identity and calculate each of its terms.\n",
"\n",
"Identity: $MS_{error} = MS_{total} - MS_{treatments}$,\n",
"\n",
"Now, let us parse each indivual treatment:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Parses treatments\n",
"Anterior = dataset1[ dataset1['Region']=='Anterior' ].Stiffness.values\n",
"Posterior = dataset1[ dataset1['Region']=='Posterior' ].Stiffness.values\n",
"Medial = dataset1[ dataset1['Region']=='Medial' ].Stiffness.values\n",
"Lateral = dataset1[ dataset1['Region']=='Lateral' ].Stiffness.values\n",
"InterZone = dataset1[ dataset1['Region']=='Intermediate Zone' ].Stiffness.values\n",
"\n",
"# Calculate means\n",
"x1 = Anterior.mean()\n",
"x2 = Posterior.mean()\n",
"x3 = Medial.mean()\n",
"x4 = Lateral.mean()\n",
"x5 = InterZone.mean()\n",
"\n",
"# Total mean\n",
"x_total = dataset1['Stiffness'].mean() #we can do this because all treatments have the same number of observations.\n",
"\n",
"# Print results\n",
"t = ['Anterior', 'Posterior', 'Meidal', 'Lateral', 'Intermediate Zone', 'Total']\n",
"means = [x1, x2, x3, x4, x5, x_total]\n",
"for i in range(6):\n",
" print \"The mean of %s is %.3f\" % (t[i], means[i])"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"The mean of Anterior is 30.333\n",
"The mean of Posterior is 31.667\n",
"The mean of Meidal is 24.167\n",
"The mean of Lateral is 19.833\n",
"The mean of Intermediate Zone is 19.000\n",
"The mean of Total is 25.000\n"
]
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With the values above, we can calculate $MS_{error}$ and $MS_{treatments}$. The quantities necessaries for our test. Noticing that the number of treatments is $n=5$ and the number of samples per treatment is $k=6$, we have\n",
"\n",
"$MS_{treatments} = \\displaystyle\\frac{k\\sum_{i=1}^n (\\bar{x}_{i.}-\\bar{x}_{..})^2}{n-1}$ :"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"n = 5.\n",
"k = 6.\n",
"MSt = ( k * np.sum(([x1,x2,x3,x4,x5] - x_total)**2) ) / ( n-1 ) \n",
"print 'Mean Square Treatment is %.3f' % MSt"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Mean Square Treatment is 204.417\n"
]
}
],
"prompt_number": 6
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"d) In Excel or by hand, calculate the Sum of Squares Error using the Sum of Squares Identity"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The $MS_{error}$ is calculated as \n",
"\n",
"$MS_{error} = \\displaystyle\\frac{\\sum_{i=1}^k \\sum_{j=1}^n (x_{ij} - \\bar{x}_{i.})^2}{n(k-1)}$:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# The terms involved in the summation of the numerator are\n",
"x_A = np.sum((Anterior - x1)**2)\n",
"x_P = np.sum((Posterior - x2)**2)\n",
"x_M = np.sum((Medial - x3)**2)\n",
"x_L = np.sum((Lateral - x4)**2)\n",
"x_I = np.sum((InterZone - x5)**2)\n",
"\n",
"MSe = (x_A + x_P + x_M + x_L + x_I) / (n*(k-1))\n",
"print 'Mean Square Error is %.3f' % MSe"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Mean Square Error is 128.333\n"
]
}
],
"prompt_number": 7
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"e) In Excel or by hand, calculate the F statistic for this 1-way ANOVA "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The next step is calculating $F_{score} = \\displaystyle\\frac{MS_{treatment}}{MS_{error}}$"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"F_score = MSt / MSe\n",
"print 'The F_score is %.3f' % F_score"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"The F_score is 1.593\n"
]
}
],
"prompt_number": 8
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"f) Based upon your F-statistic, will you reject or fail to reject the null"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Given the degrees of freedom $df_1 = n - 1 = 5$ and $df_2 = n(k-1) = 25$ and significance level $\\alpha=0.05$, the critical F value is $F_{critical} = 2.603$. Thus, since $F_{score} < F_{critical}$ we $\\textbf{fail to reject the null hypothesis}$."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"g) In Minitab or Statistica, confirm your work in Excel by running the analogous 1-way ANOVA "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The process described from c) to f) can be simplified using a single command using Scipy's statistics module:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from scipy import stats\n",
"f_value, p_value = stats.f_oneway(Anterior, Posterior, Medial, Lateral, InterZone) \n",
"print 'The F_value is %.3f and the p_value is %.3f' % (f_value, p_value)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"The F_value is 1.593 and the p_value is 0.207\n"
]
}
],
"prompt_number": 9
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Again, notice that $p_{value} > \\alpha$, we $\\textbf{fail to reject the null hypothesis}$"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"h)\tAccounting for inter-subject variability, how many factors are in your new ANOVA model "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are two factors, the Region and the Animal ID"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"i) Will this be a main effects ANOVA or a full-factorial ANOVA"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since we will not consider multi-factor interactions (first and foremost because we don't have replicates), our analysis willl be a Main Effects ANOVA."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"j) Describe the hypothesis or hypotheses for this ANOVA model "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We test the following hypothesis:\n",
"\n",
"$H_0:$ The region factor has no effect on the measured Stiffness. $SS_{Stiffness}=0$ \n",
"against \n",
"$H_a:$ The region factor does have an effect. $SS_{Stiffness}\\ne 0$ \n",
"Where $\\alpha=0.05$\n"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"k) In Minitab or Statistica, run the new ANOVA model and report your results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we actually choose to go with the module called 'statsmodels' that works well in conjunction with the other toolboxes that we are using."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from statsmodels.formula.api import ols\n",
"from statsmodels.stats.anova import anova_lm\n",
"formula = 'Stiffness ~ C(Region) + C(AnimalID)'\n",
"lm = ols(formula, dataset1).fit()\n",
"print(anova_lm(lm))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
" df sum_sq mean_sq F PR(>F)\n",
"C(Region) 4 817.666667 204.416667 38.448276 4.012137e-09\n",
"C(AnimalID) 5 3102.000000 620.400000 116.689655 4.333069e-14\n",
"Residual 20 106.333333 5.316667 NaN NaN\n"
]
}
],
"prompt_number": 10
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since $p \\approx 4\\cdot 10^{-9} < 0.05$ we $\\textbf{reject the null hypothesis}$"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"l) Did including interspecies variation in the ANOVA model change your results? Why or why not? "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Including the interspecies variation did change our results. Different animals with different body constitutions may have different reactions to the same treatment. For instance, we could judge the treatment as totally inefficient when it could be just one specie where the treatment has no effect."
]
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Question 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Solving first question without ANOVA"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"dataset2 = xl.parse('DataSet2')\n",
"dataset2.T"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" <th>3</th>\n",
" <th>4</th>\n",
" <th>5</th>\n",
" <th>6</th>\n",
" <th>7</th>\n",
" <th>8</th>\n",
" <th>9</th>\n",
" <th>...</th>\n",
" <th>50</th>\n",
" <th>51</th>\n",
" <th>52</th>\n",
" <th>53</th>\n",
" <th>54</th>\n",
" <th>55</th>\n",
" <th>56</th>\n",
" <th>57</th>\n",
" <th>58</th>\n",
" <th>59</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Treatment</th>\n",
" <td> Low Glucose DMEM</td>\n",
" <td> Low Glucose DMEM</td>\n",
" <td> Low Glucose DMEM</td>\n",
" <td> Low Glucose DMEM</td>\n",
" <td> Low Glucose DMEM</td>\n",
" <td> Low Glucose DMEM</td>\n",
" <td> Low Glucose DMEM</td>\n",
" <td> Low Glucose DMEM</td>\n",
" <td> Low Glucose DMEM</td>\n",
" <td> Low Glucose DMEM</td>\n",
" <td>...</td>\n",
" <td> High Glucose RPMI</td>\n",
" <td> High Glucose RPMI</td>\n",
" <td> High Glucose RPMI</td>\n",
" <td> High Glucose RPMI</td>\n",
" <td> High Glucose RPMI</td>\n",
" <td> High Glucose RPMI</td>\n",
" <td> High Glucose RPMI</td>\n",
" <td> High Glucose RPMI</td>\n",
" <td> High Glucose RPMI</td>\n",
" <td> High Glucose RPMI</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Millions of Cells after 1 Wk in Culture</th>\n",
" <td> 20</td>\n",
" <td> 19</td>\n",
" <td> 27</td>\n",
" <td> 20</td>\n",
" <td> 19</td>\n",
" <td> 21</td>\n",
" <td> 14</td>\n",
" <td> 29</td>\n",
" <td> 22</td>\n",
" <td> 18</td>\n",
" <td>...</td>\n",
" <td> 49</td>\n",
" <td> 32</td>\n",
" <td> 20</td>\n",
" <td> 21</td>\n",
" <td> 34</td>\n",
" <td> 36</td>\n",
" <td> 26</td>\n",
" <td> 26</td>\n",
" <td> 23</td>\n",
" <td> 32</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>2 rows \u00d7 60 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 11,
"text": [
" 0 1 \\\n",
"Treatment Low Glucose DMEM Low Glucose DMEM \n",
"Millions of Cells after 1 Wk in Culture 20 19 \n",
"\n",
" 2 3 \\\n",
"Treatment Low Glucose DMEM Low Glucose DMEM \n",
"Millions of Cells after 1 Wk in Culture 27 20 \n",
"\n",
" 4 5 \\\n",
"Treatment Low Glucose DMEM Low Glucose DMEM \n",
"Millions of Cells after 1 Wk in Culture 19 21 \n",
"\n",
" 6 7 \\\n",
"Treatment Low Glucose DMEM Low Glucose DMEM \n",
"Millions of Cells after 1 Wk in Culture 14 29 \n",
"\n",
" 8 9 \\\n",
"Treatment Low Glucose DMEM Low Glucose DMEM \n",
"Millions of Cells after 1 Wk in Culture 22 18 \n",
"\n",
" ... 50 \\\n",
"Treatment ... High Glucose RPMI \n",
"Millions of Cells after 1 Wk in Culture ... 49 \n",
"\n",
" 51 52 \\\n",
"Treatment High Glucose RPMI High Glucose RPMI \n",
"Millions of Cells after 1 Wk in Culture 32 20 \n",
"\n",
" 53 54 \\\n",
"Treatment High Glucose RPMI High Glucose RPMI \n",
"Millions of Cells after 1 Wk in Culture 21 34 \n",
"\n",
" 55 56 \\\n",
"Treatment High Glucose RPMI High Glucose RPMI \n",
"Millions of Cells after 1 Wk in Culture 36 26 \n",
"\n",
" 57 58 \\\n",
"Treatment High Glucose RPMI High Glucose RPMI \n",
"Millions of Cells after 1 Wk in Culture 26 23 \n",
"\n",
" 59 \n",
"Treatment High Glucose RPMI \n",
"Millions of Cells after 1 Wk in Culture 32 \n",
"\n",
"[2 rows x 60 columns]"
]
}
],
"prompt_number": 11
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"a) For a given media type, does glucose have an effect on the number of cell divisions? Describe your hypothesis or hypotheses that you are testing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Given a media, we calculate the number of cells after one week. We call this quantity $x$. The mean value of $x$ for a low glucose level is $\\bar{x}_L$, the mean value for a high glucose level is $\\bar{x}_H$,. Thus, our hypothesis would be: \n",
"\n",
"$H_0:$ Glucose does not have an effect on the number of cells after one week: $\\bar{x}_L = \\bar{x}_H$ \n",
"and \n",
"$H_a:$ Glucose does have an effect. $\\bar{x}_L \\ne \\bar{x}_H$ \n",
"Given a significance level of $\\alpha = 0.05$\n",
"\n",
"Assuming that we could not run an ANOVA test, we should run a t-test following the hypothesis above for each media type."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"b) Calculate the power for each test"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The power of a test is the probability $P(\\text{Reject }H_0 | H_0\\text{ is False})$. To calculate this probability, we first need to calculate $d = \\displaystyle\\frac{\\bar{x}_1 - \\bar{x}_2}{s}$ for each one of the 3 types of media. Note that since we have two independent measurements we estimate s as $s = \\displaystyle\\sqrt{\\frac{s_1^2 + s_2^2}{2}}$"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np\n",
"d = np.zeros(3) # initialize values of d\n",
"measurements = dataset2.values[:,1]\n",
"media_type = ['DMEM', 'F12', 'RPMI']\n",
"c = 0\n",
"for i in range(0,6,2):\n",
" m1 = measurements[i*5:(i+1)*5]\n",
" m2 = measurements[(i+1)*5:(i+2)*5]\n",
" d[c] = abs((np.mean(m1) - np.mean(m2))/(np.sqrt(np.std(m1,ddof=1)+np.std(m2,ddof=1))/2 ))\n",
" print \"The value for d for %s is %.3f\" % (media_type[c], d[c])\n",
" c+=1"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"The value for d for DMEM is 0.134\n",
"The value for d for F12 is 2.750\n",
"The value for d for RPMI is 6.371\n"
]
}
],
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Given the above effect sizes we can calculate the power of each statistical test as using `statsmodels`"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import statsmodels.stats.power as smp\n",
"Powers = np.zeros(3)\n",
"for i in range(3):\n",
" # Caclulate statistical power\n",
" Powers[i] = smp.TTestIndPower().solve_power(d[i], nobs1=10, ratio=1, alpha=0.05, alternative='two-sided')\n",
" # Print message\n",
" print 'The statiscal power for media type %s is %.3f' % (media_type[i], Powers[i])"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"The statiscal power for media type DMEM is 0.059\n",
"The statiscal power for media type F12 is 1.000\n",
"The statiscal power for media type RPMI is 1.000\n"
]
}
],
"prompt_number": 13
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"c) Which type of statistical error should you be concerned about in this test? "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since the statistical Power seems to be pretty high for all kinds fo media and given that Type two error probability is $1-P_{Power}$, we should be more concerned about Type I error."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"d) If you were to correct for compounding type 1 errors using the Bonferroni correction, would your result have changed? "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Bonferroni correction resets the value of $\\alpha$. The value of $\\beta$ (the probability of Type II error) depends on the $t_{critical,\\alpha/2}$, thus, we would have differnt results in case we had several repeated comparsions."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"e) For a given glucose level, does the media type have an effect on the number of cell divisions? Describe your hypothesis or hypotheses that you are testing "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Hypothesis test would be: \n",
"Given a glucose level, we calculate the number of cells after one week. We call this quantity $x$. The mean value of $x$ for media DMEM is $\\bar{x}_1$, for media F12 we have $\\bar{x}_2$ and $\\bar{x}_3$ for RPMI. Thus, we have to test 3 hypothesis (remember no ANOVA allowed yet): \n",
"\n",
"* Hypothesis 1 \n",
"$H_0:$ DMEM and F12 have the same effect on the number of cells after one week: $\\bar{x}_1=\\bar{x}_2$ \n",
"and \n",
"$H_a:$ Those two different media have different effects. $\\bar{x}_1 \\ne \\bar{x}_2$ \n",
"Given a significance level of $\\alpha = 0.05$\n",
"\n",
"\n",
"* Hypothesis 2 \n",
"$H_0:$ DMEM and RPMI have the same effect on the number of cells after one week: $\\bar{x}_1=\\bar{x}_3$ \n",
"and \n",
"$H_a:$ Those two different media have different effects. $\\bar{x}_1 \\ne \\bar{x}_3$ \n",
"Given a significance level of $\\alpha = 0.05$\n",
"\n",
"\n",
"* Hypothesis 3 \n",
"$H_0:$ F12 and RPMI have the same effect on the number of cells after one week: $\\bar{x}_2=\\bar{x}_3$ \n",
"and \n",
"$H_a:$ Those two different media have different effects. $\\bar{x}_2 \\ne \\bar{x}_3$ \n",
"Given a significance level of $\\alpha = 0.05$\n",
"\n",
"We should run those t-tests for low and high glucose levels."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"f) Calculate the power for each test "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"At b) we dealt with dataset format programatically, here we changed the dataset formatting for easier calculations."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"xl2 = pandas.ExcelFile('edited_datasets.xlsx')\n",
"dataset2_v2 = xl2.parse('Sheet1')\n",
"dataset2_v2.T"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" <th>3</th>\n",
" <th>4</th>\n",
" <th>5</th>\n",
" <th>6</th>\n",
" <th>7</th>\n",
" <th>8</th>\n",
" <th>9</th>\n",
" <th>...</th>\n",
" <th>50</th>\n",
" <th>51</th>\n",
" <th>52</th>\n",
" <th>53</th>\n",
" <th>54</th>\n",
" <th>55</th>\n",
" <th>56</th>\n",
" <th>57</th>\n",
" <th>58</th>\n",
" <th>59</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>GlucoseLevel</th>\n",
" <td> Low</td>\n",
" <td> Low</td>\n",
" <td> Low</td>\n",
" <td> Low</td>\n",
" <td> Low</td>\n",
" <td> Low</td>\n",
" <td> Low</td>\n",
" <td> Low</td>\n",
" <td> Low</td>\n",
" <td> Low</td>\n",
" <td>...</td>\n",
" <td> High</td>\n",
" <td> High</td>\n",
" <td> High</td>\n",
" <td> High</td>\n",
" <td> High</td>\n",
" <td> High</td>\n",
" <td> High</td>\n",
" <td> High</td>\n",
" <td> High</td>\n",
" <td> High</td>\n",
" </tr>\n",
" <tr>\n",
" <th>MediaType</th>\n",
" <td> DMEM</td>\n",
" <td> DMEM</td>\n",
" <td> DMEM</td>\n",
" <td> DMEM</td>\n",
" <td> DMEM</td>\n",
" <td> DMEM</td>\n",
" <td> DMEM</td>\n",
" <td> DMEM</td>\n",
" <td> DMEM</td>\n",
" <td> DMEM</td>\n",
" <td>...</td>\n",
" <td> RPMI</td>\n",
" <td> RPMI</td>\n",
" <td> RPMI</td>\n",
" <td> RPMI</td>\n",
" <td> RPMI</td>\n",
" <td> RPMI</td>\n",
" <td> RPMI</td>\n",
" <td> RPMI</td>\n",
" <td> RPMI</td>\n",
" <td> RPMI</td>\n",
" </tr>\n",
" <tr>\n",
" <th>NumberOfCellsAfter1wk</th>\n",
" <td> 20</td>\n",
" <td> 19</td>\n",
" <td> 27</td>\n",
" <td> 20</td>\n",
" <td> 19</td>\n",
" <td> 21</td>\n",
" <td> 14</td>\n",
" <td> 29</td>\n",
" <td> 22</td>\n",
" <td> 18</td>\n",
" <td>...</td>\n",
" <td> 49</td>\n",
" <td> 32</td>\n",
" <td> 20</td>\n",
" <td> 21</td>\n",
" <td> 34</td>\n",
" <td> 36</td>\n",
" <td> 26</td>\n",
" <td> 26</td>\n",
" <td> 23</td>\n",
" <td> 32</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>3 rows \u00d7 60 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 14,
"text": [
" 0 1 2 3 4 5 6 7 8 \\\n",
"GlucoseLevel Low Low Low Low Low Low Low Low Low \n",
"MediaType DMEM DMEM DMEM DMEM DMEM DMEM DMEM DMEM DMEM \n",
"NumberOfCellsAfter1wk 20 19 27 20 19 21 14 29 22 \n",
"\n",
" 9 ... 50 51 52 53 54 55 56 \\\n",
"GlucoseLevel Low ... High High High High High High High \n",
"MediaType DMEM ... RPMI RPMI RPMI RPMI RPMI RPMI RPMI \n",
"NumberOfCellsAfter1wk 18 ... 49 32 20 21 34 36 26 \n",
"\n",
" 57 58 59 \n",
"GlucoseLevel High High High \n",
"MediaType RPMI RPMI RPMI \n",
"NumberOfCellsAfter1wk 26 23 32 \n",
"\n",
"[3 rows x 60 columns]"
]
}
],
"prompt_number": 14
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"Powers = np.zeros(6)\n",
"glucose_level = ['Low', 'High']\n",
"c = 0\n",
"x = s = np.zeros((2,3))\n",
"for i in range(2):\n",
" for j in range(3):\n",
" idx1 = dataset2_v2['MediaType'] == media_type[j]\n",
" idx2 = dataset2_v2['GlucoseLevel'] == glucose_level[i]\n",
" m = dataset2_v2[ np.logical_and(idx1 , idx2) ].values[:,2]\n",
" x[i,j] = m.mean()\n",
" s[i,j] = m.std(ddof=1)\n",
" \n",
"for i in range(2):\n",
" for j in range(3):\n",
" for k in range(3):\n",
" if j is not k:\n",
" d = (x[i,j] - x[i,k]) / np.sqrt(( s[i,j] + s[i,k])/2)\n",
" power = smp.TTestIndPower().solve_power(d, nobs1=10, ratio=1, alpha=0.05, alternative='two-sided')\n",
" print 'The power for %s Glucose level for the the test comparing %s and %s is %f' % (glucose_level[i], media_type[j], media_type[k], power)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"The power for Low Glucose level for the the test comparing DMEM and F12 is 0.615480\n",
"The power for Low Glucose level for the the test comparing DMEM and RPMI is 0.054402\n",
"The power for Low Glucose level for the the test comparing F12 and DMEM is 0.615480\n",
"The power for Low Glucose level for the the test comparing F12 and RPMI is 0.540734\n",
"The power for Low Glucose level for the the test comparing RPMI and DMEM is 0.054402\n",
"The power for Low Glucose level for the the test comparing RPMI and F12 is 0.540734\n",
"The power for High Glucose level for the the test comparing DMEM and F12 is 0.080368\n",
"The power for High Glucose level for the the test comparing DMEM and RPMI is 0.314851\n",
"The power for High Glucose level for the the test comparing F12 and DMEM is 0.080368\n",
"The power for High Glucose level for the the test comparing F12 and RPMI is 0.162611\n",
"The power for High Glucose level for the the test comparing RPMI and DMEM is 0.314851\n",
"The power for High Glucose level for the the test comparing RPMI and F12 is 0.162611\n"
]
}
],
"prompt_number": 15
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"g) Which type of statistical error should you be concerned about in this test?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The statistical power here seems to be pretty low, when compared to the ones realized before. Again, since the probability $\\beta = 1-P_{power}$, we should be concerned with type II error."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"h) If you were to correct for compounding type 1 errors using the Bonferroni correction, would your result have changed?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Again, Bonferroni correction resets the value of $\\alpha$. The value of $\\beta$ (the probability of Type II error) depends on the $t_{critical,\\alpha/2}$, thus, we would have differnt results in case we had several repeated comparsions."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"i) Design and conduct a single ANOVA to test the questions above. For the ANOVA, describe the factors and the treatments within each factor. Will you include interactions? Why or why not? "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To condutct our test we consider the Glucose level and the Media type as the factors, thus the different treatments for each media type are either low or high glucose. Given, those 2 glucose levels and 3 media types, the combination of those factors gives us 6 means $\\bar{x}_1, \\bar{x}_2, \\bar{x}_3, \\bar{x}_4, \\bar{x}_5, \\bar{x}_6$. Our hypothesis are\n",
"\n",
"$H_0:$ the factors have no effects on the total number of cells after one week: $\\bar{x}_1=\\bar{x}_2=\\bar{x}_3=\\bar{x}_4=\\bar{x}_5=\\bar{x}_6$ \n",
"and \n",
"$H_a:$ at least one of the factors has an effect. \n",
"With $\\alpha=0.5$\n",
"\n",
"Note that here we include interactions because we want to be sure that the effect on cell division is due to glucose by itself and not by a combination of glucose plus a specific media type. Also, we have replicate measurement that allows us to do so."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"formula = 'NumberOfCellsAfter1wk ~ C(GlucoseLevel) + C(MediaType) + C(GlucoseLevel)*C(MediaType)'\n",
"lm = ols(formula, dataset2_v2).fit()\n",
"print(anova_lm(lm))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
" df sum_sq mean_sq F PR(>F)\n",
"C(GlucoseLevel) 1 968.016667 968.016667 22.098034 0.000018\n",
"C(MediaType) 2 4.433333 2.216667 0.050602 0.950702\n",
"C(GlucoseLevel):C(MediaType) 2 435.033333 217.516667 4.965504 0.010481\n",
"Residual 54 2365.500000 43.805556 NaN NaN\n"
]
}
],
"prompt_number": 16
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Noticing above that $p = 0.000018 < 0.05$ we $\\textbf{reject the null hypothesis}$ that Glucose level has no effect. In case of doubt, note that the p-values are in the right-most column above. NaN are not applied values."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"j) For your ANOVA, does glucose have an effect on the number of cell divisions? "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we can see in the p-values above, $\\textbf{reject the null hypothesis}$ for Glucose level, thus, it has a possible effect on the number of cells, while the media type and the interactions do not."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"k) For your ANOVA, does the media type have an effect on the number of cell divisions?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Again, we $\\textbf{fail to reject null}$ while considering any effect of the media type factor (p=0.9507)."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"l) In which factor/interactions will you test differences between groups? "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We test the differences between LOW and HIGH glucose levels, since this is the factor that seems to present significant effect on the number of cells after one week."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"m) What is the new alpha level for your tests of differences between groups? "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Bonferroni correction resets $\\alpha_{new} = \\displaystyle\\frac{\\alpha_{old}}{N_{comparisons}}$. Since we have 3 comparisons, one for each media type, the correction becomes the following:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print \"The new value of alpha is %f\" % (0.05/3)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"The new value of alpha is 0.016667\n"
]
}
],
"prompt_number": 17
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Question 3"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"dataset3 = xl.parse('DataSet3')\n",
"dataset3.T"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr>\n",
" <th></th>\n",
" <th>Animal 1</th>\n",
" <th>Animal 2</th>\n",
" <th>Animal 3</th>\n",
" <th>Animal 4</th>\n",
" <th>Animal 5</th>\n",
" <th>Animal 6</th>\n",
" <th>Animal 7</th>\n",
" <th>Animal 8</th>\n",
" <th>Animal 9</th>\n",
" <th>Animal 10</th>\n",
" <th>Animal 11</th>\n",
" <th>Animal 12</th>\n",
" <th>Animal 13</th>\n",
" <th>Animal 14</th>\n",
" <th>Animal 15</th>\n",
" <th>Animal 16</th>\n",
" <th>Animal 17</th>\n",
" <th>Animal 18</th>\n",
" </tr>\n",
" <tr>\n",
" <th></th>\n",
" <th>IL1Ra</th>\n",
" <th>IL1Ra</th>\n",
" <th>IL1Ra</th>\n",
" <th>IL1Ra</th>\n",
" <th>IL1Ra</th>\n",
" <th>IL1Ra</th>\n",
" <th>sIL1RII</th>\n",
" <th>sIL1RII</th>\n",
" <th>sIL1RII</th>\n",
" <th>sIL1RII</th>\n",
" <th>sIL1RII</th>\n",
" <th>sIL1RII</th>\n",
" <th>sIL6R</th>\n",
" <th>sIL6R</th>\n",
" <th>sIL6R</th>\n",
" <th>sIL6R</th>\n",
" <th>sIL6R</th>\n",
" <th>sIL6R</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Day 1</th>\n",
" <td> 2.5</td>\n",
" <td> 2.6</td>\n",
" <td> 2.7</td>\n",
" <td> 2.8</td>\n",
" <td> 2.3</td>\n",
" <td> 2.4</td>\n",
" <td> 2.5</td>\n",
" <td> 2.5</td>\n",
" <td> 2.6</td>\n",
" <td> 2.9</td>\n",
" <td> 2.3</td>\n",
" <td> 2.3</td>\n",
" <td> 2.3</td>\n",
" <td> 2.4</td>\n",
" <td> 2.7</td>\n",
" <td> 2.7</td>\n",
" <td> 2.5</td>\n",
" <td> 2.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Day 3</th>\n",
" <td> 2.0</td>\n",
" <td> 1.9</td>\n",
" <td> 1.8</td>\n",
" <td> 1.9</td>\n",
" <td> 2.0</td>\n",
" <td> 1.9</td>\n",
" <td> 2.3</td>\n",
" <td> 2.2</td>\n",
" <td> 2.3</td>\n",
" <td> 2.0</td>\n",
" <td> 2.1</td>\n",
" <td> 2.0</td>\n",
" <td> 2.3</td>\n",
" <td> 2.4</td>\n",
" <td> 2.3</td>\n",
" <td> 2.2</td>\n",
" <td> 2.3</td>\n",
" <td> 2.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Day 5</th>\n",
" <td> 1.6</td>\n",
" <td> 1.6</td>\n",
" <td> 1.5</td>\n",
" <td> 1.7</td>\n",
" <td> 1.5</td>\n",
" <td> 1.6</td>\n",
" <td> 2.0</td>\n",
" <td> 1.9</td>\n",
" <td> 2.0</td>\n",
" <td> 1.8</td>\n",
" <td> 1.9</td>\n",
" <td> 1.8</td>\n",
" <td> 2.1</td>\n",
" <td> 2.2</td>\n",
" <td> 2.1</td>\n",
" <td> 2.0</td>\n",
" <td> 2.1</td>\n",
" <td> 2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Day 7</th>\n",
" <td> 1.5</td>\n",
" <td> 1.5</td>\n",
" <td> 1.5</td>\n",
" <td> 1.6</td>\n",
" <td> 1.5</td>\n",
" <td> 1.6</td>\n",
" <td> 1.6</td>\n",
" <td> 1.7</td>\n",
" <td> 1.7</td>\n",
" <td> 1.6</td>\n",
" <td> 1.6</td>\n",
" <td> 1.5</td>\n",
" <td> 1.7</td>\n",
" <td> 1.8</td>\n",
" <td> 1.7</td>\n",
" <td> 1.6</td>\n",
" <td> 1.9</td>\n",
" <td> 1.7</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 18,
"text": [
" Animal 1 Animal 2 Animal 3 Animal 4 Animal 5 Animal 6 Animal 7 \\\n",
" IL1Ra IL1Ra IL1Ra IL1Ra IL1Ra IL1Ra sIL1RII \n",
"Day 1 2.5 2.6 2.7 2.8 2.3 2.4 2.5 \n",
"Day 3 2.0 1.9 1.8 1.9 2.0 1.9 2.3 \n",
"Day 5 1.6 1.6 1.5 1.7 1.5 1.6 2.0 \n",
"Day 7 1.5 1.5 1.5 1.6 1.5 1.6 1.6 \n",
"\n",
" Animal 8 Animal 9 Animal 10 Animal 11 Animal 12 Animal 13 \\\n",
" sIL1RII sIL1RII sIL1RII sIL1RII sIL1RII sIL6R \n",
"Day 1 2.5 2.6 2.9 2.3 2.3 2.3 \n",
"Day 3 2.2 2.3 2.0 2.1 2.0 2.3 \n",
"Day 5 1.9 2.0 1.8 1.9 1.8 2.1 \n",
"Day 7 1.7 1.7 1.6 1.6 1.5 1.7 \n",
"\n",
" Animal 14 Animal 15 Animal 16 Animal 17 Animal 18 \n",
" sIL6R sIL6R sIL6R sIL6R sIL6R \n",
"Day 1 2.4 2.7 2.7 2.5 2.6 \n",
"Day 3 2.4 2.3 2.2 2.3 2.1 \n",
"Day 5 2.2 2.1 2.0 2.1 2.0 \n",
"Day 7 1.8 1.7 1.6 1.9 1.7 "
]
}
],
"prompt_number": 18
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"a) How many factors do you have in this experiment "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this experiment we could have 3 factors: 'Day', 'Animal Number' and 'Anti-inflammatory'. But since we didn't use all the drugs for all the animals, we stick with TWO FACTORS ONLY: 'Day/Time' and 'Anti-inflammatory'"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"b) For your selected number of factors, can you test interactions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For the case of 2 factors, we can test interactions since we do have replicate experiments (through different animals, though)."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"c) Using a 2-way Main Effects ANOVA to test drug effects and time effects. Describe the main effect hypothesis or hypotheses"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Naming $\\bar{x}_{1,i}$ the mean value for the i-th 'Day' and $\\bar{x}_{2,j}$ the mean value associated with factor 'Drug' `j`.\n",
"\n",
"$H_0:$ The analyzed factors have no effects. All the $\\bar{x}_{1,i}$ and $\\bar{x}_{2,j}$ means are equal. \n",
"and \n",
"$H_a:$ At least one of the means is different. \n",
"with $\\alpha=0.05$ "
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"d) Using Minitab or Statistica, test your main effects hypotheses "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to solve this problem, let us first format the dataset"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"days = [['Day 1'], ['Day 3'], ['Day 5'], ['Day 7']]\n",
"drugs = [6*['IL1Ra'], 6*['sIL1RII'], 6*['sIL6R']]\n",
"drugs = [item for sublist in drugs for item in sublist]\n",
"animals = ['Animal' + str(i) for i in range(1,19)]\n",
"days_list = []\n",
"drugs_list = []\n",
"animals_list = []\n",
"val = np.asarray([])\n",
"for t in days:\n",
" val = np.hstack([val, dataset3[t].values.T[0]])\n",
" animals_list += animals\n",
" drugs_list += drugs\n",
" days_list += 18 * t\n",
" '''\n",
" for d in drugs:\n",
" c = 0\n",
" for a in animals:\n",
" days_list += t\n",
" drugs_list += d\n",
" animals_list += a\n",
" val += [dataset3[t].values[c]] \n",
" c += 1\n",
" '''\n",
"formatted_dataset3 = pandas.DataFrame.from_dict({'Animal': animals_list,\n",
" 'Drug': drugs_list,\n",
" 'Day': days_list,\n",
" 'Values': val})"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 19
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"formatted_dataset3.T"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" <th>3</th>\n",
" <th>4</th>\n",
" <th>5</th>\n",
" <th>6</th>\n",
" <th>7</th>\n",
" <th>8</th>\n",
" <th>9</th>\n",
" <th>...</th>\n",
" <th>62</th>\n",
" <th>63</th>\n",
" <th>64</th>\n",
" <th>65</th>\n",
" <th>66</th>\n",
" <th>67</th>\n",
" <th>68</th>\n",
" <th>69</th>\n",
" <th>70</th>\n",
" <th>71</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Animal</th>\n",
" <td> Animal1</td>\n",
" <td> Animal2</td>\n",
" <td> Animal3</td>\n",
" <td> Animal4</td>\n",
" <td> Animal5</td>\n",
" <td> Animal6</td>\n",
" <td> Animal7</td>\n",
" <td> Animal8</td>\n",
" <td> Animal9</td>\n",
" <td> Animal10</td>\n",
" <td>...</td>\n",
" <td> Animal9</td>\n",
" <td> Animal10</td>\n",
" <td> Animal11</td>\n",
" <td> Animal12</td>\n",
" <td> Animal13</td>\n",
" <td> Animal14</td>\n",
" <td> Animal15</td>\n",
" <td> Animal16</td>\n",
" <td> Animal17</td>\n",
" <td> Animal18</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Day</th>\n",
" <td> Day 1</td>\n",
" <td> Day 1</td>\n",
" <td> Day 1</td>\n",
" <td> Day 1</td>\n",
" <td> Day 1</td>\n",
" <td> Day 1</td>\n",
" <td> Day 1</td>\n",
" <td> Day 1</td>\n",
" <td> Day 1</td>\n",
" <td> Day 1</td>\n",
" <td>...</td>\n",
" <td> Day 7</td>\n",
" <td> Day 7</td>\n",
" <td> Day 7</td>\n",
" <td> Day 7</td>\n",
" <td> Day 7</td>\n",
" <td> Day 7</td>\n",
" <td> Day 7</td>\n",
" <td> Day 7</td>\n",
" <td> Day 7</td>\n",
" <td> Day 7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Drug</th>\n",
" <td> IL1Ra</td>\n",
" <td> IL1Ra</td>\n",
" <td> IL1Ra</td>\n",
" <td> IL1Ra</td>\n",
" <td> IL1Ra</td>\n",
" <td> IL1Ra</td>\n",
" <td> sIL1RII</td>\n",
" <td> sIL1RII</td>\n",
" <td> sIL1RII</td>\n",
" <td> sIL1RII</td>\n",
" <td>...</td>\n",
" <td> sIL1RII</td>\n",
" <td> sIL1RII</td>\n",
" <td> sIL1RII</td>\n",
" <td> sIL1RII</td>\n",
" <td> sIL6R</td>\n",
" <td> sIL6R</td>\n",
" <td> sIL6R</td>\n",
" <td> sIL6R</td>\n",
" <td> sIL6R</td>\n",
" <td> sIL6R</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Values</th>\n",
" <td> 2.5</td>\n",
" <td> 2.6</td>\n",
" <td> 2.7</td>\n",
" <td> 2.8</td>\n",
" <td> 2.3</td>\n",
" <td> 2.4</td>\n",
" <td> 2.5</td>\n",
" <td> 2.5</td>\n",
" <td> 2.6</td>\n",
" <td> 2.9</td>\n",
" <td>...</td>\n",
" <td> 1.7</td>\n",
" <td> 1.6</td>\n",
" <td> 1.6</td>\n",
" <td> 1.5</td>\n",
" <td> 1.7</td>\n",
" <td> 1.8</td>\n",
" <td> 1.7</td>\n",
" <td> 1.6</td>\n",
" <td> 1.9</td>\n",
" <td> 1.7</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>4 rows \u00d7 72 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 20,
"text": [
" 0 1 2 3 4 5 6 \\\n",
"Animal Animal1 Animal2 Animal3 Animal4 Animal5 Animal6 Animal7 \n",
"Day Day 1 Day 1 Day 1 Day 1 Day 1 Day 1 Day 1 \n",
"Drug IL1Ra IL1Ra IL1Ra IL1Ra IL1Ra IL1Ra sIL1RII \n",
"Values 2.5 2.6 2.7 2.8 2.3 2.4 2.5 \n",
"\n",
" 7 8 9 ... 62 63 64 \\\n",
"Animal Animal8 Animal9 Animal10 ... Animal9 Animal10 Animal11 \n",
"Day Day 1 Day 1 Day 1 ... Day 7 Day 7 Day 7 \n",
"Drug sIL1RII sIL1RII sIL1RII ... sIL1RII sIL1RII sIL1RII \n",
"Values 2.5 2.6 2.9 ... 1.7 1.6 1.6 \n",
"\n",
" 65 66 67 68 69 70 71 \n",
"Animal Animal12 Animal13 Animal14 Animal15 Animal16 Animal17 Animal18 \n",
"Day Day 7 Day 7 Day 7 Day 7 Day 7 Day 7 Day 7 \n",
"Drug sIL1RII sIL6R sIL6R sIL6R sIL6R sIL6R sIL6R \n",
"Values 1.5 1.7 1.8 1.7 1.6 1.9 1.7 \n",
"\n",
"[4 rows x 72 columns]"
]
}
],
"prompt_number": 20
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"formula = 'Values ~ C(Day) + C(Drug)'\n",
"lm = ols(formula, formatted_dataset3).fit()\n",
"print(anova_lm(lm,typ=2))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
" sum_sq df F PR(>F)\n",
"C(Day) 8.138194 3 128.754495 1.586576e-27\n",
"C(Drug) 0.807778 2 19.169796 2.729785e-07\n",
"Residual 1.390556 66 NaN NaN\n"
]
}
],
"prompt_number": 22
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Thus, since $p<0.05$ we $\\textbf{reject the null hypotheis}$. In case of doubt, note that the p-values are in the right-most column above. NaN are not applied values."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"e) Using a 2-way Full Factorial ANOVA to test drug effects and time effects. Describe the statistical hypothesis or hypotheses"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Naming $\\bar{x}_{1,i}$ the mean value for the i-th 'Day' and $\\bar{x}_{2,j}$ the mean value associated with factor 'Drug' `j`.\n",
"\n",
"$H_0:$ The analyzed factors have no interactions, $SS_{interactions} = 0$. \n",
"and \n",
"$H_a:$ The interactions have an effect, $SS_{interactions} \\ne 0$. \n",
"with $\\alpha=0.05$ "
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"f) Using Minitab or Statistica, test your hypotheses"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"formula = 'Values ~ C(Day) * C(Drug)'\n",
"lm = ols(formula, formatted_dataset3).fit()\n",
"print(anova_lm(lm))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
" df sum_sq mean_sq F PR(>F)\n",
"C(Day) 3 8.138194 2.712731 175.960961 1.095507e-29\n",
"C(Drug) 2 0.807778 0.403889 26.198198 6.636909e-09\n",
"C(Day):C(Drug) 6 0.465556 0.077593 5.033033 3.083779e-04\n",
"Residual 60 0.925000 0.015417 NaN NaN\n"
]
}
],
"prompt_number": 218
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since $p\\approx 3\\cdot 10^{-4} < 0.05$ we $\\textbf{reject the null hypothesis}$. The interactions are significant."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"g) Using a 3-way ANOVA to test drug effects, time effects, and animal effects. Can you test interactions in this statistical design? "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"No, we cannot since there is not replicates for the combinations of 'animal', 'time' and 'drug'."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"h) Describe the hypothesis or hypotheses to be tested "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$H_0:$ All the mean measurements are for all possible factors (Day, Drug, Animal) are equal, \n",
"and \n",
"$H_a:$ At least one of the means mentioned above is different. \n",
"with $\\alpha=0.05$ "
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"i) Using Minitab or Statistica, test your hypotheses "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The toolbox that we used was not able to run the experiment!!! We obtained a 'matrix not aligned warning' due to the incomplete nature of the measurements."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"j) Based upon the statistical tests that you\u2019ve conducted here and above, which would you select for your final analysis? Why?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I would select the way ANOVA with interaction. Since the interaction is considerable, it cannot be discarded from our model. Going up to 3-way ANOVA is not advised here because we don't have experimental values for all the possible factor combination."
]
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment