Skip to content

Instantly share code, notes, and snippets.

@clarkgrubb
Created August 20, 2015 23:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save clarkgrubb/8a1142b1e97310857a38 to your computer and use it in GitHub Desktop.
Save clarkgrubb/8a1142b1e97310857a38 to your computer and use it in GitHub Desktop.
{
"metadata": {
"name": "",
"signature": "sha256:ad9def6cd2562187051a60bccf190018c12b68225779c07fe4622ad0800536cb"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np\n",
"import scipy.stats as stats\n",
"import pandas as pd\n",
"import statsmodels.formula.api as smf"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 17
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create some data. We will create models of y and z in terms of x. \n",
"\n",
"y is generated from a single model, but z is generated from two different models."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"size = 50\n",
"\n",
"x1 = np.array(range(0, size))\n",
"y1 = 2 * x1 + 1 + stats.norm.rvs(0, 1, size)\n",
"z1 = 2 * x1 + 1 + stats.norm.rvs(0, 1, 50)\n",
"\n",
"x2 = np.array(range(2 * size, 3 * size))\n",
"y2 = 2 * x2 + 1 + stats.norm.rvs(0, 1, size)\n",
"z2 = 3 * x2 - 1 + stats.norm.rvs(0, 1, size)\n",
"\n",
"x = np.concatenate([x1, x2])\n",
"y = np.concatenate([y1, y2])\n",
"z = np.concatenate([z1, z2])\n",
"\n",
"classifier = [i > size for i in x]\n",
"\n",
"df = pd.DataFrame({'x': x, 'y': y, 'z': z, 'classifier': classifier})"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Perform Levene test on the data to answer whether the two populations have the same variance.\n",
"\n",
"The return value (statistics, p-value). A small p-value indicates that they probably don't have the same variance."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"stats.levene(y1, y2)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 14,
"text": [
"(0.0013988754231943389, 0.97024092488680347)"
]
}
],
"prompt_number": 14
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"stats.levene(z1, z2)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 15,
"text": [
"(11.156732721202243, 0.0011854998375876872)"
]
}
],
"prompt_number": 15
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Perform ANOVA on the data set.\n",
"\n",
"We are interested in whether the coefficients of `classifier` and `x:classifier` are signficant. If they are this is evidence they were generated from different models."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"fit1 = smf.ols(formula='y ~ x + classifier + x * classifier', data=df).fit()\n",
"fit1.summary()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<table class=\"simpletable\">\n",
"<caption>OLS Regression Results</caption>\n",
"<tr>\n",
" <th>Dep. Variable:</th> <td>y</td> <th> R-squared: </th> <td> 1.000</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Model:</th> <td>OLS</td> <th> Adj. R-squared: </th> <td> 1.000</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Method:</th> <td>Least Squares</td> <th> F-statistic: </th> <td>3.873e+05</td>\n",
"</tr>\n",
"<tr>\n",
" <th>Date:</th> <td>Thu, 20 Aug 2015</td> <th> Prob (F-statistic):</th> <td>8.24e-196</td>\n",
"</tr>\n",
"<tr>\n",
" <th>Time:</th> <td>16:05:29</td> <th> Log-Likelihood: </th> <td> -136.39</td> \n",
"</tr>\n",
"<tr>\n",
" <th>No. Observations:</th> <td> 100</td> <th> AIC: </th> <td> 280.8</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Df Residuals:</th> <td> 96</td> <th> BIC: </th> <td> 291.2</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Df Model:</th> <td> 3</td> <th> </th> <td> </td> \n",
"</tr>\n",
"</table>\n",
"<table class=\"simpletable\">\n",
"<tr>\n",
" <td></td> <th>coef</th> <th>std err</th> <th>t</th> <th>P>|t|</th> <th>[95.0% Conf. Int.]</th> \n",
"</tr>\n",
"<tr>\n",
" <th>Intercept</th> <td> 0.9109</td> <td> 0.269</td> <td> 3.384</td> <td> 0.001</td> <td> 0.377 1.445</td>\n",
"</tr>\n",
"<tr>\n",
" <th>classifier[T.True]</th> <td> -3.0054</td> <td> 1.217</td> <td> -2.470</td> <td> 0.015</td> <td> -5.420 -0.591</td>\n",
"</tr>\n",
"<tr>\n",
" <th>x</th> <td> 2.0061</td> <td> 0.009</td> <td> 211.922</td> <td> 0.000</td> <td> 1.987 2.025</td>\n",
"</tr>\n",
"<tr>\n",
" <th>x:classifier[T.True]</th> <td> 0.0188</td> <td> 0.013</td> <td> 1.407</td> <td> 0.163</td> <td> -0.008 0.045</td>\n",
"</tr>\n",
"</table>\n",
"<table class=\"simpletable\">\n",
"<tr>\n",
" <th>Omnibus:</th> <td> 0.486</td> <th> Durbin-Watson: </th> <td> 1.941</td>\n",
"</tr>\n",
"<tr>\n",
" <th>Prob(Omnibus):</th> <td> 0.784</td> <th> Jarque-Bera (JB): </th> <td> 0.621</td>\n",
"</tr>\n",
"<tr>\n",
" <th>Skew:</th> <td> 0.011</td> <th> Prob(JB): </th> <td> 0.733</td>\n",
"</tr>\n",
"<tr>\n",
" <th>Kurtosis:</th> <td> 2.615</td> <th> Cond. No. </th> <td>1.59e+03</td>\n",
"</tr>\n",
"</table>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 19,
"text": [
"<class 'statsmodels.iolib.summary.Summary'>\n",
"\"\"\"\n",
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: y R-squared: 1.000\n",
"Model: OLS Adj. R-squared: 1.000\n",
"Method: Least Squares F-statistic: 3.873e+05\n",
"Date: Thu, 20 Aug 2015 Prob (F-statistic): 8.24e-196\n",
"Time: 16:05:29 Log-Likelihood: -136.39\n",
"No. Observations: 100 AIC: 280.8\n",
"Df Residuals: 96 BIC: 291.2\n",
"Df Model: 3 \n",
"========================================================================================\n",
" coef std err t P>|t| [95.0% Conf. Int.]\n",
"----------------------------------------------------------------------------------------\n",
"Intercept 0.9109 0.269 3.384 0.001 0.377 1.445\n",
"classifier[T.True] -3.0054 1.217 -2.470 0.015 -5.420 -0.591\n",
"x 2.0061 0.009 211.922 0.000 1.987 2.025\n",
"x:classifier[T.True] 0.0188 0.013 1.407 0.163 -0.008 0.045\n",
"==============================================================================\n",
"Omnibus: 0.486 Durbin-Watson: 1.941\n",
"Prob(Omnibus): 0.784 Jarque-Bera (JB): 0.621\n",
"Skew: 0.011 Prob(JB): 0.733\n",
"Kurtosis: 2.615 Cond. No. 1.59e+03\n",
"==============================================================================\n",
"\n",
"Warnings:\n",
"[1] The condition number is large, 1.59e+03. This might indicate that there are\n",
"strong multicollinearity or other numerical problems.\n",
"\"\"\""
]
}
],
"prompt_number": 19
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"fit2 = smf.ols(formula='z ~ x + classifier + x * classifier', data=df).fit()\n",
"fit2.summary()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<table class=\"simpletable\">\n",
"<caption>OLS Regression Results</caption>\n",
"<tr>\n",
" <th>Dep. Variable:</th> <td>z</td> <th> R-squared: </th> <td> 1.000</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Model:</th> <td>OLS</td> <th> Adj. R-squared: </th> <td> 1.000</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Method:</th> <td>Least Squares</td> <th> F-statistic: </th> <td>8.636e+05</td>\n",
"</tr>\n",
"<tr>\n",
" <th>Date:</th> <td>Thu, 20 Aug 2015</td> <th> Prob (F-statistic):</th> <td>1.58e-212</td>\n",
"</tr>\n",
"<tr>\n",
" <th>Time:</th> <td>16:06:06</td> <th> Log-Likelihood: </th> <td> -142.57</td> \n",
"</tr>\n",
"<tr>\n",
" <th>No. Observations:</th> <td> 100</td> <th> AIC: </th> <td> 293.1</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Df Residuals:</th> <td> 96</td> <th> BIC: </th> <td> 303.6</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Df Model:</th> <td> 3</td> <th> </th> <td> </td> \n",
"</tr>\n",
"</table>\n",
"<table class=\"simpletable\">\n",
"<tr>\n",
" <td></td> <th>coef</th> <th>std err</th> <th>t</th> <th>P>|t|</th> <th>[95.0% Conf. Int.]</th> \n",
"</tr>\n",
"<tr>\n",
" <th>Intercept</th> <td> 0.7066</td> <td> 0.286</td> <td> 2.468</td> <td> 0.015</td> <td> 0.138 1.275</td>\n",
"</tr>\n",
"<tr>\n",
" <th>classifier[T.True]</th> <td> -2.0824</td> <td> 1.294</td> <td> -1.609</td> <td> 0.111</td> <td> -4.651 0.487</td>\n",
"</tr>\n",
"<tr>\n",
" <th>x</th> <td> 2.0095</td> <td> 0.010</td> <td> 199.543</td> <td> 0.000</td> <td> 1.989 2.029</td>\n",
"</tr>\n",
"<tr>\n",
" <th>x:classifier[T.True]</th> <td> 0.9929</td> <td> 0.014</td> <td> 69.718</td> <td> 0.000</td> <td> 0.965 1.021</td>\n",
"</tr>\n",
"</table>\n",
"<table class=\"simpletable\">\n",
"<tr>\n",
" <th>Omnibus:</th> <td> 3.332</td> <th> Durbin-Watson: </th> <td> 2.180</td>\n",
"</tr>\n",
"<tr>\n",
" <th>Prob(Omnibus):</th> <td> 0.189</td> <th> Jarque-Bera (JB): </th> <td> 3.325</td>\n",
"</tr>\n",
"<tr>\n",
" <th>Skew:</th> <td>-0.412</td> <th> Prob(JB): </th> <td> 0.190</td>\n",
"</tr>\n",
"<tr>\n",
" <th>Kurtosis:</th> <td> 2.653</td> <th> Cond. No. </th> <td>1.59e+03</td>\n",
"</tr>\n",
"</table>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 20,
"text": [
"<class 'statsmodels.iolib.summary.Summary'>\n",
"\"\"\"\n",
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: z R-squared: 1.000\n",
"Model: OLS Adj. R-squared: 1.000\n",
"Method: Least Squares F-statistic: 8.636e+05\n",
"Date: Thu, 20 Aug 2015 Prob (F-statistic): 1.58e-212\n",
"Time: 16:06:06 Log-Likelihood: -142.57\n",
"No. Observations: 100 AIC: 293.1\n",
"Df Residuals: 96 BIC: 303.6\n",
"Df Model: 3 \n",
"========================================================================================\n",
" coef std err t P>|t| [95.0% Conf. Int.]\n",
"----------------------------------------------------------------------------------------\n",
"Intercept 0.7066 0.286 2.468 0.015 0.138 1.275\n",
"classifier[T.True] -2.0824 1.294 -1.609 0.111 -4.651 0.487\n",
"x 2.0095 0.010 199.543 0.000 1.989 2.029\n",
"x:classifier[T.True] 0.9929 0.014 69.718 0.000 0.965 1.021\n",
"==============================================================================\n",
"Omnibus: 3.332 Durbin-Watson: 2.180\n",
"Prob(Omnibus): 0.189 Jarque-Bera (JB): 3.325\n",
"Skew: -0.412 Prob(JB): 0.190\n",
"Kurtosis: 2.653 Cond. No. 1.59e+03\n",
"==============================================================================\n",
"\n",
"Warnings:\n",
"[1] The condition number is large, 1.59e+03. This might indicate that there are\n",
"strong multicollinearity or other numerical problems.\n",
"\"\"\""
]
}
],
"prompt_number": 20
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment