Skip to content

Instantly share code, notes, and snippets.

@vincentarelbundock
Created August 26, 2012 23:21
Show Gist options
  • Save vincentarelbundock/3484294 to your computer and use it in GitHub Desktop.
Save vincentarelbundock/3484294 to your computer and use it in GitHub Desktop.
Statsmodels example: Generalized Least Squares
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "example_gls"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Generalized Least Squares"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import statsmodels.api as sm\n",
"import numpy as np\n",
"from statsmodels.iolib.table import (SimpleTable, default_txt_fmt)\n"
],
"language": "python",
"metadata": {},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'gls_results' is not defined",
"output_type": "pyerr",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[1;31mNameError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-1-b87f18091680>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 2\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mnumpy\u001b[0m \u001b[1;32mas\u001b[0m \u001b[0mnp\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 3\u001b[0m \u001b[1;32mfrom\u001b[0m \u001b[0mstatsmodels\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0miolib\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mtable\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[1;33m(\u001b[0m\u001b[0mSimpleTable\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mdefault_txt_fmt\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 4\u001b[1;33m \u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mvstack\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mgls_results\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mparams\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 5\u001b[0m \u001b[0mdata\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0msm\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mdatasets\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mlongley\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mload\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 6\u001b[0m \u001b[0mdata\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mexog\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0msm\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0madd_constant\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mexog\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;31mNameError\u001b[0m: name 'gls_results' is not defined"
]
}
],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Longley dataset is a time series dataset: "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"data = sm.datasets.longley.load()\n",
"data.exog = sm.add_constant(data.exog)\n",
"print data.exog[:5]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[[ 83. 234289. 2356. 1590. 107608. 1947. 1. ]\n",
" [ 88.5 259426. 2325. 1456. 108632. 1948. 1. ]\n",
" [ 88.2 258054. 3682. 1616. 109773. 1949. 1. ]\n",
" [ 89.5 284599. 3351. 1650. 110929. 1950. 1. ]\n",
" [ 96.2 328975. 2099. 3099. 112075. 1951. 1. ]]\n"
]
},
{
"output_type": "stream",
"stream": "stderr",
"text": [
"/usr/local/lib/python2.7/dist-packages/statsmodels-0.5.0-py2.7-linux-x86_64.egg/statsmodels/tools/tools.py:306: FutureWarning: The default of `prepend` will be changed to True in 0.5.0, use explicit prepend\n",
" FutureWarning)\n"
]
}
],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
" Let's assume that the data is heteroskedastic and that we know\n",
" the nature of the heteroskedasticity. We can then define\n",
" `sigma` and use it to give us a GLS model\n",
"\n",
" First we will obtain the residuals from an OLS fit"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ols_resid = sm.OLS(data.endog, data.exog).fit().resid"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Assume that the error terms follow an AR(1) process with a trend:\n",
"\n",
" `resid[i] = beta_0 + rho*resid[i-1] + e[i]`\n",
"\n",
"where `e ~ N(0,some_sigma**2)`\n",
" \n",
"and that rho is simply the correlation of the residual a consistent estimator for rho is to regress the residuals on the lagged residuals"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"resid_fit = sm.OLS(ols_resid[1:], sm.add_constant(ols_resid[:-1])).fit()\n",
"print resid_fit.tvalues[0]\n",
"print resid_fit.pvalues[0]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"-1.43902298398\n",
"0.173784447887\n"
]
}
],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" While we don't have strong evidence that the errors follow an AR(1)\n",
" process we continue"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"rho = resid_fit.params[0]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we know, an AR(1) process means that near-neighbors have a stronger\n",
" relation so we can give this structure by using a toeplitz matrix"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from scipy.linalg import toeplitz\n",
"\n",
"toeplitz(range(5))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 6,
"text": [
"array([[0, 1, 2, 3, 4],\n",
" [1, 0, 1, 2, 3],\n",
" [2, 1, 0, 1, 2],\n",
" [3, 2, 1, 0, 1],\n",
" [4, 3, 2, 1, 0]])"
]
}
],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"order = toeplitz(range(len(ols_resid)))"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"so that our error covariance structure is actually rho**order\n",
" which defines an autocorrelation structure"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"sigma = rho**order\n",
"gls_model = sm.GLS(data.endog, data.exog, sigma=sigma)\n",
"gls_results = gls_model.fit()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Of course, the exact rho in this instance is not known so it it might make more sense to use feasible gls, which currently only has experimental support. \n",
"\n",
"We can use the GLSAR model with one lag, to get to a similar result:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"glsar_model = sm.GLSAR(data.endog, data.exog, 1)\n",
"glsar_results = glsar_model.iterative_fit(1)\n",
"print glsar_results.summary()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
" GLSAR Regression Results \n",
"==============================================================================\n",
"Dep. Variable: y R-squared: 0.996\n",
"Model: GLSAR Adj. R-squared: 0.992\n",
"Method: Least Squares F-statistic: 295.2\n",
"Date: Sun, 26 Aug 2012 Prob (F-statistic): 6.09e-09\n",
"Time: 20:51:48 Log-Likelihood: -102.04\n",
"No. Observations: 15 AIC: 218.1\n",
"Df Residuals: 8 BIC: 223.0\n",
"Df Model: 6 \n",
"==============================================================================\n",
" coef std err t P>|t| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------\n",
"x1 34.5568 84.734 0.408 0.694 -160.840 229.953\n",
"x2 -0.0343 0.033 -1.047 0.326 -0.110 0.041\n",
"x3 -1.9621 0.481 -4.083 0.004 -3.070 -0.854\n",
"x4 -1.0020 0.211 -4.740 0.001 -1.489 -0.515\n",
"x5 -0.0978 0.225 -0.435 0.675 -0.616 0.421\n",
"x6 1823.1829 445.829 4.089 0.003 795.100 2851.266\n",
"const -3.468e+06 8.72e+05 -3.979 0.004 -5.48e+06 -1.46e+06\n",
"==============================================================================\n",
"Omnibus: 1.960 Durbin-Watson: 2.554\n",
"Prob(Omnibus): 0.375 Jarque-Bera (JB): 1.423\n",
"Skew: 0.713 Prob(JB): 0.491\n",
"Kurtosis: 2.508 Cond. No. 4.80e+09\n",
"==============================================================================\n",
"\n",
"The condition number is large, 4.8e+09. This might indicate that there are\n",
"strong multicollinearity or other numerical problems.\n"
]
},
{
"output_type": "stream",
"stream": "stderr",
"text": [
"/usr/local/lib/python2.7/dist-packages/scipy/stats/stats.py:1199: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=15\n",
" int(n))\n"
]
}
],
"prompt_number": 9
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Comparing gls and glsar results, we see that there are some small\n",
" differences in the parameter estimates and the resulting standard\n",
" errors of the parameter estimate. This might be do to the numerical\n",
" differences in the algorithm, e.g. the treatment of initial conditions,\n",
" because of the small number of observations in the longley dataset."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"comparison = np.vstack([gls_results.params, glsar_results.params, \n",
" gls_results.bse, glsar_results.bse])\n",
"comparison = np.transpose(csomparison)\n",
"colnames = ['gls_params', 'glsar_params', 'gls_bse', 'glsar_bse']\n",
"print SimpleTable(comparison, colnames, txt_fmt=default_txt_fmt)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"================================================================\n",
" gls_params glsar_params gls_bse glsar_bse \n",
"----------------------------------------------------------------\n",
" -12.7656454401 34.5567846182 69.4308073335 84.7337145245 \n",
"-0.0380013249817 -0.0343410089663 0.026247682233 0.0328032449964\n",
" -2.18694871107 -1.96214395046 0.382393150849 0.480544864905\n",
" -1.15177649259 -1.00197295929 0.165252691545 0.211383870914\n",
"-0.0680535580455 -0.0978045986166 0.176428333976 0.224774369449\n",
" 1993.95292851 1823.1828867 342.634627565 445.828747793 \n",
" -3797854.90154 -3467960.63254 670688.699307 871584.051696 \n",
"----------------------------------------------------------------\n"
]
}
],
"prompt_number": 10
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment