Instantly share code, notes, and snippets.

josef-pkt/example_gmm_euler.ipynb Last active Dec 23, 2019

Generalized Method of Moments in Python: Estimating Euler Equations
 { "metadata": { "name": "example_gmm_euler_1" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Estimating Euler Equation with Generalized Method of Moments with statsmodels" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One of the earliest application of GMM is the estimation of Euler equations in rational expectation models \n", "(Hansen and Singleton 1982). We can read about it in many textbooks, but here I just want to present it as \n", "another test case for GMM in statsmodels.\n", "\n", "The short form is that optimization by economic agents with rational expectation implies the following orthogonality condition in expectation:\n", "\n", "$\\hspace{10pt} E [z_t \\hspace{3pt} (1 - \\beta (1 + r_{t+1}) (c_{t+1} / c_t)^{-\\gamma}) ]$\n", " \n", "\n", "where beta is the discount factor of the agent, and gamma is the coefficient reflecting constant relative risk aversion. R is a rate of return on assets or interest rate, c is consumption\n", "(I have not looked at this part of economics in a long time.)\n", "\n", "The main point is that we cannot treat current and future consumption and future interest rates as exogenous when we try to estimate the parameters, since economic agents have more information about those variables than the econometrician. Those variables will be correlated with any residuals in the estimation and give us inconsistent estimates of the parameters if we don't take the endogeneity into account.\n" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Setup and Data" ] }, { "cell_type": "code", "collapsed": false, "input": [ "\n", "\"\"\"\n", "\n", "Created on Tue Oct 08 09:27:31 2013\n", "\n", "Author: Josef Perktold\n", "\"\"\"\n", "\n", "import numpy as np\n", "import pandas as pd\n", "\n", "from statsmodels.sandbox.regression import gmm\n", "\n", "dta = pd.read_csv('consumption.csv', parse_dates=[0])\n", "\n", "print dta.iloc[:5]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ " qtr r c\n", "0 1947-01-01 00:00:00 0.0038 1017.2\n", "1 1947-04-01 00:00:00 0.0038 1034.0\n", "2 1947-07-01 00:00:00 0.0066 1037.5\n", "3 1947-10-01 00:00:00 0.0085 1037.7\n", "4 1948-01-01 00:00:00 0.0097 1042.6\n" ] } ], "prompt_number": 30 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, I create the lagged and leading variables for the estimation. \n", "As instruments we use lagged interest rate and current and lagged consumption growth." ] }, { "cell_type": "code", "collapsed": false, "input": [ "dta['c_growth'] = dta['c'] / dta['c'].shift(1)\n", "dta['c_growth_lag1'] = dta['c_growth'].shift(1)\n", "dta['r_lag1'] = dta['r'].shift(1)\n", "dta['r_lag2'] = dta['r'].shift(2)\n", "dta['r_forw1'] = dta['r'].shift(-1)\n", "dta['c_lag1'] = dta['c'].shift(1)\n", "dta['c_forw1'] = dta['c'].shift(-1)\n", "dta['const'] = 1\n", "\n", "dta_clean = dta.dropna()\n", "\n", "\n", "endog_df = dta_clean[['r_forw1', 'c_forw1', 'c']]\n", "exog_df = endog_df\n", "instrument_df = dta_clean[['r_lag1', 'r_lag2', 'c_growth', 'c_growth_lag1',\n", " 'const']]\n", "\n", "endog, exog, instrument = map(np.asarray, [endog_df, exog_df, instrument_df])" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 31 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "The moment equations and GMM - version 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Currently statsmodels has two ways of specifying moment conditions. \n", "The first uses general non-linear functions for the (unconditional) moment condition:\n", " \n", "$\\hspace{10pt} E[m(params)] = 0$\n", "\n", "The second version uses an instrumental variables approach with additive error structure\n", " \n", "\\begin{equation*} \\hspace{10pt} E[z \\hspace{3pt} (y - f(x, params)] = 0 \\end{equation*}\n", "\n", "where z are the instruments and $u = (y - f(x, params)$ defines an additive residual, and x \n", "are explanatory variables that can be endogenous or exogenous.\n", " \n", "In the following I use the class NonlinearIVGMM, which implements the second form of the moment conditions.\n", "However, our Euler equation doesn't fit's this only if we use endog or y as a constant.\n", "I'm not showing an example how we can use the first generic form, class GMM, nor how\n", "we could define the moment conditions directly by subclassing." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**The first version**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the first version, we use endog = 0 and define u to be the the Euler equation without the instrument part." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def moment_consumption1(params, exog):\n", " beta, gamma = params\n", " r_forw1, c_forw1, c = exog.T # unwrap iterable (ndarray)\n", " \n", " # moment condition without instrument \n", " err = 1 - beta * (1 + r_forw1) * np.power(c_forw1 / c, -gamma)\n", " return -err" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 32 }, { "cell_type": "code", "collapsed": false, "input": [ "endog1 = np.zeros(exog.shape[0]) \n", "mod1 = gmm.NonlinearIVGMM(endog1, exog, instrument, moment_consumption1, k_moms=4)\n", "w0inv = np.dot(instrument.T, instrument) / len(endog1)\n", "res1 = mod1.fit([1,-1], maxiter=2, inv_weights=w0inv) " ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.000539\n", " Iterations: 6\n", " Function evaluations: 23\n", " Gradient evaluations: 23\n", "Optimization terminated successfully.\n", " Current function value: 0.152609\n", " Iterations: 6\n", " Function evaluations: 11\n", " Gradient evaluations: 11\n" ] } ], "prompt_number": 33 }, { "cell_type": "code", "collapsed": false, "input": [ "print(res1.summary(yname='Euler Eq', xname=['discount', 'CRRA']))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ " NonlinearIVGMM Results \n", "==============================================================================\n", "Dep. Variable: Euler Eq Hansen J: 36.47\n", "Model: NonlinearIVGMM Prob (Hansen J): 1.20e-08\n", "Method: GMM \n", "Date: Wed, 25 Dec 2013 \n", "Time: 20:08:33 \n", "No. Observations: 239 \n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "discount 0.8977 0.017 52.860 0.000 0.864 0.931\n", "CRRA -6.7953 2.050 -3.315 0.001 -10.813 -2.778\n", "==============================================================================\n" ] } ], "prompt_number": 34 }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, the default GMM standard errors of the estimate are only robust to heteroscedasticity, i.e. variance differs by observation.\n", "Since we using time series for our estimation, there is a strong chance that the errors, \n", "or better moment conditions, are correlated over time. In this case, we can use a HAC robust standard error, \n", "that is robust to heteroscedasticity as well as autocorrelation.\n", "\n", "In the GMM version in statsmodels, we define these options through weights_method='hac', wargs={'maxlag':4},\n", "which uses the Newey, West standard errors based on 4 lags." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# We don't need Nelder-Mead in this case, we can use bfgs default directly\n", "#res1_ = mod1.fit([1,-1], maxiter=0, inv_weights=w0inv, opt_method='nm')\n", "\n", "res1_hac4_2s = mod1.fit([1, -1], maxiter=2, inv_weights=w0inv, weights_method='hac', wargs={'maxlag':4})\n", "print('\\n\\n')\n", "print(res1_hac4_2s.summary(yname='Euler Eq', xname=['discount', 'CRRA']))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.000539\n", " Iterations: 6\n", " Function evaluations: 23\n", " Gradient evaluations: 23\n", "Optimization terminated successfully.\n", " Current function value: 0.055395\n", " Iterations: 5\n", " Function evaluations: 10\n", " Gradient evaluations: 10\n", "\n", "\n", "\n", " NonlinearIVGMM Results \n", "==============================================================================\n", "Dep. Variable: Euler Eq Hansen J: 13.24\n", "Model: NonlinearIVGMM Prob (Hansen J): 0.00133\n", "Method: GMM \n", "Date: Wed, 25 Dec 2013 \n", "Time: 20:08:33 \n", "No. Observations: 239 \n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "discount 0.9213 0.014 67.281 0.000 0.894 0.948\n", "CRRA -4.1136 1.507 -2.729 0.006 -7.068 -1.160\n", "==============================================================================\n" ] } ], "prompt_number": 35 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "The moment equations and GMM - version 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In version 2 of the moment equations, we use that the non-instrument part of the Euler Equation has the form 1 - f(x, params).\n", "Our moment_consumption2 defines only the f(x, params)' part and we use a vector of ones for endog." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def moment_consumption2(params, exog):\n", " beta, gamma = params\n", " #endog, exog = args\n", " r_forw1, c_forw1, c = exog.T # unwrap iterable (ndarray)\n", " \n", " # 2nd part of moment condition without instrument \n", " predicted = beta * (1. + r_forw1) * np.power(c_forw1 / c, -gamma)\n", " return predicted" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 36 }, { "cell_type": "code", "collapsed": false, "input": [ "endog2 = np.ones(exog.shape[0]) \n", "mod2 = gmm.NonlinearIVGMM(endog2, exog, instrument, moment_consumption2, k_moms=4)\n", "w0inv = np.dot(instrument.T, instrument) / len(endog2) \n", "res2_hac4_2s = mod2.fit([1,-1], maxiter=2, inv_weights=w0inv, weights_method='hac', wargs={'maxlag':4})" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.000539\n", " Iterations: 6\n", " Function evaluations: 23\n", " Gradient evaluations: 23\n", "Optimization terminated successfully.\n", " Current function value: 0.055395\n", " Iterations: 5\n", " Function evaluations: 10\n", " Gradient evaluations: 10\n" ] } ], "prompt_number": 37 }, { "cell_type": "code", "collapsed": false, "input": [ "print(res2_hac4_2s.summary(yname='Euler Eq', xname=['discount', 'CRRA']))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ " NonlinearIVGMM Results \n", "==============================================================================\n", "Dep. Variable: Euler Eq Hansen J: 13.24\n", "Model: NonlinearIVGMM Prob (Hansen J): 0.00133\n", "Method: GMM \n", "Date: Wed, 25 Dec 2013 \n", "Time: 20:08:33 \n", "No. Observations: 239 \n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "discount 0.9213 0.014 67.281 0.000 0.894 0.948\n", "CRRA -4.1136 1.507 -2.729 0.006 -7.068 -1.160\n", "==============================================================================\n" ] } ], "prompt_number": 38 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Compare" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We should get the same estimates in both versions for speciying the moment conditions." ] }, { "cell_type": "code", "collapsed": false, "input": [ "res1_hac4_2s.params" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 39, "text": [ "array([ 0.92129628, -4.11361243])" ] } ], "prompt_number": 39 }, { "cell_type": "code", "collapsed": false, "input": [ "res2_hac4_2s.params" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 40, "text": [ "array([ 0.92129628, -4.11361243])" ] } ], "prompt_number": 40 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And the two are the same (up to differences numerical precision of the algorithms) :" ] }, { "cell_type": "code", "collapsed": false, "input": [ "res1_hac4_2s.params - res2_hac4_2s.params, np.max(np.abs(res1_hac4_2s.params - res2_hac4_2s.params))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 41, "text": [ "(array([ 3.27948779e-12, 3.71550790e-10]), 3.7155079013473369e-10)" ] } ], "prompt_number": 41 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Stata manual has params [0.9205, -4.2224] and standard errors equal to [0.0135, 1.4739]. Stata doesn't center the moments by default. We can get something closer to the results of Stata by using centered = False as a weights argument:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "res_ = mod2.fit([1,-1], maxiter=2, inv_weights=w0inv, weights_method='hac', \n", " wargs={'maxlag':4, 'centered':False}, optim_args={'disp':0})\n", "print res_.params\n", "print res_.bse" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[ 0.92045263 -4.22337303]\n", "[ 0.01345577 1.47341411]\n" ] } ], "prompt_number": 42 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Extra: Iterated GMM" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As comparison we can also iterate the GMM estimation until convergence.\n", "\n", "The results look strange, especially with HAC robust weighting matrix. \n", "In both cases the estimate for the discount factor is large and close to one, however \n", "the preference parameter for relative risk aversion is 0.08 and -6.03 respectively.\n", "\n", "This needs verification or an explanation. \n", "I have a test case for iterated GMM with heteroscedasticity robust standard errors \n", "where statsmodels and Stata agree at several decimals, so there might be something\n", "problematic with this dataset." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**with HAC standard errors**" ] }, { "cell_type": "code", "collapsed": true, "input": [ "res2_hac_i = mod2.fit([1,-1], maxiter=100, inv_weights=w0inv, \n", " weights_method='hac', wargs={'maxlag':4}, \n", " optim_args={'disp':0})" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 43 }, { "cell_type": "code", "collapsed": false, "input": [ "#print(res2_hac4_i.summary(yname='Euler Eq', xname=['discount', 'CRRA']))\n", "res2_hac_i.params" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 44, "text": [ "array([ 0.95862044, 0.07853072])" ] } ], "prompt_number": 44 }, { "cell_type": "markdown", "metadata": {}, "source": [ "**with HC standard errors**" ] }, { "cell_type": "code", "collapsed": false, "input": [ "res2_i = mod2.fit([1,-1], maxiter=100, inv_weights=w0inv, optim_args={'disp':0})" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 45 }, { "cell_type": "code", "collapsed": false, "input": [ "#print(res2_i.summary(yname='Euler Eq', xname=['discount', 'CRRA']))\n", "print res2_i.params" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[ 0.90335842 -6.02767703]\n" ] } ], "prompt_number": 46 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Extra: Problems with Optimization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When I ran several different cases for this example, I sometimes got convergence to different results. \n", "Although the objective function is quadratic in the moments, the momend conditions themselves can have \n", "more complicated non-linearities, that could be the cause for local minima.\n", "\n", "In the above examples I used the function for the moment conditions as an argument to the class.\n", "As alternative it is also possible to directly subclass a GMM class. This has the advantage\n", "that we can override specific methods. For example, currently it is not possible to give a\n", "analytical derivative for the moment conditions, but it can be added by subclassing and \n", "replacing the generic forward difference method gradient_momcond`.\n" ] } ], "metadata": {} } ] }

laozhang1314 commented Feb 13, 2018 • edited

 Hi, Josef! Where can I find the data?

Junga1987 commented Feb 20, 2018

 Hi Josef, how would you modify the code if my Euler equation is the following: E[ I^-1 \sum_{i=1}^{I} \frac{C_{it}}{C_{it-1}}(R_{jt}-R_{ft})]. In other words, consider the Euler equation on household, subscripted by i, level and then taken average over all i's. This is a form of Euler equation listed Brav, Constantinides, and Geczy (2002) Journal of Political Economy paper. Any comment/suggestion to model this would be great help as I am new to Python programming and trying to replicate their results using Python. Thank you.