Created
August 26, 2012 23:38
-
-
Save vincentarelbundock/3484398 to your computer and use it in GitHub Desktop.
Statsmodels example: Using dates with timeseries models
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"metadata": { | |
"name": "ex_dates" | |
}, | |
"nbformat": 3, | |
"nbformat_minor": 0, | |
"worksheets": [ | |
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Using dates with timeseries models" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"import statsmodels.api as sm\n", | |
"import numpy as np\n", | |
"import pandas" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 1 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Getting started" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"data = sm.datasets.sunspots.load()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 2 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Right now an annual date series must be datetimes at the end of the year." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"from datetime import datetime\n", | |
"dates = sm.tsa.datetools.dates_from_range('1700', length=len(data.endog))" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 3 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Using Pandas\n", | |
"\n", | |
"Make a pandas TimeSeries or DataFrame" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"endog = pandas.TimeSeries(data.endog, index=dates)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 4 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Instantiate the model" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ar_model = sm.tsa.AR(endog, freq='A')\n", | |
"pandas_ar_res = ar_model.fit(maxlag=9, method='mle', disp=-1)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 5 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Out-of-sample prediction" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"pred = pandas_ar_res.predict(start='2005', end='2015')\n", | |
"print pred" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"2005-12-31 20.003275\n", | |
"2006-12-31 24.703970\n", | |
"2007-12-31 20.026113\n", | |
"2008-12-31 23.473658\n", | |
"2009-12-31 30.858584\n", | |
"2010-12-31 61.335478\n", | |
"2011-12-31 87.024727\n", | |
"2012-12-31 91.321290\n", | |
"2013-12-31 79.921658\n", | |
"2014-12-31 60.799537\n", | |
"2015-12-31 40.374871\n", | |
"Freq: A-DEC\n" | |
] | |
} | |
], | |
"prompt_number": 6 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Using explicit dates" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ar_model = sm.tsa.AR(data.endog, dates=dates, freq='A')\n", | |
"ar_res = ar_model.fit(maxlag=9, method='mle', disp=-1)\n", | |
"pred = ar_res.predict(start='2005', end='2015')\n", | |
"print pred" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"[ 20.00327492 24.70396965 20.02611309 23.47365775 30.8585841\n", | |
" 61.33547797 87.02472654 91.3212902 79.92165773 60.79953659\n", | |
" 40.37487104]\n" | |
] | |
} | |
], | |
"prompt_number": 7 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This just returns a regular array, but since the model has date information attached, you can get the prediction dates in a roundabout way." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"print ar_res._data.predict_dates" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"<class 'pandas.tseries.index.DatetimeIndex'>\n", | |
"[2005-12-31 00:00:00, ..., 2015-12-31 00:00:00]\n", | |
"Length: 11, Freq: A-DEC, Timezone: None\n" | |
] | |
} | |
], | |
"prompt_number": 8 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Note: This attribute only exists if predict has been called. It holds the dates associated with the last call to predict." | |
] | |
} | |
], | |
"metadata": {} | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment