Created
July 25, 2020 18:13
-
-
Save sachinsdate/7a070070728d80ba5b2887543c6413a9 to your computer and use it in GitHub Desktop.
Holt-Winters Exponential Smoothing using Python and statsmodels
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
from matplotlib import pyplot as plt | |
from statsmodels.tsa.holtwinters import ExponentialSmoothing as HWES | |
#read the data file. the date column is expected to be in the mm-dd-yyyy format. | |
df = pd.read_csv('retail_sales_used_car_dealers_us_1992_2020.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0]) | |
df.index.freq = 'MS' | |
#plot the data | |
df.plot() | |
plt.show() | |
#split between the training and the test data sets. The last 12 periods form the test data | |
df_train = df.iloc[:-12] | |
df_test = df.iloc[-12:] | |
#build and train the model on the training data | |
model = HWES(df_train, seasonal_periods=12, trend='add', seasonal='mul') | |
fitted = model.fit(optimized=True, use_brute=True) | |
#print out the training summary | |
print(fitted.summary()) | |
#create an out of sample forcast for the next 12 steps beyond the final data point in the training data set | |
sales_forecast = fitted.forecast(steps=12) | |
#plot the training data, the test data and the forecast on the same plot | |
fig = plt.figure() | |
fig.suptitle('Retail Sales of Used Cars in the US (1992-2020)') | |
past, = plt.plot(df_train.index, df_train, 'b.-', label='Sales History') | |
future, = plt.plot(df_test.index, df_test, 'r.-', label='Actual Sales') | |
predicted_future, = plt.plot(df_test.index, sales_forecast, 'g.-', label='Sales Forecast') | |
plt.legend(handles=[past, future, predicted_future]) | |
plt.show() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi Sachin ~
Here is our output of 'df.index'.
DatetimeIndex(['1992-01-01', '1992-01-02', '1992-01-03', '1992-01-04',
'1992-01-05', '1992-01-06', '1992-01-07', '1992-01-08',
'1992-01-09', '1992-01-10',
...
'2019-01-07', '2019-01-08', '2019-01-09', '2019-01-10',
'2019-01-11', '2019-01-12', '2020-01-01', '2020-01-02',
'2020-01-03', '2020-01-04'],
dtype='datetime64[ns]', name='DATE', length=340, freq=None)
Maybe the problem lies in here.
Thanks,
Mark