Skip to content

Instantly share code, notes, and snippets.

@sachinsdate
Created July 25, 2020 18:13
Show Gist options
  • Save sachinsdate/7a070070728d80ba5b2887543c6413a9 to your computer and use it in GitHub Desktop.
Save sachinsdate/7a070070728d80ba5b2887543c6413a9 to your computer and use it in GitHub Desktop.
Holt-Winters Exponential Smoothing using Python and statsmodels
import pandas as pd
from matplotlib import pyplot as plt
from statsmodels.tsa.holtwinters import ExponentialSmoothing as HWES
#read the data file. the date column is expected to be in the mm-dd-yyyy format.
df = pd.read_csv('retail_sales_used_car_dealers_us_1992_2020.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0])
df.index.freq = 'MS'
#plot the data
df.plot()
plt.show()
#split between the training and the test data sets. The last 12 periods form the test data
df_train = df.iloc[:-12]
df_test = df.iloc[-12:]
#build and train the model on the training data
model = HWES(df_train, seasonal_periods=12, trend='add', seasonal='mul')
fitted = model.fit(optimized=True, use_brute=True)
#print out the training summary
print(fitted.summary())
#create an out of sample forcast for the next 12 steps beyond the final data point in the training data set
sales_forecast = fitted.forecast(steps=12)
#plot the training data, the test data and the forecast on the same plot
fig = plt.figure()
fig.suptitle('Retail Sales of Used Cars in the US (1992-2020)')
past, = plt.plot(df_train.index, df_train, 'b.-', label='Sales History')
future, = plt.plot(df_test.index, df_test, 'r.-', label='Actual Sales')
predicted_future, = plt.plot(df_test.index, sales_forecast, 'g.-', label='Sales Forecast')
plt.legend(handles=[past, future, predicted_future])
plt.show()
@MarkStoneLin
Copy link

Hi Sachin ~
Here is our output of 'df.index'.

DatetimeIndex(['1992-01-01', '1992-01-02', '1992-01-03', '1992-01-04',
'1992-01-05', '1992-01-06', '1992-01-07', '1992-01-08',
'1992-01-09', '1992-01-10',
...
'2019-01-07', '2019-01-08', '2019-01-09', '2019-01-10',
'2019-01-11', '2019-01-12', '2020-01-01', '2020-01-02',
'2020-01-03', '2020-01-04'],
dtype='datetime64[ns]', name='DATE', length=340, freq=None)

Maybe the problem lies in here.

Thanks,
Mark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment