Skip to content

Instantly share code, notes, and snippets.

@sachinsdate
Created July 25, 2020 18:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save sachinsdate/7a070070728d80ba5b2887543c6413a9 to your computer and use it in GitHub Desktop.
Save sachinsdate/7a070070728d80ba5b2887543c6413a9 to your computer and use it in GitHub Desktop.
Holt-Winters Exponential Smoothing using Python and statsmodels
import pandas as pd
from matplotlib import pyplot as plt
from statsmodels.tsa.holtwinters import ExponentialSmoothing as HWES
#read the data file. the date column is expected to be in the mm-dd-yyyy format.
df = pd.read_csv('retail_sales_used_car_dealers_us_1992_2020.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0])
df.index.freq = 'MS'
#plot the data
df.plot()
plt.show()
#split between the training and the test data sets. The last 12 periods form the test data
df_train = df.iloc[:-12]
df_test = df.iloc[-12:]
#build and train the model on the training data
model = HWES(df_train, seasonal_periods=12, trend='add', seasonal='mul')
fitted = model.fit(optimized=True, use_brute=True)
#print out the training summary
print(fitted.summary())
#create an out of sample forcast for the next 12 steps beyond the final data point in the training data set
sales_forecast = fitted.forecast(steps=12)
#plot the training data, the test data and the forecast on the same plot
fig = plt.figure()
fig.suptitle('Retail Sales of Used Cars in the US (1992-2020)')
past, = plt.plot(df_train.index, df_train, 'b.-', label='Sales History')
future, = plt.plot(df_test.index, df_test, 'r.-', label='Actual Sales')
predicted_future, = plt.plot(df_test.index, sales_forecast, 'g.-', label='Sales Forecast')
plt.legend(handles=[past, future, predicted_future])
plt.show()
@sachinsdate
Copy link
Author

Hi Ron,
Here is the output I get when I execute the following 2 statements:

df = pd.read_csv('retail_sales_used_car_dealers_us_1992_2020.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0])
df.index
DatetimeIndex(['1992-01-01', '1992-02-01', '1992-03-01', '1992-04-01',
'1992-05-01', '1992-06-01', '1992-07-01', '1992-08-01',
'1992-09-01', '1992-10-01',
...
'2019-07-01', '2019-08-01', '2019-09-01', '2019-10-01',
'2019-11-01', '2019-12-01', '2020-01-01', '2020-02-01',
'2020-03-01', '2020-04-01'],
dtype='datetime64[ns]', name='DATE', length=340, freq=None)

df.index.freq = 'MS'
df.index
DatetimeIndex(['1992-01-01', '1992-02-01', '1992-03-01', '1992-04-01',
'1992-05-01', '1992-06-01', '1992-07-01', '1992-08-01',
'1992-09-01', '1992-10-01',
...
'2019-07-01', '2019-08-01', '2019-09-01', '2019-10-01',
'2019-11-01', '2019-12-01', '2020-01-01', '2020-02-01',
'2020-03-01', '2020-04-01'],
dtype='datetime64[ns]', name='DATE', length=340, freq='MS')

Can you compare this output with what you are seeing? It might give you some leads on your problem.

You could also try using the following statement for changing the frequency to Start of Month (MS):
df = df.asfreq('MS')

'best
Sachin

@vedular
Copy link

vedular commented Nov 27, 2020

Hi,
Here is the error I am getting.
df.index.freq = 'MS'

ValueError Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\arrays\datetimelike.py in _validate_frequency(cls, index, freq, **kwargs)
892 if not np.array_equal(index.asi8, on_freq.asi8):
--> 893 raise ValueError
894 except ValueError as e:

ValueError:

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
in
2 #df.index = pd.to_datetime(df.index)
3
----> 4 df.index.freq='MS'
5 #df.index.freq='MS'

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\datetimelike.py in freq(self, value)
98 def freq(self, value):
99 # validation is handled by _data setter
--> 100 self._data.freq = value
101
102 @Property

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\arrays\datetimelike.py in freq(self, value)
829 if value is not None:
830 value = frequencies.to_offset(value)
--> 831 self._validate_frequency(self, value)
832
833 self._freq = value

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\arrays\datetimelike.py in _validate_frequency(cls, index, freq, **kwargs)
905 "Inferred frequency {infer} from passed values "
906 "does not conform to passed frequency {passed}".format(
--> 907 infer=inferred, passed=freq.freqstr
908 )
909 )

ValueError: Inferred frequency None from passed values does not conform to passed frequency MS

if I use df = df.asfreq('MS')
then I am getting blank in the following statement. The plot is not showing.
df.plot()
plt.show()

Is this due to the date field in the input file? I tried other dataset and the
df.index.freq='MS is working.

Can you please let us know if changing input will fix?

Thanks
Ram

@MarkStoneLin
Copy link

Hi Sachin ~
Here is our output of 'df.index'.

DatetimeIndex(['1992-01-01', '1992-01-02', '1992-01-03', '1992-01-04',
'1992-01-05', '1992-01-06', '1992-01-07', '1992-01-08',
'1992-01-09', '1992-01-10',
...
'2019-01-07', '2019-01-08', '2019-01-09', '2019-01-10',
'2019-01-11', '2019-01-12', '2020-01-01', '2020-01-02',
'2020-01-03', '2020-01-04'],
dtype='datetime64[ns]', name='DATE', length=340, freq=None)

Maybe the problem lies in here.

Thanks,
Mark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment