Skip to content

Instantly share code, notes, and snippets.

@sachinsdate
Created July 25, 2020 18:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save sachinsdate/7a070070728d80ba5b2887543c6413a9 to your computer and use it in GitHub Desktop.
Save sachinsdate/7a070070728d80ba5b2887543c6413a9 to your computer and use it in GitHub Desktop.
Holt-Winters Exponential Smoothing using Python and statsmodels
import pandas as pd
from matplotlib import pyplot as plt
from statsmodels.tsa.holtwinters import ExponentialSmoothing as HWES
#read the data file. the date column is expected to be in the mm-dd-yyyy format.
df = pd.read_csv('retail_sales_used_car_dealers_us_1992_2020.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0])
df.index.freq = 'MS'
#plot the data
df.plot()
plt.show()
#split between the training and the test data sets. The last 12 periods form the test data
df_train = df.iloc[:-12]
df_test = df.iloc[-12:]
#build and train the model on the training data
model = HWES(df_train, seasonal_periods=12, trend='add', seasonal='mul')
fitted = model.fit(optimized=True, use_brute=True)
#print out the training summary
print(fitted.summary())
#create an out of sample forcast for the next 12 steps beyond the final data point in the training data set
sales_forecast = fitted.forecast(steps=12)
#plot the training data, the test data and the forecast on the same plot
fig = plt.figure()
fig.suptitle('Retail Sales of Used Cars in the US (1992-2020)')
past, = plt.plot(df_train.index, df_train, 'b.-', label='Sales History')
future, = plt.plot(df_test.index, df_test, 'r.-', label='Actual Sales')
predicted_future, = plt.plot(df_test.index, sales_forecast, 'g.-', label='Sales Forecast')
plt.legend(handles=[past, future, predicted_future])
plt.show()
@vedular
Copy link

vedular commented Nov 27, 2020

Hi,
Here is the error I am getting.
df.index.freq = 'MS'

ValueError Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\arrays\datetimelike.py in _validate_frequency(cls, index, freq, **kwargs)
892 if not np.array_equal(index.asi8, on_freq.asi8):
--> 893 raise ValueError
894 except ValueError as e:

ValueError:

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
in
2 #df.index = pd.to_datetime(df.index)
3
----> 4 df.index.freq='MS'
5 #df.index.freq='MS'

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\datetimelike.py in freq(self, value)
98 def freq(self, value):
99 # validation is handled by _data setter
--> 100 self._data.freq = value
101
102 @Property

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\arrays\datetimelike.py in freq(self, value)
829 if value is not None:
830 value = frequencies.to_offset(value)
--> 831 self._validate_frequency(self, value)
832
833 self._freq = value

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\arrays\datetimelike.py in _validate_frequency(cls, index, freq, **kwargs)
905 "Inferred frequency {infer} from passed values "
906 "does not conform to passed frequency {passed}".format(
--> 907 infer=inferred, passed=freq.freqstr
908 )
909 )

ValueError: Inferred frequency None from passed values does not conform to passed frequency MS

if I use df = df.asfreq('MS')
then I am getting blank in the following statement. The plot is not showing.
df.plot()
plt.show()

Is this due to the date field in the input file? I tried other dataset and the
df.index.freq='MS is working.

Can you please let us know if changing input will fix?

Thanks
Ram

@MarkStoneLin
Copy link

Hi Sachin ~
Here is our output of 'df.index'.

DatetimeIndex(['1992-01-01', '1992-01-02', '1992-01-03', '1992-01-04',
'1992-01-05', '1992-01-06', '1992-01-07', '1992-01-08',
'1992-01-09', '1992-01-10',
...
'2019-01-07', '2019-01-08', '2019-01-09', '2019-01-10',
'2019-01-11', '2019-01-12', '2020-01-01', '2020-01-02',
'2020-01-03', '2020-01-04'],
dtype='datetime64[ns]', name='DATE', length=340, freq=None)

Maybe the problem lies in here.

Thanks,
Mark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment