-
-
Save sachinsdate/7a070070728d80ba5b2887543c6413a9 to your computer and use it in GitHub Desktop.
import pandas as pd | |
from matplotlib import pyplot as plt | |
from statsmodels.tsa.holtwinters import ExponentialSmoothing as HWES | |
#read the data file. the date column is expected to be in the mm-dd-yyyy format. | |
df = pd.read_csv('retail_sales_used_car_dealers_us_1992_2020.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0]) | |
df.index.freq = 'MS' | |
#plot the data | |
df.plot() | |
plt.show() | |
#split between the training and the test data sets. The last 12 periods form the test data | |
df_train = df.iloc[:-12] | |
df_test = df.iloc[-12:] | |
#build and train the model on the training data | |
model = HWES(df_train, seasonal_periods=12, trend='add', seasonal='mul') | |
fitted = model.fit(optimized=True, use_brute=True) | |
#print out the training summary | |
print(fitted.summary()) | |
#create an out of sample forcast for the next 12 steps beyond the final data point in the training data set | |
sales_forecast = fitted.forecast(steps=12) | |
#plot the training data, the test data and the forecast on the same plot | |
fig = plt.figure() | |
fig.suptitle('Retail Sales of Used Cars in the US (1992-2020)') | |
past, = plt.plot(df_train.index, df_train, 'b.-', label='Sales History') | |
future, = plt.plot(df_test.index, df_test, 'r.-', label='Actual Sales') | |
predicted_future, = plt.plot(df_test.index, sales_forecast, 'g.-', label='Sales Forecast') | |
plt.legend(handles=[past, future, predicted_future]) | |
plt.show() |
Hi Ron,
Here is the output I get when I execute the following 2 statements:
df = pd.read_csv('retail_sales_used_car_dealers_us_1992_2020.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0])
df.index
DatetimeIndex(['1992-01-01', '1992-02-01', '1992-03-01', '1992-04-01',
'1992-05-01', '1992-06-01', '1992-07-01', '1992-08-01',
'1992-09-01', '1992-10-01',
...
'2019-07-01', '2019-08-01', '2019-09-01', '2019-10-01',
'2019-11-01', '2019-12-01', '2020-01-01', '2020-02-01',
'2020-03-01', '2020-04-01'],
dtype='datetime64[ns]', name='DATE', length=340, freq=None)
df.index.freq = 'MS'
df.index
DatetimeIndex(['1992-01-01', '1992-02-01', '1992-03-01', '1992-04-01',
'1992-05-01', '1992-06-01', '1992-07-01', '1992-08-01',
'1992-09-01', '1992-10-01',
...
'2019-07-01', '2019-08-01', '2019-09-01', '2019-10-01',
'2019-11-01', '2019-12-01', '2020-01-01', '2020-02-01',
'2020-03-01', '2020-04-01'],
dtype='datetime64[ns]', name='DATE', length=340, freq='MS')
Can you compare this output with what you are seeing? It might give you some leads on your problem.
You could also try using the following statement for changing the frequency to Start of Month (MS):
df = df.asfreq('MS')
'best
Sachin
Hi,
Here is the error I am getting.
df.index.freq = 'MS'
ValueError Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\arrays\datetimelike.py in _validate_frequency(cls, index, freq, **kwargs)
892 if not np.array_equal(index.asi8, on_freq.asi8):
--> 893 raise ValueError
894 except ValueError as e:
ValueError:
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
in
2 #df.index = pd.to_datetime(df.index)
3
----> 4 df.index.freq='MS'
5 #df.index.freq='MS'
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\datetimelike.py in freq(self, value)
98 def freq(self, value):
99 # validation is handled by _data setter
--> 100 self._data.freq = value
101
102 @Property
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\arrays\datetimelike.py in freq(self, value)
829 if value is not None:
830 value = frequencies.to_offset(value)
--> 831 self._validate_frequency(self, value)
832
833 self._freq = value
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\arrays\datetimelike.py in _validate_frequency(cls, index, freq, **kwargs)
905 "Inferred frequency {infer} from passed values "
906 "does not conform to passed frequency {passed}".format(
--> 907 infer=inferred, passed=freq.freqstr
908 )
909 )
ValueError: Inferred frequency None from passed values does not conform to passed frequency MS
if I use df = df.asfreq('MS')
then I am getting blank in the following statement. The plot is not showing.
df.plot()
plt.show()
Is this due to the date field in the input file? I tried other dataset and the
df.index.freq='MS is working.
Can you please let us know if changing input will fix?
Thanks
Ram
Hi Sachin ~
Here is our output of 'df.index'.
DatetimeIndex(['1992-01-01', '1992-01-02', '1992-01-03', '1992-01-04',
'1992-01-05', '1992-01-06', '1992-01-07', '1992-01-08',
'1992-01-09', '1992-01-10',
...
'2019-01-07', '2019-01-08', '2019-01-09', '2019-01-10',
'2019-01-11', '2019-01-12', '2020-01-01', '2020-01-02',
'2020-01-03', '2020-01-04'],
dtype='datetime64[ns]', name='DATE', length=340, freq=None)
Maybe the problem lies in here.
Thanks,
Mark
Hi Sachin ~ I was trying to walk through this (I'm a noob) and everything was fine except the line 'df.index.freq = 'MS'. I'm getting an error message "Inferred frequency None from passed values does not conform to passed frequency MS".
I've tried troubleshooting on my end, but nothing seems to work. Obviously your code as written works, so must be something on my side, but don't know where to look.
Thank you for posting this out there, and for helping out.
Ron