-
-
Save robjhyndman/d9eb5568a78dbc79f7acc49e22553e96 to your computer and use it in GitHub Desktop.
library(forecast) | |
fc <- function(y, h, xreg, newxreg) { | |
fit <- auto.arima(y, xreg=xreg) | |
forecast(fit, xreg=newxreg, h=h) | |
} | |
y <- ts(rnorm(100)) | |
x <- matrix(ts(rnorm(100)),ncol=1) | |
tsCV(y, fc, xreg=x, h=1) | |
robjhyndman
commented
May 19, 2021
- Using a window works with this code. Just set window=20 (or whatever value you like).
- Why should it be small? The window is the amount of data in each training set. If it is too small you won't have enough data to fit a model. I normally do not use a window as I want to use as much training data as possible. You would have to define what you mean by optimal. It is not obvious how to define optimal in this context.
Thanks for your fast response once again Rob, greatly appreciated!
The rolling window does indeed work, it seems that I set my value too low which caused the NA's to occur.
To give more context to the window size, my objective is to get my forecast accuracy as high as possible. I am afraid that if I take my window size too large, the model fit will become increasingly good but the future forecasts might get worse. In my case, the definition of an optimal window would be one that optimizes the forecast accuracy.
For context, I am modelling the number of hospital appointments on a weekly basis and my goal is to predict with high accuracy the amount of appointments in the next week(s).
Kind regards
Increased training data does not lead to overfitting.
Hi Rob,
For these 2 functions:
arima_xreg <- function(x, h, xreg, newxreg) {
forecast(Arima(x, order = c(1, 1, 0), xreg = xreg), xreg = newxreg)
}
tsCV(Y, arima_xreg, xreg = X_all)
The call to arima_xreg in tsCV has a xreg variable but no newxreg variable. Is the tsCV function automatically splitting the xreg data (when populated) in proportion to how Y is split and it automatically sends this split data as xreg, newxreg into the arima_xreg function?
Complicated but clever if so!
Appreciate the help.
Yes, the data splitting is all handled within tsCV()
Thanks for the quick response!
Hi Rob,
I have 2 data sets, one from Intraday electricity market(dependent variable) and one for day-ahead electricity market(explanatory variable). I also noticed that there is seasonality so I included fourier series to deal with it. To find the appropriate ARIMA model I used the auto.arima function. I have also split my data set into training and test. The training sets contain around 14k observation and test set around 3.5k observations. The results from the code below suggest using regression with ARIMA(5,0,2) errors and Fourier series with K = 7.
bestfit <- list(aicc=Inf) for(i in 1:25) { fit <- auto.arima(elbas_training, xreg=cbind(as.matrix(ts_elspot),fourier(gas, K=i)), seasonal=FALSE) print(i) if(fit$aicc < bestfit$aicc) bestfit <- fit else break; }
My question is the following code. What time series should I include in the tsCV function for "y". Should it be the whole data set(training + test), the time series that I used for training the model or the test set if I want to forecast for the next 3 periods?
`arima_xreg <- function(x, h, xreg, newxreg) {
forecast(Arima(x, order = c(5, 0, 2), xreg = xreg), xreg = newxreg)
}
tsCV(Y, arima_xreg, xreg = X_all, h=3)`
First, here is how to do it with auto.arima()
, to answer the previous question now deleted.
library(forecast)
fc <- function(y, h, xreg, newxreg) {
fit <- auto.arima(y, xreg=xreg)
forecast(fit, xreg=newxreg, h=h)
}
y <- ts(rnorm(100))
x <- matrix(ts(rnorm(100)),ncol=1)
tsCV(y, fc, xreg=x, h=1)
Second, tsCV()
does its own training and test splits. If you want to do a specific training and test split, don't use tsCV()
. See https://otexts.com/fpp2/accuracy.html
Thanks a lot for your efforts to provide this excellent function. I have tried to apply tscv for multiple regression (one response and a number of predictors, but unfortunately, it didn't work out, I just got "NA". Could you please help me? Thanks in advance.
Please post a reproducible example at http://rstd.io/forecast-package
Thank you so much for the quick response. Sorry, something went wrong when I tried to use the RStudio Community as this first time to use it. The problem is simply I used your code for the Example with exogenous predictors, but with lm function instead of ARIMA. I got NA.
x1 <- c(23, 43, 42, 65, 20, 23, 20, 16, 14, 72, 12, 14, 22, 14, 98, 11, 20, 22, 24, 44)
X2 <- c(27, 31, 40, 41, 40, 45, 41, 35, 36, 27, 25, 27, 37, 31, 29, 90, 37, 41, 24, 56)
y <- c(13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 66, 44, 76, 88, 99, 55, 33)
X_matrix <- cbind(X1, X2)
fu <- function(x, h, xreg, newxreg) {
forecast(lm(formula,data, xreg = xreg), xreg = newxreg)
}
e<-tsCV(y, fu, xreg = X_matrix)
For a linear model, use tslm
, not lm
.
Dear Rob,
I am doing multi step ahead ex-ante forecast with a model with explanatory variables, for instance if its a 7 days ahead forecast of Y, I need to have 1 to 7 days forecast of X to be used recursively in 1 to seven days forecast of Y. I use a modified version of tscv that allows to have xreg_actual and xreg_forecast, then I use these codes:
far <- function(x, h, xreg, newxreg) {
forecast(Arima(x, order=c(0,0,0), xreg=xreg), xreg=newxreg)
}
xreg_actual <- data.frame(X1=X1_actual,)
xreg_forecast <- data.frame(X1=X1_onestep, X1=X1_twostep,....., X1=X7_sevenstep)
So xreg_actual includes the actual values of X1 and xreg_forecast includes X1_onestep to X7_sevenstep, which are forecasted values of X1 for the validation and test period, and are obtained from another tscv forecast. What I expect is that, each of the elements of xreg_forecast dataframe to be used in forecasting y from one to seven step ahead. for example if h=7, the tscv starts from h=1 and then recursively reaches to h=7, and I want X1_onestep to be used in h=1, X2_twostep to be used in h=2 and so on....
So, I made the following loop:
for (i in 1:ncol(xreg_forecast) {
error[[i]]<- data.frame(mytsCV(y, far, h=7, window = 365, initial =10, xreg_actual=xreg_actual,
xreg_forecast=xreg_forecast[[i]]))
}
the loop is generating the errors as it should be, however, I am not sure if the loop is atually doing what I aimed to.
Also Im not sure how can I extend the loop for models with more than one explanatory variables.
Any help is highly appriciated.
Thanks