Skip to content

Instantly share code, notes, and snippets.

@robjhyndman
Last active July 14, 2022 10:00
Show Gist options
  • Save robjhyndman/d9eb5568a78dbc79f7acc49e22553e96 to your computer and use it in GitHub Desktop.
Save robjhyndman/d9eb5568a78dbc79f7acc49e22553e96 to your computer and use it in GitHub Desktop.
tsCV with xreg
library(forecast)
fc <- function(y, h, xreg, newxreg) {
fit <- auto.arima(y, xreg=xreg)
forecast(fit, xreg=newxreg, h=h)
}
y <- ts(rnorm(100))
x <- matrix(ts(rnorm(100)),ncol=1)
tsCV(y, fc, xreg=x, h=1)
@pgbutler
Copy link

Hi Rob,

For these 2 functions:

arima_xreg <- function(x, h, xreg, newxreg) {
forecast(Arima(x, order = c(1, 1, 0), xreg = xreg), xreg = newxreg)
}

tsCV(Y, arima_xreg, xreg = X_all)

The call to arima_xreg in tsCV has a xreg variable but no newxreg variable. Is the tsCV function automatically splitting the xreg data (when populated) in proportion to how Y is split and it automatically sends this split data as xreg, newxreg into the arima_xreg function?

Complicated but clever if so!

Appreciate the help.

@robjhyndman
Copy link
Author

Yes, the data splitting is all handled within tsCV()

@pgbutler
Copy link

Thanks for the quick response!

@markolucic97
Copy link

Hi Rob,

I have 2 data sets, one from Intraday electricity market(dependent variable) and one for day-ahead electricity market(explanatory variable). I also noticed that there is seasonality so I included fourier series to deal with it. To find the appropriate ARIMA model I used the auto.arima function. I have also split my data set into training and test. The training sets contain around 14k observation and test set around 3.5k observations. The results from the code below suggest using regression with ARIMA(5,0,2) errors and Fourier series with K = 7.
bestfit <- list(aicc=Inf) for(i in 1:25) { fit <- auto.arima(elbas_training, xreg=cbind(as.matrix(ts_elspot),fourier(gas, K=i)), seasonal=FALSE) print(i) if(fit$aicc < bestfit$aicc) bestfit <- fit else break; }

My question is the following code. What time series should I include in the tsCV function for "y". Should it be the whole data set(training + test), the time series that I used for training the model or the test set if I want to forecast for the next 3 periods?

`arima_xreg <- function(x, h, xreg, newxreg) {
forecast(Arima(x, order = c(5, 0, 2), xreg = xreg), xreg = newxreg)
}

tsCV(Y, arima_xreg, xreg = X_all, h=3)`

@robjhyndman
Copy link
Author

robjhyndman commented Aug 11, 2021

First, here is how to do it with auto.arima(), to answer the previous question now deleted.

library(forecast)

fc <- function(y, h, xreg, newxreg) {
  fit <- auto.arima(y, xreg=xreg)
  forecast(fit, xreg=newxreg, h=h)
}

y <- ts(rnorm(100))
x <- matrix(ts(rnorm(100)),ncol=1)
tsCV(y, fc, xreg=x, h=1)

@robjhyndman
Copy link
Author

Second, tsCV() does its own training and test splits. If you want to do a specific training and test split, don't use tsCV(). See https://otexts.com/fpp2/accuracy.html

@khmahmood
Copy link

Thanks a lot for your efforts to provide this excellent function. I have tried to apply tscv for multiple regression (one response and a number of predictors, but unfortunately, it didn't work out, I just got "NA". Could you please help me? Thanks in advance.

@robjhyndman
Copy link
Author

Please post a reproducible example at http://rstd.io/forecast-package

@khmahmood
Copy link

Thank you so much for the quick response. Sorry, something went wrong when I tried to use the RStudio Community as this first time to use it. The problem is simply I used your code for the Example with exogenous predictors, but with lm function instead of ARIMA. I got NA.

x1 <- c(23, 43, 42, 65, 20, 23, 20, 16, 14, 72, 12, 14, 22, 14, 98, 11, 20, 22, 24, 44)
X2 <- c(27, 31, 40, 41, 40, 45, 41, 35, 36, 27, 25, 27, 37, 31, 29, 90, 37, 41, 24, 56)
y <- c(13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 66, 44, 76, 88, 99, 55, 33)
X_matrix <- cbind(X1, X2)

fu <- function(x, h, xreg, newxreg) {
forecast(lm(formula,data, xreg = xreg), xreg = newxreg)
}

e<-tsCV(y, fu, xreg = X_matrix)

@robjhyndman
Copy link
Author

For a linear model, use tslm, not lm.

@Niloojan
Copy link

Niloojan commented Jul 1, 2022

Dear Rob,
I am doing multi step ahead ex-ante forecast with a model with explanatory variables, for instance if its a 7 days ahead forecast of Y, I need to have 1 to 7 days forecast of X to be used recursively in 1 to seven days forecast of Y. I use a modified version of tscv that allows to have xreg_actual and xreg_forecast, then I use these codes:

far <- function(x, h, xreg, newxreg) {
forecast(Arima(x, order=c(0,0,0), xreg=xreg), xreg=newxreg)
}

xreg_actual <- data.frame(X1=X1_actual,)
xreg_forecast <- data.frame(X1=X1_onestep, X1=X1_twostep,....., X1=X7_sevenstep)

So xreg_actual includes the actual values of X1 and xreg_forecast includes X1_onestep to X7_sevenstep, which are forecasted values of X1 for the validation and test period, and are obtained from another tscv forecast. What I expect is that, each of the elements of xreg_forecast dataframe to be used in forecasting y from one to seven step ahead. for example if h=7, the tscv starts from h=1 and then recursively reaches to h=7, and I want X1_onestep to be used in h=1, X2_twostep to be used in h=2 and so on....
So, I made the following loop:

for (i in 1:ncol(xreg_forecast) {
error[[i]]<- data.frame(mytsCV(y, far, h=7, window = 365, initial =10, xreg_actual=xreg_actual,
xreg_forecast=xreg_forecast[[i]]))
}

the loop is generating the errors as it should be, however, I am not sure if the loop is atually doing what I aimed to.
Also Im not sure how can I extend the loop for models with more than one explanatory variables.
Any help is highly appriciated.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment