-
-
Save robjhyndman/d9eb5568a78dbc79f7acc49e22553e96 to your computer and use it in GitHub Desktop.
library(forecast) | |
fc <- function(y, h, xreg, newxreg) { | |
fit <- auto.arima(y, xreg=xreg) | |
forecast(fit, xreg=newxreg, h=h) | |
} | |
y <- ts(rnorm(100)) | |
x <- matrix(ts(rnorm(100)),ncol=1) | |
tsCV(y, fc, xreg=x, h=1) | |
Hi Rob,
For these 2 functions:
arima_xreg <- function(x, h, xreg, newxreg) {
forecast(Arima(x, order = c(1, 1, 0), xreg = xreg), xreg = newxreg)
}
tsCV(Y, arima_xreg, xreg = X_all)
The call to arima_xreg in tsCV has a xreg variable but no newxreg variable. Is the tsCV function automatically splitting the xreg data (when populated) in proportion to how Y is split and it automatically sends this split data as xreg, newxreg into the arima_xreg function?
Complicated but clever if so!
Appreciate the help.
Yes, the data splitting is all handled within tsCV()
Thanks for the quick response!
Hi Rob,
I have 2 data sets, one from Intraday electricity market(dependent variable) and one for day-ahead electricity market(explanatory variable). I also noticed that there is seasonality so I included fourier series to deal with it. To find the appropriate ARIMA model I used the auto.arima function. I have also split my data set into training and test. The training sets contain around 14k observation and test set around 3.5k observations. The results from the code below suggest using regression with ARIMA(5,0,2) errors and Fourier series with K = 7.
bestfit <- list(aicc=Inf) for(i in 1:25) { fit <- auto.arima(elbas_training, xreg=cbind(as.matrix(ts_elspot),fourier(gas, K=i)), seasonal=FALSE) print(i) if(fit$aicc < bestfit$aicc) bestfit <- fit else break; }
My question is the following code. What time series should I include in the tsCV function for "y". Should it be the whole data set(training + test), the time series that I used for training the model or the test set if I want to forecast for the next 3 periods?
`arima_xreg <- function(x, h, xreg, newxreg) {
forecast(Arima(x, order = c(5, 0, 2), xreg = xreg), xreg = newxreg)
}
tsCV(Y, arima_xreg, xreg = X_all, h=3)`
First, here is how to do it with auto.arima()
, to answer the previous question now deleted.
library(forecast)
fc <- function(y, h, xreg, newxreg) {
fit <- auto.arima(y, xreg=xreg)
forecast(fit, xreg=newxreg, h=h)
}
y <- ts(rnorm(100))
x <- matrix(ts(rnorm(100)),ncol=1)
tsCV(y, fc, xreg=x, h=1)
Second, tsCV()
does its own training and test splits. If you want to do a specific training and test split, don't use tsCV()
. See https://otexts.com/fpp2/accuracy.html
Thanks a lot for your efforts to provide this excellent function. I have tried to apply tscv for multiple regression (one response and a number of predictors, but unfortunately, it didn't work out, I just got "NA". Could you please help me? Thanks in advance.
Please post a reproducible example at http://rstd.io/forecast-package
Thank you so much for the quick response. Sorry, something went wrong when I tried to use the RStudio Community as this first time to use it. The problem is simply I used your code for the Example with exogenous predictors, but with lm function instead of ARIMA. I got NA.
x1 <- c(23, 43, 42, 65, 20, 23, 20, 16, 14, 72, 12, 14, 22, 14, 98, 11, 20, 22, 24, 44)
X2 <- c(27, 31, 40, 41, 40, 45, 41, 35, 36, 27, 25, 27, 37, 31, 29, 90, 37, 41, 24, 56)
y <- c(13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 66, 44, 76, 88, 99, 55, 33)
X_matrix <- cbind(X1, X2)
fu <- function(x, h, xreg, newxreg) {
forecast(lm(formula,data, xreg = xreg), xreg = newxreg)
}
e<-tsCV(y, fu, xreg = X_matrix)
For a linear model, use tslm
, not lm
.
Dear Rob,
I am doing multi step ahead ex-ante forecast with a model with explanatory variables, for instance if its a 7 days ahead forecast of Y, I need to have 1 to 7 days forecast of X to be used recursively in 1 to seven days forecast of Y. I use a modified version of tscv that allows to have xreg_actual and xreg_forecast, then I use these codes:
far <- function(x, h, xreg, newxreg) {
forecast(Arima(x, order=c(0,0,0), xreg=xreg), xreg=newxreg)
}
xreg_actual <- data.frame(X1=X1_actual,)
xreg_forecast <- data.frame(X1=X1_onestep, X1=X1_twostep,....., X1=X7_sevenstep)
So xreg_actual includes the actual values of X1 and xreg_forecast includes X1_onestep to X7_sevenstep, which are forecasted values of X1 for the validation and test period, and are obtained from another tscv forecast. What I expect is that, each of the elements of xreg_forecast dataframe to be used in forecasting y from one to seven step ahead. for example if h=7, the tscv starts from h=1 and then recursively reaches to h=7, and I want X1_onestep to be used in h=1, X2_twostep to be used in h=2 and so on....
So, I made the following loop:
for (i in 1:ncol(xreg_forecast) {
error[[i]]<- data.frame(mytsCV(y, far, h=7, window = 365, initial =10, xreg_actual=xreg_actual,
xreg_forecast=xreg_forecast[[i]]))
}
the loop is generating the errors as it should be, however, I am not sure if the loop is atually doing what I aimed to.
Also Im not sure how can I extend the loop for models with more than one explanatory variables.
Any help is highly appriciated.
Thanks
Increased training data does not lead to overfitting.