-
-
Save jessegrabowski/ccda08b8a758f882f5794b8b89ace07a to your computer and use it in GitHub Desktop.
No you're spot on. I see where we're differing in nomenclature now. The second example is very close to what you're thinking about, some kind of regression model with GARCH errors. The model was ARIMA there, but there's no reason why it couldn't be anything you like. For example, we could have some exogenous indicators, and use a linear regression for the mean model:
def step(*args):
# It's not necessary to do this args thing, but we will need to pass all of these to
# collect_default_updates in the end, so it makes it a bit more compact.
y_true, X_exog, sigma_sq_tm1, epsilon_tm1, beta_exog, ω, α, β, μ = args
# E[y] model as a linear regression
μ = μ + X_exog @ beta_exog
# GARCH process
σ2 = ω + α * epsilon_tm1 ** 2 + β * sigma_sq_tm1
# y = E[y] + epsilon implies y ~ N(E[y], sigma)
y = pm.Normal.dist(mu=μ, sigma=pt.sqrt(σ2))
# back out the value of epsilon from y and mu
epsilon = y_true - μ
return (σ2, epsilon, y), collect_default_updates(args, [y])
[sigmas, epsilons, y_hat], updates = pytensor.scan(
fn=step,
sequences=[data, exog_data], # <-- note the new data. Pay attention to timing though; here's im saying the exog_data operates on y contemporaneously (i.e. E[y_t] = mu + beta X_t) as opposed to with a lag. That's a modeling choice!
outputs_info=[sigma_sq_init, epsilon_init, None],
non_sequences=[beta_exog, omega, alpha, beta, mu], #<-- Note the inclusion of a new vector of parameters for the linear mean model
strict=True,
mode = get_mode(None) # use get_mode("JAX") if you try to sample with numpyro
)
In principle I don't see any reason why you couldn't write anything you want for the mean model, including adding a link function for e.g. logistic regression. You could put a whole neural network there if you were really feeling frisky.
Another practical question I think you're asking is, can I just do the whole mean regression first then run the GARCH model directly on the residuals of that regression, or must I do everything together in steps? There might be an algebraic equivalence between the two, you would have to sit down and carefully look at it. I know that there are subtleties in this area, see this note on how to write a SARIMAX regression for some idea of what can "go wrong".
Personally I will recommend doing it the way I've presented because I know 100% that it's right. But if you experiment and find a way that is faster or more clear, please report back so I can do it too!
If you did do the regression first, the mean of the residuals will be zero by construction, so you could just omit the mean model from inside the scan entirely and replace mu
with 0
everywhere. You wouldn't need the 0th tap for y_true
, you could just use the -1 tap as epsilon_tm1
directly. There would be some other subtleties with setting up the likelihood too, because you can't use the residuals from the first regression as observed in the second regression. You would need to provide the actual data as observed, with mu=regression_mu
and sigma=GARCH_sigmas
. Not positive, you'll have to play around with it
Yes, the idea is to model both the regression and the GARCH
sigmas simultaneously. Then the model "ideally" attempts to explain as much as possible the observed data with the regression, and then (hopefully) uses the GARCH sigma
to absorb any unexplained variance.
I have had good success with this type of model (in particular for say modeling sales at a retailer): prior to COVID19, the GARCH sigmas
are really small, since the regression has the "right" features to explain the observed data, but during COVID19, (or at least the start of it) you see a large increase in the GARCH std
because the model "doesn't know" how to explain the observed data (the features don't explain it).
I just want to ensure that I am proceeding on sound footing here. (The reasoning behind a lot of this is that I am in the process of converting my Stan model into PYMC, and want to replicate what I have in loops in Stan as scan
in the PYMC code)
That said, I really appreciate your comments!
Ah ok! Yeah it definitely makes sense to have some kind of COVID adjustment. I saw a paper on LinkedIn that did this kind of adjustment in a Bayesian VAR (I dug around for the post but couldn't find it). Basically they just used an indicator variable for COVID/not COVID, then modeled the residuals of that regression with a normal VAR. You would basically be doing the same thing in a GARCH framework by having your exog_data
be a COVID indicator?
So we had no indicator originally. And COVID is a really simple example (we absolutely knew what was going on) but generically, if you see large std
in certain time regimes, it is a motivator to go see why the std
is growing (i.e. why aren't the features capturing the variance?). Then you can apply the COVID indicator (for example) and (hopefully) see the variance diminish during the same time period (when retrained). Then you can be confident that the uncaptured variance was in fact due to COVID (and not a bunch of other things).
Alright, sorry to keep pestering you @jessegrabowski but one last question (maybe):
Here is my implementation, but I don't think things are correct for some reason (actually for two reasons: when I try to run with JAX
backend, it is complaining about numpy generators, and if I don't run with JAX
, it is using a bunch of samplers other than NUTS for different parameters (which should all be able to be sampled by NUTS since they are continuous)).
Anyway, here is the relevant part of the model (I dropped the part that computes obs_mean
):
# GARCH parameters
# Inital values for scan
sigma_sq_init = pm.Exponential('sigma_sq_init', 1)
epsilon_init = pm.Normal('epsilon_init')
omega = pm.Uniform('omega',0,1)
alpha1 = pm.Uniform('alpha1',0,1)
beta1 = pm.Uniform('beta1',0, (1-alpha))
def step(*args):
epsilon_tm1, sigma_sq_tm1, omega, alpha1, beta1 = args
# GARCH process
sig2 = omega + alpha * at.square(epsilon_tm1) + beta1 * sigma_sq_tm1
return sig2, collect_default_updates(args, [sig2])
sigmas, updates = pytensor.scan(
fn=step,
sequences=[{'input':at.concatenate([epsilon_init[None], (obs_data-obs_mean)]), 'taps':[-1]}],
outputs_info=[sigma_sq_init],
non_sequences=[omega, alpha1, beta1],
strict=True,
mode = get_mode(None) # use get_mode("JAX") if you try to sample with numpyro
)
sig = pm.Deterministic('sig',at.sqrt(sigmas))
lik = pm.Normal(name='lik', mu=obs_mean, sigma=sig, observed=obs_data, shape=obs_data.shape)
where obs_data
is the observed data and obs_mean
is the calculated mean from the model (same shape as obs_data
).
Can you see anything amiss here?
They way you've written it, I don't think you need to collect updates at all in the scan function. This is because you're not actually making any random variables inside the scan, so you don't need to do anything special with the underlying random number generator (this is what collect_default_updates
does.
Ah yes, thank you for the clarification. This is much clearer in my mind now. Your second comment is exactly why I stated we would "know" the epsilons when we compute a time varying
mu
outside of the scan. Philosophically, I agree that you do not observe the error.Practically though, If I compute
Then
std
is capturing the volatility of the model (i.e. the errors thatmu
is not absorbing when regressingobserved
).Does this interpretation make sense?
in particular, if I use some type of logistic regression (or something else) to produce
mu
, then I would likestd
to capture those times that my features do not explainobserved
. To your point, there may be some underlying baseline error that is not captured at all by the features, so that would be a lower bound on the size ofstd
.Maybe this is an abuse of the GARCH model formulation to think of it like this?