First of all, thank you @AustinRochford for a wonderful demo.
Like others, I also had trouble reproducing the result for the first hazard regression (i.e. not the time-varying coefficient model, which I found to be reproducible). I was able to get similar results with a different model specification that is perhaps more typical for Bayesian regression:
with pm.Model() as model:
lambda0 = pm.Gamma('lambda0', 0.01, 0.01, shape=n_intervals)
tau = pm.Gamma('tau', 10., 10.)
mu_beta = pm.Normal('mu_beta', 0., 10 ** -2)
beta = pm.Normal('beta', mu_beta, tau)
lambda_ = pm.Deterministic('lambda_', T.outer(T.exp(beta * df.metastized), lambda0))
mu = pm.Deterministic('mu', exposure * lambda_)
obs = pm.Poisson('obs', mu, observed=death)
The gamma prior on tau
produces a distribution for beta
that looks reasonable for a regression model (i.e. centered on zero with a fair amount of density between -2 and 2).
Since beta
is generated by a Gaussian random walk with fixed tau=1
in the time-varying model, this can explain why someone running this demo could have a problem with the first but not second example. Incidentally, a gamma distribution with parameters (a=10, b=10) as in the code written above produces a distribution for tau
with a mean of 1.
Of course, what I can't explain is why the model specification as it appears in the notebook worked in the first place. Anyway, hope this helps anyone else struggling with it.
And FWIW I'm using Python 3.6.1 on Mac OS X, pymc3 3.1, Theano 0.9.0, numpy 1.12.1
Same problem:
np.exp(trace['beta'].mean()) = 1.0404104237122036
Beta plot, autocorrelation plot, Cumulative hazard and Survival function are different from your notebook (although consistent with each other.
All results from section "Time varying effects" are identical to yours
python version: 2.7.13 |Anaconda custom (x86_64)| (default, Dec 20 2016, 23:05:08)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
numpy version: 1.12.1
pymc3 version: 3.0
statsmodels version: 0.8.0
theano version: 0.9.0