-
-
Save utkarsh-maheshwari/8d4cd2fd84c763bf85291c3f0881d588 to your computer and use it in GitHub Desktop.
@ahartikainen @canyon289
I think ppc can be calculated by 2 ways:
- posterior_predictive = pm.sample_posterior_predictive(trace, samples=1000, model) : (User will probably calculate like this)
- y_ppc_values = data.posterior[intercept] + data.posterior[slope] * data.constant_data[x] :
How will the both ppc values differ? I believed both are same. But
y_ppc= idata.posterior["Intercept"] + idata.posterior["slope"] * idata.constant_data["mom_iq"]
plt.scatter(idata.constant_data["mom_iq"], idata.observed_data["kid_score"])
az.plot_hdi(x=idata.constant_data["mom_iq"], y=idata.posterior_predictive["kid_score"], color='c')
az.plot_hdi(idata.constant_data["mom_iq"], y_ppc, color='r')
produces:
How can this be explained?
- posterior_predictive = pm.sample_posterior_predictive(trace, samples=1000, model) : (User will probably calculate like this)
Should I not compute ppc like this?
Maybe, posterior_predictive is more suitable to plot uncertainty in points and not uncertainty lines 🤔
Both are suitable and preferable in different circumstances. One is a plot of the uncertainty of the mean in red, the other is of the data in blue. Havent had a look through the notebook yet but its on my TODO list. Thank you for putting this together!
Got it, Thanks!
@canyon289
I tried to implement what you suggested by extending @ahartikainen's function. I can see some bugs though. But it might be a fair start.
Approach to parse the y_model
rhs, lhs = y_model.split()
terms = rhs.split('+') : gives terms of the expression.
var_name = terms.split('*') : gives variable names to be searched in data
Iterate over terms
calculate value of each term
add it to the y_model_values
Seems like the right approach! Any guidance you need from us?
Seems like the right approach! Any guidance you need from us?
A brief review would be great.
- Now, that it supports plotting simple linear regression models, what should be the immediate next goal of the function?
- Any bugs you see in the function?
- Other things need to be done for simple regression?
- Should I start adding the function to the library?
If you feel like youre done prototyping the next thing you should do is move this into a PR in ArviZ. notably adding many tests. This answers the first question and the last question.
For finding bugs writing tests with hell both you and us find bugs if any exist.
At a glance it looks good but I have some questions about the api. This however will be easier to assess with a full test suite and when part of PR.
None of this is to say this isnt great work! IMO youre right on track and really appreciate the way youre doing this.
Thank you! Reviews are very helpful and encouraging for me be they positive or negative.
Not sure if prototyping is enough but since functions output something, I am gonna open a PR.
I'll get more reviews there 😋
Not sure if this has been abandoned. But I tried it in colab with just the prior predictive.
It failed due to this:
groups = ["posterior_predictive", "prior_predictive"]
I removed that and modified the loop to check for groups with _predictive
in its name:
for group in groups:
if '_predictive' in group:
item = getattr(data, group)
if y_ppc in item and y_ppc_group is None:
y_ppc_group = group
elif y_ppc in item:
print("Warning, duplicate variable names for y_ppc, "
"using variable from group {}".format(y_ppc_group))
Unfortunately it crashed.
Hi @mattiasthalen! Did you try plot_lm
function? This gist is just a prototype.
@utkarsh-maheshwari no, I stumbled on this when googling. But I did now and it seems to work good :)
If posterior_predivtive group is present in the input inferencedata, then I guess we don't need
y_model
from the user.?y_ppc
name will be assumed to be as name ofy
unless stated otherwise. We just needx
(str or sequence),y
(str only),idata
in this case.If no posterior_predivtive group is there, then user must provide y_model (str or sequence).
If
y_model : str
---> we need idata only. x,y can be extracted.If
y_model : seq
---> x, y are needed can be str or seq.