Skip to content

Instantly share code, notes, and snippets.

@utkarsh-maheshwari
Last active September 8, 2022 07:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save utkarsh-maheshwari/8d4cd2fd84c763bf85291c3f0881d588 to your computer and use it in GitHub Desktop.
Save utkarsh-maheshwari/8d4cd2fd84c763bf85291c3f0881d588 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@utkarsh-maheshwari
Copy link
Author

If posterior_predivtive group is present in the input inferencedata, then I guess we don't need y_model from the user.? y_ppc name will be assumed to be as name of y unless stated otherwise. We just need x(str or sequence), y(str only), idata in this case.

If no posterior_predivtive group is there, then user must provide y_model (str or sequence).

  • If y_model : str ---> we need idata only. x,y can be extracted.

  • If y_model : seq ---> x, y are needed can be str or seq.

      If x : str or y : str ---> idata must be there.
      else ---> no idata is required
    

@utkarsh-maheshwari
Copy link
Author

@ahartikainen @canyon289
I think ppc can be calculated by 2 ways:

  1. posterior_predictive = pm.sample_posterior_predictive(trace, samples=1000, model) : (User will probably calculate like this)
  2. y_ppc_values = data.posterior[intercept] + data.posterior[slope] * data.constant_data[x] :

How will the both ppc values differ? I believed both are same. But

y_ppc= idata.posterior["Intercept"] + idata.posterior["slope"] * idata.constant_data["mom_iq"]
plt.scatter(idata.constant_data["mom_iq"], idata.observed_data["kid_score"])
az.plot_hdi(x=idata.constant_data["mom_iq"], y=idata.posterior_predictive["kid_score"], color='c')
az.plot_hdi(idata.constant_data["mom_iq"], y_ppc, color='r')

produces:

image

How can this be explained?

@utkarsh-maheshwari
Copy link
Author

  1. posterior_predictive = pm.sample_posterior_predictive(trace, samples=1000, model) : (User will probably calculate like this)

Should I not compute ppc like this?

@utkarsh-maheshwari
Copy link
Author

Maybe, posterior_predictive is more suitable to plot uncertainty in points and not uncertainty lines 🤔

@canyon289
Copy link

canyon289 commented Jun 9, 2021

Both are suitable and preferable in different circumstances. One is a plot of the uncertainty of the mean in red, the other is of the data in blue. Havent had a look through the notebook yet but its on my TODO list. Thank you for putting this together!

@utkarsh-maheshwari
Copy link
Author

Got it, Thanks!

@utkarsh-maheshwari
Copy link
Author

@canyon289
I tried to implement what you suggested by extending @ahartikainen's function. I can see some bugs though. But it might be a fair start.

@utkarsh-maheshwari
Copy link
Author

Approach to parse the y_model

rhs, lhs = y_model.split()
terms = rhs.split('+') : gives terms of the expression. 
var_name = terms.split('*') : gives variable names to be searched in data

Iterate over terms
calculate value of each term
add it to the y_model_values

@canyon289
Copy link

Seems like the right approach! Any guidance you need from us?

@utkarsh-maheshwari
Copy link
Author

Seems like the right approach! Any guidance you need from us?

A brief review would be great.

  • Now, that it supports plotting simple linear regression models, what should be the immediate next goal of the function?
  • Any bugs you see in the function?
  • Other things need to be done for simple regression?
  • Should I start adding the function to the library?

@canyon289
Copy link

If you feel like youre done prototyping the next thing you should do is move this into a PR in ArviZ. notably adding many tests. This answers the first question and the last question.

For finding bugs writing tests with hell both you and us find bugs if any exist.

At a glance it looks good but I have some questions about the api. This however will be easier to assess with a full test suite and when part of PR.

None of this is to say this isnt great work! IMO youre right on track and really appreciate the way youre doing this.

@utkarsh-maheshwari
Copy link
Author

Thank you! Reviews are very helpful and encouraging for me be they positive or negative.
Not sure if prototyping is enough but since functions output something, I am gonna open a PR.
I'll get more reviews there 😋

@mattiasthalen
Copy link

Not sure if this has been abandoned. But I tried it in colab with just the prior predictive.

It failed due to this:

groups = ["posterior_predictive", "prior_predictive"]

I removed that and modified the loop to check for groups with _predictive in its name:

for group in groups:
    if '_predictive' in group:
        item = getattr(data, group)
        if y_ppc in item and y_ppc_group is None:
            y_ppc_group = group
        elif y_ppc in item:
            print("Warning, duplicate variable names for y_ppc, "
            "using variable from group {}".format(y_ppc_group))

Unfortunately it crashed.

@utkarsh-maheshwari
Copy link
Author

Hi @mattiasthalen! Did you try plot_lm function? This gist is just a prototype.

@mattiasthalen
Copy link

@utkarsh-maheshwari no, I stumbled on this when googling. But I did now and it seems to work good :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment