Skip to content

Instantly share code, notes, and snippets.

@dlakelan
Created January 4, 2023 04:54
Show Gist options
  • Save dlakelan/c064185efe44f4db9358438a5b2785e8 to your computer and use it in GitHub Desktop.
Save dlakelan/c064185efe44f4db9358438a5b2785e8 to your computer and use it in GitHub Desktop.
When the "true" distribution isn't in the model space, does Bayes give a reasonable answer?
using Distributions, Random, StatsPlots, Turing
# We have a pallet of 100 jugs of Orange Juice, each is a 2L jug.
# they are filled by a machine which may have a flaw. we would like to find
# the avg amount of OJ in a jug by taking a sample of 20
# this is a finite population, and the actual distribution of volumes is just given
# by 100 numbers. we'll say they are something like this:
Random.set_global_seed!(1234)
realdata = shuffle!(vcat(repeat([0.125],20),rand(Normal(1.90,0.12),60),
rand(Uniform(1.15,1.5),20)))
h = histogram(realdata)
display(h)
density(realdata; bandwidth=0.1)
# This looks nothing like an exponential distribution... but the bayesian data analyst
# knows that the volume must be a non-negative number with an average... the maximum
# entropy distribution for such a thing is exponential, so will model the distribution of
# weights as exponential with a given average. The "true" distribution is nothing like
# the ones within the model family, so there can be no "true" parameter. Nevertheless
# Bayes will converge to the right answer because exponential distributions with
# average near the true average will put the data in a high density region of the
# predictive distribution.
@model function inferexp(d)
m ~ Uniform(0.0,2.5)
# the average must be something in the range 0 to some value you'd get if you overfilled a bottle as much as possible, definitely less than 2.5 liters
d ~ filldist(Exponential(m),length(d))
end
takesamp = sample(realdata,40;replace=false)
sm = sample(inferexp(takesamp),NUTS(200,.8),200)
p = density(get(sm,:m).m.data; title="Inferred mean value",xlim=(0,2.5),label="Posterior estimate of mean volume")
plot!(p,[mean(realdata),mean(realdata)],[0.0,2.0]; label="True mean value of 100 bottles")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment