Skip to content

Instantly share code, notes, and snippets.

@chrishwiggins
Created September 16, 2014 10:17
Show Gist options
  • Save chrishwiggins/f48093ce798e4561c38d to your computer and use it in GitHub Desktop.
Save chrishwiggins/f48093ce798e4561c38d to your computer and use it in GitHub Desktop.
The Bayesian approach to model selection is a subject you'll
like. The basic idea is to compute the "Bayes Factor":
http://en.wikipedia.org/wiki/Bayes_factor .
As the page says "Bayesian inference has been put forward as a
theoretical justification for and generalization of Occam's
razor".
( http://en.wikipedia.org/wiki/Occam%27s_razor )
The Bayes factor can be approximated under sum assumptions,
leading to a simple penalized maximum likelihood called the
"BIC" (basically chi-squared, which goes as the model gets more
complicated, plus a term linear in the number of parameters, to
compensate and produce a "best" value of the number of free
parameters):
http://en.wikipedia.org/wiki/Bayesian_information_criterion
The paper it comes from is very short!
http://www.andrew.cmu.edu/user/kk3n/simplicity/schwarzbic.pdf
here's a more pedagogical derivation by my friend Harish Bhat:
http://nscs00.ucmerced.edu/~nkumar4/BhatKumarBIC.pdf
David Mackay, in his PhD thesis (1991) interpreted the Bayes
Factor as a function of the data, and suggested a curve which
explained the Bayesian approach to model selection graphically:
http://authors.library.caltech.edu/13792/1/MACnc92a.pdf
Here's a great paper about "Bayesian Occam's razor" based on
actually computing and plotting p(D|M) -- the curve you get when
you integrate over parameters as a function of the data.
http://mlg.eng.cam.ac.uk/zoubin/papers/05occam/occam.pdf
This whole approach can also be used for causality. The idea is
to find p(D|M) for different choices of M (model) where every
model is a possible causal graph of which variables influence
which other variables. To make this something you can compute
you have to choose a particular case. Heckerman illustrated how
this can be done with categorical variables in which case all
the relations among the random variables are conditional
probability tables, over which one integrates to compute Bayes
factors. His 1995 tutorial is awesome and oft-cited:
http://research.microsoft.com/pubs/69588/tr-95-06.pdf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment