chrishwiggins/a.rb

## a.rb
The Bayesian approach to model selection is a subject you'll
like.  The basic idea is to compute the "Bayes Factor":
http://en.wikipedia.org/wiki/Bayes_factor .
As the page says "Bayesian inference has been put forward as a
theoretical justification for and generalization of Occam's
razor".
( http://en.wikipedia.org/wiki/Occam%27s_razor )

The Bayes factor can be approximated under sum assumptions,
leading to a simple penalized maximum likelihood called the
"BIC" (basically chi-squared, which goes as the model gets more
complicated, plus a term linear in the number of parameters, to
compensate and produce a "best" value of the number of free
parameters):
http://en.wikipedia.org/wiki/Bayesian_information_criterion

The paper it comes from is very short!
http://www.andrew.cmu.edu/user/kk3n/simplicity/schwarzbic.pdf
here's a more pedagogical derivation by my friend Harish Bhat:
http://nscs00.ucmerced.edu/~nkumar4/BhatKumarBIC.pdf

David Mackay, in his PhD thesis (1991) interpreted the Bayes
Factor as a function of the data, and suggested a curve which
explained the Bayesian approach to model selection graphically:
http://authors.library.caltech.edu/13792/1/MACnc92a.pdf

Here's a great paper about "Bayesian Occam's razor" based on
actually computing and plotting p(D|M) -- the curve you get when
you integrate over parameters as a function of the data.
http://mlg.eng.cam.ac.uk/zoubin/papers/05occam/occam.pdf

This whole approach can also be used for causality. The idea is
to find p(D|M) for different choices of M (model) where every
model is a possible causal graph of which variables influence
which other variables. To make this something you can compute
you have to choose a particular case. Heckerman illustrated how
this can be done with categorical variables in which case all
the relations among the random variables are conditional
probability tables, over which one integrates to compute Bayes
factors.  His 1995 tutorial is awesome and oft-cited:
http://research.microsoft.com/pubs/69588/tr-95-06.pdf
	The Bayesian approach to model selection is a subject you'll
	like. The basic idea is to compute the "Bayes Factor":
	http://en.wikipedia.org/wiki/Bayes_factor .
	As the page says "Bayesian inference has been put forward as a
	theoretical justification for and generalization of Occam's
	razor".
	( http://en.wikipedia.org/wiki/Occam%27s_razor )

	The Bayes factor can be approximated under sum assumptions,
	leading to a simple penalized maximum likelihood called the
	"BIC" (basically chi-squared, which goes as the model gets more
	complicated, plus a term linear in the number of parameters, to
	compensate and produce a "best" value of the number of free
	parameters):
	http://en.wikipedia.org/wiki/Bayesian_information_criterion

	The paper it comes from is very short!
	http://www.andrew.cmu.edu/user/kk3n/simplicity/schwarzbic.pdf
	here's a more pedagogical derivation by my friend Harish Bhat:
	http://nscs00.ucmerced.edu/~nkumar4/BhatKumarBIC.pdf

	David Mackay, in his PhD thesis (1991) interpreted the Bayes
	Factor as a function of the data, and suggested a curve which
	explained the Bayesian approach to model selection graphically:
	http://authors.library.caltech.edu/13792/1/MACnc92a.pdf

	Here's a great paper about "Bayesian Occam's razor" based on
	actually computing and plotting p(D\|M) -- the curve you get when
	you integrate over parameters as a function of the data.
	http://mlg.eng.cam.ac.uk/zoubin/papers/05occam/occam.pdf

	This whole approach can also be used for causality. The idea is
	to find p(D\|M) for different choices of M (model) where every
	model is a possible causal graph of which variables influence
	which other variables. To make this something you can compute
	you have to choose a particular case. Heckerman illustrated how
	this can be done with categorical variables in which case all
	the relations among the random variables are conditional
	probability tables, over which one integrates to compute Bayes
	factors. His 1995 tutorial is awesome and oft-cited:
	http://research.microsoft.com/pubs/69588/tr-95-06.pdf