JohannesBuchner/methodchoice.rst

## methodchoice.rst

      
    Raw
  

              methodchoice.rst
            
          
    Method decision tree


Write down your problem mathematically/statistically; stripping out astronomy-specific details from your model (e.g. pre-compute redshifts, weights, etc.)
Ideally, shorten and simplify it down to primitives of math and statistics, which can be implemented in any language.

For the following decision tree, keep in mind that typically, one first thinks

I just want to quickly find the best fit on this.
Oh, I actually need errors on that! (parameter estimation)
Oh, that problem actually has some structure / degeneracies / multiple solutions
Oh, I want to extend the model to more parameters (high/medium dimensional problem)
And now I want to compare to another model. (model comparison)

So, lets think ahead. For exploring a likelihood / parameter space with multiple methods (optimization like levmar, cobyla, minuit, etc; MCMC; differential evolution; MultiNest), jbopt provides a uniform interface in Python.

If you need only parameter estimation, not model comparison:


Parameter space has only one solution

Number of parameters is low (<20)

Use MultiNest -- it frees you from worrying about convergence, and is generally fast

Number of parameters is high (20-200)

Try to write the problem in Stan -- it is very fast and converges quickly.
Alternatively, try emcee with many walkers. Emcee is a bit overhyped, it can not handle non-linear degeneracies well.

Number of parameters is very high (200-10000)

Use Stan -- it is very fast and converges quickly. You have to write your problem in the Stan language. Export all necessary, pre-computed constants into data arrays.


Possibly have multiple solutions?

Number of parameters is low (<20)

Use MultiNest -- it frees you from worrying about convergence, is generally fast and handles multiple solutions.

Number of parameters is high (20-200)

This is an unsolved problem.
Perhaps you can split the parameter space manually, and analyse each solution.
Cautiously try Stan with multiple chains, or emcee.


Need model comparison


Parameter space has only one solution

Number of parameters is low (<20)

Use MultiNest -- it computes the integral over the parameter space, and handles degeneracies well.

Number of parameters is high (20-200)

This is not a solved problem.
Perhaps PolyChord can be a solution, but that software needs further testing.

Number of parameters is very high (200-10000)

This is an unsolved problem.


Possibly have multiple solutions?

Number of parameters is low (<20)

Use MultiNest -- it computes the integral over the parameter space, is generally fast and handles multiple solutions.

Number of parameters is high (20-200)

This is an unsolved problem.
Perhaps PolyChord can be a solution, but that software needs further testing.

Number of parameters is very high (200-10000)

This is an unsolved problem.


Why not use information criteria like the AIC for model comparison?

AIC is a useful criterion for deciding which model represents or stores the data best (information). It assumes the large-data limit.
BIC approximates the integral in the large-data limit, and is thus not more useful than computing the integral with other methods.
Information criteria like the AIC & BIC do not incorporate uncertainties in the parameters. To include those, replace them with WAIC and WBIC. Use the AICc or WAIC instead of AIC whenever you can. The DIC has been super-seded by the WBIC.
Information criteria based on best-fits are generally unstable in high-dimensional problems.
Information criteria do not convey a probability.

What if I have empirical models, should I use Bayesian model comparison?

No. Empirical models are known to be wrong at time of construction and are only helpers towards building a meaningful physical model.
Use AIC to compare different empirical models, as that compares which "summary of the data" conveys the most information. Keep in mind though that ultimately, the physical model should also be fitted to the data.
Explore different physical models until you find one (or more!) with similar likelihoods as the best-fit empirical model. Use the empirical model to discover where the data differs, and where strong differences in the likelihood originate.
Apply Bayesian model comparison between the physical models.

Why not use my awesome home-made MCMC sampler, it is super-adaptive?

There is no test for convergence, only tests that detect (some forms of) non-convergence. So you can never be certain that your sampler is done.
Nested sampling does not have that problem.

How about likelihood-free methods?

Likelihood-free methods are useful if the model can not be expressed as a computable likelihood, and model comparison is only meaningful when comparing the prediction of two models.
Likelihood-free methods do not convey a probability for comparing models.

Why not use ABC?

ABC has a very limited range of applicability. If it can be applied to your problem, great!

How about Kalman filters?

The formalism of Kalman filters is limited to Gaussian observations, which are serial (independent samples except ordered in time). If that is your problem, go for it!
Kalman filters are a special solution to a more general problem: hierarchical Bayesian models. These can be written down and solved efficiently by tools like JUGS and Stan.

How about JUGS?

JUGS is a Gibbs Sampler, performing MCMC. It requires to write the problem in a custom language (BUGS, similar to Stan, OpenBUGS)
Stan typically shows better performance, because it automatically also takes advantage of derivatives.

How about Importance Sampling?

The accuracy of importance sampling strongly depends on the chosen proposal function, and can vary by orders of magnitude.
Importance Sampling is inefficient in high-dimensional problems.

How about Variational Bayes / Population Monte Carlo?

These methods can be used for parameter estimation. Parameter space integration can be performed massively parallel with importance sampling (but see above).
Population Monte Carlo (PMC) requires a good understanding of the underlying distribution. It can easily misbehave in multi-solution and medium-dimensional problems. Probably you need to hand-tune the algorithm to your problem.
Variational Bayes requires a good initial guess of the underlying distribution. It performs slightly better than PMC, but also requires verification of the results, as discovery of peaks may be fluky.
The performance of PMC/Variational Bayes limits them to low-dimensional problems (~20).