This is a simple NLTK Python script which uses N-grams to construct phrases from a generative language model trained on the King James Bible.
This includes a test of the new GMM routines in https://github.com/bthirion/scikit-learn/tree/gmm-fixes
By changing the line
GMM = mixture.GMM
at the top of the file, we can plot the BIC and AIC for each variant of GMM. Standard GMM works beautifully: it settles in on 3 components, which are a good description of the data. DPGMM and VBGMM produce some unexpected results.