Last active
November 16, 2018 15:48
-
-
Save Prtfw/e7217df2a75fb62249d312e74f886eba to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Workshop 8: \n", | |
"## Approximate Likelihood guides for MH inference: An abbreviation model example" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"MH:\n", | |
" Metropolis Hastings algorithm high level summary:\n", | |
" - a specific type of MCMC (Markovchain Monte Carlo)\n", | |
" - used to sample from a intractable posterior via a simpler proposal distribution\n", | |
" - guaranteed to converge to true posterior in the long run \n", | |
" - monte carlo: generate random numbers\n", | |
" - markov chain: the next state only depends on the current state\n", | |
" - acceptance criteria: \n", | |
" - correct for the fact that proposal distribution has no relation to the exact posterior (the distribution we care about but can't compute)\n", | |
" - the proposal distribution simply proposes new sites/area to sample\n", | |
" - we simply want to accept more samples from high density region of the posterior \n", | |
" - a.k.a in proportion to the posterior's density\n", | |
"\n", | |
"For further reading:\n", | |
"MH example with animation:\n", | |
"https://blog.stata.com/2016/11/15/introduction-to-bayesian-statistics-part-2-mcmc-and-the-metropolis-hastings-algorithm/\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from hyphenate import hyphenate_word \n", | |
"import matplotlib.pyplot as plt\n", | |
"%matplotlib inline\n", | |
"import math\n", | |
"import Levenshtein\n", | |
"import numpy as np" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The following demonstrates a simple model for abbreviation. It assumes a naive model of abbreviation with two key components:\n", | |
"\n", | |
"* truncation: Probabilistically drop all but the first syllable. This relies on an external module that attempts to break up words by syllable (eg. 'foobar' -> 'foo-bar').\n", | |
"* letter dropping: Drop letters from a word to create an abbreviation. Vowels and consonants could be dropped at different rates.\n", | |
"\n", | |
"This is a naive (wide) model, additional components could be added to make a better abbreviate model.\n", | |
"\n", | |
"The goal here is to demonstrate the use of approximiate likelihood functions to guide MH inference." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"'trun'" | |
] | |
}, | |
"execution_count": 6, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"def truncate(word):\n", | |
" \"\"\"\n", | |
" A function for truncating a word, returning only the first syllable\n", | |
" \"\"\"\n", | |
" return(hyphenate_word(word)[0])\n", | |
"\n", | |
"# small illustrative example\n", | |
"truncate('truncate') " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"[False, 0, 1, 0]\n", | |
"lae\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"<particle.core.Trace at 0x106224208>" | |
] | |
}, | |
"execution_count": 11, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"class Drop(Dist):\n", | |
" def __init__(self, word):\n", | |
" \"\"\"\n", | |
" A model of for the dropping of letters from a word.\n", | |
" \"\"\"\n", | |
" self.vowels = ['a', 'e', 'i', 'o', 'u']\n", | |
" self.word = word\n", | |
" \n", | |
" def concat(self, drop_idx):\n", | |
" \"\"\"\n", | |
" Take the letters in word where the corresponding element in drop_idx\n", | |
" is False, and concatenate them into one string.\n", | |
" \"\"\"\n", | |
" joined = \"\".join([self.word[i] for i in range(len(drop_idx)) if not drop_idx[i]])\n", | |
" return joined\n", | |
"\n", | |
" def run(self, env): \n", | |
" word = self.word\n", | |
" ln = len(word)\n", | |
" candidate_rates_vowels = [0, .5, .9]\n", | |
" # Feel free to experimented with having different drop rates\n", | |
" # as in most real examples constants are kept more often than vowels\n", | |
" # candidate_rates_consonant = [0, .1, 0.3]\n", | |
" vowel_drop_rate = env.bind(\"vowel_drop_rate\", UniformDiscrete(candidate_rates_vowels))\n", | |
" consonant_drop_rate = env.bind(\"consonant_drop_rate\", UniformDiscrete(candidate_rates_vowels))\n", | |
" drop_letters = [False]\n", | |
" \n", | |
" for i in range(1, ln):\n", | |
" if word[i] in self.vowels:\n", | |
" drop_letters.append(env.bind('drop_{}'.format(i), Flip(vowel_drop_rate)))\n", | |
" else:\n", | |
" drop_letters.append(env.bind('drop_{}'.format(i), Flip(consonant_drop_rate)))\n", | |
" print(drop_letters)\n", | |
" final = self.concat(drop_letters)\n", | |
" print(final)\n", | |
"\n", | |
" return final\n", | |
" \n", | |
"# a small illustrative example of the word lane being abbreviated\n", | |
"Drop('lane').generate()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"vocab = ['dog', 'cat', 'mice', 'utah', 'sam'] \n", | |
"# simpler words as a identity test (to see if anything is wrong with the code)\n", | |
"vocab = ['chicken', 'chucking', 'cooking', 'chickling', 'crackling'] # more realistic words\n", | |
"prior = {word: 1 for word in vocab} # make it a uniform categorical over candidate words" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"I give these a uniform prior, and condition on the abbreviation being 'ckn'." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The `Abbreviator` model makes use of the truncate function and the `Drop` model to create a generative model of an abbreviation. It takes a `prior` argument, which is a dictionary where keys are possible words and values are weights.\n", | |
"\n", | |
"This demo model assumes we have a vocab of possible words to abbreviate before we start but we could generalize down the road by incorporating a probability of the abbreviation coming from a unseen word and how those abbreviations would be generated etc... (try this your self for some extra fun)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"[False, 1, 0, 0, 1]\n", | |
"cic\n" | |
] | |
} | |
], | |
"source": [ | |
"class Abbreviator(Dist):\n", | |
" def __init__(self, prior, test):\n", | |
" self.prior = prior\n", | |
" self.test = test\n", | |
" \n", | |
" def run(self, env): \n", | |
" # choose a word based on discrete prior\n", | |
" word = env.bind(\"word\", Discrete(self.prior))\n", | |
" # the simplist abbreviation is the word itself\n", | |
" abbreviation = word\n", | |
" truncation_rate = 0.5 # 50% of the time we only keep the first syllable\n", | |
" is_truncated = env.bind('truncate', Flip(truncation_rate)) # Flip draws from a binomial distribution\n", | |
" if is_truncated:\n", | |
" abbreviation = truncate(abbreviation) \n", | |
" # get the abbreviation after turning off characters\n", | |
" abbreviation = env.inline(\"drop\", Drop(abbreviation)).return_value\n", | |
" # add result to state (the env object)\n", | |
" \n", | |
" ''' \n", | |
" Note that if we use a Dirac (approximately equal to a gaussian with sigma/standard deviation=>0) \n", | |
" to bind the abbreviation.\n", | |
" MH is very unlikely to sample and \"stay in\" the high density region. (Feel free to test it out).\n", | |
" An good analogy is \"finding a needle in a hay stack\"\n", | |
" '''\n", | |
" # print('abv',abbreviation)\n", | |
" # abbreviation = env.bind(\"abbreviation\", Dirac(abbreviation))\n", | |
" \n", | |
" \n", | |
" # Therefore we use a softened/relaxed approximate likelihood function by using the Levenshtein distance. \n", | |
" # Which helps MH converge. \n", | |
" # We effectively get to the right answer faster by solving a slightly different problem.\n", | |
" # We are essentially spreading out the pdf from a very tall pointy distribution into a fatter shorter version\n", | |
" # The key thing is the peak of the distribution remains the same.\n", | |
"\n", | |
" env.add_logmass('abbreviation_likelihood', -10 * Levenshtein.distance(abbreviation, self.test))\n", | |
" return abbreviation\n", | |
" \n", | |
"part=Abbreviator(prior,' ').generate() # this generates 1 sample according to the model, here is an illustration" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"As an example of inference, consider the challenge of disambiguating the abbrevation \"cken\" into one of the following words.\n", | |
"\n", | |
"\n", | |
"the MH solver is doing the following:\n", | |
"- generate an example from the probabilistic model\n", | |
"- score how similar this generated abbrevation is to the data point \"cken\" using the Levenstein distance\n", | |
"- decide to keep or reject the sample in porportion to the probability density\n", | |
"- choose a slightly different set of random variables and repeat\n", | |
"- at the end of the 1000 samples, we have a fairly low fidelity approximation of the actual distribution\n", | |
"- an analogy is this toy on amazon: [link](https://www.amazon.com/Glantop-Sculpture-Impression-Office-Desktop/dp/B0743D2WKH/ref=sr_1_6?ie=UTF8&qid=1542074924&sr=8-6&keywords=push+pin+toy)\n", | |
"- we are approximating the real surface of the distribution with 1000 \"pins\"\n", | |
" \n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 57, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"[False, 1, 1, 1, 1, 1, 1]\n", | |
"c\n", | |
"[False, 1, 1, 1, 0, 1, 1]\n", | |
"ci\n", | |
"[False, 1, 1, 1, 1, 0, 0]\n", | |
"cng\n", | |
"[False, 0, 1, 1, 1, 0, 0]\n", | |
"cong\n", | |
"[False, 0, 1, 1, 0, 1, 0, 1]\n", | |
"chkn\n", | |
"[False, 0, 1, 1, 1, 1, 0, 1]\n", | |
"chn\n", | |
"[False, 0, 1, 1, 1, 1, 1, 1]\n", | |
"ch\n", | |
"[False, 0, 1, 1, 1]\n", | |
" ... \n", | |
"ck\n", | |
"[False, 1, 1, 1, 1, 0, 0]\n", | |
"cen\n", | |
"[False, 1, 0, 1, 0, 0, 0]\n", | |
"ciken\n", | |
"[False, 1, 1, 1, 0, 0, 1]\n", | |
"cke\n", | |
"[False, 1, 0, 1, 0, 0, 0]\n", | |
"ciken\n", | |
"[False, 1, 1, 1, 0, 0, 1]\n", | |
"cke\n", | |
"[False, 1, 0, 1, 0, 0, 0]\n", | |
"ciken\n", | |
"[False, 1, 1, 1, 1, 0, 0]\n", | |
"cen\n", | |
"[False, 0, 1, 0, 0, 0, 0]\n", | |
"chcken\n", | |
"[False, 1, 0, 1, 0, 0, 0]\n", | |
"ciken\n", | |
"[False, 1, 1, 1, 0, 0, 0]\n", | |
"cing\n", | |
"[False, 0, 1, 1, 0, 0, 0]\n", | |
"chken\n", | |
"[False, 1, 0, 1, 0, 0, 0]\n", | |
"ciken\n", | |
"[False, 1, 1, 1, 0, 0, 0, 0]\n", | |
"cking\n", | |
"[False, 1, 1, 0, 0, 0, 0]\n", | |
"ccken\n", | |
"[False, 0, 1, 1, 0, 0, 0]\n", | |
"chken\n", | |
"[False, 1, 1, 1, 0, 0, 0]\n", | |
"cing\n", | |
"[False, 0, 1, 1, 0, 0, 0]\n", | |
"chken\n", | |
"[False, 1, 1, 1, 1, 0, 0]\n", | |
"cen\n", | |
"[False, 1, 1, 1, 0, 0, 0, 0, 1]\n", | |
"cklin\n", | |
"[False, 1, 1, 0, 0, 0, 0]\n", | |
"ccken\n", | |
"[False, 1, 1, 1, 0, 1, 0]\n", | |
"ckn\n", | |
"[False, 1, 0, 1, 0, 0, 0]\n", | |
"ciken\n", | |
"[False, 1, 0, 1, 0, 0, 0]\n", | |
"ciken\n", | |
"[False, 1, 1, 1, 0, 0, 1]\n", | |
"cke\n", | |
"[False, 1, 1, 1, 1, 0, 0]\n", | |
" ... \n", | |
"cen\n", | |
"[False, 1, 1, 0, 1, 0, 0]\n", | |
"ccen\n", | |
"[False, 1, 1, 1, 0, 0, 0]\n", | |
"cing\n", | |
"[False, 0, 1, 1, 0, 0, 0]\n", | |
"chken\n", | |
"[False, 1, 1, 1, 0, 0, 1]\n", | |
"cke\n", | |
"[False, 1, 1, 1, 0]\n", | |
"ck\n", | |
"[False, 1, 0, 1, 0, 0, 0]\n", | |
"ciken\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"{'cooking': 0.003, 'chucking': 0.016, 'chicken': 0.981}" | |
] | |
}, | |
"execution_count": 57, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"cond_prog = Abbreviator(prior, 'cken') \n", | |
"'''\n", | |
"Here we are saying, based on the fact that we saw the abbreviation 'cken'. \n", | |
"Re-apply the abbreviatior model with the same uniform prior over words but take into the data point we just saw.\n", | |
"\n", | |
"Q: what if we change this to 'chn', how would you expect the results to differ?\n", | |
"'''\n", | |
"\n", | |
"\n", | |
"solver = MH(cond_prog, num_samples=1000)\n", | |
"result = solver.run()\n", | |
"\n", | |
"\n", | |
"\n", | |
"# uncomment line below to check if logmasses are -INF, or not increasing which indicates MH is just randomly bouncing\n", | |
"# print([res.logmass for res in result]) \n", | |
"result.marginal('word')\n", | |
"'''\n", | |
"Did you have an a guess to the Q on line 1? \n", | |
"I hope it was something along the lines of:\n", | |
"\"the result would be less certain because it could be an abbreviation of cooking, chucking and chicken\"\n", | |
"When reasoning about probabilistic programs it's helpful to ask yourself:\n", | |
"\"How would I as a human approach this problem; what's my intuition?\"\n", | |
"'''\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The following figure illustrates the total_logmass of the states being explored. This helps us diagnose if the MH algorithm is converging. (We typically expect to see a rising curve that flattens out.)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 59, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[<matplotlib.lines.Line2D at 0x107462080>]" | |
] | |
}, | |
"execution_count": 59, | |
"metadata": {}, | |
"output_type": "execute_result" | |
}, | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAD8CAYAAAB0IB+mAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAFklJREFUeJzt3XuQXOV55/HvoxmNbhCuAnGTJYLABgoTe8KC13bhNRjiJbCJ7S1cqYK1yarYkHWSypYdSqk4mxS1SeE42cRZB+0tm4TEyYK5BNvhtg5xZYMdaZGJuBlxMwIZhAEJkNBopp/9o0/3tHHPjGZOj3r6nO+nqqv7XOac9/Tp+fXb73n77chMJEnVt6jfBZAkHRwGviTVhIEvSTVh4EtSTRj4klQTBr4k1YSBL0k1YeBLUk0Y+JJUE8P9LkCno48+OtesWdPvYkjSQNm8efNLmblypvUWVOCvWbOGTZs29bsYkjRQIuKZA1nPJh1JqgkDX5JqwsCXpJow8CWpJgx8SaoJA1+SasLAl6SaWFD98KW6efR7u/nqgzv6XYxSliwe4t+8Zw0rlgzz7Mt7uGnzdgbxp1M/dMYqzjzhsFLb2Dc+wR///dO8sW981n976qpDueSs40vtfyYGvtRHG+97ki8/8BwR/S7J3LRy/ZRjDuGiM1bxZ/c/ww1/9+TAHU8mbNv5Ov/lZ95dajubn3mF//S1RwFm/RxcctbxBr5UZfvGG/zoyhXc+8vn97soc/L0S29w/uf+tl2jfX3fOEcfMsKmX72wzyWbnX/1h3/P6/smSm/njWIbd/z795b+tDAfbMOX+mj/RIPFQ4P7b7h8ZAiAPWPNoNs7NsHSxUP9LNKcLFs8xN6x2TfDvNWeYhsL9TkY3FeaVAGDHvjLisDfWwT+nrGJ9pvAIFk+MtR+0yqj9Tws1OdgcF9pUgWMN5LhoQFr8O6wfKTZKtwKyz37J1g2MngtxctGhtphXcYeA1/SVMbGB7uGP7QoGBlexN79rSadcZYv0OaM6fSshl88D8sWaOCXeiuOiI8Bvw68AzgnMzd1LLsWuAqYAD6VmXeW2ZdUReONZOniwQ18aIblQ8/v4tYHnmPHrjc57dhD+12kWVs+MszuN/dz6wPPldrOlmdfbb4JLtA38bKfvbYCPw3c0DkzIk4HLgfOAI4H7omIUzOz/FuoVCHjEw0WLx28JpBOxx22jG88/hLfePwlAM4/bcbf4VhwjjtsKXvGJvjFv9xSelsnHrGMWKD9Uku90jLzEaDbwV0GfCkz9wFPRcQ24BzgH8rsT6qasYlkeNHCrA0eqJuuPo8XX9vXnj7piGV9LM3crH//yVx85ioaPfi+2NGHjJTfyDyZr6rFCcD9HdPbi3mSOoxPNBgZXpi1wQO1Yskwa5cM9qeUiOBtR63odzHm3YxnKSLuAVZ1WbQhM2+b6s+6zOv63hkR64H1AKtXr56pOFKl7J9oDHwNX4NjxsDPzAvmsN3twEkd0ycCz0+x/Y3ARoDR0dHBG4BDKmH/RA50Lx0Nlvl6pd0OXB4RSyJiLbAO+NY87UsaWM0vXg12k44GR9lumT8F/AGwEvhKRGzJzIsy86GI+CvgYWAcuGYh99DJTB574TXGxhv9LsqcBcFpqw5lZNjaYhkH+7Xw5v6Jgf7ilQZL2V46twC3TLHsOuC6Mts/WO5++AXW/+nmfhejtJ87/0f59MVv73cxBtq9j7zIz/7JpplX7KFDly4+qPtTfQ32pfUeeaHoUvb5f/1ODls2mP98n77pwR/oGqe5eeG1NwH4nY+9k8OXz/9rIQJ+fM2R874fCQx8gPYoeReefuzA1rYOW764J2OB1F3rObzg9GMH9s1fmooNvsDesWZ77fIBHPSpZfnIUHscD83dQh/tUCrDwAf27B9nZHgRQ4sG9+LZ8sXD7bG4NXd79k+weCjsKqlK8lVNs1Y36DW6Xg3vWnd7xyZYNoCjPUoHYnDbMHrolT37B/6ffPnIENtfGef1Ofx4sibt3rt/oJv2pOnU/pX96Pd289fffp6Tjhy8AZ86Hbp0mCd2vsGZn3UU6rJOOeaQfhdBmhe1D/wdu5rd8K44d01/C1LSNR84hXXHDN445AvR2asP73cRpHlR+8BvDen242sHuy/0245awb99/8n9LoakBaz2F20b2Uz8we2fI0kHpvaBX+Q9ixboL9RIUq/UPvDbNXzzXlLF1T7wHYBfUl0Y+EUN3yYdSVVn4BdVfPNeUtUZ+MW9NXxJVVf7wPeiraS6qH3gT3bL7G85JGm+1T7wWzV8v3olqepqH/gtNulIqrraB37DbpmSaqL2gd/ultnfYkjSvDPwHUtHUk3UPvDtlimpLmof+O0+Oga+pIoz8Ns1fBNfUrWVCvyI+FhEPBQRjYgY7Zi/JiL2RsSW4vZH5Ys6P7xoK6kuyv7E4Vbgp4Ebuix7IjPPLrn9edfwoq2kmigV+Jn5CAx2c0jiRVtJ9TCfbfhrI+KBiLgvIt43j/spxeGRJdXFjDX8iLgHWNVl0YbMvG2KP9sBrM7M70fEu4FbI+KMzNzdZfvrgfUAq1evPvCS90j7oq2t+JIqbsbAz8wLZrvRzNwH7Cseb46IJ4BTgU1d1t0IbAQYHR096L84ODke/sHesyQdXPPSpBMRKyNiqHh8MrAOeHI+9lVWo2G3TEn1ULZb5k9FxHbgPOArEXFnsej9wIMR8W3gJuDqzHy5XFHnh4MjS6qLsr10bgFu6TL/ZuDmMts+WOyWKaku/Kat37ySVBO1D/wWL9pKqrraB37DsXQk1UTtA98fMZdUF7UP/Ea7Cd/El1RttQ98x9KRVBcGvmPpSKoJA9+xdCTVhIHvRVtJNVH7wG9ftLVNR1LF1T7wWxdtreFLqrraB741fEl1UfvAbzfiS1LF1T7wG2lzjqR6qH3gJ2lzjqRaMPCt4UuqidoHfiP90pWkeqh94DebdPpdCkmafwZ+Oo6OpHow8DNt0pFUC7UPfLtlSqqL2gd+s0nHxJdUfQa+F20l1YSBn9iCL6kWDPxMFtmIL6kGhvtdgINh+yt7uPeRF9u/btXpkR2vWcOXVAulAj8irgd+EhgDngA+kZmvFsuuBa4CJoBPZeadJcs6a3vHJvj6Yy/yv/7v03zzqZenXO+M43/kIJZKkvqjbA3/buDazByPiN8GrgU+ExGnA5cDZwDHA/dExKmZOVFyf7Ny+7ef4zM3/xMAHz9nNZ++6LSu6x2ytBYfdCTVXKmky8y7OibvBz5aPL4M+FJm7gOeiohtwDnAP5TZ32ztHWu+v9zyc+/hzBMOY/FQ7S9ZSKqxXibgJ4GvFY9PAJ7tWLa9mHdQtVrs1xy1wrCXVHsz1vAj4h5gVZdFGzLztmKdDcA4cGPrz7qs3/WnpSJiPbAeYPXq1QdQ5AOX7Z8v7OlmJWkgzRj4mXnBdMsj4krgEuCDOdkNZjtwUsdqJwLPT7H9jcBGgNHR0Z7+3mBrY46VI0klm3Qi4mLgM8ClmbmnY9HtwOURsSQi1gLrgG+V2ddctN5/wtYcSSrdS+cLwBLg7mI8mvsz8+rMfCgi/gp4mGZTzzUHu4dOJ+v3klS+l84p0yy7DriuzPbLmmzDN/IlqdKNHVm04hv3klT1wLeXjiS1VTvwi3t76UhSxQO/0eqlY95LUrUDv8vgmJJUW5UO/BZr+JJU8cBvf/HKNnxJqnrgN++t4UtS1QO/uDfvJanqge83bSWprdqB7zdtJamt2oFvG74ktVU78It7m3QkqeKB7zevJGlSpQM/sTlHklqqHfjpBVtJaql24JO230tSodqBbw1fktqqHfjYhi9JLdUO/HTgNElqqXbgY5uOJLVUOvDNe0maVOnAT2CRjfiSBFQ98DO9aCtJhYoHvk06ktRS7cDHgdMkqaVU4EfE9RHxaEQ8GBG3RMThxfw1EbE3IrYUtz/qTXFnxxq+JE0qW8O/GzgzM88CvgNc27Hsicw8u7hdXXI/c2K3TEmaVCrwM/OuzBwvJu8HTixfpN6xhi9Jk3rZhv9J4Gsd02sj4oGIuC8i3tfD/RywZi8dI1+SAIZnWiEi7gFWdVm0ITNvK9bZAIwDNxbLdgCrM/P7EfFu4NaIOCMzd3fZ/npgPcDq1avndhRTcCwdSZo0Y+Bn5gXTLY+IK4FLgA9mNn9iKjP3AfuKx5sj4gngVGBTl+1vBDYCjI6O9vQnqmzSkaRJZXvpXAx8Brg0M/d0zF8ZEUPF45OBdcCTZfY1F46HL0mTZqzhz+ALwBLg7iJY7y965Lwf+I2IGAcmgKsz8+WS+5o1a/iSNKlU4GfmKVPMvxm4ucy2e8E2fEmaVO1v2iZYx5ekpkoHPjh4miS1VDrwbcOXpEnVD3wTX5KAqgc+6W/aSlKh2oFvDV+S2qod+NiGL0kt1Q789AdQJKml2oFPT4fmkaSBVunAxzZ8SWqrdOA7tIIkTap24KfdMiWppdqBjzV8SWqpduAnLDLxJQmoeuBjP3xJaql24Dt6miS1VSbwn/n+G1z4+fv4wOf+ln98uvnjWtbwJWlSZQL/8Rde5/EXX+epl95gy3dfbc70m7aS1FaZwO/8Tu2b+yeKeWkNX5IK1Qn8nIz8vUXgNxp2y5SkluoEfsfjN/c3inl+8UqSWqoT+F1q+I6HL0mTKhT4k4/3tdvwJUkt1Qn8jsc/WMO3ii9JUKHAbxRV/KWLF/HAd1/lv/7dk2AvHUlqq0zgt5p0Pvj2Y9kzNs71dz1mG74kdSgd+BHxmxHxYERsiYi7IuL4Yn5ExO9HxLZi+bvKF3dqrSadX7pwHZ9871rGxhtMZBr4klToRQ3/+sw8KzPPBu4Afq2Y/xPAuuK2HvhiD/Y1pVYvnYhg+cgQAHvGJuyWKUmF0oGfmbs7JlcwWdm+DPiTbLofODwijiu7v6nL0bwPYNnIMAB7xsat4UtSYbgXG4mI64ArgF3AB4rZJwDPdqy2vZi3oxf7fKvWD5ZHBMsXFzX8fRMcurQnhyhJA++AavgRcU9EbO1yuwwgMzdk5knAjcDPt/6sy6Z+qGt8RKyPiE0RsWnnzp1zPY52DX9R0G7SeWNs3Ku2klQ4oOpvZl5wgNv7c+ArwGdp1uhP6lh2IvB8l21vBDYCjI6Ozvm7Uo12k06wtAj8V/fs57jDls11k5JUKb3opbOuY/JS4NHi8e3AFUVvnXOBXZk5L8050HnRFtYctYLlI0PsG29wyjGHzNcuJWmg9KKB+7ci4jSgATwDXF3M/yrwYWAbsAf4RA/2NaXOjwZrj17B1l+/CLBFR5JaSgd+Zn5kivkJXFN2+wdekObdokXxA/eSpKbKfNO2NbSCMS9J3VUm8FtNOjbhSFJ31Qn8drdME1+SuqlO4GOTjiRNpzKB32i36fS1GJK0YFUm8GlftDXxJambygR+q4Jvb0xJ6q4ygd9oTA6eJkn6YZUJfJvwJWl61Qn81uBpJr4kdVWdwC/ubdKRpO6qE/gdo2VKkn5YhQK/eW/eS1J31Qn8olHHoRUkqbvKBH7Di7aSNK3KBH52/MShJOmHVSfw8aKtJE2nOoFvk44kTatCge/gaZI0nQoFfvPeGr4kdVedwC/u7ZYpSd1VJ/D94pUkTasygd9waAVJmlZlAt/B0yRpepUJfDKt3UvSNCoT+I20/V6SplMq8CPiNyPiwYjYEhF3RcTxxfzzI2JXMX9LRPxab4o7tSRtzpGkaZSt4V+fmWdl5tnAHUBnsH8jM88ubr9Rcj8zSmv4kjStUoGfmbs7Jlcwee30oEvsgy9J0xkuu4GIuA64AtgFfKBj0XkR8W3geeA/ZOZDZfc1lb1jEzz8/G7GJhrztQtJGngz1vAj4p6I2NrldhlAZm7IzJOAG4GfL/7s/wFvy8x3An8A3DrN9tdHxKaI2LRz5845HcRjL7zGfd+Z299KUl3MGPiZeUFmntnldttbVv1z4CPF3+zOzNeLx18FFkfE0VNsf2Nmjmbm6MqVK+d0EIuHbMqRpJmU7aWzrmPyUuDRYv6qKLrMRMQ5xX6+X2Zf0xkZqkzvUkmaN2Xb8H8rIk4DGsAzwNXF/I8C/y4ixoG9wOXZGr94HowMG/iSNJNSgZ+ZH5li/heAL5TZ9mwstoYvSTOqRFIa+JI0s0okpU06kjSzSiSlF20laWaVSEq7ZUrSzCoR+MPW8CVpRialJNWEgS9JNWHgS1JNlB4tc6H41X/5jn4XQZIWtMoE/s++7+R+F0GSFjSbdCSpJgx8SaoJA1+SasLAl6SaMPAlqSYMfEmqCQNfkmrCwJekmoh5/KnZWYuInTR/G3eujgZe6lFxBkHdjhc85rrwmGfnbZm5cqaVFlTglxURmzJztN/lOFjqdrzgMdeFxzw/bNKRpJow8CWpJqoW+Bv7XYCDrG7HCx5zXXjM86BSbfiSpKlVrYYvSZpCJQI/Ii6OiMciYltE/Eq/y9MrEXFSRHw9Ih6JiIci4heK+UdGxN0R8Xhxf0QxPyLi94vn4cGIeFd/j2BuImIoIh6IiDuK6bUR8c3ieP8yIkaK+UuK6W3F8jX9LHcZEXF4RNwUEY8W5/u8GpznXype11sj4i8iYmnVznVE/I+IeDEitnbMm/V5jYgri/Ufj4gr51qegQ/8iBgC/hD4CeB04OMRcXp/S9Uz48AvZ+Y7gHOBa4pj+xXg3sxcB9xbTEPzOVhX3NYDXzz4Re6JXwAe6Zj+beB3i+N9BbiqmH8V8EpmngL8brHeoPrPwN9k5tuBd9I8/sqe54g4AfgUMJqZZwJDwOVU71z/MXDxW+bN6rxGxJHAZ4F/BpwDfLb1JjFrmTnQN+A84M6O6WuBa/tdrnk61tuAC4HHgOOKeccBjxWPbwA+3rF+e71BuQEnFv8E/wK4AwiaX0YZfuv5Bu4EziseDxfrRb+PYQ7H/CPAU28te8XP8wnAs8CRxbm7A7ioiucaWANsnet5BT4O3NAx/wfWm81t4Gv4TL5wWrYX8yql+Aj7Y8A3gWMzcwdAcX9MsVoVnovfAz4NNIrpo4BXM3O8mO48pvbxFst3FesPmpOBncD/LJqy/ltErKDC5zkznwM+B3wX2EHz3G2m+ucaZn9ee3a+qxD40WVepboeRcQhwM3AL2bm7ulW7TJvYJ6LiLgEeDEzN3fO7rJqHsCyQTIMvAv4Ymb+GPAGkx/zuxn44y6aJC4D1gLHAytoNmm8VdXO9XSmOsaeHXsVAn87cFLH9InA830qS89FxGKaYX9jZn65mP1CRBxXLD8OeLGYP+jPxT8HLo2Ip4Ev0WzW+T3g8IgYLtbpPKb28RbLDwNePpgF7pHtwPbM/GYxfRPNN4CqnmeAC4CnMnNnZu4Hvgy8h+qfa5j9ee3Z+a5C4P8jsK64uj9C88LP7X0uU09ERAD/HXgkMz/fseh2oHWl/kqabfut+VcUV/vPBXa1PjoOgsy8NjNPzMw1NM/j/8nMnwG+Dny0WO2tx9t6Hj5arD9wtb7M/B7wbEScVsz6IPAwFT3Phe8C50bE8uJ13jrmSp/rwmzP653AhyLiiOKT0YeKebPX7wsaPboo8mHgO8ATwIZ+l6eHx/Vemh/dHgS2FLcP02y7vBd4vLg/slg/aPZYegL4J5o9IPp+HHM89vOBO4rHJwPfArYB/xtYUsxfWkxvK5af3O9ylzjes4FNxbm+FTii6ucZ+I/Ao8BW4E+BJVU718Bf0LxGsZ9mTf2quZxX4JPFsW8DPjHX8vhNW0mqiSo06UiSDoCBL0k1YeBLUk0Y+JJUEwa+JNWEgS9JNWHgS1JNGPiSVBP/HxEoBxSaek2tAAAAAElFTkSuQmCC\n", | |
"text/plain": [ | |
"<Figure size 432x288 with 1 Axes>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"logmasses = list(map(lambda x: x, result.get_total_logmasses())) \n", | |
"# logmasses = log of probability at iteration x (x-axis), the highest possible value of log probabilty is 1\n", | |
"# this illustrates how \"likely is this sampled data point assuming my model is correct\"\n", | |
"plt.plot(logmasses)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Other Practical Notes:\n", | |
"- MH results are dependent on initiation (for a given number of iteration)\n", | |
" => combine results from multiple initializations or\n", | |
" => set a burn in period to reduce the influence of initial values\n", | |
" \n", | |
"- MH results are not i.i.d (state at t+1 is dependent on state at t)\n", | |
" => set skip variable to reduce auto-correlation (Skip of s means we keep every s data point sampled).\n", | |
" => verify with correlogram\n", | |
" \n", | |
"- MH may not be suitable for bi-modal / multi-modal distributions.\n", | |
" => intuitivly, if MH finds a high density area, there is little incentive for it to bounce/sample through a low density area to another peak, this is similar to getting stuck in a bowl/local minima in gradient decent in deep learning\n", | |
" => Hamiltonian Monte Carlo might be a better choice.\n", | |
"\n", | |
"- MH may be slow to converge for high dimensional posteriors. Consider Gibbs-sampling if not multimodal." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python classifier2", | |
"language": "python", | |
"name": "classifers2" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.5" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment