Skip to content

Instantly share code, notes, and snippets.

@Prtfw
Last active November 16, 2018 15:48
Show Gist options
  • Save Prtfw/e7217df2a75fb62249d312e74f886eba to your computer and use it in GitHub Desktop.
Save Prtfw/e7217df2a75fb62249d312e74f886eba to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Workshop 8: \n",
"## Approximate Likelihood guides for MH inference: An abbreviation model example"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"MH:\n",
" Metropolis Hastings algorithm high level summary:\n",
" - a specific type of MCMC (Markovchain Monte Carlo)\n",
" - used to sample from a intractable posterior via a simpler proposal distribution\n",
" - guaranteed to converge to true posterior in the long run \n",
" - monte carlo: generate random numbers\n",
" - markov chain: the next state only depends on the current state\n",
" - acceptance criteria: \n",
" - correct for the fact that proposal distribution has no relation to the exact posterior (the distribution we care about but can't compute)\n",
" - the proposal distribution simply proposes new sites/area to sample\n",
" - we simply want to accept more samples from high density region of the posterior \n",
" - a.k.a in proportion to the posterior's density\n",
"\n",
"For further reading:\n",
"MH example with animation:\n",
"https://blog.stata.com/2016/11/15/introduction-to-bayesian-statistics-part-2-mcmc-and-the-metropolis-hastings-algorithm/\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from hyphenate import hyphenate_word \n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"import math\n",
"import Levenshtein\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following demonstrates a simple model for abbreviation. It assumes a naive model of abbreviation with two key components:\n",
"\n",
"* truncation: Probabilistically drop all but the first syllable. This relies on an external module that attempts to break up words by syllable (eg. 'foobar' -> 'foo-bar').\n",
"* letter dropping: Drop letters from a word to create an abbreviation. Vowels and consonants could be dropped at different rates.\n",
"\n",
"This is a naive (wide) model, additional components could be added to make a better abbreviate model.\n",
"\n",
"The goal here is to demonstrate the use of approximiate likelihood functions to guide MH inference."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'trun'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def truncate(word):\n",
" \"\"\"\n",
" A function for truncating a word, returning only the first syllable\n",
" \"\"\"\n",
" return(hyphenate_word(word)[0])\n",
"\n",
"# small illustrative example\n",
"truncate('truncate') "
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[False, 0, 1, 0]\n",
"lae\n"
]
},
{
"data": {
"text/plain": [
"<particle.core.Trace at 0x106224208>"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"class Drop(Dist):\n",
" def __init__(self, word):\n",
" \"\"\"\n",
" A model of for the dropping of letters from a word.\n",
" \"\"\"\n",
" self.vowels = ['a', 'e', 'i', 'o', 'u']\n",
" self.word = word\n",
" \n",
" def concat(self, drop_idx):\n",
" \"\"\"\n",
" Take the letters in word where the corresponding element in drop_idx\n",
" is False, and concatenate them into one string.\n",
" \"\"\"\n",
" joined = \"\".join([self.word[i] for i in range(len(drop_idx)) if not drop_idx[i]])\n",
" return joined\n",
"\n",
" def run(self, env): \n",
" word = self.word\n",
" ln = len(word)\n",
" candidate_rates_vowels = [0, .5, .9]\n",
" # Feel free to experimented with having different drop rates\n",
" # as in most real examples constants are kept more often than vowels\n",
" # candidate_rates_consonant = [0, .1, 0.3]\n",
" vowel_drop_rate = env.bind(\"vowel_drop_rate\", UniformDiscrete(candidate_rates_vowels))\n",
" consonant_drop_rate = env.bind(\"consonant_drop_rate\", UniformDiscrete(candidate_rates_vowels))\n",
" drop_letters = [False]\n",
" \n",
" for i in range(1, ln):\n",
" if word[i] in self.vowels:\n",
" drop_letters.append(env.bind('drop_{}'.format(i), Flip(vowel_drop_rate)))\n",
" else:\n",
" drop_letters.append(env.bind('drop_{}'.format(i), Flip(consonant_drop_rate)))\n",
" print(drop_letters)\n",
" final = self.concat(drop_letters)\n",
" print(final)\n",
"\n",
" return final\n",
" \n",
"# a small illustrative example of the word lane being abbreviated\n",
"Drop('lane').generate()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"vocab = ['dog', 'cat', 'mice', 'utah', 'sam'] \n",
"# simpler words as a identity test (to see if anything is wrong with the code)\n",
"vocab = ['chicken', 'chucking', 'cooking', 'chickling', 'crackling'] # more realistic words\n",
"prior = {word: 1 for word in vocab} # make it a uniform categorical over candidate words"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I give these a uniform prior, and condition on the abbreviation being 'ckn'."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `Abbreviator` model makes use of the truncate function and the `Drop` model to create a generative model of an abbreviation. It takes a `prior` argument, which is a dictionary where keys are possible words and values are weights.\n",
"\n",
"This demo model assumes we have a vocab of possible words to abbreviate before we start but we could generalize down the road by incorporating a probability of the abbreviation coming from a unseen word and how those abbreviations would be generated etc... (try this your self for some extra fun)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[False, 1, 0, 0, 1]\n",
"cic\n"
]
}
],
"source": [
"class Abbreviator(Dist):\n",
" def __init__(self, prior, test):\n",
" self.prior = prior\n",
" self.test = test\n",
" \n",
" def run(self, env): \n",
" # choose a word based on discrete prior\n",
" word = env.bind(\"word\", Discrete(self.prior))\n",
" # the simplist abbreviation is the word itself\n",
" abbreviation = word\n",
" truncation_rate = 0.5 # 50% of the time we only keep the first syllable\n",
" is_truncated = env.bind('truncate', Flip(truncation_rate)) # Flip draws from a binomial distribution\n",
" if is_truncated:\n",
" abbreviation = truncate(abbreviation) \n",
" # get the abbreviation after turning off characters\n",
" abbreviation = env.inline(\"drop\", Drop(abbreviation)).return_value\n",
" # add result to state (the env object)\n",
" \n",
" ''' \n",
" Note that if we use a Dirac (approximately equal to a gaussian with sigma/standard deviation=>0) \n",
" to bind the abbreviation.\n",
" MH is very unlikely to sample and \"stay in\" the high density region. (Feel free to test it out).\n",
" An good analogy is \"finding a needle in a hay stack\"\n",
" '''\n",
" # print('abv',abbreviation)\n",
" # abbreviation = env.bind(\"abbreviation\", Dirac(abbreviation))\n",
" \n",
" \n",
" # Therefore we use a softened/relaxed approximate likelihood function by using the Levenshtein distance. \n",
" # Which helps MH converge. \n",
" # We effectively get to the right answer faster by solving a slightly different problem.\n",
" # We are essentially spreading out the pdf from a very tall pointy distribution into a fatter shorter version\n",
" # The key thing is the peak of the distribution remains the same.\n",
"\n",
" env.add_logmass('abbreviation_likelihood', -10 * Levenshtein.distance(abbreviation, self.test))\n",
" return abbreviation\n",
" \n",
"part=Abbreviator(prior,' ').generate() # this generates 1 sample according to the model, here is an illustration"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As an example of inference, consider the challenge of disambiguating the abbrevation \"cken\" into one of the following words.\n",
"\n",
"\n",
"the MH solver is doing the following:\n",
"- generate an example from the probabilistic model\n",
"- score how similar this generated abbrevation is to the data point \"cken\" using the Levenstein distance\n",
"- decide to keep or reject the sample in porportion to the probability density\n",
"- choose a slightly different set of random variables and repeat\n",
"- at the end of the 1000 samples, we have a fairly low fidelity approximation of the actual distribution\n",
"- an analogy is this toy on amazon: [link](https://www.amazon.com/Glantop-Sculpture-Impression-Office-Desktop/dp/B0743D2WKH/ref=sr_1_6?ie=UTF8&qid=1542074924&sr=8-6&keywords=push+pin+toy)\n",
"- we are approximating the real surface of the distribution with 1000 \"pins\"\n",
" \n"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[False, 1, 1, 1, 1, 1, 1]\n",
"c\n",
"[False, 1, 1, 1, 0, 1, 1]\n",
"ci\n",
"[False, 1, 1, 1, 1, 0, 0]\n",
"cng\n",
"[False, 0, 1, 1, 1, 0, 0]\n",
"cong\n",
"[False, 0, 1, 1, 0, 1, 0, 1]\n",
"chkn\n",
"[False, 0, 1, 1, 1, 1, 0, 1]\n",
"chn\n",
"[False, 0, 1, 1, 1, 1, 1, 1]\n",
"ch\n",
"[False, 0, 1, 1, 1]\n",
" ... \n",
"ck\n",
"[False, 1, 1, 1, 1, 0, 0]\n",
"cen\n",
"[False, 1, 0, 1, 0, 0, 0]\n",
"ciken\n",
"[False, 1, 1, 1, 0, 0, 1]\n",
"cke\n",
"[False, 1, 0, 1, 0, 0, 0]\n",
"ciken\n",
"[False, 1, 1, 1, 0, 0, 1]\n",
"cke\n",
"[False, 1, 0, 1, 0, 0, 0]\n",
"ciken\n",
"[False, 1, 1, 1, 1, 0, 0]\n",
"cen\n",
"[False, 0, 1, 0, 0, 0, 0]\n",
"chcken\n",
"[False, 1, 0, 1, 0, 0, 0]\n",
"ciken\n",
"[False, 1, 1, 1, 0, 0, 0]\n",
"cing\n",
"[False, 0, 1, 1, 0, 0, 0]\n",
"chken\n",
"[False, 1, 0, 1, 0, 0, 0]\n",
"ciken\n",
"[False, 1, 1, 1, 0, 0, 0, 0]\n",
"cking\n",
"[False, 1, 1, 0, 0, 0, 0]\n",
"ccken\n",
"[False, 0, 1, 1, 0, 0, 0]\n",
"chken\n",
"[False, 1, 1, 1, 0, 0, 0]\n",
"cing\n",
"[False, 0, 1, 1, 0, 0, 0]\n",
"chken\n",
"[False, 1, 1, 1, 1, 0, 0]\n",
"cen\n",
"[False, 1, 1, 1, 0, 0, 0, 0, 1]\n",
"cklin\n",
"[False, 1, 1, 0, 0, 0, 0]\n",
"ccken\n",
"[False, 1, 1, 1, 0, 1, 0]\n",
"ckn\n",
"[False, 1, 0, 1, 0, 0, 0]\n",
"ciken\n",
"[False, 1, 0, 1, 0, 0, 0]\n",
"ciken\n",
"[False, 1, 1, 1, 0, 0, 1]\n",
"cke\n",
"[False, 1, 1, 1, 1, 0, 0]\n",
" ... \n",
"cen\n",
"[False, 1, 1, 0, 1, 0, 0]\n",
"ccen\n",
"[False, 1, 1, 1, 0, 0, 0]\n",
"cing\n",
"[False, 0, 1, 1, 0, 0, 0]\n",
"chken\n",
"[False, 1, 1, 1, 0, 0, 1]\n",
"cke\n",
"[False, 1, 1, 1, 0]\n",
"ck\n",
"[False, 1, 0, 1, 0, 0, 0]\n",
"ciken\n"
]
},
{
"data": {
"text/plain": [
"{'cooking': 0.003, 'chucking': 0.016, 'chicken': 0.981}"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cond_prog = Abbreviator(prior, 'cken') \n",
"'''\n",
"Here we are saying, based on the fact that we saw the abbreviation 'cken'. \n",
"Re-apply the abbreviatior model with the same uniform prior over words but take into the data point we just saw.\n",
"\n",
"Q: what if we change this to 'chn', how would you expect the results to differ?\n",
"'''\n",
"\n",
"\n",
"solver = MH(cond_prog, num_samples=1000)\n",
"result = solver.run()\n",
"\n",
"\n",
"\n",
"# uncomment line below to check if logmasses are -INF, or not increasing which indicates MH is just randomly bouncing\n",
"# print([res.logmass for res in result]) \n",
"result.marginal('word')\n",
"'''\n",
"Did you have an a guess to the Q on line 1? \n",
"I hope it was something along the lines of:\n",
"\"the result would be less certain because it could be an abbreviation of cooking, chucking and chicken\"\n",
"When reasoning about probabilistic programs it's helpful to ask yourself:\n",
"\"How would I as a human approach this problem; what's my intuition?\"\n",
"'''\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following figure illustrates the total_logmass of the states being explored. This helps us diagnose if the MH algorithm is converging. (We typically expect to see a rising curve that flattens out.)"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[<matplotlib.lines.Line2D at 0x107462080>]"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAD8CAYAAAB0IB+mAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAFklJREFUeJzt3XuQXOV55/HvoxmNbhCuAnGTJYLABgoTe8KC13bhNRjiJbCJ7S1cqYK1yarYkHWSypYdSqk4mxS1SeE42cRZB+0tm4TEyYK5BNvhtg5xZYMdaZGJuBlxMwIZhAEJkNBopp/9o0/3tHHPjGZOj3r6nO+nqqv7XOac9/Tp+fXb73n77chMJEnVt6jfBZAkHRwGviTVhIEvSTVh4EtSTRj4klQTBr4k1YSBL0k1YeBLUk0Y+JJUE8P9LkCno48+OtesWdPvYkjSQNm8efNLmblypvUWVOCvWbOGTZs29bsYkjRQIuKZA1nPJh1JqgkDX5JqwsCXpJow8CWpJgx8SaoJA1+SasLAl6SaWFD98KW6efR7u/nqgzv6XYxSliwe4t+8Zw0rlgzz7Mt7uGnzdgbxp1M/dMYqzjzhsFLb2Dc+wR///dO8sW981n976qpDueSs40vtfyYGvtRHG+97ki8/8BwR/S7J3LRy/ZRjDuGiM1bxZ/c/ww1/9+TAHU8mbNv5Ov/lZ95dajubn3mF//S1RwFm/RxcctbxBr5UZfvGG/zoyhXc+8vn97soc/L0S29w/uf+tl2jfX3fOEcfMsKmX72wzyWbnX/1h3/P6/smSm/njWIbd/z795b+tDAfbMOX+mj/RIPFQ4P7b7h8ZAiAPWPNoNs7NsHSxUP9LNKcLFs8xN6x2TfDvNWeYhsL9TkY3FeaVAGDHvjLisDfWwT+nrGJ9pvAIFk+MtR+0yqj9Tws1OdgcF9pUgWMN5LhoQFr8O6wfKTZKtwKyz37J1g2MngtxctGhtphXcYeA1/SVMbGB7uGP7QoGBlexN79rSadcZYv0OaM6fSshl88D8sWaOCXeiuOiI8Bvw68AzgnMzd1LLsWuAqYAD6VmXeW2ZdUReONZOniwQ18aIblQ8/v4tYHnmPHrjc57dhD+12kWVs+MszuN/dz6wPPldrOlmdfbb4JLtA38bKfvbYCPw3c0DkzIk4HLgfOAI4H7omIUzOz/FuoVCHjEw0WLx28JpBOxx22jG88/hLfePwlAM4/bcbf4VhwjjtsKXvGJvjFv9xSelsnHrGMWKD9Uku90jLzEaDbwV0GfCkz9wFPRcQ24BzgH8rsT6qasYlkeNHCrA0eqJuuPo8XX9vXnj7piGV9LM3crH//yVx85ioaPfi+2NGHjJTfyDyZr6rFCcD9HdPbi3mSOoxPNBgZXpi1wQO1Yskwa5cM9qeUiOBtR63odzHm3YxnKSLuAVZ1WbQhM2+b6s+6zOv63hkR64H1AKtXr56pOFKl7J9oDHwNX4NjxsDPzAvmsN3twEkd0ycCz0+x/Y3ARoDR0dHBG4BDKmH/RA50Lx0Nlvl6pd0OXB4RSyJiLbAO+NY87UsaWM0vXg12k44GR9lumT8F/AGwEvhKRGzJzIsy86GI+CvgYWAcuGYh99DJTB574TXGxhv9LsqcBcFpqw5lZNjaYhkH+7Xw5v6Jgf7ilQZL2V46twC3TLHsOuC6Mts/WO5++AXW/+nmfhejtJ87/0f59MVv73cxBtq9j7zIz/7JpplX7KFDly4+qPtTfQ32pfUeeaHoUvb5f/1ODls2mP98n77pwR/oGqe5eeG1NwH4nY+9k8OXz/9rIQJ+fM2R874fCQx8gPYoeReefuzA1rYOW764J2OB1F3rObzg9GMH9s1fmooNvsDesWZ77fIBHPSpZfnIUHscD83dQh/tUCrDwAf27B9nZHgRQ4sG9+LZ8sXD7bG4NXd79k+weCjsKqlK8lVNs1Y36DW6Xg3vWnd7xyZYNoCjPUoHYnDbMHrolT37B/6ffPnIENtfGef1Ofx4sibt3rt/oJv2pOnU/pX96Pd289fffp6Tjhy8AZ86Hbp0mCd2vsGZn3UU6rJOOeaQfhdBmhe1D/wdu5rd8K44d01/C1LSNR84hXXHDN445AvR2asP73cRpHlR+8BvDen242sHuy/0245awb99/8n9LoakBaz2F20b2Uz8we2fI0kHpvaBX+Q9ixboL9RIUq/UPvDbNXzzXlLF1T7wHYBfUl0Y+EUN3yYdSVVn4BdVfPNeUtUZ+MW9NXxJVVf7wPeiraS6qH3gT3bL7G85JGm+1T7wWzV8v3olqepqH/gtNulIqrraB37DbpmSaqL2gd/ultnfYkjSvDPwHUtHUk3UPvDtlimpLmof+O0+Oga+pIoz8Ns1fBNfUrWVCvyI+FhEPBQRjYgY7Zi/JiL2RsSW4vZH5Ys6P7xoK6kuyv7E4Vbgp4Ebuix7IjPPLrn9edfwoq2kmigV+Jn5CAx2c0jiRVtJ9TCfbfhrI+KBiLgvIt43j/spxeGRJdXFjDX8iLgHWNVl0YbMvG2KP9sBrM7M70fEu4FbI+KMzNzdZfvrgfUAq1evPvCS90j7oq2t+JIqbsbAz8wLZrvRzNwH7Cseb46IJ4BTgU1d1t0IbAQYHR096L84ODke/sHesyQdXPPSpBMRKyNiqHh8MrAOeHI+9lVWo2G3TEn1ULZb5k9FxHbgPOArEXFnsej9wIMR8W3gJuDqzHy5XFHnh4MjS6qLsr10bgFu6TL/ZuDmMts+WOyWKaku/Kat37ySVBO1D/wWL9pKqrraB37DsXQk1UTtA98fMZdUF7UP/Ea7Cd/El1RttQ98x9KRVBcGvmPpSKoJA9+xdCTVhIHvRVtJNVH7wG9ftLVNR1LF1T7wWxdtreFLqrraB741fEl1UfvAbzfiS1LF1T7wG2lzjqR6qH3gJ2lzjqRaMPCt4UuqidoHfiP90pWkeqh94DebdPpdCkmafwZ+Oo6OpHow8DNt0pFUC7UPfLtlSqqL2gd+s0nHxJdUfQa+F20l1YSBn9iCL6kWDPxMFtmIL6kGhvtdgINh+yt7uPeRF9u/btXpkR2vWcOXVAulAj8irgd+EhgDngA+kZmvFsuuBa4CJoBPZeadJcs6a3vHJvj6Yy/yv/7v03zzqZenXO+M43/kIJZKkvqjbA3/buDazByPiN8GrgU+ExGnA5cDZwDHA/dExKmZOVFyf7Ny+7ef4zM3/xMAHz9nNZ++6LSu6x2ytBYfdCTVXKmky8y7OibvBz5aPL4M+FJm7gOeiohtwDnAP5TZ32ztHWu+v9zyc+/hzBMOY/FQ7S9ZSKqxXibgJ4GvFY9PAJ7tWLa9mHdQtVrs1xy1wrCXVHsz1vAj4h5gVZdFGzLztmKdDcA4cGPrz7qs3/WnpSJiPbAeYPXq1QdQ5AOX7Z8v7OlmJWkgzRj4mXnBdMsj4krgEuCDOdkNZjtwUsdqJwLPT7H9jcBGgNHR0Z7+3mBrY46VI0klm3Qi4mLgM8ClmbmnY9HtwOURsSQi1gLrgG+V2ddctN5/wtYcSSrdS+cLwBLg7mI8mvsz8+rMfCgi/gp4mGZTzzUHu4dOJ+v3klS+l84p0yy7DriuzPbLmmzDN/IlqdKNHVm04hv3klT1wLeXjiS1VTvwi3t76UhSxQO/0eqlY95LUrUDv8vgmJJUW5UO/BZr+JJU8cBvf/HKNnxJqnrgN++t4UtS1QO/uDfvJanqge83bSWprdqB7zdtJamt2oFvG74ktVU78It7m3QkqeKB7zevJGlSpQM/sTlHklqqHfjpBVtJaql24JO230tSodqBbw1fktqqHfjYhi9JLdUO/HTgNElqqXbgY5uOJLVUOvDNe0maVOnAT2CRjfiSBFQ98DO9aCtJhYoHvk06ktRS7cDHgdMkqaVU4EfE9RHxaEQ8GBG3RMThxfw1EbE3IrYUtz/qTXFnxxq+JE0qW8O/GzgzM88CvgNc27Hsicw8u7hdXXI/c2K3TEmaVCrwM/OuzBwvJu8HTixfpN6xhi9Jk3rZhv9J4Gsd02sj4oGIuC8i3tfD/RywZi8dI1+SAIZnWiEi7gFWdVm0ITNvK9bZAIwDNxbLdgCrM/P7EfFu4NaIOCMzd3fZ/npgPcDq1avndhRTcCwdSZo0Y+Bn5gXTLY+IK4FLgA9mNn9iKjP3AfuKx5sj4gngVGBTl+1vBDYCjI6O9vQnqmzSkaRJZXvpXAx8Brg0M/d0zF8ZEUPF45OBdcCTZfY1F46HL0mTZqzhz+ALwBLg7iJY7y965Lwf+I2IGAcmgKsz8+WS+5o1a/iSNKlU4GfmKVPMvxm4ucy2e8E2fEmaVO1v2iZYx5ekpkoHPjh4miS1VDrwbcOXpEnVD3wTX5KAqgc+6W/aSlKh2oFvDV+S2qod+NiGL0kt1Q789AdQJKml2oFPT4fmkaSBVunAxzZ8SWqrdOA7tIIkTap24KfdMiWppdqBjzV8SWqpduAnLDLxJQmoeuBjP3xJaql24Dt6miS1VSbwn/n+G1z4+fv4wOf+ln98uvnjWtbwJWlSZQL/8Rde5/EXX+epl95gy3dfbc70m7aS1FaZwO/8Tu2b+yeKeWkNX5IK1Qn8nIz8vUXgNxp2y5SkluoEfsfjN/c3inl+8UqSWqoT+F1q+I6HL0mTKhT4k4/3tdvwJUkt1Qn8jsc/WMO3ii9JUKHAbxRV/KWLF/HAd1/lv/7dk2AvHUlqq0zgt5p0Pvj2Y9kzNs71dz1mG74kdSgd+BHxmxHxYERsiYi7IuL4Yn5ExO9HxLZi+bvKF3dqrSadX7pwHZ9871rGxhtMZBr4klToRQ3/+sw8KzPPBu4Afq2Y/xPAuuK2HvhiD/Y1pVYvnYhg+cgQAHvGJuyWKUmF0oGfmbs7JlcwWdm+DPiTbLofODwijiu7v6nL0bwPYNnIMAB7xsat4UtSYbgXG4mI64ArgF3AB4rZJwDPdqy2vZi3oxf7fKvWD5ZHBMsXFzX8fRMcurQnhyhJA++AavgRcU9EbO1yuwwgMzdk5knAjcDPt/6sy6Z+qGt8RKyPiE0RsWnnzp1zPY52DX9R0G7SeWNs3Ku2klQ4oOpvZl5wgNv7c+ArwGdp1uhP6lh2IvB8l21vBDYCjI6Ozvm7Uo12k06wtAj8V/fs57jDls11k5JUKb3opbOuY/JS4NHi8e3AFUVvnXOBXZk5L8050HnRFtYctYLlI0PsG29wyjGHzNcuJWmg9KKB+7ci4jSgATwDXF3M/yrwYWAbsAf4RA/2NaXOjwZrj17B1l+/CLBFR5JaSgd+Zn5kivkJXFN2+wdekObdokXxA/eSpKbKfNO2NbSCMS9J3VUm8FtNOjbhSFJ31Qn8drdME1+SuqlO4GOTjiRNpzKB32i36fS1GJK0YFUm8GlftDXxJambygR+q4Jvb0xJ6q4ygd9oTA6eJkn6YZUJfJvwJWl61Qn81uBpJr4kdVWdwC/ubdKRpO6qE/gdo2VKkn5YhQK/eW/eS1J31Qn8olHHoRUkqbvKBH7Di7aSNK3KBH52/MShJOmHVSfw8aKtJE2nOoFvk44kTatCge/gaZI0nQoFfvPeGr4kdVedwC/u7ZYpSd1VJ/D94pUkTasygd9waAVJmlZlAt/B0yRpepUJfDKt3UvSNCoT+I20/V6SplMq8CPiNyPiwYjYEhF3RcTxxfzzI2JXMX9LRPxab4o7tSRtzpGkaZSt4V+fmWdl5tnAHUBnsH8jM88ubr9Rcj8zSmv4kjStUoGfmbs7Jlcwee30oEvsgy9J0xkuu4GIuA64AtgFfKBj0XkR8W3geeA/ZOZDZfc1lb1jEzz8/G7GJhrztQtJGngz1vAj4p6I2NrldhlAZm7IzJOAG4GfL/7s/wFvy8x3An8A3DrN9tdHxKaI2LRz5845HcRjL7zGfd+Z299KUl3MGPiZeUFmntnldttbVv1z4CPF3+zOzNeLx18FFkfE0VNsf2Nmjmbm6MqVK+d0EIuHbMqRpJmU7aWzrmPyUuDRYv6qKLrMRMQ5xX6+X2Zf0xkZqkzvUkmaN2Xb8H8rIk4DGsAzwNXF/I8C/y4ixoG9wOXZGr94HowMG/iSNJNSgZ+ZH5li/heAL5TZ9mwstoYvSTOqRFIa+JI0s0okpU06kjSzSiSlF20laWaVSEq7ZUrSzCoR+MPW8CVpRialJNWEgS9JNWHgS1JNlB4tc6H41X/5jn4XQZIWtMoE/s++7+R+F0GSFjSbdCSpJgx8SaoJA1+SasLAl6SaMPAlqSYMfEmqCQNfkmrCwJekmoh5/KnZWYuInTR/G3eujgZe6lFxBkHdjhc85rrwmGfnbZm5cqaVFlTglxURmzJztN/lOFjqdrzgMdeFxzw/bNKRpJow8CWpJqoW+Bv7XYCDrG7HCx5zXXjM86BSbfiSpKlVrYYvSZpCJQI/Ii6OiMciYltE/Eq/y9MrEXFSRHw9Ih6JiIci4heK+UdGxN0R8Xhxf0QxPyLi94vn4cGIeFd/j2BuImIoIh6IiDuK6bUR8c3ieP8yIkaK+UuK6W3F8jX9LHcZEXF4RNwUEY8W5/u8GpznXype11sj4i8iYmnVznVE/I+IeDEitnbMm/V5jYgri/Ufj4gr51qegQ/8iBgC/hD4CeB04OMRcXp/S9Uz48AvZ+Y7gHOBa4pj+xXg3sxcB9xbTEPzOVhX3NYDXzz4Re6JXwAe6Zj+beB3i+N9BbiqmH8V8EpmngL8brHeoPrPwN9k5tuBd9I8/sqe54g4AfgUMJqZZwJDwOVU71z/MXDxW+bN6rxGxJHAZ4F/BpwDfLb1JjFrmTnQN+A84M6O6WuBa/tdrnk61tuAC4HHgOOKeccBjxWPbwA+3rF+e71BuQEnFv8E/wK4AwiaX0YZfuv5Bu4EziseDxfrRb+PYQ7H/CPAU28te8XP8wnAs8CRxbm7A7ioiucaWANsnet5BT4O3NAx/wfWm81t4Gv4TL5wWrYX8yql+Aj7Y8A3gWMzcwdAcX9MsVoVnovfAz4NNIrpo4BXM3O8mO48pvbxFst3FesPmpOBncD/LJqy/ltErKDC5zkznwM+B3wX2EHz3G2m+ucaZn9ee3a+qxD40WVepboeRcQhwM3AL2bm7ulW7TJvYJ6LiLgEeDEzN3fO7rJqHsCyQTIMvAv4Ymb+GPAGkx/zuxn44y6aJC4D1gLHAytoNmm8VdXO9XSmOsaeHXsVAn87cFLH9InA830qS89FxGKaYX9jZn65mP1CRBxXLD8OeLGYP+jPxT8HLo2Ip4Ev0WzW+T3g8IgYLtbpPKb28RbLDwNePpgF7pHtwPbM/GYxfRPNN4CqnmeAC4CnMnNnZu4Hvgy8h+qfa5j9ee3Z+a5C4P8jsK64uj9C88LP7X0uU09ERAD/HXgkMz/fseh2oHWl/kqabfut+VcUV/vPBXa1PjoOgsy8NjNPzMw1NM/j/8nMnwG+Dny0WO2tx9t6Hj5arD9wtb7M/B7wbEScVsz6IPAwFT3Phe8C50bE8uJ13jrmSp/rwmzP653AhyLiiOKT0YeKebPX7wsaPboo8mHgO8ATwIZ+l6eHx/Vemh/dHgS2FLcP02y7vBd4vLg/slg/aPZYegL4J5o9IPp+HHM89vOBO4rHJwPfArYB/xtYUsxfWkxvK5af3O9ylzjes4FNxbm+FTii6ucZ+I/Ao8BW4E+BJVU718Bf0LxGsZ9mTf2quZxX4JPFsW8DPjHX8vhNW0mqiSo06UiSDoCBL0k1YeBLUk0Y+JJUEwa+JNWEgS9JNWHgS1JNGPiSVBP/HxEoBxSaek2tAAAAAElFTkSuQmCC\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"logmasses = list(map(lambda x: x, result.get_total_logmasses())) \n",
"# logmasses = log of probability at iteration x (x-axis), the highest possible value of log probabilty is 1\n",
"# this illustrates how \"likely is this sampled data point assuming my model is correct\"\n",
"plt.plot(logmasses)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Other Practical Notes:\n",
"- MH results are dependent on initiation (for a given number of iteration)\n",
" => combine results from multiple initializations or\n",
" => set a burn in period to reduce the influence of initial values\n",
" \n",
"- MH results are not i.i.d (state at t+1 is dependent on state at t)\n",
" => set skip variable to reduce auto-correlation (Skip of s means we keep every s data point sampled).\n",
" => verify with correlogram\n",
" \n",
"- MH may not be suitable for bi-modal / multi-modal distributions.\n",
" => intuitivly, if MH finds a high density area, there is little incentive for it to bounce/sample through a low density area to another peak, this is similar to getting stuck in a bowl/local minima in gradient decent in deep learning\n",
" => Hamiltonian Monte Carlo might be a better choice.\n",
"\n",
"- MH may be slow to converge for high dimensional posteriors. Consider Gibbs-sampling if not multimodal."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python classifier2",
"language": "python",
"name": "classifers2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment