Prtfw/lec8.ipynb

## lec8.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Workshop 8: \n",
    "## Approximate Likelihood guides for MH inference: An abbreviation model example"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "MH:\n",
    "    Metropolis Hastings algorithm high level summary:\n",
    "        - a specific type of MCMC (Markovchain Monte Carlo)\n",
    "        - used to sample from a intractable posterior via a simpler proposal distribution\n",
    "        - guaranteed to converge to true posterior in the long run \n",
    "        - monte carlo: generate random numbers\n",
    "        - markov chain: the next state only depends on the current state\n",
    "        - acceptance criteria: \n",
    "            - correct for the fact that proposal distribution has no relation to the exact posterior (the distribution we care about but can't compute)\n",
    "            - the proposal distribution simply proposes new sites/area to sample\n",
    "            - we simply want to accept more samples from high density region of the posterior \n",
    "            - a.k.a in proportion to the posterior's density\n",
    "\n",
    "For further reading:\n",
    "MH example with animation:\n",
    "https://blog.stata.com/2016/11/15/introduction-to-bayesian-statistics-part-2-mcmc-and-the-metropolis-hastings-algorithm/\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from hyphenate import hyphenate_word \n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "import math\n",
    "import Levenshtein\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following demonstrates a simple model for abbreviation.  It assumes a naive model of abbreviation with two key components:\n",
    "\n",
    "* truncation: Probabilistically drop all but the first syllable.  This relies on an external module that attempts to break up words by syllable (eg. 'foobar' -> 'foo-bar').\n",
    "* letter dropping:  Drop letters from a word to create an abbreviation.  Vowels and consonants could be dropped at different rates.\n",
    "\n",
    "This is a naive (wide) model, additional components could be added to make a better abbreviate model.\n",
    "\n",
    "The goal here is to demonstrate the use of approximiate likelihood functions to guide MH inference."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'trun'"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def truncate(word):\n",
    "    \"\"\"\n",
    "    A function for truncating a word, returning only the first syllable\n",
    "    \"\"\"\n",
    "    return(hyphenate_word(word)[0])\n",
    "\n",
    "# small illustrative example\n",
    "truncate('truncate') "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[False, 0, 1, 0]\n",
      "lae\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<particle.core.Trace at 0x106224208>"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "class Drop(Dist):\n",
    "    def __init__(self, word):\n",
    "        \"\"\"\n",
    "        A model of for the dropping of letters from a word.\n",
    "        \"\"\"\n",
    "        self.vowels = ['a', 'e', 'i', 'o', 'u']\n",
    "        self.word = word\n",
    "        \n",
    "    def concat(self, drop_idx):\n",
    "        \"\"\"\n",
    "        Take the letters in word where the corresponding element in drop_idx\n",
    "        is False, and concatenate them into one string.\n",
    "        \"\"\"\n",
    "        joined = \"\".join([self.word[i] for i in range(len(drop_idx)) if not drop_idx[i]])\n",
    "        return joined\n",
    "\n",
    "    def run(self, env): \n",
    "        word = self.word\n",
    "        ln = len(word)\n",
    "        candidate_rates_vowels = [0, .5, .9]\n",
    "        # Feel free to experimented with having different drop rates\n",
    "        # as in most real examples constants are kept more often than vowels\n",
    "        # candidate_rates_consonant = [0, .1, 0.3]\n",
    "        vowel_drop_rate = env.bind(\"vowel_drop_rate\", UniformDiscrete(candidate_rates_vowels))\n",
    "        consonant_drop_rate = env.bind(\"consonant_drop_rate\", UniformDiscrete(candidate_rates_vowels))\n",
    "        drop_letters = [False]\n",
    "        \n",
    "        for i in range(1, ln):\n",
    "            if word[i] in self.vowels:\n",
    "                drop_letters.append(env.bind('drop_{}'.format(i), Flip(vowel_drop_rate)))\n",
    "            else:\n",
    "                drop_letters.append(env.bind('drop_{}'.format(i), Flip(consonant_drop_rate)))\n",
    "        print(drop_letters)\n",
    "        final = self.concat(drop_letters)\n",
    "        print(final)\n",
    "\n",
    "        return final\n",
    "    \n",
    "# a small illustrative example of the word lane being abbreviated\n",
    "Drop('lane').generate()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "vocab = ['dog', 'cat', 'mice', 'utah', 'sam'] \n",
    "# simpler words as a identity test (to see if anything is wrong with the code)\n",
    "vocab = ['chicken', 'chucking', 'cooking', 'chickling', 'crackling'] # more realistic words\n",
    "prior = {word: 1 for word in vocab} # make it a uniform categorical over candidate words"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I give these a uniform prior, and condition on the abbreviation being 'ckn'."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `Abbreviator` model makes use of the truncate function and the `Drop` model to create a generative model of an abbreviation.  It takes a `prior` argument, which is a dictionary where keys are possible words and values are weights.\n",
    "\n",
    "This demo model assumes we have a vocab of possible words to abbreviate before we start but we could generalize down the road by incorporating a probability of the abbreviation coming from a unseen word and how those abbreviations would be generated etc... (try this your self for some extra fun)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[False, 1, 0, 0, 1]\n",
      "cic\n"
     ]
    }
   ],
   "source": [
    "class Abbreviator(Dist):\n",
    "    def __init__(self, prior, test):\n",
    "        self.prior = prior\n",
    "        self.test = test\n",
    " \n",
    "    def run(self, env):       \n",
    "        # choose a word based on discrete prior\n",
    "        word = env.bind(\"word\", Discrete(self.prior))\n",
    "        # the simplist abbreviation is the word itself\n",
    "        abbreviation = word\n",
    "        truncation_rate = 0.5 # 50% of the time we only keep the first syllable\n",
    "        is_truncated = env.bind('truncate', Flip(truncation_rate)) # Flip draws from a binomial distribution\n",
    "        if is_truncated:\n",
    "            abbreviation = truncate(abbreviation) \n",
    "        # get the abbreviation after turning off characters\n",
    "        abbreviation = env.inline(\"drop\", Drop(abbreviation)).return_value\n",
    "        # add result to state (the env object)\n",
    "        \n",
    "        ''' \n",
    "        Note that if we use a Dirac (approximately equal to a gaussian with sigma/standard deviation=>0) \n",
    "        to bind the abbreviation.\n",
    "        MH is very unlikely to sample and \"stay in\" the high density region. (Feel free to test it out).\n",
    "        An good analogy is \"finding a needle in a hay stack\"\n",
    "        '''\n",
    "        # print('abv',abbreviation)\n",
    "        # abbreviation = env.bind(\"abbreviation\", Dirac(abbreviation))\n",
    "         \n",
    "        \n",
    "        # Therefore we use a softened/relaxed approximate likelihood function by using the Levenshtein distance. \n",
    "        # Which helps MH converge. \n",
    "        # We effectively get to the right answer faster by solving a slightly different problem.\n",
    "        # We are essentially spreading out the pdf from a very tall pointy distribution into a fatter shorter version\n",
    "        # The key thing is the peak of the distribution remains the same.\n",
    "\n",
    "        env.add_logmass('abbreviation_likelihood', -10 * Levenshtein.distance(abbreviation, self.test))\n",
    "        return abbreviation\n",
    "    \n",
    "part=Abbreviator(prior,' ').generate() # this generates 1 sample according to the model, here is an illustration"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As an example of inference, consider the challenge of disambiguating the abbrevation \"cken\" into one of the following words.\n",
    "\n",
    "\n",
    "the MH solver is doing the following:\n",
    "- generate an example from the probabilistic model\n",
    "- score how similar this generated abbrevation is to the data point \"cken\" using the Levenstein distance\n",
    "- decide to keep or reject the sample in porportion to the probability density\n",
    "- choose a slightly different set of random variables and repeat\n",
    "- at the end of the 1000 samples, we have a fairly low fidelity approximation of the actual distribution\n",
    "- an analogy is this toy on amazon: [link](https://www.amazon.com/Glantop-Sculpture-Impression-Office-Desktop/dp/B0743D2WKH/ref=sr_1_6?ie=UTF8&qid=1542074924&sr=8-6&keywords=push+pin+toy)\n",
    "- we are approximating the real surface of the distribution with 1000 \"pins\"\n",
    "         \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[False, 1, 1, 1, 1, 1, 1]\n",
      "c\n",
      "[False, 1, 1, 1, 0, 1, 1]\n",
      "ci\n",
      "[False, 1, 1, 1, 1, 0, 0]\n",
      "cng\n",
      "[False, 0, 1, 1, 1, 0, 0]\n",
      "cong\n",
      "[False, 0, 1, 1, 0, 1, 0, 1]\n",
      "chkn\n",
      "[False, 0, 1, 1, 1, 1, 0, 1]\n",
      "chn\n",
      "[False, 0, 1, 1, 1, 1, 1, 1]\n",
      "ch\n",
      "[False, 0, 1, 1, 1]\n",
      "       ...      \n",
      "ck\n",
      "[False, 1, 1, 1, 1, 0, 0]\n",
      "cen\n",
      "[False, 1, 0, 1, 0, 0, 0]\n",
      "ciken\n",
      "[False, 1, 1, 1, 0, 0, 1]\n",
      "cke\n",
      "[False, 1, 0, 1, 0, 0, 0]\n",
      "ciken\n",
      "[False, 1, 1, 1, 0, 0, 1]\n",
      "cke\n",
      "[False, 1, 0, 1, 0, 0, 0]\n",
      "ciken\n",
      "[False, 1, 1, 1, 1, 0, 0]\n",
      "cen\n",
      "[False, 0, 1, 0, 0, 0, 0]\n",
      "chcken\n",
      "[False, 1, 0, 1, 0, 0, 0]\n",
      "ciken\n",
      "[False, 1, 1, 1, 0, 0, 0]\n",
      "cing\n",
      "[False, 0, 1, 1, 0, 0, 0]\n",
      "chken\n",
      "[False, 1, 0, 1, 0, 0, 0]\n",
      "ciken\n",
      "[False, 1, 1, 1, 0, 0, 0, 0]\n",
      "cking\n",
      "[False, 1, 1, 0, 0, 0, 0]\n",
      "ccken\n",
      "[False, 0, 1, 1, 0, 0, 0]\n",
      "chken\n",
      "[False, 1, 1, 1, 0, 0, 0]\n",
      "cing\n",
      "[False, 0, 1, 1, 0, 0, 0]\n",
      "chken\n",
      "[False, 1, 1, 1, 1, 0, 0]\n",
      "cen\n",
      "[False, 1, 1, 1, 0, 0, 0, 0, 1]\n",
      "cklin\n",
      "[False, 1, 1, 0, 0, 0, 0]\n",
      "ccken\n",
      "[False, 1, 1, 1, 0, 1, 0]\n",
      "ckn\n",
      "[False, 1, 0, 1, 0, 0, 0]\n",
      "ciken\n",
      "[False, 1, 0, 1, 0, 0, 0]\n",
      "ciken\n",
      "[False, 1, 1, 1, 0, 0, 1]\n",
      "cke\n",
      "[False, 1, 1, 1, 1, 0, 0]\n",
      "       ...      \n",
      "cen\n",
      "[False, 1, 1, 0, 1, 0, 0]\n",
      "ccen\n",
      "[False, 1, 1, 1, 0, 0, 0]\n",
      "cing\n",
      "[False, 0, 1, 1, 0, 0, 0]\n",
      "chken\n",
      "[False, 1, 1, 1, 0, 0, 1]\n",
      "cke\n",
      "[False, 1, 1, 1, 0]\n",
      "ck\n",
      "[False, 1, 0, 1, 0, 0, 0]\n",
      "ciken\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'cooking': 0.003, 'chucking': 0.016, 'chicken': 0.981}"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cond_prog = Abbreviator(prior, 'cken') \n",
    "'''\n",
    "Here we are saying, based on the fact that we saw the abbreviation 'cken'. \n",
    "Re-apply the abbreviatior model with the same uniform prior over words but take into the data point we just saw.\n",
    "\n",
    "Q: what if we change this to 'chn', how would you expect the results to differ?\n",
    "'''\n",
    "\n",
    "\n",
    "solver = MH(cond_prog, num_samples=1000)\n",
    "result = solver.run()\n",
    "\n",
    "\n",
    "\n",
    "# uncomment line below to check if logmasses are -INF, or not increasing which indicates MH is just randomly bouncing\n",
    "# print([res.logmass for res in result]) \n",
    "result.marginal('word')\n",
    "'''\n",
    "Did you have an a guess to the Q on line 1? \n",
    "I hope it was something along the lines of:\n",
    "\"the result would be less certain because it could be an abbreviation of cooking, chucking and chicken\"\n",
    "When reasoning about probabilistic programs it's helpful to ask yourself:\n",
    "\"How would I as a human approach this problem; what's my intuition?\"\n",
    "'''\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following figure illustrates the total_logmass of the states being explored. This helps us diagnose if the  MH algorithm is converging. (We typically expect to see a rising curve that flattens out.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<matplotlib.lines.Line2D at 0x107462080>]"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAD8CAYAAAB0IB+mAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAFklJREFUeJzt3XuQXOV55/HvoxmNbhCuAnGTJYLABgoTe8KC13bhNRjiJbCJ7S1cqYK1yarYkHWSypYdSqk4mxS1SeE42cRZB+0tm4TEyYK5BNvhtg5xZYMdaZGJuBlxMwIZhAEJkNBopp/9o0/3tHHPjGZOj3r6nO+nqqv7XOac9/Tp+fXb73n77chMJEnVt6jfBZAkHRwGviTVhIEvSTVh4EtSTRj4klQTBr4k1YSBL0k1YeBLUk0Y+JJUE8P9LkCno48+OtesWdPvYkjSQNm8efNLmblypvUWVOCvWbOGTZs29bsYkjRQIuKZA1nPJh1JqgkDX5JqwsCXpJow8CWpJgx8SaoJA1+SasLAl6SaWFD98KW6efR7u/nqgzv6XYxSliwe4t+8Zw0rlgzz7Mt7uGnzdgbxp1M/dMYqzjzhsFLb2Dc+wR///dO8sW981n976qpDueSs40vtfyYGvtRHG+97ki8/8BwR/S7J3LRy/ZRjDuGiM1bxZ/c/ww1/9+TAHU8mbNv5Ov/lZ95dajubn3mF//S1RwFm/RxcctbxBr5UZfvGG/zoyhXc+8vn97soc/L0S29w/uf+tl2jfX3fOEcfMsKmX72wzyWbnX/1h3/P6/smSm/njWIbd/z795b+tDAfbMOX+mj/RIPFQ4P7b7h8ZAiAPWPNoNs7NsHSxUP9LNKcLFs8xN6x2TfDvNWeYhsL9TkY3FeaVAGDHvjLisDfWwT+nrGJ9pvAIFk+MtR+0yqj9Tws1OdgcF9pUgWMN5LhoQFr8O6wfKTZKtwKyz37J1g2MngtxctGhtphXcYeA1/SVMbGB7uGP7QoGBlexN79rSadcZYv0OaM6fSshl88D8sWaOCXeiuOiI8Bvw68AzgnMzd1LLsWuAqYAD6VmXeW2ZdUReONZOniwQ18aIblQ8/v4tYHnmPHrjc57dhD+12kWVs+MszuN/dz6wPPldrOlmdfbb4JLtA38bKfvbYCPw3c0DkzIk4HLgfOAI4H7omIUzOz/FuoVCHjEw0WLx28JpBOxx22jG88/hLfePwlAM4/bcbf4VhwjjtsKXvGJvjFv9xSelsnHrGMWKD9Uku90jLzEaDbwV0GfCkz9wFPRcQ24BzgH8rsT6qasYlkeNHCrA0eqJuuPo8XX9vXnj7piGV9LM3crH//yVx85ioaPfi+2NGHjJTfyDyZr6rFCcD9HdPbi3mSOoxPNBgZXpi1wQO1Yskwa5cM9qeUiOBtR63odzHm3YxnKSLuAVZ1WbQhM2+b6s+6zOv63hkR64H1AKtXr56pOFKl7J9oDHwNX4NjxsDPzAvmsN3twEkd0ycCz0+x/Y3ARoDR0dHBG4BDKmH/RA50Lx0Nlvl6pd0OXB4RSyJiLbAO+NY87UsaWM0vXg12k44GR9lumT8F/AGwEvhKRGzJzIsy86GI+CvgYWAcuGYh99DJTB574TXGxhv9LsqcBcFpqw5lZNjaYhkH+7Xw5v6Jgf7ilQZL2V46twC3TLHsOuC6Mts/WO5++AXW/+nmfhejtJ87/0f59MVv73cxBtq9j7zIz/7JpplX7KFDly4+qPtTfQ32pfUeeaHoUvb5f/1ODls2mP98n77pwR/oGqe5eeG1NwH4nY+9k8OXz/9rIQJ+fM2R874fCQx8gPYoeReefuzA1rYOW764J2OB1F3rObzg9GMH9s1fmooNvsDesWZ77fIBHPSpZfnIUHscD83dQh/tUCrDwAf27B9nZHgRQ4sG9+LZ8sXD7bG4NXd79k+weCjsKqlK8lVNs1Y36DW6Xg3vWnd7xyZYNoCjPUoHYnDbMHrolT37B/6ffPnIENtfGef1Ofx4sibt3rt/oJv2pOnU/pX96Pd289fffp6Tjhy8AZ86Hbp0mCd2vsGZn3UU6rJOOeaQfhdBmhe1D/wdu5rd8K44d01/C1LSNR84hXXHDN445AvR2asP73cRpHlR+8BvDen242sHuy/0245awb99/8n9LoakBaz2F20b2Uz8we2fI0kHpvaBX+Q9ixboL9RIUq/UPvDbNXzzXlLF1T7wHYBfUl0Y+EUN3yYdSVVn4BdVfPNeUtUZ+MW9NXxJVVf7wPeiraS6qH3gT3bL7G85JGm+1T7wWzV8v3olqepqH/gtNulIqrraB37DbpmSaqL2gd/ultnfYkjSvDPwHUtHUk3UPvDtlimpLmof+O0+Oga+pIoz8Ns1fBNfUrWVCvyI+FhEPBQRjYgY7Zi/JiL2RsSW4vZH5Ys6P7xoK6kuyv7E4Vbgp4Ebuix7IjPPLrn9edfwoq2kmigV+Jn5CAx2c0jiRVtJ9TCfbfhrI+KBiLgvIt43j/spxeGRJdXFjDX8iLgHWNVl0YbMvG2KP9sBrM7M70fEu4FbI+KMzNzdZfvrgfUAq1evPvCS90j7oq2t+JIqbsbAz8wLZrvRzNwH7Cseb46IJ4BTgU1d1t0IbAQYHR096L84ODke/sHesyQdXPPSpBMRKyNiqHh8MrAOeHI+9lVWo2G3TEn1ULZb5k9FxHbgPOArEXFnsej9wIMR8W3gJuDqzHy5XFHnh4MjS6qLsr10bgFu6TL/ZuDmMts+WOyWKaku/Kat37ySVBO1D/wWL9pKqrraB37DsXQk1UTtA98fMZdUF7UP/Ea7Cd/El1RttQ98x9KRVBcGvmPpSKoJA9+xdCTVhIHvRVtJNVH7wG9ftLVNR1LF1T7wWxdtreFLqrraB741fEl1UfvAbzfiS1LF1T7wG2lzjqR6qH3gJ2lzjqRaMPCt4UuqidoHfiP90pWkeqh94DebdPpdCkmafwZ+Oo6OpHow8DNt0pFUC7UPfLtlSqqL2gd+s0nHxJdUfQa+F20l1YSBn9iCL6kWDPxMFtmIL6kGhvtdgINh+yt7uPeRF9u/btXpkR2vWcOXVAulAj8irgd+EhgDngA+kZmvFsuuBa4CJoBPZeadJcs6a3vHJvj6Yy/yv/7v03zzqZenXO+M43/kIJZKkvqjbA3/buDazByPiN8GrgU+ExGnA5cDZwDHA/dExKmZOVFyf7Ny+7ef4zM3/xMAHz9nNZ++6LSu6x2ytBYfdCTVXKmky8y7OibvBz5aPL4M+FJm7gOeiohtwDnAP5TZ32ztHWu+v9zyc+/hzBMOY/FQ7S9ZSKqxXibgJ4GvFY9PAJ7tWLa9mHdQtVrs1xy1wrCXVHsz1vAj4h5gVZdFGzLztmKdDcA4cGPrz7qs3/WnpSJiPbAeYPXq1QdQ5AOX7Z8v7OlmJWkgzRj4mXnBdMsj4krgEuCDOdkNZjtwUsdqJwLPT7H9jcBGgNHR0Z7+3mBrY46VI0klm3Qi4mLgM8ClmbmnY9HtwOURsSQi1gLrgG+V2ddctN5/wtYcSSrdS+cLwBLg7mI8mvsz8+rMfCgi/gp4mGZTzzUHu4dOJ+v3klS+l84p0yy7DriuzPbLmmzDN/IlqdKNHVm04hv3klT1wLeXjiS1VTvwi3t76UhSxQO/0eqlY95LUrUDv8vgmJJUW5UO/BZr+JJU8cBvf/HKNnxJqnrgN++t4UtS1QO/uDfvJanqge83bSWprdqB7zdtJamt2oFvG74ktVU78It7m3QkqeKB7zevJGlSpQM/sTlHklqqHfjpBVtJaql24JO230tSodqBbw1fktqqHfjYhi9JLdUO/HTgNElqqXbgY5uOJLVUOvDNe0maVOnAT2CRjfiSBFQ98DO9aCtJhYoHvk06ktRS7cDHgdMkqaVU4EfE9RHxaEQ8GBG3RMThxfw1EbE3IrYUtz/qTXFnxxq+JE0qW8O/GzgzM88CvgNc27Hsicw8u7hdXXI/c2K3TEmaVCrwM/OuzBwvJu8HTixfpN6xhi9Jk3rZhv9J4Gsd02sj4oGIuC8i3tfD/RywZi8dI1+SAIZnWiEi7gFWdVm0ITNvK9bZAIwDNxbLdgCrM/P7EfFu4NaIOCMzd3fZ/npgPcDq1avndhRTcCwdSZo0Y+Bn5gXTLY+IK4FLgA9mNn9iKjP3AfuKx5sj4gngVGBTl+1vBDYCjI6O9vQnqmzSkaRJZXvpXAx8Brg0M/d0zF8ZEUPF45OBdcCTZfY1F46HL0mTZqzhz+ALwBLg7iJY7y965Lwf+I2IGAcmgKsz8+WS+5o1a/iSNKlU4GfmKVPMvxm4ucy2e8E2fEmaVO1v2iZYx5ekpkoHPjh4miS1VDrwbcOXpEnVD3wTX5KAqgc+6W/aSlKh2oFvDV+S2qod+NiGL0kt1Q789AdQJKml2oFPT4fmkaSBVunAxzZ8SWqrdOA7tIIkTap24KfdMiWppdqBjzV8SWqpduAnLDLxJQmoeuBjP3xJaql24Dt6miS1VSbwn/n+G1z4+fv4wOf+ln98uvnjWtbwJWlSZQL/8Rde5/EXX+epl95gy3dfbc70m7aS1FaZwO/8Tu2b+yeKeWkNX5IK1Qn8nIz8vUXgNxp2y5SkluoEfsfjN/c3inl+8UqSWqoT+F1q+I6HL0mTKhT4k4/3tdvwJUkt1Qn8jsc/WMO3ii9JUKHAbxRV/KWLF/HAd1/lv/7dk2AvHUlqq0zgt5p0Pvj2Y9kzNs71dz1mG74kdSgd+BHxmxHxYERsiYi7IuL4Yn5ExO9HxLZi+bvKF3dqrSadX7pwHZ9871rGxhtMZBr4klToRQ3/+sw8KzPPBu4Afq2Y/xPAuuK2HvhiD/Y1pVYvnYhg+cgQAHvGJuyWKUmF0oGfmbs7JlcwWdm+DPiTbLofODwijiu7v6nL0bwPYNnIMAB7xsat4UtSYbgXG4mI64ArgF3AB4rZJwDPdqy2vZi3oxf7fKvWD5ZHBMsXFzX8fRMcurQnhyhJA++AavgRcU9EbO1yuwwgMzdk5knAjcDPt/6sy6Z+qGt8RKyPiE0RsWnnzp1zPY52DX9R0G7SeWNs3Ku2klQ4oOpvZl5wgNv7c+ArwGdp1uhP6lh2IvB8l21vBDYCjI6Ozvm7Uo12k06wtAj8V/fs57jDls11k5JUKb3opbOuY/JS4NHi8e3AFUVvnXOBXZk5L8050HnRFtYctYLlI0PsG29wyjGHzNcuJWmg9KKB+7ci4jSgATwDXF3M/yrwYWAbsAf4RA/2NaXOjwZrj17B1l+/CLBFR5JaSgd+Zn5kivkJXFN2+wdekObdokXxA/eSpKbKfNO2NbSCMS9J3VUm8FtNOjbhSFJ31Qn8drdME1+SuqlO4GOTjiRNpzKB32i36fS1GJK0YFUm8GlftDXxJambygR+q4Jvb0xJ6q4ygd9oTA6eJkn6YZUJfJvwJWl61Qn81uBpJr4kdVWdwC/ubdKRpO6qE/gdo2VKkn5YhQK/eW/eS1J31Qn8olHHoRUkqbvKBH7Di7aSNK3KBH52/MShJOmHVSfw8aKtJE2nOoFvk44kTatCge/gaZI0nQoFfvPeGr4kdVedwC/u7ZYpSd1VJ/D94pUkTasygd9waAVJmlZlAt/B0yRpepUJfDKt3UvSNCoT+I20/V6SplMq8CPiNyPiwYjYEhF3RcTxxfzzI2JXMX9LRPxab4o7tSRtzpGkaZSt4V+fmWdl5tnAHUBnsH8jM88ubr9Rcj8zSmv4kjStUoGfmbs7Jlcwee30oEvsgy9J0xkuu4GIuA64AtgFfKBj0XkR8W3geeA/ZOZDZfc1lb1jEzz8/G7GJhrztQtJGngz1vAj4p6I2NrldhlAZm7IzJOAG4GfL/7s/wFvy8x3An8A3DrN9tdHxKaI2LRz5845HcRjL7zGfd+Z299KUl3MGPiZeUFmntnldttbVv1z4CPF3+zOzNeLx18FFkfE0VNsf2Nmjmbm6MqVK+d0EIuHbMqRpJmU7aWzrmPyUuDRYv6qKLrMRMQ5xX6+X2Zf0xkZqkzvUkmaN2Xb8H8rIk4DGsAzwNXF/I8C/y4ixoG9wOXZGr94HowMG/iSNJNSgZ+ZH5li/heAL5TZ9mwstoYvSTOqRFIa+JI0s0okpU06kjSzSiSlF20laWaVSEq7ZUrSzCoR+MPW8CVpRialJNWEgS9JNWHgS1JNlB4tc6H41X/5jn4XQZIWtMoE/s++7+R+F0GSFjSbdCSpJgx8SaoJA1+SasLAl6SaMPAlqSYMfEmqCQNfkmrCwJekmoh5/KnZWYuInTR/G3eujgZe6lFxBkHdjhc85rrwmGfnbZm5cqaVFlTglxURmzJztN/lOFjqdrzgMdeFxzw/bNKRpJow8CWpJqoW+Bv7XYCDrG7HCx5zXXjM86BSbfiSpKlVrYYvSZpCJQI/Ii6OiMciYltE/Eq/y9MrEXFSRHw9Ih6JiIci4heK+UdGxN0R8Xhxf0QxPyLi94vn4cGIeFd/j2BuImIoIh6IiDuK6bUR8c3ieP8yIkaK+UuK6W3F8jX9LHcZEXF4RNwUEY8W5/u8GpznXype11sj4i8iYmnVznVE/I+IeDEitnbMm/V5jYgri/Ufj4gr51qegQ/8iBgC/hD4CeB04OMRcXp/S9Uz48AvZ+Y7gHOBa4pj+xXg3sxcB9xbTEPzOVhX3NYDXzz4Re6JXwAe6Zj+beB3i+N9BbiqmH8V8EpmngL8brHeoPrPwN9k5tuBd9I8/sqe54g4AfgUMJqZZwJDwOVU71z/MXDxW+bN6rxGxJHAZ4F/BpwDfLb1JjFrmTnQN+A84M6O6WuBa/tdrnk61tuAC4HHgOOKeccBjxWPbwA+3rF+e71BuQEnFv8E/wK4AwiaX0YZfuv5Bu4EziseDxfrRb+PYQ7H/CPAU28te8XP8wnAs8CRxbm7A7ioiucaWANsnet5BT4O3NAx/wfWm81t4Gv4TL5wWrYX8yql+Aj7Y8A3gWMzcwdAcX9MsVoVnovfAz4NNIrpo4BXM3O8mO48pvbxFst3FesPmpOBncD/LJqy/ltErKDC5zkznwM+B3wX2EHz3G2m+ucaZn9ee3a+qxD40WVepboeRcQhwM3AL2bm7ulW7TJvYJ6LiLgEeDEzN3fO7rJqHsCyQTIMvAv4Ymb+GPAGkx/zuxn44y6aJC4D1gLHAytoNmm8VdXO9XSmOsaeHXsVAn87cFLH9InA830qS89FxGKaYX9jZn65mP1CRBxXLD8OeLGYP+jPxT8HLo2Ip4Ev0WzW+T3g8IgYLtbpPKb28RbLDwNePpgF7pHtwPbM/GYxfRPNN4CqnmeAC4CnMnNnZu4Hvgy8h+qfa5j9ee3Z+a5C4P8jsK64uj9C88LP7X0uU09ERAD/HXgkMz/fseh2oHWl/kqabfut+VcUV/vPBXa1PjoOgsy8NjNPzMw1NM/j/8nMnwG+Dny0WO2tx9t6Hj5arD9wtb7M/B7wbEScVsz6IPAwFT3Phe8C50bE8uJ13jrmSp/rwmzP653AhyLiiOKT0YeKebPX7wsaPboo8mHgO8ATwIZ+l6eHx/Vemh/dHgS2FLcP02y7vBd4vLg/slg/aPZYegL4J5o9IPp+HHM89vOBO4rHJwPfArYB/xtYUsxfWkxvK5af3O9ylzjes4FNxbm+FTii6ucZ+I/Ao8BW4E+BJVU718Bf0LxGsZ9mTf2quZxX4JPFsW8DPjHX8vhNW0mqiSo06UiSDoCBL0k1YeBLUk0Y+JJUEwa+JNWEgS9JNWHgS1JNGPiSVBP/HxEoBxSaek2tAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "logmasses = list(map(lambda x: x, result.get_total_logmasses())) \n",
    "# logmasses = log of probability at iteration x (x-axis), the highest possible value of log probabilty is 1\n",
    "# this illustrates how \"likely is this sampled data point assuming my model is correct\"\n",
    "plt.plot(logmasses)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Other Practical Notes:\n",
    "- MH results are dependent on initiation (for a given number of iteration)\n",
    "    => combine results from multiple initializations or\n",
    "    => set a burn in period to reduce the influence of initial values\n",
    "    \n",
    "- MH results are not i.i.d (state at t+1 is dependent on state at t)\n",
    "    => set skip variable to reduce auto-correlation (Skip of s means we keep every s data point sampled).\n",
    "    => verify with correlogram\n",
    "    \n",
    "- MH may not be suitable for bi-modal / multi-modal distributions.\n",
    "    => intuitivly, if MH finds a high density area, there is little incentive for it to bounce/sample through a low density area to another peak, this is similar to getting stuck in a bowl/local minima in gradient decent in deep learning\n",
    "    => Hamiltonian Monte Carlo might be a better choice.\n",
    "\n",
    "- MH may be slow to converge for high dimensional posteriors. Consider Gibbs-sampling if not multimodal."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python classifier2",
   "language": "python",
   "name": "classifers2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}