rasmusbergpalm/bayes-nb.ipynb

## bayes-nb.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Bayesian Naive bayes\n",
    "\n",
    "The data set $D$ consists of $n$ pairs of $x$ and $y$\n",
    "\n",
    "$$ D = ((x^{(1)}, y_1), ..., (x^{(n)}, y_n)) $$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$x^{(i)}$ is the counts of words in document $i$. There are $d$ different words in the data set.\n",
    "\n",
    "$$ x^{(i)} = (x^{(i)}_1, ..., x^{(i)}_d) $$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$y_i$ is the sender of document $i$. There are $m$ different senders in the data set.\n",
    "\n",
    "$$ y_i \\in \\{1,...,m\\} $$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The distribution for the sender $y$ is a categorical distribution, parameterized by $\\theta$ probabilities of $y$ being each of the $m$ different senders\n",
    "\n",
    "$$ \n",
    "p(y|\\theta) =\n",
    "Cat(y|\\theta) =\n",
    "\\prod_{k=1}^m \\theta_k^{[y=k]}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The distribution for the counts of words $x$ in a document is the multinomial distribution parameterized by $\\phi_{yj}$, the probability of word $j$ appearing in documents from sender $y$.\n",
    "\n",
    "$$ \n",
    "p(x|y,\\phi) =\n",
    "Mult(x|y,\\phi) =\n",
    "\\frac\n",
    "{\\left(\\sum_{j=1}^V x_j\\right)!}\n",
    "{\\prod_{j=1}^V x_j!}\n",
    "\\prod_{j=1}^d \\phi_{yj}^{x_j} \n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For reasons that are not at all obvious now, but will come in handy later, we'll re-write this as\n",
    "\n",
    "$$ p(x|y,\\phi) =\n",
    "\\frac\n",
    "{\\left(\\sum_{j=1}^V x_j\\right)!}\n",
    "{\\prod_{j=1}^V x_j!}\n",
    "\\prod_{k=1}^m\n",
    "\\prod_{j=1}^d \n",
    "\\phi_{kj}^{x_j[k=y]} \n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For the priors on $\\theta$ and $\\phi$ we'll use dirichlet distributions as they're the conjugate priors for both the  categorical and the multinomial distribution. The priors are parameterized by the hyper parameters $\\alpha$ and $\\beta$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$ \n",
    "p(\\theta) = \n",
    "Dir(\\theta|\\alpha) \\propto\n",
    "\\prod_{k=1}^m \\theta_k^{\\alpha_k-1} \n",
    "$$\n",
    "\n",
    "$$ \n",
    "p(\\phi) =\n",
    "\\prod_{k=1}^m p(\\phi_k) =\n",
    "\\prod_{k=1}^m Dir(\\phi_k|\\beta_k) \\propto\n",
    "\\prod_{k=1}^m\n",
    "\\prod_{j=1}^d \n",
    "\\phi_{kj}^{\\beta_{kj}-1} \n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll find the posterior of the parameters first\n",
    "\n",
    "$$\n",
    "p(\\theta, \\phi | D) \\propto p(\\theta)p(\\phi)p(D|\\theta, \\phi)\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Assuming the data is i.i.d\n",
    "\n",
    "$$\n",
    "p(\\theta, \\phi | D) \\propto p(\\theta)p(\\phi)\\prod_{i=1}^n p(x^{(i)}|y_i, \\phi)p(y_i|\\theta)\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Re-arranging\n",
    "\n",
    "$$\n",
    "p(\\theta, \\phi | D) \\propto \n",
    "\\left[\n",
    "p(\\theta)\n",
    "\\prod_{i=1}^n \n",
    "p(y_i|\\theta)\n",
    "\\right]\n",
    "\\left[\n",
    "p(\\phi)\n",
    "\\prod_{i=1}^n \n",
    "p(x^{(i)}|y_i, \\phi)\n",
    "\\right]\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll handle the leftmost bracket first. Inserting the definitions.\n",
    "\n",
    "\n",
    "$$\n",
    "p(\\theta, \\phi | D) \\propto \n",
    "\\left[\n",
    "\\prod_{k=1}^m \\theta_k^{\\alpha_k-1}\n",
    "\\prod_{i=1}^n \n",
    "\\prod_{k=1}^m \\theta_k^{[y=k]}\n",
    "\\right]\n",
    "\\left[\n",
    "p(\\phi)\n",
    "\\prod_{i=1}^n \n",
    "p(x^{(i)}|y_i, \\phi)\n",
    "\\right]\n",
    "$$\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Introducing $c_k = \\sum_{i=1}^n[y_i=k]$, the count of sender $k$ in the data set\n",
    "\n",
    "$$\n",
    "p(\\theta, \\phi | D) \\propto \n",
    "\\left[\n",
    "\\prod_{k=1}^m \\theta_k^{\\alpha_k-1}\n",
    "\\prod_{k=1}^m \\theta_k^{c_k}\n",
    "\\right]\n",
    "\\left[\n",
    "p(\\phi)\n",
    "\\prod_{i=1}^n \n",
    "p(x^{(i)}|y_i, \\phi)\n",
    "\\right]\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Joining the products\n",
    "\n",
    "$$\n",
    "p(\\theta, \\phi | D) \\propto \n",
    "\\left[\n",
    "\\prod_{k=1}^m \n",
    "\\theta_k^{\\alpha_k + c_k -1}\n",
    "\\right]\n",
    "\\left[\n",
    "p(\\phi)\n",
    "\\prod_{i=1}^n \n",
    "p(x^{(i)}|y_i, \\phi)\n",
    "\\right]\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And now the right hand bracket. Inserting the definitions\n",
    "\n",
    "$$\n",
    "p(\\theta, \\phi | D) \\propto \n",
    "\\left[\n",
    "\\prod_{k=1}^m \n",
    "\\theta_k^{\\alpha_k + c_k -1}\n",
    "\\right]\n",
    "\\left[\n",
    "\\prod_{k=1}^m\n",
    "\\prod_{j=1}^d \n",
    "\\phi_{kj}^{\\beta_{kj}-1} \n",
    "\\prod_{i=1}^n\n",
    "\\frac\n",
    "{\\left(\\sum_{j=1}^V x^{(i)}_j\\right)!}\n",
    "{\\prod_{j=1}^V x^{(i)}_j!}\n",
    "\\prod_{k=1}^m\n",
    "\\prod_{j=1}^d \n",
    "\\phi_{kj}^{x^{(i)}_j[k=y_i]} \n",
    "\\right]\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Dropping the constant term\n",
    "\n",
    "$$\n",
    "p(\\theta, \\phi | D) \\propto \n",
    "\\left[\n",
    "\\prod_{k=1}^m \n",
    "\\theta_k^{\\alpha_k + c_k -1}\n",
    "\\right]\n",
    "\\left[\n",
    "\\prod_{k=1}^m\n",
    "\\prod_{j=1}^d \n",
    "\\phi_{kj}^{\\beta_{kj}-1} \n",
    "\\prod_{i=1}^n\n",
    "\\prod_{k=1}^m\n",
    "\\prod_{j=1}^d \n",
    "\\phi_{kj}^{x^{(i)}_j[k=y_i]} \n",
    "\\right]\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Introducing $w_{kj} = \\sum_{i=1}^n x^{(i)}_j[k=y_i]$, the sum of occurences of word $j$ in documents from sender $k$\n",
    "\n",
    "$$\n",
    "p(\\theta, \\phi | D) \\propto \n",
    "\\left[\n",
    "\\prod_{k=1}^m \n",
    "\\theta_k^{\\alpha_k + c_k -1}\n",
    "\\right]\n",
    "\\left[\n",
    "\\prod_{k=1}^m\n",
    "\\prod_{j=1}^d \n",
    "\\phi_{kj}^{\\beta_{kj}-1} \n",
    "\\prod_{k=1}^m\n",
    "\\prod_{j=1}^d \n",
    "\\phi_{kj}^{w_{kj}} \n",
    "\\right]\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Joining the products\n",
    "\n",
    "$$\n",
    "p(\\theta, \\phi | D) \\propto \n",
    "\\left[\n",
    "\\prod_{k=1}^m \n",
    "\\theta_k^{\\alpha_k + c_k -1}\n",
    "\\right]\n",
    "\\left[\n",
    "\\prod_{k=1}^m\n",
    "\\prod_{j=1}^d \n",
    "\\phi_{kj}^{\\beta_{kj} + w_{kj} - 1} \n",
    "\\right]\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We're interested in the posterior predictive distribution, i.e. given a new input $\\tilde{x}$ what is the distribution of senders $\\tilde{y}$, conditioned on the data set $D$.\n",
    "\n",
    "$$ p(\\tilde{y} | \\tilde{x}, D) \\propto \\int \\int p(\\tilde{x}, \\tilde{y} | \\theta, \\phi, D) p(\\theta, \\phi | D) d\\theta d\\phi $$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Re-arranging\n",
    "\n",
    "$$\n",
    "p(\\tilde{y} | \\tilde{x}, D) \\propto \n",
    "\\int \n",
    "p(\\tilde{y} | \\theta)\n",
    "p(\\theta | D) \n",
    "d\\theta \n",
    "\\int \n",
    "p(\\tilde{x} | \\tilde{y}, \\phi)\n",
    "p(\\phi | D) \n",
    "d\\phi\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll handle the integral over $\\theta$ first. Inserting the definitions\n",
    "\n",
    "$$\n",
    "p(\\tilde{y} | \\tilde{x}, D) \\propto \n",
    "\\int \n",
    "\\prod_{k=1}^m \\theta_k^{[\\tilde{y}=k]}\n",
    "\\prod_{k=1}^m \n",
    "\\theta_k^{\\alpha_k + c_k -1}\n",
    "d\\theta \n",
    "\\int \n",
    "p(\\tilde{x} | \\tilde{y}, \\phi)\n",
    "p(\\phi | D) \n",
    "d\\phi\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Joining the products\n",
    "\n",
    "$$\n",
    "p(\\tilde{y} | \\tilde{x}, D) \\propto \n",
    "\\int \n",
    "\\prod_{k=1}^m \n",
    "\\theta_k^{\\alpha_k + c_k + [\\tilde{y}=k] - 1}\n",
    "d\\theta \n",
    "\\int \n",
    "p(\\tilde{x} | \\tilde{y}, \\phi)\n",
    "p(\\phi | D) \n",
    "d\\phi\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is an integral over the un-normalized $Dir(\\theta|\\alpha_k + c_k + [\\tilde{y}=k])$ distribution.\n",
    "\n",
    "Using $\\int \\frac{1}{Z}p(x)dx = 1 \\implies \\int p(x) dx = Z$\n",
    "\n",
    "$$\n",
    "p(\\tilde{y} | \\tilde{x}, D) \\propto \n",
    "\\frac\n",
    "{\\prod_{k=1}^m\\Gamma(\\alpha_k + c_k + [\\tilde{y}=k])}\n",
    "{\\Gamma \\left( \\sum_{k=1}^m \\alpha_k + c_k + [\\tilde{y}=k] \\right)}\n",
    "\\int \n",
    "p(\\tilde{x} | \\tilde{y}, \\phi)\n",
    "p(\\phi | D) \n",
    "d\\phi\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using $\\Gamma(x+1) = x\\Gamma(x)$\n",
    "\n",
    "$$\n",
    "p(\\tilde{y} | \\tilde{x}, D) \\propto \n",
    "\\frac\n",
    "{(\\alpha_\\tilde{y} + c_\\tilde{y})}\n",
    "{(\\sum_{k=1}^m \\alpha_k + c_k)}\n",
    "\\frac\n",
    "{\\prod_{k=1}^m\\Gamma(\\alpha_k + c_k)}\n",
    "{\\Gamma \\left( \\sum_{k=1}^m \\alpha_k + c_k \\right)}\n",
    "\\int \n",
    "p(\\tilde{x} | \\tilde{y}, \\phi)\n",
    "p(\\phi | D) \n",
    "d\\phi\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Dropping the constant terms\n",
    "\n",
    "$$\n",
    "p(\\tilde{y} | \\tilde{x}, D) \\propto \n",
    "(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
    "\\int \n",
    "p(\\tilde{x} | \\tilde{y}, \\phi)\n",
    "p(\\phi | D) \n",
    "d\\phi\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now for the integral over $\\phi$. Inserting the definitions\n",
    "\n",
    "$$\n",
    "p(\\tilde{y} | \\tilde{x}, D) \\propto \n",
    "(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
    "\\int \n",
    "\\left[\n",
    "\\frac\n",
    "{\\left(\\sum_{j=1}^V \\tilde{x}_j\\right)!}\n",
    "{\\prod_{j=1}^V \\tilde{x}_j!}\n",
    "\\prod_{k=1}^m\n",
    "\\prod_{j=1}^d \n",
    "\\phi_{kj}^{\\tilde{x}_j[k=\\tilde{y}]} \n",
    "\\right]\n",
    "\\left[\n",
    "\\prod_{k=1}^m\n",
    "\\prod_{j=1}^d \n",
    "\\phi_{kj}^{\\beta_{kj} + w_{kj} - 1} \n",
    "\\right]\n",
    "d\\phi\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Dropping the constant term\n",
    "\n",
    "$$\n",
    "p(\\tilde{y} | \\tilde{x}, D) \\propto \n",
    "(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
    "\\int \n",
    "\\left[\n",
    "\\prod_{k=1}^m\n",
    "\\prod_{j=1}^d \n",
    "\\phi_{kj}^{\\tilde{x}_j[k=\\tilde{y}]} \n",
    "\\right]\n",
    "\\left[\n",
    "\\prod_{k=1}^m\n",
    "\\prod_{j=1}^d \n",
    "\\phi_{kj}^{\\beta_{kj} + w_{kj} - 1} \n",
    "\\right]\n",
    "d\\phi\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Joining the products, and moving the product over $k$ outside the integrals\n",
    "\n",
    "$$\n",
    "p(\\tilde{y} | \\tilde{x}, D) \\propto \n",
    "(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
    "\\prod_{k=1}^m\n",
    "\\int \n",
    "\\left[\n",
    "\\prod_{j=1}^d \n",
    "\\phi_{kj}^{\\beta_{kj} + w_{kj} + \\tilde{x}_j[k=\\tilde{y}] - 1} \n",
    "\\right]\n",
    "d\\phi\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is an integral over the un-normalized $Dir(\\theta_k|\\beta_{k} + w_{k} + \\tilde{x}_j[k=\\tilde{y}])$ distribution\n",
    "\n",
    "$$\n",
    "p(\\tilde{y} | \\tilde{x}, D) \\propto \n",
    "(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
    "\\prod_{k=1}^m\n",
    "\\frac\n",
    "{\\prod_{j=1}^d\\Gamma \\left( \\beta_{kj} + w_{kj} + \\tilde{x}_j[k=\\tilde{y}] \\right)}\n",
    "{\\Gamma \\left( \\sum_{j=1}^d \\beta_{kj} + w_{kj} + \\tilde{x}_j[k=\\tilde{y}] \\right)}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Splitting the sum in the denominator\n",
    "\n",
    "$$\n",
    "p(\\tilde{y} | \\tilde{x}, D) \\propto \n",
    "(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
    "\\prod_{k=1}^m\n",
    "\\frac\n",
    "{\\prod_{j=1}^d\\Gamma \\left( \\beta_{kj} + w_{kj} + \\tilde{x}_j[k=\\tilde{y}] \\right)}\n",
    "{\\Gamma \\left( \\sum_{j=1}^d (\\beta_{kj} + w_{kj}) + \\sum_{j=1}^d \\tilde{x}_j[k=\\tilde{y}] \\right)}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using $\\Gamma(x+n) = x^{(n)}\\Gamma(x)$, where $x^{(n)} = x(x+1)...x(x+n-1)$ denotes the rising factorial\n",
    "\n",
    "$$\n",
    "p(\\tilde{y} | \\tilde{x}, D) \\propto \n",
    "(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
    "\\prod_{k=1}^m\n",
    "\\frac\n",
    "{\\prod_{j=1}^d (\\beta_{kj} + w_{kj})^{(\\tilde{x}_j[k=\\tilde{y}])}}\n",
    "{\\left(\\sum_{j=1}^d \\beta_{kj} + w_{kj}\\right)^{(\\sum_{j=1}^d \\tilde{x}_j[k=\\tilde{y}])}}\n",
    "\\frac\n",
    "{\\prod_{j=1}^d \\Gamma \\left( \\beta_{kj} + w_{kj} \\right)}\n",
    "{\\Gamma \\left( \\sum_{j=1}^d \\beta_{kj} + w_{kj} \\right)}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Dropping the constant terms\n",
    "\n",
    "$$\n",
    "p(\\tilde{y} | \\tilde{x}, D) \\propto \n",
    "(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
    "\\prod_{k=1}^m\n",
    "\\frac\n",
    "{\\prod_{j=1}^d(\\beta_{kj} + w_{kj})^{(\\tilde{x}_j[k=\\tilde{y}])}}\n",
    "{\\left(\\sum_{j=1}^d \\beta_{kj} + w_{kj}\\right)^{(\\sum_{j=1}^d \\tilde{x}_j[k=\\tilde{y}])}}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using $x^{(0)} = 1$ by definition\n",
    "\n",
    "$$\n",
    "p(\\tilde{y} | \\tilde{x}, D) \\propto \n",
    "(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
    "\\frac\n",
    "{\\prod_{j=1}^d(\\beta_{\\tilde{y}j} + w_{\\tilde{y}j})^{(\\tilde{x}_j)}}\n",
    "{\\left(\\sum_{j=1}^d \\beta_{\\tilde{y}j} + w_{\\tilde{y}j}\\right)^{(\\sum_{j=1}^d\\tilde{x}_j)}}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using $x^{(n)} = \\frac{\\Gamma(x+n)}{\\Gamma(x)}$\n",
    "\n",
    "\n",
    "$$\n",
    "p(\\tilde{y} | \\tilde{x}, D) \\propto \n",
    "(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
    "\\frac\n",
    "{\n",
    "  \\prod_{j=1}^d\n",
    "  \\frac{\\Gamma((\\beta_{\\tilde{y}j} + w_{\\tilde{y}j}) + \\tilde{x}_j)}{\\Gamma((\\beta_{\\tilde{y}j} + w_{\\tilde{y}j}))}\n",
    "}\n",
    "{\n",
    "  \\frac{\\Gamma\\left(\\left(\\sum_{j=1}^d \\beta_{\\tilde{y}j} + w_{\\tilde{y}j}\\right) + \\sum_{j=1}^d\\tilde{x}_j\\right)}{\\Gamma\\left(\\sum_{j=1}^d \\beta_{\\tilde{y}j} + w_{\\tilde{y}j}\\right)}\n",
    "}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Simplifying \n",
    "\n",
    "$$\n",
    "p(\\tilde{y} | \\tilde{x}, D) \\propto \n",
    "(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
    "\\left[\n",
    "\\prod_{j=1}^d\n",
    "\\frac{\\Gamma(\\beta_{\\tilde{y}j} + w_{\\tilde{y}j} + \\tilde{x}_j)}{\\Gamma(\\beta_{\\tilde{y}j} + w_{\\tilde{y}j})}\n",
    "\\right]\n",
    "\\frac{\\Gamma\\left(\\sum_{j=1}^d \\beta_{\\tilde{y}j} + w_{\\tilde{y}j}\\right)}{\\Gamma(\\left(\\sum_{j=1}^d \\beta_{\\tilde{y}j} + w_{\\tilde{y}j}\\right) + \\sum_{j=1}^d\\tilde{x}_j)}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Simplifying \n",
    "\n",
    "$$\n",
    "p(\\tilde{y} | \\tilde{x}, D) \\propto \n",
    "(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
    "\\left[\n",
    "\\prod_{j=1}^d\n",
    "\\frac{\\Gamma(\\beta_{\\tilde{y}j} + w_{\\tilde{y}j} + \\tilde{x}_j)}{\\Gamma(\\beta_{\\tilde{y}j} + w_{\\tilde{y}j})}\n",
    "\\right]\n",
    "\\frac{\\Gamma\\left(\\sum_{j=1}^d \\beta_{\\tilde{y}j} + w_{\\tilde{y}j}\\right)}\n",
    "{\\Gamma\\left(\\sum_{j=1}^d \\beta_{\\tilde{y}j} + w_{\\tilde{y}j} + \\tilde{x}_j\\right)}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Simplifying \n",
    "\n",
    "$$\n",
    "p(\\tilde{y} | \\tilde{x}, D) \\propto \n",
    "(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
    "\\frac\n",
    "{\\prod_{j=1}^d\\Gamma(\\beta_{\\tilde{y}j} + w_{\\tilde{y}j} + \\tilde{x}_j)}\n",
    "{\\Gamma\\left(\\sum_{j=1}^d \\beta_{\\tilde{y}j} + w_{\\tilde{y}j} + \\tilde{x}_j\\right)}\n",
    "\\frac\n",
    "{\\Gamma\\left(\\sum_{j=1}^d \\beta_{\\tilde{y}j} + w_{\\tilde{y}j}\\right)}\n",
    "{\\prod_{j=1}^d\\Gamma(\\beta_{\\tilde{y}j} + w_{\\tilde{y}j})}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Simplifying, where $B$ is the multivariate beta function over $j$\n",
    "\n",
    "$$\n",
    "p(\\tilde{y} | \\tilde{x}, D) \\propto \n",
    "(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
    "\\frac\n",
    "{B\\left( \\beta_{\\tilde{y}j} + w_{\\tilde{y}j} + \\tilde{x}_j \\right)}\n",
    "{B\\left(\\beta_{\\tilde{y}j} + w_{\\tilde{y}j}\\right)}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Bayesian Naive bayes\n",
	"\n",
	"The data set $D$ consists of $n$ pairs of $x$ and $y$\n",
	"\n",
	"$$ D = ((x^{(1)}, y_1), ..., (x^{(n)}, y_n)) $$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"$x^{(i)}$ is the counts of words in document $i$. There are $d$ different words in the data set.\n",
	"\n",
	"$$ x^{(i)} = (x^{(i)}_1, ..., x^{(i)}_d) $$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"$y_i$ is the sender of document $i$. There are $m$ different senders in the data set.\n",
	"\n",
	"$$ y_i \\in \\{1,...,m\\} $$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"The distribution for the sender $y$ is a categorical distribution, parameterized by $\\theta$ probabilities of $y$ being each of the $m$ different senders\n",
	"\n",
	"$$ \n",
	"p(y\|\\theta) =\n",
	"Cat(y\|\\theta) =\n",
	"\\prod_{k=1}^m \\theta_k^{[y=k]}\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"The distribution for the counts of words $x$ in a document is the multinomial distribution parameterized by $\\phi_{yj}$, the probability of word $j$ appearing in documents from sender $y$.\n",
	"\n",
	"$$ \n",
	"p(x\|y,\\phi) =\n",
	"Mult(x\|y,\\phi) =\n",
	"\\frac\n",
	"{\\left(\\sum_{j=1}^V x_j\\right)!}\n",
	"{\\prod_{j=1}^V x_j!}\n",
	"\\prod_{j=1}^d \\phi_{yj}^{x_j} \n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"For reasons that are not at all obvious now, but will come in handy later, we'll re-write this as\n",
	"\n",
	"$$ p(x\|y,\\phi) =\n",
	"\\frac\n",
	"{\\left(\\sum_{j=1}^V x_j\\right)!}\n",
	"{\\prod_{j=1}^V x_j!}\n",
	"\\prod_{k=1}^m\n",
	"\\prod_{j=1}^d \n",
	"\\phi_{kj}^{x_j[k=y]} \n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"For the priors on $\\theta$ and $\\phi$ we'll use dirichlet distributions as they're the conjugate priors for both the categorical and the multinomial distribution. The priors are parameterized by the hyper parameters $\\alpha$ and $\\beta$."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"$$ \n",
	"p(\\theta) = \n",
	"Dir(\\theta\|\\alpha) \\propto\n",
	"\\prod_{k=1}^m \\theta_k^{\\alpha_k-1} \n",
	"$$\n",
	"\n",
	"$$ \n",
	"p(\\phi) =\n",
	"\\prod_{k=1}^m p(\\phi_k) =\n",
	"\\prod_{k=1}^m Dir(\\phi_k\|\\beta_k) \\propto\n",
	"\\prod_{k=1}^m\n",
	"\\prod_{j=1}^d \n",
	"\\phi_{kj}^{\\beta_{kj}-1} \n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We'll find the posterior of the parameters first\n",
	"\n",
	"$$\n",
	"p(\\theta, \\phi \| D) \\propto p(\\theta)p(\\phi)p(D\|\\theta, \\phi)\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Assuming the data is i.i.d\n",
	"\n",
	"$$\n",
	"p(\\theta, \\phi \| D) \\propto p(\\theta)p(\\phi)\\prod_{i=1}^n p(x^{(i)}\|y_i, \\phi)p(y_i\|\\theta)\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Re-arranging\n",
	"\n",
	"$$\n",
	"p(\\theta, \\phi \| D) \\propto \n",
	"\\left[\n",
	"p(\\theta)\n",
	"\\prod_{i=1}^n \n",
	"p(y_i\|\\theta)\n",
	"\\right]\n",
	"\\left[\n",
	"p(\\phi)\n",
	"\\prod_{i=1}^n \n",
	"p(x^{(i)}\|y_i, \\phi)\n",
	"\\right]\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We'll handle the leftmost bracket first. Inserting the definitions.\n",
	"\n",
	"\n",
	"$$\n",
	"p(\\theta, \\phi \| D) \\propto \n",
	"\\left[\n",
	"\\prod_{k=1}^m \\theta_k^{\\alpha_k-1}\n",
	"\\prod_{i=1}^n \n",
	"\\prod_{k=1}^m \\theta_k^{[y=k]}\n",
	"\\right]\n",
	"\\left[\n",
	"p(\\phi)\n",
	"\\prod_{i=1}^n \n",
	"p(x^{(i)}\|y_i, \\phi)\n",
	"\\right]\n",
	"$$\n",
	"\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Introducing $c_k = \\sum_{i=1}^n[y_i=k]$, the count of sender $k$ in the data set\n",
	"\n",
	"$$\n",
	"p(\\theta, \\phi \| D) \\propto \n",
	"\\left[\n",
	"\\prod_{k=1}^m \\theta_k^{\\alpha_k-1}\n",
	"\\prod_{k=1}^m \\theta_k^{c_k}\n",
	"\\right]\n",
	"\\left[\n",
	"p(\\phi)\n",
	"\\prod_{i=1}^n \n",
	"p(x^{(i)}\|y_i, \\phi)\n",
	"\\right]\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Joining the products\n",
	"\n",
	"$$\n",
	"p(\\theta, \\phi \| D) \\propto \n",
	"\\left[\n",
	"\\prod_{k=1}^m \n",
	"\\theta_k^{\\alpha_k + c_k -1}\n",
	"\\right]\n",
	"\\left[\n",
	"p(\\phi)\n",
	"\\prod_{i=1}^n \n",
	"p(x^{(i)}\|y_i, \\phi)\n",
	"\\right]\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"And now the right hand bracket. Inserting the definitions\n",
	"\n",
	"$$\n",
	"p(\\theta, \\phi \| D) \\propto \n",
	"\\left[\n",
	"\\prod_{k=1}^m \n",
	"\\theta_k^{\\alpha_k + c_k -1}\n",
	"\\right]\n",
	"\\left[\n",
	"\\prod_{k=1}^m\n",
	"\\prod_{j=1}^d \n",
	"\\phi_{kj}^{\\beta_{kj}-1} \n",
	"\\prod_{i=1}^n\n",
	"\\frac\n",
	"{\\left(\\sum_{j=1}^V x^{(i)}_j\\right)!}\n",
	"{\\prod_{j=1}^V x^{(i)}_j!}\n",
	"\\prod_{k=1}^m\n",
	"\\prod_{j=1}^d \n",
	"\\phi_{kj}^{x^{(i)}_j[k=y_i]} \n",
	"\\right]\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Dropping the constant term\n",
	"\n",
	"$$\n",
	"p(\\theta, \\phi \| D) \\propto \n",
	"\\left[\n",
	"\\prod_{k=1}^m \n",
	"\\theta_k^{\\alpha_k + c_k -1}\n",
	"\\right]\n",
	"\\left[\n",
	"\\prod_{k=1}^m\n",
	"\\prod_{j=1}^d \n",
	"\\phi_{kj}^{\\beta_{kj}-1} \n",
	"\\prod_{i=1}^n\n",
	"\\prod_{k=1}^m\n",
	"\\prod_{j=1}^d \n",
	"\\phi_{kj}^{x^{(i)}_j[k=y_i]} \n",
	"\\right]\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Introducing $w_{kj} = \\sum_{i=1}^n x^{(i)}_j[k=y_i]$, the sum of occurences of word $j$ in documents from sender $k$\n",
	"\n",
	"$$\n",
	"p(\\theta, \\phi \| D) \\propto \n",
	"\\left[\n",
	"\\prod_{k=1}^m \n",
	"\\theta_k^{\\alpha_k + c_k -1}\n",
	"\\right]\n",
	"\\left[\n",
	"\\prod_{k=1}^m\n",
	"\\prod_{j=1}^d \n",
	"\\phi_{kj}^{\\beta_{kj}-1} \n",
	"\\prod_{k=1}^m\n",
	"\\prod_{j=1}^d \n",
	"\\phi_{kj}^{w_{kj}} \n",
	"\\right]\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Joining the products\n",
	"\n",
	"$$\n",
	"p(\\theta, \\phi \| D) \\propto \n",
	"\\left[\n",
	"\\prod_{k=1}^m \n",
	"\\theta_k^{\\alpha_k + c_k -1}\n",
	"\\right]\n",
	"\\left[\n",
	"\\prod_{k=1}^m\n",
	"\\prod_{j=1}^d \n",
	"\\phi_{kj}^{\\beta_{kj} + w_{kj} - 1} \n",
	"\\right]\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We're interested in the posterior predictive distribution, i.e. given a new input $\\tilde{x}$ what is the distribution of senders $\\tilde{y}$, conditioned on the data set $D$.\n",
	"\n",
	"$$ p(\\tilde{y} \| \\tilde{x}, D) \\propto \\int \\int p(\\tilde{x}, \\tilde{y} \| \\theta, \\phi, D) p(\\theta, \\phi \| D) d\\theta d\\phi $$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Re-arranging\n",
	"\n",
	"$$\n",
	"p(\\tilde{y} \| \\tilde{x}, D) \\propto \n",
	"\\int \n",
	"p(\\tilde{y} \| \\theta)\n",
	"p(\\theta \| D) \n",
	"d\\theta \n",
	"\\int \n",
	"p(\\tilde{x} \| \\tilde{y}, \\phi)\n",
	"p(\\phi \| D) \n",
	"d\\phi\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We'll handle the integral over $\\theta$ first. Inserting the definitions\n",
	"\n",
	"$$\n",
	"p(\\tilde{y} \| \\tilde{x}, D) \\propto \n",
	"\\int \n",
	"\\prod_{k=1}^m \\theta_k^{[\\tilde{y}=k]}\n",
	"\\prod_{k=1}^m \n",
	"\\theta_k^{\\alpha_k + c_k -1}\n",
	"d\\theta \n",
	"\\int \n",
	"p(\\tilde{x} \| \\tilde{y}, \\phi)\n",
	"p(\\phi \| D) \n",
	"d\\phi\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Joining the products\n",
	"\n",
	"$$\n",
	"p(\\tilde{y} \| \\tilde{x}, D) \\propto \n",
	"\\int \n",
	"\\prod_{k=1}^m \n",
	"\\theta_k^{\\alpha_k + c_k + [\\tilde{y}=k] - 1}\n",
	"d\\theta \n",
	"\\int \n",
	"p(\\tilde{x} \| \\tilde{y}, \\phi)\n",
	"p(\\phi \| D) \n",
	"d\\phi\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"This is an integral over the un-normalized $Dir(\\theta\|\\alpha_k + c_k + [\\tilde{y}=k])$ distribution.\n",
	"\n",
	"Using $\\int \\frac{1}{Z}p(x)dx = 1 \\implies \\int p(x) dx = Z$\n",
	"\n",
	"$$\n",
	"p(\\tilde{y} \| \\tilde{x}, D) \\propto \n",
	"\\frac\n",
	"{\\prod_{k=1}^m\\Gamma(\\alpha_k + c_k + [\\tilde{y}=k])}\n",
	"{\\Gamma \\left( \\sum_{k=1}^m \\alpha_k + c_k + [\\tilde{y}=k] \\right)}\n",
	"\\int \n",
	"p(\\tilde{x} \| \\tilde{y}, \\phi)\n",
	"p(\\phi \| D) \n",
	"d\\phi\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Using $\\Gamma(x+1) = x\\Gamma(x)$\n",
	"\n",
	"$$\n",
	"p(\\tilde{y} \| \\tilde{x}, D) \\propto \n",
	"\\frac\n",
	"{(\\alpha_\\tilde{y} + c_\\tilde{y})}\n",
	"{(\\sum_{k=1}^m \\alpha_k + c_k)}\n",
	"\\frac\n",
	"{\\prod_{k=1}^m\\Gamma(\\alpha_k + c_k)}\n",
	"{\\Gamma \\left( \\sum_{k=1}^m \\alpha_k + c_k \\right)}\n",
	"\\int \n",
	"p(\\tilde{x} \| \\tilde{y}, \\phi)\n",
	"p(\\phi \| D) \n",
	"d\\phi\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Dropping the constant terms\n",
	"\n",
	"$$\n",
	"p(\\tilde{y} \| \\tilde{x}, D) \\propto \n",
	"(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
	"\\int \n",
	"p(\\tilde{x} \| \\tilde{y}, \\phi)\n",
	"p(\\phi \| D) \n",
	"d\\phi\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Now for the integral over $\\phi$. Inserting the definitions\n",
	"\n",
	"$$\n",
	"p(\\tilde{y} \| \\tilde{x}, D) \\propto \n",
	"(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
	"\\int \n",
	"\\left[\n",
	"\\frac\n",
	"{\\left(\\sum_{j=1}^V \\tilde{x}_j\\right)!}\n",
	"{\\prod_{j=1}^V \\tilde{x}_j!}\n",
	"\\prod_{k=1}^m\n",
	"\\prod_{j=1}^d \n",
	"\\phi_{kj}^{\\tilde{x}_j[k=\\tilde{y}]} \n",
	"\\right]\n",
	"\\left[\n",
	"\\prod_{k=1}^m\n",
	"\\prod_{j=1}^d \n",
	"\\phi_{kj}^{\\beta_{kj} + w_{kj} - 1} \n",
	"\\right]\n",
	"d\\phi\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Dropping the constant term\n",
	"\n",
	"$$\n",
	"p(\\tilde{y} \| \\tilde{x}, D) \\propto \n",
	"(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
	"\\int \n",
	"\\left[\n",
	"\\prod_{k=1}^m\n",
	"\\prod_{j=1}^d \n",
	"\\phi_{kj}^{\\tilde{x}_j[k=\\tilde{y}]} \n",
	"\\right]\n",
	"\\left[\n",
	"\\prod_{k=1}^m\n",
	"\\prod_{j=1}^d \n",
	"\\phi_{kj}^{\\beta_{kj} + w_{kj} - 1} \n",
	"\\right]\n",
	"d\\phi\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Joining the products, and moving the product over $k$ outside the integrals\n",
	"\n",
	"$$\n",
	"p(\\tilde{y} \| \\tilde{x}, D) \\propto \n",
	"(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
	"\\prod_{k=1}^m\n",
	"\\int \n",
	"\\left[\n",
	"\\prod_{j=1}^d \n",
	"\\phi_{kj}^{\\beta_{kj} + w_{kj} + \\tilde{x}_j[k=\\tilde{y}] - 1} \n",
	"\\right]\n",
	"d\\phi\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"This is an integral over the un-normalized $Dir(\\theta_k\|\\beta_{k} + w_{k} + \\tilde{x}_j[k=\\tilde{y}])$ distribution\n",
	"\n",
	"$$\n",
	"p(\\tilde{y} \| \\tilde{x}, D) \\propto \n",
	"(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
	"\\prod_{k=1}^m\n",
	"\\frac\n",
	"{\\prod_{j=1}^d\\Gamma \\left( \\beta_{kj} + w_{kj} + \\tilde{x}_j[k=\\tilde{y}] \\right)}\n",
	"{\\Gamma \\left( \\sum_{j=1}^d \\beta_{kj} + w_{kj} + \\tilde{x}_j[k=\\tilde{y}] \\right)}\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Splitting the sum in the denominator\n",
	"\n",
	"$$\n",
	"p(\\tilde{y} \| \\tilde{x}, D) \\propto \n",
	"(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
	"\\prod_{k=1}^m\n",
	"\\frac\n",
	"{\\prod_{j=1}^d\\Gamma \\left( \\beta_{kj} + w_{kj} + \\tilde{x}_j[k=\\tilde{y}] \\right)}\n",
	"{\\Gamma \\left( \\sum_{j=1}^d (\\beta_{kj} + w_{kj}) + \\sum_{j=1}^d \\tilde{x}_j[k=\\tilde{y}] \\right)}\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Using $\\Gamma(x+n) = x^{(n)}\\Gamma(x)$, where $x^{(n)} = x(x+1)...x(x+n-1)$ denotes the rising factorial\n",
	"\n",
	"$$\n",
	"p(\\tilde{y} \| \\tilde{x}, D) \\propto \n",
	"(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
	"\\prod_{k=1}^m\n",
	"\\frac\n",
	"{\\prod_{j=1}^d (\\beta_{kj} + w_{kj})^{(\\tilde{x}_j[k=\\tilde{y}])}}\n",
	"{\\left(\\sum_{j=1}^d \\beta_{kj} + w_{kj}\\right)^{(\\sum_{j=1}^d \\tilde{x}_j[k=\\tilde{y}])}}\n",
	"\\frac\n",
	"{\\prod_{j=1}^d \\Gamma \\left( \\beta_{kj} + w_{kj} \\right)}\n",
	"{\\Gamma \\left( \\sum_{j=1}^d \\beta_{kj} + w_{kj} \\right)}\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Dropping the constant terms\n",
	"\n",
	"$$\n",
	"p(\\tilde{y} \| \\tilde{x}, D) \\propto \n",
	"(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
	"\\prod_{k=1}^m\n",
	"\\frac\n",
	"{\\prod_{j=1}^d(\\beta_{kj} + w_{kj})^{(\\tilde{x}_j[k=\\tilde{y}])}}\n",
	"{\\left(\\sum_{j=1}^d \\beta_{kj} + w_{kj}\\right)^{(\\sum_{j=1}^d \\tilde{x}_j[k=\\tilde{y}])}}\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Using $x^{(0)} = 1$ by definition\n",
	"\n",
	"$$\n",
	"p(\\tilde{y} \| \\tilde{x}, D) \\propto \n",
	"(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
	"\\frac\n",
	"{\\prod_{j=1}^d(\\beta_{\\tilde{y}j} + w_{\\tilde{y}j})^{(\\tilde{x}_j)}}\n",
	"{\\left(\\sum_{j=1}^d \\beta_{\\tilde{y}j} + w_{\\tilde{y}j}\\right)^{(\\sum_{j=1}^d\\tilde{x}_j)}}\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Using $x^{(n)} = \\frac{\\Gamma(x+n)}{\\Gamma(x)}$\n",
	"\n",
	"\n",
	"$$\n",
	"p(\\tilde{y} \| \\tilde{x}, D) \\propto \n",
	"(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
	"\\frac\n",
	"{\n",
	" \\prod_{j=1}^d\n",
	" \\frac{\\Gamma((\\beta_{\\tilde{y}j} + w_{\\tilde{y}j}) + \\tilde{x}_j)}{\\Gamma((\\beta_{\\tilde{y}j} + w_{\\tilde{y}j}))}\n",
	"}\n",
	"{\n",
	" \\frac{\\Gamma\\left(\\left(\\sum_{j=1}^d \\beta_{\\tilde{y}j} + w_{\\tilde{y}j}\\right) + \\sum_{j=1}^d\\tilde{x}_j\\right)}{\\Gamma\\left(\\sum_{j=1}^d \\beta_{\\tilde{y}j} + w_{\\tilde{y}j}\\right)}\n",
	"}\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Simplifying \n",
	"\n",
	"$$\n",
	"p(\\tilde{y} \| \\tilde{x}, D) \\propto \n",
	"(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
	"\\left[\n",
	"\\prod_{j=1}^d\n",
	"\\frac{\\Gamma(\\beta_{\\tilde{y}j} + w_{\\tilde{y}j} + \\tilde{x}_j)}{\\Gamma(\\beta_{\\tilde{y}j} + w_{\\tilde{y}j})}\n",
	"\\right]\n",
	"\\frac{\\Gamma\\left(\\sum_{j=1}^d \\beta_{\\tilde{y}j} + w_{\\tilde{y}j}\\right)}{\\Gamma(\\left(\\sum_{j=1}^d \\beta_{\\tilde{y}j} + w_{\\tilde{y}j}\\right) + \\sum_{j=1}^d\\tilde{x}_j)}\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Simplifying \n",
	"\n",
	"$$\n",
	"p(\\tilde{y} \| \\tilde{x}, D) \\propto \n",
	"(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
	"\\left[\n",
	"\\prod_{j=1}^d\n",
	"\\frac{\\Gamma(\\beta_{\\tilde{y}j} + w_{\\tilde{y}j} + \\tilde{x}_j)}{\\Gamma(\\beta_{\\tilde{y}j} + w_{\\tilde{y}j})}\n",
	"\\right]\n",
	"\\frac{\\Gamma\\left(\\sum_{j=1}^d \\beta_{\\tilde{y}j} + w_{\\tilde{y}j}\\right)}\n",
	"{\\Gamma\\left(\\sum_{j=1}^d \\beta_{\\tilde{y}j} + w_{\\tilde{y}j} + \\tilde{x}_j\\right)}\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Simplifying \n",
	"\n",
	"$$\n",
	"p(\\tilde{y} \| \\tilde{x}, D) \\propto \n",
	"(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
	"\\frac\n",
	"{\\prod_{j=1}^d\\Gamma(\\beta_{\\tilde{y}j} + w_{\\tilde{y}j} + \\tilde{x}_j)}\n",
	"{\\Gamma\\left(\\sum_{j=1}^d \\beta_{\\tilde{y}j} + w_{\\tilde{y}j} + \\tilde{x}_j\\right)}\n",
	"\\frac\n",
	"{\\Gamma\\left(\\sum_{j=1}^d \\beta_{\\tilde{y}j} + w_{\\tilde{y}j}\\right)}\n",
	"{\\prod_{j=1}^d\\Gamma(\\beta_{\\tilde{y}j} + w_{\\tilde{y}j})}\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Simplifying, where $B$ is the multivariate beta function over $j$\n",
	"\n",
	"$$\n",
	"p(\\tilde{y} \| \\tilde{x}, D) \\propto \n",
	"(\\alpha_\\tilde{y} + c_\\tilde{y})\n",
	"\\frac\n",
	"{B\\left( \\beta_{\\tilde{y}j} + w_{\\tilde{y}j} + \\tilde{x}_j \\right)}\n",
	"{B\\left(\\beta_{\\tilde{y}j} + w_{\\tilde{y}j}\\right)}\n",
	"$$"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": []
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 2",
	"language": "python",
	"name": "python2"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 2
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython2",
	"version": "2.7.11"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 0
	}