Skip to content

Instantly share code, notes, and snippets.

@rasmusbergpalm
Created February 17, 2017 08:10
Show Gist options
  • Save rasmusbergpalm/b6b092815e23c1366c2d9eb50ed41d09 to your computer and use it in GitHub Desktop.
Save rasmusbergpalm/b6b092815e23c1366c2d9eb50ed41d09 to your computer and use it in GitHub Desktop.
VAE approximate posterior
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We assume the following generative process\n",
"\\begin{align}\n",
"z_i &\\sim p(z) \\\\\n",
"x_i &\\sim p(x|z_i)\n",
"\\end{align}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Giving the following joint distribution\n",
"\n",
"$$\n",
"p(x,z) = p(x|z)p(z)\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using bayes rule to find $p(z|x)$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\\begin{align}\n",
"p(z|x) &= \\frac{p(x|z)p(z)}{p(x)} \\\\\n",
"p(z|x) &= \\frac{p(x|z)p(z)}{\\int_zp(x|z)p(z)}\n",
"\\end{align}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"However, summing over $z$ in this way is slow and cumbersome. Instead we'll approximate it with a new distribution $q(z|x)$. The difference between $p(z|x)$ and $q(z|x)$ can be measured with the KL divergence."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$\n",
"KL(q(z|x)||p(z|x)) = \\int_z q(z|x)\\log\\frac{q(z|x)}{p(z|x)} \n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Inserting the definition of $p(z|x)$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$\n",
"KL(q(z|x)||p(z|x)) = \\int_z q(z|x)\\log\\frac{q(z|x)p(x)}{p(x|z)p(z)}\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using $\\log(ab) = \\log(a)+\\log(b)$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$\n",
"KL(q(z|x)||p(z|x)) = \\int_z q(z|x) \\left[ \\log \\frac{q(z|x)}{p(z)} - \\log p(x|z) + \\log p(x) \\right]\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Splitting up the terms\n",
"\n",
"$$\n",
"KL(q(z|x)||p(z|x)) = \\int_z q(z|x) \\log \\frac{q(z|x)}{p(z)} - \\int_z q(z|x) \\log p(x|z) + \\int_z q(z|x) \\log p(x)\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The first term is the $KL(q(z|x)||p(z))$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$\n",
"KL(q(z|x)||p(z|x)) = KL(q(z|x)||p(z)) - \\int_z q(z|x) \\log p(x|z) + \\int_z q(z|x) \\log p(x)\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The second term is the expectation of $\\log p(x|z)$ with samples of $z$ drawn from $q(z|x)$\n",
"\n",
"$$\n",
"KL(q(z|x)||p(z|x)) = KL(q(z|x)||p(z)) - E_{z \\sim q(z|x)} \\log p(x|z) + \\int_z q(z|x) \\log p(x)\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the third term, $\\log p(x)$ does not depend on $z$ and can be moved outside the integral. Further the integral over $q(z|x)$ is 1, as it is the integral of a probability distribution\n",
"\n",
"$$\n",
"KL(q(z|x)||p(z|x)) = KL(q(z|x)||p(z)) - E_{z \\sim q(z|x)} \\log p(x|z) + \\log p(x)\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Re-arranging\n",
"$$\n",
"KL(q(z|x)||p(z|x)) - \\log p(x) = KL(q(z|x)||p(z)) - E_{z \\sim q(z|x)} \\log p(x|z)\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since $\\log p(x) \\leq 0$, if we remove it then \n",
"$$\n",
"KL(q(z|x)||p(z|x)) \\leq KL(q(z|x)||p(z)) - E_{z \\sim q(z|x)} \\log p(x|z)\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**If we minimize the right hand side, we're minimizing $KL(q(z|x)||p(z|x))$**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we go back, one step\n",
"$$\n",
"KL(q(z|x)||p(z|x)) - \\log p(x) = KL(q(z|x)||p(z)) - E_{z \\sim q(z|x)} \\log p(x|z)\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And remove $KL(q(z|x)||p(z|x)) \\ge 0$ instead\n",
"$$\n",
"- \\log p(x) \\leq KL(q(z|x)||p(z)) - E_{z \\sim q(z|x)} \\log p(x|z)\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**If we minimize the right hand side, we're also minimizing the negative log likelihood of the data. Equivialently we're maximizing the log probability of the data.**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.11"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment