Created
February 17, 2017 08:10
-
-
Save rasmusbergpalm/b6b092815e23c1366c2d9eb50ed41d09 to your computer and use it in GitHub Desktop.
VAE approximate posterior
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We assume the following generative process\n", | |
"\\begin{align}\n", | |
"z_i &\\sim p(z) \\\\\n", | |
"x_i &\\sim p(x|z_i)\n", | |
"\\end{align}" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Giving the following joint distribution\n", | |
"\n", | |
"$$\n", | |
"p(x,z) = p(x|z)p(z)\n", | |
"$$" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Using bayes rule to find $p(z|x)$" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"\\begin{align}\n", | |
"p(z|x) &= \\frac{p(x|z)p(z)}{p(x)} \\\\\n", | |
"p(z|x) &= \\frac{p(x|z)p(z)}{\\int_zp(x|z)p(z)}\n", | |
"\\end{align}" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"However, summing over $z$ in this way is slow and cumbersome. Instead we'll approximate it with a new distribution $q(z|x)$. The difference between $p(z|x)$ and $q(z|x)$ can be measured with the KL divergence." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"$$\n", | |
"KL(q(z|x)||p(z|x)) = \\int_z q(z|x)\\log\\frac{q(z|x)}{p(z|x)} \n", | |
"$$" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Inserting the definition of $p(z|x)$" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"$$\n", | |
"KL(q(z|x)||p(z|x)) = \\int_z q(z|x)\\log\\frac{q(z|x)p(x)}{p(x|z)p(z)}\n", | |
"$$" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Using $\\log(ab) = \\log(a)+\\log(b)$" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"$$\n", | |
"KL(q(z|x)||p(z|x)) = \\int_z q(z|x) \\left[ \\log \\frac{q(z|x)}{p(z)} - \\log p(x|z) + \\log p(x) \\right]\n", | |
"$$" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Splitting up the terms\n", | |
"\n", | |
"$$\n", | |
"KL(q(z|x)||p(z|x)) = \\int_z q(z|x) \\log \\frac{q(z|x)}{p(z)} - \\int_z q(z|x) \\log p(x|z) + \\int_z q(z|x) \\log p(x)\n", | |
"$$" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The first term is the $KL(q(z|x)||p(z))$" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"$$\n", | |
"KL(q(z|x)||p(z|x)) = KL(q(z|x)||p(z)) - \\int_z q(z|x) \\log p(x|z) + \\int_z q(z|x) \\log p(x)\n", | |
"$$" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The second term is the expectation of $\\log p(x|z)$ with samples of $z$ drawn from $q(z|x)$\n", | |
"\n", | |
"$$\n", | |
"KL(q(z|x)||p(z|x)) = KL(q(z|x)||p(z)) - E_{z \\sim q(z|x)} \\log p(x|z) + \\int_z q(z|x) \\log p(x)\n", | |
"$$" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"In the third term, $\\log p(x)$ does not depend on $z$ and can be moved outside the integral. Further the integral over $q(z|x)$ is 1, as it is the integral of a probability distribution\n", | |
"\n", | |
"$$\n", | |
"KL(q(z|x)||p(z|x)) = KL(q(z|x)||p(z)) - E_{z \\sim q(z|x)} \\log p(x|z) + \\log p(x)\n", | |
"$$" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Re-arranging\n", | |
"$$\n", | |
"KL(q(z|x)||p(z|x)) - \\log p(x) = KL(q(z|x)||p(z)) - E_{z \\sim q(z|x)} \\log p(x|z)\n", | |
"$$" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Since $\\log p(x) \\leq 0$, if we remove it then \n", | |
"$$\n", | |
"KL(q(z|x)||p(z|x)) \\leq KL(q(z|x)||p(z)) - E_{z \\sim q(z|x)} \\log p(x|z)\n", | |
"$$" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"**If we minimize the right hand side, we're minimizing $KL(q(z|x)||p(z|x))$**" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"If we go back, one step\n", | |
"$$\n", | |
"KL(q(z|x)||p(z|x)) - \\log p(x) = KL(q(z|x)||p(z)) - E_{z \\sim q(z|x)} \\log p(x|z)\n", | |
"$$" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"And remove $KL(q(z|x)||p(z|x)) \\ge 0$ instead\n", | |
"$$\n", | |
"- \\log p(x) \\leq KL(q(z|x)||p(z)) - E_{z \\sim q(z|x)} \\log p(x|z)\n", | |
"$$" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"**If we minimize the right hand side, we're also minimizing the negative log likelihood of the data. Equivialently we're maximizing the log probability of the data.**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 2", | |
"language": "python", | |
"name": "python2" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 2 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython2", | |
"version": "2.7.11" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment