rasmusbergpalm/vae-2.ipynb

## vae-2.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We assume the following generative process\n",
    "\\begin{align}\n",
    "z_i &\\sim p(z) \\\\\n",
    "x_i &\\sim p(x|z_i)\n",
    "\\end{align}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Giving the following joint distribution\n",
    "\n",
    "$$\n",
    "p(x,z) = p(x|z)p(z)\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using bayes rule to find $p(z|x)$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align}\n",
    "p(z|x) &= \\frac{p(x|z)p(z)}{p(x)} \\\\\n",
    "p(z|x) &= \\frac{p(x|z)p(z)}{\\int_zp(x|z)p(z)}\n",
    "\\end{align}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "However, summing over $z$ in this way is slow and cumbersome. Instead we'll approximate it with a new distribution $q(z|x)$. The difference between $p(z|x)$ and $q(z|x)$ can be measured with the KL divergence."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "KL(q(z|x)||p(z|x)) = \\int_z q(z|x)\\log\\frac{q(z|x)}{p(z|x)} \n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Inserting the definition of $p(z|x)$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "KL(q(z|x)||p(z|x)) = \\int_z q(z|x)\\log\\frac{q(z|x)p(x)}{p(x|z)p(z)}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using $\\log(ab) = \\log(a)+\\log(b)$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "KL(q(z|x)||p(z|x)) = \\int_z q(z|x) \\left[ \\log \\frac{q(z|x)}{p(z)} - \\log p(x|z) + \\log p(x) \\right]\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Splitting up the terms\n",
    "\n",
    "$$\n",
    "KL(q(z|x)||p(z|x)) = \\int_z q(z|x) \\log \\frac{q(z|x)}{p(z)} - \\int_z q(z|x) \\log p(x|z) + \\int_z q(z|x) \\log p(x)\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first term is the $KL(q(z|x)||p(z))$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "KL(q(z|x)||p(z|x)) = KL(q(z|x)||p(z)) - \\int_z q(z|x) \\log p(x|z) + \\int_z q(z|x) \\log p(x)\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The second term is the expectation of $\\log p(x|z)$ with samples of $z$ drawn from $q(z|x)$\n",
    "\n",
    "$$\n",
    "KL(q(z|x)||p(z|x)) = KL(q(z|x)||p(z)) - E_{z \\sim q(z|x)} \\log p(x|z) + \\int_z q(z|x) \\log p(x)\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the third term, $\\log p(x)$ does not depend on $z$ and can be moved outside the integral. Further the integral over $q(z|x)$ is 1, as it is the integral of a probability distribution\n",
    "\n",
    "$$\n",
    "KL(q(z|x)||p(z|x)) = KL(q(z|x)||p(z)) - E_{z \\sim q(z|x)} \\log p(x|z) + \\log p(x)\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Re-arranging\n",
    "$$\n",
    "KL(q(z|x)||p(z|x)) - \\log p(x) = KL(q(z|x)||p(z)) - E_{z \\sim q(z|x)} \\log p(x|z)\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since $\\log p(x) \\leq 0$, if we remove it then \n",
    "$$\n",
    "KL(q(z|x)||p(z|x)) \\leq KL(q(z|x)||p(z)) - E_{z \\sim q(z|x)} \\log p(x|z)\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**If we minimize the right hand side, we're minimizing $KL(q(z|x)||p(z|x))$**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we go back, one step\n",
    "$$\n",
    "KL(q(z|x)||p(z|x)) - \\log p(x) = KL(q(z|x)||p(z)) - E_{z \\sim q(z|x)} \\log p(x|z)\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And remove $KL(q(z|x)||p(z|x)) \\ge 0$ instead\n",
    "$$\n",
    "- \\log p(x) \\leq KL(q(z|x)||p(z)) - E_{z \\sim q(z|x)} \\log p(x|z)\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**If we minimize the right hand side, we're also minimizing the negative log likelihood of the data. Equivialently we're maximizing the log probability of the data.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We assume the following generative process\n",
	"\\begin{align}\n",
	"z_i &\\sim p(z) \\\\\n",
	"x_i &\\sim p(x\|z_i)\n",
	"\\end{align}"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Giving the following joint distribution\n",
	"\n",
	"$$\n",
	"p(x,z) = p(x\|z)p(z)\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Using bayes rule to find $p(z\|x)$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"\\begin{align}\n",
	"p(z\|x) &= \\frac{p(x\|z)p(z)}{p(x)} \\\\\n",
	"p(z\|x) &= \\frac{p(x\|z)p(z)}{\\int_zp(x\|z)p(z)}\n",
	"\\end{align}"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"However, summing over $z$ in this way is slow and cumbersome. Instead we'll approximate it with a new distribution $q(z\|x)$. The difference between $p(z\|x)$ and $q(z\|x)$ can be measured with the KL divergence."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"$$\n",
	"KL(q(z\|x)\|\|p(z\|x)) = \\int_z q(z\|x)\\log\\frac{q(z\|x)}{p(z\|x)} \n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Inserting the definition of $p(z\|x)$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"$$\n",
	"KL(q(z\|x)\|\|p(z\|x)) = \\int_z q(z\|x)\\log\\frac{q(z\|x)p(x)}{p(x\|z)p(z)}\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Using $\\log(ab) = \\log(a)+\\log(b)$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"$$\n",
	"KL(q(z\|x)\|\|p(z\|x)) = \\int_z q(z\|x) \\left[ \\log \\frac{q(z\|x)}{p(z)} - \\log p(x\|z) + \\log p(x) \\right]\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Splitting up the terms\n",
	"\n",
	"$$\n",
	"KL(q(z\|x)\|\|p(z\|x)) = \\int_z q(z\|x) \\log \\frac{q(z\|x)}{p(z)} - \\int_z q(z\|x) \\log p(x\|z) + \\int_z q(z\|x) \\log p(x)\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"The first term is the $KL(q(z\|x)\|\|p(z))$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"$$\n",
	"KL(q(z\|x)\|\|p(z\|x)) = KL(q(z\|x)\|\|p(z)) - \\int_z q(z\|x) \\log p(x\|z) + \\int_z q(z\|x) \\log p(x)\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"The second term is the expectation of $\\log p(x\|z)$ with samples of $z$ drawn from $q(z\|x)$\n",
	"\n",
	"$$\n",
	"KL(q(z\|x)\|\|p(z\|x)) = KL(q(z\|x)\|\|p(z)) - E_{z \\sim q(z\|x)} \\log p(x\|z) + \\int_z q(z\|x) \\log p(x)\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"In the third term, $\\log p(x)$ does not depend on $z$ and can be moved outside the integral. Further the integral over $q(z\|x)$ is 1, as it is the integral of a probability distribution\n",
	"\n",
	"$$\n",
	"KL(q(z\|x)\|\|p(z\|x)) = KL(q(z\|x)\|\|p(z)) - E_{z \\sim q(z\|x)} \\log p(x\|z) + \\log p(x)\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Re-arranging\n",
	"$$\n",
	"KL(q(z\|x)\|\|p(z\|x)) - \\log p(x) = KL(q(z\|x)\|\|p(z)) - E_{z \\sim q(z\|x)} \\log p(x\|z)\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Since $\\log p(x) \\leq 0$, if we remove it then \n",
	"$$\n",
	"KL(q(z\|x)\|\|p(z\|x)) \\leq KL(q(z\|x)\|\|p(z)) - E_{z \\sim q(z\|x)} \\log p(x\|z)\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"If we minimize the right hand side, we're minimizing $KL(q(z\|x)\|\|p(z\|x))$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"If we go back, one step\n",
	"$$\n",
	"KL(q(z\|x)\|\|p(z\|x)) - \\log p(x) = KL(q(z\|x)\|\|p(z)) - E_{z \\sim q(z\|x)} \\log p(x\|z)\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"And remove $KL(q(z\|x)\|\|p(z\|x)) \\ge 0$ instead\n",
	"$$\n",
	"- \\log p(x) \\leq KL(q(z\|x)\|\|p(z)) - E_{z \\sim q(z\|x)} \\log p(x\|z)\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"If we minimize the right hand side, we're also minimizing the negative log likelihood of the data. Equivialently we're maximizing the log probability of the data."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": []
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 2",
	"language": "python",
	"name": "python2"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 2
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython2",
	"version": "2.7.11"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 0
	}