Skip to content

Instantly share code, notes, and snippets.

@tcbegley
Created January 20, 2021 21:20
Show Gist options
  • Save tcbegley/a209773e77ae90e25ba570aab88afad5 to your computer and use it in GitHub Desktop.
Save tcbegley/a209773e77ae90e25ba570aab88afad5 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Hamiltonian Monte Carlo\n",
"\n",
"This notebook reproduces plots and code from my [blog post](https://tcbegley.com/blog/mcmc-part-2) on Hamiltonian Monte Carlo. If you're interested in the theory behind the code below, check out the post.\n",
"\n",
"Note that all implementations here are designed for simplicity and clarity rather than performance.\n",
"\n",
"To make the animations, you will need to install `ffmpeg`. See [here](https://ffmpeg.org/download.html) for installation options, or use your favourite package manager.\n",
"\n",
"You will also need `matplotlib` and `numpy` installed. You can run the cell below to install if you don't already have them."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"!{sys.executable} -m pip install -U matplotlib numpy"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"from IPython.display import Video\n",
"from matplotlib.animation import FFMpegWriter"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is the basic Metropolis algorithm, it's very simple. Accumulating samples by appending to a list is sub-optimal, but my goal here is clarity / simplicity rather than performance.\n",
"\n",
"- `target` is the target probability density, it should accept a sample as its only argument and return the probability density of that sample.\n",
"- `initial` is a function that returns the starting point in parameter space. It is a callable which takes no arguments.\n",
"- `proposal` is the proposal distribution, it takes the current sample as its argument and returns a proposal for the next sample."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def metropolis(target, initial, proposal, iterations=100_000):\n",
" samples = [initial()]\n",
"\n",
" for _ in range(iterations):\n",
" current = samples[-1]\n",
" proposed = proposal(current)\n",
" if np.random.random() < target(proposed) / target(current):\n",
" samples.append(proposed)\n",
" else:\n",
" samples.append(current)\n",
"\n",
" return samples"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A standard, simple choice of proposal is a normally distributed jump from the current location. This gives rise to Random Walk Metropolis."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class NormalProposal:\n",
" def __init__(self, scale):\n",
" self.scale = scale\n",
"\n",
" def __call__(self, sample):\n",
" jump = np.random.normal(scale=self.scale, size=sample.shape)\n",
" return sample + jump"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll start by sampling from a radially symmetric donut shaped distribution. The density decays exponentially with the distance from a circle of fixed radius. We can implement the target distribution as a callable object by writing a custom class and implementing the `__call__` magic method. In this case `__call__` returns an unnormalised form of the donut shaped density."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class DonutPDF:\n",
" def __init__(self, radius=3, sigma2=0.05):\n",
" self.radius = radius\n",
" self.sigma2 = sigma2\n",
"\n",
" def __call__(self, x):\n",
" r = np.linalg.norm(x)\n",
" return np.exp(-((r - self.radius) ** 2) / self.sigma2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can adjust the scale of the normal distribution that determines the jumps, but both small and large scales have their problems which the below plot shows."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"target = DonutPDF()\n",
"samples05 = np.array(metropolis(target, lambda: np.array([3, 0]), NormalProposal(0.05)))\n",
"samples1 = np.array(metropolis(target, lambda: np.array([3, 0]), NormalProposal(1)))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"f, ax = plt.subplots(figsize=(16, 8), ncols=2)\n",
"ax[0].tick_params(axis=\"both\", which=\"major\", labelsize=14)\n",
"ax[1].tick_params(axis=\"both\", which=\"major\", labelsize=14)\n",
"\n",
"points = np.linspace(-4, 4, 301)\n",
"x = np.tile(points, len(points))\n",
"y = np.repeat(points, len(points))\n",
"\n",
"z = np.exp(np.apply_along_axis(target, 1, np.column_stack([x, y])))\n",
"\n",
"ax[0].contour(\n",
" points,\n",
" points,\n",
" z.reshape(len(points), len(points)),\n",
" cmap=\"plasma\",\n",
" alpha=0.3,\n",
" levels=5,\n",
")\n",
"\n",
"ax[1].contour(\n",
" points,\n",
" points,\n",
" z.reshape(len(points), len(points)),\n",
" cmap=\"plasma\",\n",
" alpha=0.3,\n",
" levels=5,\n",
")\n",
"\n",
"(traj05,) = ax[0].plot([], [])\n",
"(traj1,) = ax[1].plot([], [])\n",
"\n",
"title05 = ax[0].set_title(\"Acceptance rate: \", fontsize=18)\n",
"title1 = ax[1].set_title(\"Acceptance rate: \", fontsize=18)\n",
"\n",
"writer = FFMpegWriter(fps=25)\n",
"\n",
"with writer.saving(f, \"donut_mcmc.mp4\", 50):\n",
" # initial empty frames\n",
" for _ in range(25):\n",
" writer.grab_frame()\n",
"\n",
" for i in range(2, 2500, 2):\n",
" traj05.set_data(*(samples05[: i + 1].T))\n",
" traj1.set_data(*(samples1[: i + 1].T))\n",
"\n",
" accept05 = 1 - (samples05[1 : i + 1] == samples05[:i]).all(axis=1).mean()\n",
" accept1 = 1 - (samples1[1 : i + 1] == samples1[:i]).all(axis=1).mean()\n",
" title05.set_text(f\"Acceptance rate: {accept05 * 100:.1f}%\")\n",
" title1.set_text(f\"Acceptance rate: {accept1 * 100:.1f}%\")\n",
"\n",
" writer.grab_frame()\n",
"\n",
" # closing empty frames\n",
" for _ in range(25):\n",
" writer.grab_frame()\n",
"\n",
"\n",
"plt.close()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Video(\"donut_mcmc.mp4\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The donut is not pathological\n",
"\n",
"The donut example might seem contrived, but a thin shell where the samples are concentrated is actually somehow generic in high dimensions, as samples from a standard normal in increasing dimensions shows."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def generate_sample(d, N=10_000):\n",
" return np.random.standard_normal(size=(N, d))\n",
"\n",
"\n",
"x1 = generate_sample(1)\n",
"x10 = generate_sample(10)\n",
"x100 = generate_sample(100)\n",
"\n",
"f, ax = plt.subplots(figsize=(22, 7), ncols=3)\n",
"\n",
"ax[0].tick_params(axis=\"both\", which=\"major\", labelsize=18)\n",
"ax[1].tick_params(axis=\"both\", which=\"major\", labelsize=18)\n",
"ax[2].tick_params(axis=\"both\", which=\"major\", labelsize=18)\n",
"\n",
"ax[0].hist(np.linalg.norm(x1, axis=1), density=True, bins=30)\n",
"ax[1].hist(np.linalg.norm(x10, axis=1), density=True, bins=30)\n",
"ax[2].hist(np.linalg.norm(x100, axis=1), density=True, bins=30)\n",
"\n",
"ax[0].set_xlim(left=0)\n",
"ax[1].set_xlim(left=0)\n",
"ax[2].set_xlim(left=0)\n",
"\n",
"ax[0].set_title(\"Norms of 1-D normal sample\", fontsize=22)\n",
"ax[1].set_title(\"Norms of 10-D normal sample\", fontsize=22)\n",
"ax[2].set_title(\"Norms of 100-D normal sample\", fontsize=22)\n",
"\n",
"f.tight_layout(pad=3)\n",
"\n",
"f.savefig(\"normal_sample_radius.png\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The leapfrog integrator\n",
"\n",
"The leapfrog integrator is a symplectic integrator that doesn't accumulate errors in the same way that alternative algorithms do. We compare it to three alternatives with the simple Hamiltonian\n",
"\n",
"$$\n",
" H(q, p) = \\frac{q^2 + p^2}{2}\n",
"$$"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"step_size = 0.3\n",
"\n",
"# Euler's method\n",
"q, p = np.array([0, 1])\n",
"qs = [q]\n",
"ps = [p]\n",
"\n",
"for i in range(100):\n",
" q_new = q + step_size * p\n",
" p_new = p - step_size * q\n",
" qs.append(q_new)\n",
" ps.append(p_new)\n",
" q, p = q_new, p_new\n",
"\n",
"# modified Euler's method\n",
"q, p = np.array([0, 1])\n",
"qs_mod = [q]\n",
"ps_mod = [p]\n",
"\n",
"for i in range(100):\n",
" q_new = q + step_size * p\n",
" p_new = p - step_size * q_new\n",
" qs_mod.append(q_new)\n",
" ps_mod.append(p_new)\n",
" q, p = q_new, p_new\n",
"\n",
"# the leapfrog integrator\n",
"q, p = np.array([0, 1])\n",
"qs_lf = [q]\n",
"ps_lf = [p]\n",
"\n",
"for i in range(100):\n",
" p_new = p - 0.5 * step_size * q\n",
" q_new = q + step_size * p_new\n",
" p_new = p_new - 0.5 * step_size * q_new\n",
" qs_lf.append(q_new)\n",
" ps_lf.append(p_new)\n",
" q, p = q_new, p_new"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following cell makes a simple animation comparing each approximation scheme with the analytic solution (a circle)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"f, ax = plt.subplots(ncols=3, figsize=(22, 7))\n",
"\n",
"t = np.linspace(0, 2 * np.pi, 1001)\n",
"x = np.cos(t)\n",
"y = np.sin(t)\n",
"\n",
"for a in ax:\n",
" a.plot(x, y, c=\"#333333\", alpha=0.3)\n",
"\n",
"\n",
"for a in ax:\n",
" a.set_xlim(-2, 2)\n",
" a.set_ylim(-2, 2)\n",
"\n",
"(euler,) = ax[0].plot([], [])\n",
"(mod_euler,) = ax[1].plot([], [])\n",
"(lf,) = ax[2].plot([], [])\n",
"\n",
"\n",
"ax[0].set_title(\"Euler's method\", fontsize=18)\n",
"ax[1].set_title(\"Modified Euler's method\", fontsize=18)\n",
"ax[2].set_title(\"Leapfrog Integrator\", fontsize=18)\n",
"\n",
"f.tight_layout(pad=3)\n",
"\n",
"writer = FFMpegWriter(fps=5)\n",
"\n",
"with writer.saving(f, \"integrator.mp4\", 50):\n",
" # initial empty frames\n",
" for _ in range(5):\n",
" writer.grab_frame()\n",
"\n",
" for i in range(1, 25):\n",
" euler.set_data(qs[:i], ps[:i])\n",
" mod_euler.set_data(qs_mod[:i], ps_mod[:i])\n",
" lf.set_data(qs_lf[:i], ps_lf[:i])\n",
"\n",
" writer.grab_frame()\n",
"\n",
"\n",
"plt.close()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Video(\"integrator.mp4\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Hamiltonian Monte Carlo\n",
"\n",
"The algorithm is as follows:\n",
"\n",
"1. Choose a starting point $q^{(0)}$ in parameter space, and fix a step size $\\varepsilon$ and a path length $L$.\n",
"2. Given parameters $q^{(n)}$ we sample $p^{(n)} \\sim \\mathcal N(0, I)$. More generally we could sample from $\\mathcal{N}(0, M)$ where $M \\approx \\mathrm{cov}(q)$. We will however keep things simple and just use the identity matrix.\n",
"3. We set $(q_0, p_0) = (q^{(n)}, p^{(n)})$. Evolve $(q_0, p_0)$ for $L$ steps using the leapfrog integrator to obtain $(q_L, p_L)$, an approximation of $(q(\\varepsilon L), p(\\varepsilon L))$. \n",
"4. Sample $r \\sim \\mathrm{Unif}(0, 1)$ and let $q^{(n+1)} = q_L$ if $\\pi(q_L, p_L) / \\pi(q^{(n)}, p^{(n)}) < r$, or $q^{(n+1)} = q^{(n)}$ otherwise.\n",
"5. Repeat\n",
"\n",
"First the leapfrog integrator."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def leapfrog(q0, p0, target, L, step_size):\n",
" q = q0.copy()\n",
" p = p0.copy()\n",
"\n",
" for i in range(L):\n",
" p += target.grad_log_density(q) * step_size / 2\n",
" q += p * step_size\n",
" p += target.grad_log_density(q) * step_size / 2\n",
"\n",
" return q, p"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here is a basic implementation of Hamiltonian Monte Carlo"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def hmc(target, initial, iterations=10_000, L=50, step_size=0.1):\n",
" samples = [initial()]\n",
"\n",
" for i in range(iterations):\n",
" q0 = samples[-1]\n",
" p0 = np.random.standard_normal(size=q0.size)\n",
"\n",
" q_star, p_star = leapfrog(q0, p0, target, L, step_size)\n",
"\n",
" h0 = -target.log_density(q0) + (p0 * p0).sum() / 2\n",
" h = -target.log_density(q_star) + (p_star * p_star).sum() / 2\n",
" log_accept_ratio = h0 - h\n",
"\n",
" if np.random.random() < np.exp(log_accept_ratio):\n",
" samples.append(q_star)\n",
" else:\n",
" samples.append(q0)\n",
"\n",
" return samples"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We make a slight modification to the donut target density as we require the log density and the gradients of the log density rather than the density itself."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class DonutPDF:\n",
" def __init__(self, radius=3, sigma2=0.05):\n",
" self.radius = radius\n",
" self.sigma2 = sigma2\n",
"\n",
" def log_density(self, x):\n",
" r = np.linalg.norm(x)\n",
" return -((r - self.radius) ** 2) / self.sigma2\n",
"\n",
" def grad_log_density(self, x):\n",
" r = np.linalg.norm(x)\n",
" if r == 0:\n",
" return np.zeros_like(x)\n",
" return 2 * x * (self.radius / r - 1) / self.sigma2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Draw samples with HMC"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"hmc_samples = hmc(DonutPDF(), lambda: np.array([3, 0.0]));"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To make an animation of what's going on here, we need to make a modified version of the leapfrog integrator that also saves the trajectory that lead to each proposal."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def leapfrog_traj(q0, p0, target, L, step_size):\n",
" q = q0.copy()\n",
" p = p0.copy()\n",
" traj = [q.copy()]\n",
"\n",
" for i in range(L):\n",
" p += target.grad_log_density(q) * step_size / 2\n",
" q += p * step_size\n",
" p += target.grad_log_density(q) * step_size / 2\n",
" traj.append(q.copy())\n",
"\n",
" return q, p, traj"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Manually run HMC while making an animation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"f, ax = plt.subplots(figsize=(7, 7))\n",
"\n",
"points = np.linspace(-4, 4, 101)\n",
"x = np.tile(points, len(points))\n",
"y = np.repeat(points, len(points))\n",
"\n",
"donut = DonutPDF()\n",
"z = np.exp(np.apply_along_axis(donut.log_density, 1, np.column_stack([x, y])))\n",
"\n",
"ax.contour(\n",
" points,\n",
" points,\n",
" z.reshape(len(points), len(points)),\n",
" cmap=\"plasma\",\n",
" alpha=0.5,\n",
")\n",
"\n",
"(samps,) = ax.plot([], [], \"o\", alpha=0.7)\n",
"(traj,) = ax.plot([], [], \"k\")\n",
"\n",
"writer = FFMpegWriter(fps=15)\n",
"\n",
"with writer.saving(f, \"donut_hmc.mp4\", 50):\n",
" # initial empty frame\n",
" writer.grab_frame()\n",
"\n",
" samples = [np.array([3.0, 0.0])]\n",
" samps.set_data(*(np.array(samples).T))\n",
"\n",
" writer.grab_frame()\n",
"\n",
" for i in range(100):\n",
" q0 = samples[-1]\n",
" p0 = np.random.normal(scale=2.0, size=q0.size)\n",
"\n",
" q_star, p_star, trajectory = leapfrog_traj(q0, p0, donut, 20, 0.1)\n",
"\n",
" h0 = -donut.log_density(q0) + (p0 * p0).sum() / 2\n",
" h = -donut.log_density(q_star) + (p_star * p_star).sum() / 2\n",
" log_accept_ratio = h0 - h\n",
"\n",
" # animate the trajectory\n",
" for j in range(len(trajectory)):\n",
" traj.set_data(*np.array(trajectory[: j + 1]).T)\n",
" writer.grab_frame()\n",
"\n",
" # flash green or red to signal acceptance / rejection\n",
" if np.random.random() < np.exp(log_accept_ratio):\n",
" samples.append(q_star)\n",
" traj.set_color(\"tab:green\")\n",
" else:\n",
" samples.append(q0)\n",
" traj.set_color(\"tab:red\")\n",
"\n",
" for _ in range(2):\n",
" writer.grab_frame()\n",
"\n",
" traj.set_color(\"black\")\n",
" traj.set_data([], [])\n",
" samps.set_data(*(np.array(samples).T))\n",
" writer.grab_frame()\n",
"\n",
"plt.close()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Video(\"donut_hmc.mp4\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Comparing algorithms\n",
"\n",
"To finish, let's compare HMC to the two variants of MCMC. We use the samples from earlier in the notebook."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"f, ax = plt.subplots(ncols=3, figsize=(18, 7))\n",
"\n",
"hmc_samples = np.array(hmc_samples)\n",
"\n",
"points = np.linspace(-4, 4, 101)\n",
"x = np.tile(points, len(points))\n",
"y = np.repeat(points, len(points))\n",
"\n",
"donut = DonutPDF()\n",
"z = np.exp(np.apply_along_axis(donut.log_density, 1, np.column_stack([x, y])))\n",
"\n",
"for a in ax:\n",
" a.contour(\n",
" points,\n",
" points,\n",
" z.reshape(len(points), len(points)),\n",
" cmap=\"plasma\",\n",
" alpha=0.1,\n",
" )\n",
"\n",
"(mcmc05_scat,) = ax[0].plot([], [], \".\", alpha=0.2, ms=10)\n",
"(mcmc1_scat,) = ax[1].plot([], [], \".\", alpha=0.2, ms=10)\n",
"(hmc_scat,) = ax[2].plot([], [], \".\", alpha=0.2, ms=10)\n",
"\n",
"title05 = ax[0].set_title(\n",
" \"RW-Metropolis - scale = 0.05\\n# samples: 0\\nAccept rate: N/A\", fontsize=18\n",
")\n",
"title1 = ax[1].set_title(\n",
" \"RW-Metropolis - scale = 1.0\\n# samples: 0\\nAccept rate: N/A\", fontsize=18\n",
")\n",
"title_hmc = ax[2].set_title(\"HMC\\n# samples: 0\\nAccept rate: N/A\", fontsize=18)\n",
"\n",
"for a in ax:\n",
" a.set_xlim(-4, 4)\n",
" a.set_ylim(-4, 4)\n",
"\n",
"f.tight_layout(pad=4)\n",
"\n",
"writer = FFMpegWriter(fps=25)\n",
"\n",
"with writer.saving(f, \"donut_compare.mp4\", 50):\n",
" # initial empty frame\n",
" writer.grab_frame()\n",
"\n",
" for i in range(1, 1000):\n",
" mcmc05_scat.set_data(*samples05[: i + 1].T)\n",
" mcmc1_scat.set_data(*samples1[: i + 1].T)\n",
" hmc_scat.set_data(*hmc_samples[: i + 1].T)\n",
"\n",
" accept05 = 1 - (samples05[1 : i + 1] == samples05[:i]).all(axis=1).mean()\n",
" accept1 = 1 - (samples1[1 : i + 1] == samples1[:i]).all(axis=1).mean()\n",
" accept_hmc = 1 - (hmc_samples[1 : i + 1] == hmc_samples[:i]).all(axis=1).mean()\n",
"\n",
" title05.set_text(\n",
" f\"RW-Metropolis - scale = 0.05\\n# samples: {i+1}\\nAccept rate: {accept05 * 100:.2f}%\"\n",
" )\n",
" title1.set_text(\n",
" f\"RW-Metropolis - scale = 1.0\\n# samples: {i+1}\\nAccept rate: {accept1 * 100:.2f}%\"\n",
" )\n",
" title_hmc.set_text(\n",
" f\"HMC\\n# samples: {i+1}\\nAccept rate: {accept_hmc * 100:.2f}%\"\n",
" )\n",
"\n",
" writer.grab_frame()\n",
"\n",
"plt.close()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Video(\"donut_compare.mp4\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment