Skip to content

Instantly share code, notes, and snippets.

@tcbegley
Created January 8, 2021 20:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tcbegley/e7b1f22a5d3fffed35d3f1dbffc8daf3 to your computer and use it in GitHub Desktop.
Save tcbegley/e7b1f22a5d3fffed35d3f1dbffc8daf3 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# The Metropolis algorithm\n",
"\n",
"This notebook reproduces plots and code from my [blog post](https://tcbegley.com/blog/mcmc-part-1) on Markov Chain Monte Carlo and the Metropolis algorithm. If you're interested in the theory behind the code below, check out the post.\n",
"\n",
"To make the animations, you will need to install `ffmpeg`. See [here](https://ffmpeg.org/download.html) for installation options, or use your favourite package manager.\n",
"\n",
"You will also need `matplotlib` and `numpy` installed. You can run the cell below to install if you don't already have them."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"!{sys.executable} -m pip install -U matplotlib numpy"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"from matplotlib.animation import FFMpegWriter"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is the basic Metropolis algorithm, it's very simple. Accumulating samples by appending to a list is sub-optimal, but my goal here is clarity / simplicity rather than performance.\n",
"\n",
"- `target` is the target probability density, it should accept a sample as its only argument and return the probability density of that sample.\n",
"- `initial` is a function that returns the starting point in parameter space. It is a callable which takes no arguments.\n",
"- `proposal` is the proposal distribution, it takes the current sample as its argument and returns a proposal for the next sample."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def metropolis(target, initial, proposal, iterations=100_000):\n",
" samples = [initial()]\n",
" \n",
" for _ in range(iterations):\n",
" current = samples[-1]\n",
" proposed = proposal(current)\n",
" if np.random.random() < target(proposed) / target(current):\n",
" samples.append(proposed)\n",
" else:\n",
" samples.append(current)\n",
" \n",
" return samples "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A standard, simple choice of proposal is a normally distributed jump from the current location. This gives rise to Random Walk Metropolis."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class NormalProposal:\n",
" def __init__(self, scale):\n",
" self.scale = scale\n",
" \n",
" def __call__(self, sample):\n",
" jump = np.random.normal(\n",
" scale=self.scale, size=sample.shape\n",
" )\n",
" return sample + jump"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll start by sampling from a multi-variate normal distribution. We can implement the target distribution as a callable object by writing a custom class and implementing the `__call__` magic method. In this case `__call__` returns an unnormalised form of the multivariate normal density."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class MultivariateNormalPDF:\n",
" def __init__(self, mean, variance):\n",
" self.mean = mean\n",
" self.variance = variance\n",
" self.inv_variance = np.linalg.inv(self.variance)\n",
" \n",
" def __call__(self, sample):\n",
" return np.exp(\n",
" -0.5\n",
" * (sample - self.mean).T\n",
" @ self.inv_variance\n",
" @ (sample - self.mean)\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll run 4 chains starting from different locations. I'm deliberately initialising each chain quite far out of distribution so that we can see the bias in early samples and so that we can watch the chains converge and mix."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"N = 50_000\n",
"\n",
"target = MultivariateNormalPDF(\n",
" np.array([0, 0]), np.array([[1, 0.8], [0.8, 1]])\n",
")\n",
"\n",
"samples = []\n",
"\n",
"for x, y in zip([5, 5, -5, -5], [5, -5, 5, -5]):\n",
" samples.append(\n",
" np.array(\n",
" metropolis(\n",
" target,\n",
" lambda: np.array([x, y]),\n",
" NormalProposal(0.1),\n",
" iterations=N,\n",
" )\n",
" )\n",
" )\n",
"\n",
"f, ax = plt.subplots()\n",
"\n",
"for s in samples:\n",
" plt.plot(s[:, 0], s[:, 1], alpha=0.8)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The cell below remakes this plot as an animation, with a contour plot of the target underneath to show what we are sampling from."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"f, ax = plt.subplots(figsize=(6, 6))\n",
"ax.tick_params(axis='both', which='major', labelsize=14)\n",
"\n",
"points = np.linspace(-6, 6, 101)\n",
"x = np.tile(points, len(points))\n",
"y = np.repeat(points, len(points))\n",
"\n",
"z = np.exp(np.apply_along_axis(target, 1, np.column_stack([x, y])))\n",
"\n",
"ax.contour(\n",
" points,\n",
" points,\n",
" z.reshape(len(points), len(points)),\n",
" cmap=\"plasma\",\n",
" alpha=0.5,\n",
" levels=5,\n",
")\n",
"\n",
"traj1, = ax.plot([], [], alpha=0.7)\n",
"traj2, = ax.plot([], [], alpha=0.7)\n",
"traj3, = ax.plot([], [], alpha=0.7)\n",
"traj4, = ax.plot([], [], alpha=0.7)\n",
"\n",
"writer = FFMpegWriter(fps=25)\n",
"\n",
"with writer.saving(f, \"normal_mcmc.mp4\", 200):\n",
" # initial empty frames\n",
" for _ in range(50):\n",
" writer.grab_frame()\n",
" \n",
" for i in range(0, 2500, 2):\n",
" traj1.set_data(*(samples[0][:i+1].T))\n",
" traj2.set_data(*(samples[1][:i+1].T))\n",
" traj3.set_data(*(samples[2][:i+1].T))\n",
" traj4.set_data(*(samples[3][:i+1].T))\n",
" \n",
" writer.grab_frame()\n",
" \n",
" # closing empty frames\n",
" for _ in range(50):\n",
" writer.grab_frame()\n",
"\n",
" \n",
"plt.close()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from IPython.display import Video\n",
"\n",
"Video(\"normal_mcmc.mp4\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Solving Logistic regression with the Metropolis algorithm\n",
"\n",
"Next let's try solving a more interesting problem with the Metropolis algorithm. We'll create a simple dataset and then try to fit a Bayesian logistic regression model to the data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(42)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def sigmoid(arr):\n",
" return 1 / (1 + np.exp(-arr))\n",
"\n",
"def logit(arr):\n",
" return np.log(arr) - np.log(1 - arr)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We generate data according to the model\n",
"\n",
"$$\n",
" y \\sim \\mathrm{Bernoulli}\\left(\\frac{1}{1 + \\exp(-(\\alpha + \\beta^T x))}\\right)\n",
"$$"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"alpha = -1.5\n",
"beta = np.array([1.5, -1.6])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = np.random.multivariate_normal(np.zeros(2), np.array([[4, -2], [-2, 4]]), size=10)\n",
"labels = np.random.binomial(1, p=sigmoid(data @ beta + alpha))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll make a grid for plotting heatmaps of predictions in a minute."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"N = 50\n",
"points = np.linspace(-7, 7, N)\n",
"grid = np.column_stack([np.tile(points, N), np.repeat(points, N)])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def heatmap(probs, display_heatmap=True):\n",
" f, ax = plt.subplots(figsize=(11, 10))\n",
" ax.tick_params(axis='both', which='major', labelsize=16)\n",
"\n",
" if display_heatmap:\n",
" mesh = ax.pcolormesh(\n",
" points,\n",
" points,\n",
" probs,\n",
" cmap=\"RdBu\",\n",
" )\n",
" cbar = ax.figure.colorbar(mesh)\n",
" cbar.ax.set_ylabel('Predicted probability', rotation=90, fontsize=16)\n",
" cbar.ax.tick_params(labelsize=12)\n",
" else:\n",
" ax.set_xlim(points.min(), points.max())\n",
" ax.set_ylim(points.min(), points.max())\n",
"\n",
" mask = labels == 1\n",
" ax.scatter(\n",
" data[mask, 0], \n",
" data[mask, 1], \n",
" c=\"#0055ff\",\n",
" edgecolors=\"white\", \n",
" s=200, \n",
" lw=2,\n",
" )\n",
" ax.scatter(\n",
" data[~mask, 0],\n",
" data[~mask, 1],\n",
" c=\"#ff5500\", \n",
" edgecolors=\"white\",\n",
" s=200,\n",
" lw=2,\n",
" )\n",
" return ax"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First let's make a simple visualisation of the data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ax = heatmap(sigmoid(grid @ beta + alpha).reshape(N, N), False)\n",
"ax.figure.savefig(\"data.png\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Bayesian logistic regression model is\n",
"\n",
"$$\n",
" \\alpha \\sim \\mathcal{N}(0, 5^2) \\\\\n",
" \\beta_i \\sim \\mathcal{N}(0, 5^2) \\\\\n",
" y \\sim \\mathrm{Bernoulli}\\left(\\frac{1}{1 + \\exp(-(\\alpha + \\beta^T x))}\\right)\n",
"$$\n",
"\n",
"We want to sample from $p(\\alpha, \\beta | \\mathbf y, \\mathbf x)$. Let \n",
"\n",
"$$\n",
" \\eta_i = \\frac{1}{1 + \\exp(-(\\alpha + \\beta^T x_i))}\n",
"$$\n",
"\n",
"By Bayes rule we have\n",
"\n",
"$$\n",
" p(\\alpha, \\beta | \\mathbf y, \\mathbf x) \\propto \\exp\\left( \\frac{-\\alpha^2 - \\|\\beta\\|^2}{50} \\right) \\prod_i \\eta_i^{y_i} (1 - \\eta_i)^{1 - y_i}\n",
"$$\n",
"\n",
"So we just need to implement the right hand side as the target density (note it's unnormalised, but that's ok for Metropolis)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class LogisticPDF:\n",
" def __init__(self, x, y, prior_scale=5):\n",
" self.x = x\n",
" self.y = y\n",
" self.prior_scale = prior_scale\n",
" \n",
" @staticmethod\n",
" def _likelihood(x, y, alpha, beta):\n",
" eta = 1 / (1 + np.exp(-(alpha + np.dot(beta, x))))\n",
" if y == 1:\n",
" return eta\n",
" return (1 - eta)\n",
" \n",
" \n",
" def __call__(self, sample):\n",
" alpha, beta = sample\n",
" prior = np.exp(\n",
" -(alpha ** 2 + np.linalg.norm(beta) ** 2) / (2 * self.prior_scale ** 2)\n",
" )\n",
" likelihood = 1\n",
" for x, y in zip(self.x, self.y):\n",
" likelihood *= self._likelihood(x, y, alpha, beta)\n",
" \n",
" return prior * likelihood"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since each sample is now a pair `(alpha, beta)` we need to slightly rewrite the proposal distribution."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class NormalProposalLogistic:\n",
" def __init__(self, scale):\n",
" self.scale = scale\n",
" \n",
" def __call__(self, sample):\n",
" alpha, beta = sample\n",
" alpha_jump = np.random.normal(\n",
" scale=self.scale, size=alpha.shape\n",
" )\n",
" beta_jump = np.random.normal(\n",
" scale=self.scale, size=beta.shape\n",
" )\n",
" return alpha + alpha_jump, beta + beta_jump"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now sample as before"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"target = LogisticPDF(data, labels)\n",
"proposal = NormalProposalLogistic(0.1)\n",
"samples = metropolis(target, lambda: (np.array(0), np.array([0, 0])), proposal)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We extrack the parameters"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"alpha_samp = np.array([a for a, _ in samples])\n",
"beta_samp = np.array([b for _, b in samples])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Average predictions made with each set of parameters (discarding the first 10,000 biased samples)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pred = sigmoid(beta_samp[10_000:] @ grid.T + alpha_samp[10_000:, None])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And finally plot the heatmap of predicted probabilities with the data overlaid."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ax = heatmap(pred.mean(axis=0).reshape(N, N))\n",
"ax.figure.savefig(\"logistic-heatmap.png\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Not too bad!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment