Skip to content

Instantly share code, notes, and snippets.

@AustinRochford
Last active September 25, 2022 19:11
Show Gist options
  • Save AustinRochford/3e6fde7eaf9d5b93d27ba898bdac7f0f to your computer and use it in GitHub Desktop.
Save AustinRochford/3e6fde7eaf9d5b93d27ba898bdac7f0f to your computer and use it in GitHub Desktop.
A Closed-form Solution for the Cholesky Decomposition of the Covariance Matrix of Exchangeable Normal Variables
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "c3dd6161-4d55-4760-b316-ba9509c31d7c",
"metadata": {},
"source": [
"In a [previous post](https://austinrochford.com/posts/exch-param-pymc.html) I compared several parameterizations for estimating the covariance parameter of latent [exchangeable normal random variables](https://en.wikipedia.org/wiki/Exchangeable_random_variables) using [PyMC](http://pymc.io/). These parameterizations were necesssary because for quite some time before the [official release of PyMC version 4.0](https://www.pymc.io/blog/v4_announcement.html), the beta lacked support for multivariate random variables. Support for these distributions has since [been added](https://www.pymc.io/projects/docs/en/stable/api/distributions/multivariate.html), rendering the workaround parameterizations in that post superfluous. Even so, after writing that post I remained curious about the closed-form solution for the [Cholesky decomposition](https://en.wikipedia.org/wiki/Cholesky_decomposition) of the covariance matrix for exchangeable normal random variables, and eventually was able to derive it. This post presents that closed-form solution along with a numerical verification of its correctness.\n",
"\n",
"Recall that a vector of random variables is exchangeable if every permutation of its entries has the same joint distribution. A set of exchangeable normal random variables, $X_1, \\ldots, X_T \\sim N(\\mu, \\sigma^2)$ is parameterized by their marginal mean, $\\mu$, marginal scale, $\\sigma$, and the [Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient)\n",
"$$\\rho = \\frac{\\mathbb{Cov}(X_i, X_j)}{\\sigma^2}.$$\n",
"\n",
"In this post we will assume $\\mu = 0$ and $\\sigma = 1$ for simplicity; the results are straightforward to generalize. With this assumption, the covariance matrix is\n",
"\n",
"$$\n",
"\\Sigma_{\\rho} = \\begin{pmatrix}\n",
" 1 & \\rho & \\cdots & \\rho & \\rho \\\\\n",
" \\rho & 1 & \\cdots & \\rho & \\rho \\\\\n",
" \\rho & \\rho & \\cdots & 1 & \\rho \\\\\n",
" \\rho & \\rho & \\cdots & \\rho & 1\n",
"\\end{pmatrix} \\in \\mathbb{R}^{T \\times T}.\n",
"$$\n",
"\n",
"The previous post used [SymPy](https://www.sympy.org/en/index.html) to calculate the Cholesky decomposition of this matrix for small values of $T$. It was obvious that there was a pattern, but the we did not pursue it at the time, as the SymPy factorization could be [lambdified](https://docs.sympy.org/latest/modules/utilities/lambdify.html) and the result applied to [Aesara](https://aesara.readthedocs.io/en/latest/) tensors to facilitate inference in PyMC. This post will prove and numerically verify the following closed-form solution for the Cholesky decomposition of $\\Sigma_{\\rho}.$\n",
"\n",
"Let $-\\frac{1}{T - 1} \\leq \\rho < 1$. Define $d_1 = 1$ and $\\ell_1 = \\rho$. For $1 < j \\leq T$, let\n",
"$$d_j = \\sqrt{d_{j - 1}^2 - \\ell_{j - 1}^2}$$\n",
"and\n",
"$$\\ell_j = \\frac{\\rho - 1}{d_j} + d_j.$$\n",
"\n",
"Finally, let\n",
"$$\n",
"L_T = \\begin{pmatrix}\n",
"d_1 & 0 & 0 & \\cdots & 0 \\\\\n",
"\\ell_1 & d_2 & 0 & \\cdots & 0 \\\\\n",
"\\ell_1 & \\ell_2 & d_3 & \\cdots & 0 \\\\\n",
" & & & \\ddots & \\\\\n",
"\\ell_1 & \\ell_2 & \\ell_3 & \\cdots & d_T\n",
"\\end{pmatrix}\n",
"\\in \\mathbb{R}^{T \\times T}.\n",
"$$\n",
"\n",
"**Claim** $\\Sigma_T = L_T L_T^{\\top}$.\n",
"\n",
"**Proof (by induction):**\n",
"\n",
"When $T = 1$ or $T = 2$, the claim is easily verified manually.\n",
"\n",
"Assume, for all $1 \\leq n \\leq T$, that $\\Sigma_n = L_n L_n^{\\top}$.\n",
"\n",
"It will be useful to block partition $L_n$ as\n",
"$$L_n = \\begin{pmatrix}\n",
"L_{n - 1} & 0 \\\\\n",
"v_n & d_n\n",
"\\end{pmatrix},$$\n",
"where $v_n = \\begin{pmatrix}\\ell_1 & \\ell_2 & \\cdots & \\ell_{n - 1}\\end{pmatrix}$ are the vector of lower triangular elements of $L_n$. The inductive hypothesis then becomes\n",
"$$\\begin{align*}\n",
"\\Sigma_n\n",
" & = L_n L_n^{\\top} \\\\\n",
" & = \\begin{pmatrix}\n",
" L_{n - 1} & 0 \\\\\n",
" v_n & d_n\n",
" \\end{pmatrix} \\begin{pmatrix}\n",
" L_{n - 1}^{\\top} & v_n^{\\top} \\\\\n",
" 0 & d_n\n",
" \\end{pmatrix} \\\\\n",
" & = \\begin{pmatrix}\n",
" \\Sigma_{n - 1} & L_{n - 1} v_n^{\\top} \\\\\n",
" v_n L_{n - 1}^{\\top} & v_n v_n^{\\top} + d_n^2\n",
" \\end{pmatrix}.\n",
"\\end{align*}$$\n",
"Equating the bottom right element of these two matrices gives\n",
"$$1 = (\\Sigma_n)_{n, n} = v_n v_n^{\\top} + d_n^2.$$\n",
"Restated, we have that $v_n v_n^{\\top} = 1 - d_n^2$.\n",
"Equating the off-diagonal elements in the final column gives, for $1 \\leq i < n$,\n",
"$$\\rho = (\\Sigma_n)_{i, n} = (L_{n - 1} v_n^{\\top})_i,$$\n",
"so $L_{n - 1} v_n^{\\top} = \\rho \\vec{1}$.\n",
"\n",
"Now,\n",
"$$L_{T + 1} L_{T + 1}^{\\top} = \\begin{pmatrix}\n",
" \\Sigma_T & L_T v_{T + 1}^{\\top} \\\\\n",
" v_{T + 1} L_T^{\\top} & v_{T + 1} v_{T + 1}^{\\top} + d_{T + 1}^2\n",
"\\end{pmatrix},\n",
"$$\n",
"so it suffices to show that\n",
"$$L_T v_{T + 1}^{\\top} = \\rho \\vec{1}$$\n",
"and\n",
"$$v_{T + 1} v_{T + 1}^{\\top} + d_{T + 1}^2 = 1.$$\n",
"\n",
"First, we have that\n",
"$$\\begin{align*}\n",
"L_T v_{T + 1}^{\\top}\n",
" & = \\begin{pmatrix}\n",
" L_{T - 1} & 0\\\\\n",
" v_T & d_T\n",
" \\end{pmatrix} \\begin{pmatrix}\n",
" v_T^{\\top} \\\\\n",
" \\ell_T\n",
" \\end{pmatrix} \\\\\n",
" & = \\begin{pmatrix}\n",
" L_{T - 1} v_T^{\\top} \\\\\n",
" v_T v_T^{\\top} + d_T \\cdot \\ell_T\n",
" \\end{pmatrix}.\n",
"\\end{align*}$$\n",
"From the inductive hypothesis, $L_{T - 1} v_T^{\\top} = \\rho \\vec{1}$. For the final entry,\n",
"$$\\begin{align*}\n",
"v_T v_T^{\\top} + d_T \\cdot \\ell_T\n",
" & = 1 - d_T^2 + d_T \\left(\\frac{\\rho - 1}{d_T} + d_T\\right) \\\\\n",
" & = \\rho.\n",
"\\end{align*}$$\n",
"\n",
"Second, we have that\n",
"$$\\begin{align*}\n",
"v_{T + 1} v_{T + 1}^{\\top} + d_{T + 1}^2\n",
" & = v_T v_T^{\\top} + \\ell_T^2 + d_{T + 1}^2 \\\\\n",
" & = 1 - d_T^2 + \\ell_T^2 + \\left(\\sqrt{d_T^2 - \\ell_T^2}\\right)^2 \\\\\n",
" & = 1.\n",
"\\end{align*}$$\n",
"\n",
"**QED**"
]
},
{
"cell_type": "markdown",
"id": "e451e475-a037-41ff-ac8a-f029bb1354c2",
"metadata": {},
"source": [
"To validate these calculations numerically we will use [Hypothesis](https://hypothesis.readthedocs.io/en/latest/), a Python library for generative testing. First we make the necessary Python imports."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "276824ae-cea5-4b80-b6b0-e764215cf068",
"metadata": {},
"outputs": [],
"source": [
"from hypothesis import assume, given\n",
"from hypothesis.strategies import integers, floats"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "6aec495e-0b02-4960-b54a-78386428a3f4",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np"
]
},
{
"cell_type": "markdown",
"id": "84d90ce3-f704-4b90-951e-1a684046b2f4",
"metadata": {},
"source": [
"The following function generates the covariance matrix, $\\Sigma_{\\rho}$, for given values of $\\rho$ and $T$."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a9b02f66-2f38-4d75-881b-994923576f17",
"metadata": {},
"outputs": [],
"source": [
"def get_cov_mat(ρ, T):\n",
" return np.eye(T) + ρ * (1 - np.eye(T))"
]
},
{
"cell_type": "markdown",
"id": "9fd548f0-28be-4509-bff6-6eba27f919c4",
"metadata": {},
"source": [
"The next function implements the closed-form solution for the Cholesky decomposition of $\\Sigma_{\\rho}$ presented above."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "0be83b6f-7f39-4315-8087-efef83436c40",
"metadata": {},
"outputs": [],
"source": [
"def get_cov_mat_chol(ρ, T):\n",
" diag = np.empty(T)\n",
" tril = np.empty(T)\n",
" \n",
" diag[0] = 1\n",
" tril[0] = ρ\n",
"\n",
" for j in range(1, T):\n",
" diag[j] = np.sqrt(diag[j - 1]**2 - tril[j - 1]**2)\n",
" tril[j] = (ρ - 1) / diag[j] + diag[j] if j < T - 1 else 0\n",
" \n",
" return np.diag(diag) + np.tril(tril, k=-1)"
]
},
{
"cell_type": "markdown",
"id": "2bd57e26-61fc-49b4-b1dd-8524ecdf3e27",
"metadata": {},
"source": [
"This final function is a Hypothesis test that verifies that we have, in fact, computed the Cholesky decomposition of $\\Sigma_{\\rho}$."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "5c884cca-5097-4369-bb3d-12025aec35cc",
"metadata": {
"jupyter": {
"source_hidden": true
},
"tags": []
},
"outputs": [],
"source": [
"@given(floats(-0.5, 1), integers(2, 1000))\n",
"def test_cov_mat_chol(ρ, T):\n",
" assume(-1 / (T - 1) <= ρ < 1)\n",
" \n",
" Σ = get_cov_mat(ρ, T)\n",
" chol = get_cov_mat_chol(ρ, T)\n",
"\n",
" # check that chol is lower diagonal\n",
" assert (chol == np.tril(chol)).all()\n",
" \n",
" # check that \"chol is a factorization of Σ\n",
" assert np.allclose(Σ, chol @ chol.T)"
]
},
{
"cell_type": "markdown",
"id": "ff81a694-7764-4031-8187-7596ba366226",
"metadata": {},
"source": [
"The [`given`](https://hypothesis.readthedocs.io/en/latest/details.html#hypothesis.given) decorator tells Hypothesis to generate floating point values in the interval $\\left[-\\frac{1}{2}, 1\\right]$ for $\\rho$ and integers between 2 and 1,000 for $T$. The upper bound on values generated for $T$ is not necessary but ensures the test runs in a reasonable amount of time. The [`assume`](https://hypothesis.readthedocs.io/en/latest/details.html#hypothesis.assume) call ensures that $\\rho$ is in the appropriate interval based on the size of the random vector. Given this guidance, Hypothesis will generate random values of $\\rho$ and $T$ to run the test with."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "b2b97060-f7ce-405a-9c7e-319ec1b61295",
"metadata": {},
"outputs": [],
"source": [
"test_cov_mat_chol()"
]
},
{
"cell_type": "markdown",
"id": "c9cf9cc2-aba7-4701-b128-45db1948d95e",
"metadata": {},
"source": [
"The test passes, numerically validating our closed-form solution for the Cholesky decomposition.\n",
"\n",
"This post is available as a Jupyter notebook [here](https://nbviewer.org/gist/AustinRochford/3e6fde7eaf9d5b93d27ba898bdac7f0f)."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "e0df5c57-9e4b-4017-88fd-9b269685693c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Last updated: Sun Sep 25 2022\n",
"\n",
"Python implementation: CPython\n",
"Python version : 3.10.6\n",
"IPython version : 8.4.0\n",
"\n",
"hypothesis: 6.54.6\n",
"\n",
"numpy: 1.23.2\n",
"\n"
]
}
],
"source": [
"%load_ext watermark\n",
"%watermark -n -u -v -iv -p hypothesis"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment