Skip to content

Instantly share code, notes, and snippets.

@baogorek
Last active February 3, 2023 04:29
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save baogorek/bf686bc512b61903e3b89477275f7534 to your computer and use it in GitHub Desktop.
Save baogorek/bf686bc512b61903e3b89477275f7534 to your computer and use it in GitHub Desktop.
Gist version of Miscellaneous/methods/kalman-filter/kalman-derivations.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# The (Simplified) Kalman Filter Model\n",
"The following is a simplified representation of the state space and measurement models underlying the Kalman Filter:\n",
"\n",
"$$\n",
"\\begin{align*}\n",
"x_t &= Ax_{t - 1} + q_t,\\\\\n",
"y_t &= H x_t + r_t,\n",
"\\end{align*}\n",
"$$\n",
"\n",
"where $q_t \\sim N(0, Q)$ and $r_t \\sim N(0, R)$. It is simplified in that it only uses time subscripting where absolutely neccessary and does not feature a control input, two things that do not change the argument to come."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Bayesian Setup for Filtering\n",
"The goal of \"filtering\" in\n",
"[Recursive Bayesian Estimation](https://en.wikipedia.org/wiki/Recursive_Bayesian_estimation#Sequential_Bayesian_filtering)\n",
"is \"estimating the current value given past and current observations,\" and thus it is helpful to create notation that separates the past and current observations. At any time $t > 1$, let $y_t$ be the current observation and $\\mathcal{Y}_{t-1}$ represent $(y_{t-1}, y_{t-2}, \\ldots, y_1)$.\n",
"\n",
"The following is the setup for inference about the unknown state, $x_t$ given $y_t, \\mathcal{Y}_{t-1}$.\n",
"\n",
"## Prior\n",
"$$x_t \\, | \\, \\mathcal{Y}_{t-1}$$\n",
"\n",
"## Likelihood\n",
"$$y_t \\, | \\, x_t, \\mathcal{Y}_{t-1} \\text{ which simplifies to } y_t \\, | \\, x_t$$\n",
"\n",
"## Posterior:\n",
"$$x_t \\, | \\, y_t, \\mathcal{Y}_{t - 1}$$\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"# Prior derivation\n",
"The conditional \"prior\" distribution $x_t | \\mathcal{Y}_{t - 1}$ depends on both knowledge of the state at the previous time point (given all *past* observations) and of the movement of the state since that time point. The following expansion of the prior pdf illustrates this:\n",
"\n",
"$$\n",
"\\begin{align*}\n",
"f(x_t \\, | \\, \\mathcal{Y}_{t - 1}) &= \\int f(x_t \\, | \\, x_{t - 1}, \\mathcal{Y}_{t - 1}) f(x_{t-1} \\, | \\, \\mathcal{Y}_{t - 1}) \\, d x_{t - 1} \\\\\n",
"&= \\int f(x_t \\, | \\, x_{t - 1}) f(x_{t-1} \\, | \\, \\mathcal{Y}_{t - 1}) \\, d x_{t - 1}\n",
"\\end{align*}\n",
"$$\n",
" \n",
"This expansion has a name, the \"Chapman-Kolmogorov equation.\" Note that the distribution of $x_t \\, | \\, x_{t - 1}$ is specified by the state-space model and is simply $N(A x_{t - 1}, Q)$. The distribution of $x_{t - 1} \\, | \\, \\mathcal{Y}_{t - 1}$ *is* the posterior distribution if we step back one unit in time. This is where the recursion comes in. Since it is the posterior distribution from the last time step, you can assume that you *have it*. In this article, we will call the previous time point's posterior mean and variance $\\mu_{t - 1}$ and $\\Sigma_{t - 1}$, respectively. We will show later that the posterior is also gaussian, and hence $x_{t-1} \\, | \\, \\mathcal{Y}_{t - 1} \\sim N(\\mu_{t - 1}, \\Sigma_{t - 1})$.\n",
"\n",
"There is a potentially high-dimensional integral involved. Fortunately, because the densities in that equation correspond to multivariate gaussians, we won't actually have to compute it. Consider the following result about multivariate gaussians:\n",
"\n",
"$$\n",
"\\begin{align*}\n",
"&\\textbf{If } \\, r \\sim N(\\mu_r, \\Sigma_r) \\, \\text{ and } \\, s|r \\sim N(W \\mu_r, \\Sigma_{s|r}), \\\\\n",
" & \\\\\n",
"& \\textbf{Then } \\, \\left(\\begin{matrix} \n",
"r \\\\\n",
"s \n",
"\\end{matrix}\\right) \\sim \n",
"N\\left[\\left(\\begin{matrix} \n",
"\\mu_r \\\\\n",
"W \\mu_r \n",
"\\end{matrix}\\right),\n",
"\\left(\\begin{matrix} \n",
"\\Sigma_r & \\Sigma_r W^T \\\\\n",
"W \\Sigma_r & W \\Sigma_r W^T + \\Sigma_{s | r} \n",
"\\end{matrix}\\right) \\right]. \\; \\; (*)\n",
"\\end{align*}\n",
"$$\n",
"\n",
"With $r$ corresponding to $x_{t-1} \\, | \\, \\mathcal{Y}_{t-1}$ and $s \\, | \\, r$ to $x_t \\, | \\, x_{t - 1}, \\mathcal{Y}_{t-1}$ , we have\n",
"\n",
"$$\n",
"\\left(\\begin{matrix} \n",
"x_{t-1}\\, | \\, \\mathcal{Y}_{t-1} \\\\\n",
"x_{t} \\, | \\, \\mathcal{Y}_{t-1}\n",
"\\end{matrix}\\right) \\sim \n",
"N\\left[\\left(\\begin{matrix} \n",
"\\mu_{t - 1} \\\\\n",
"A \\mu_{t - 1} \n",
"\\end{matrix}\\right),\n",
"\\left(\\begin{matrix} \n",
"\\Sigma_{t - 1} & \\Sigma_{t - 1} A^T \\\\\n",
"A \\Sigma_{t - 1} & A \\Sigma_{t - 1} A^T + Q\n",
"\\end{matrix}\\right) \\right].\n",
"$$\n",
"\n",
"The marginal distribution $x_t \\, | \\, \\mathcal{Y}_{t-1}$ is thus\n",
"$$\n",
"x_t \\, | \\, \\mathcal{Y}_{t-1} \\sim N(A \\mu_{t - 1}, A \\Sigma_{t - 1} A^T + Q)\n",
"$$\n",
"and is a coherent prior for the current state given past measurements and knowledge of the system dynamics. It does assume we have a posterior in hand from the time step before. Since the covariance matrix of this prior will be used in expressions to come, we will give it a name: $$P_t = A\\Sigma_{t - 1} A^T + Q$$.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Likelihood derivation\n",
"\n",
"From the measurement model,\n",
" \n",
"$$y_t | x_t, \\mathcal{Y}_{t - 1} = y_t | x_t \\sim N(H x_t, R).$$ That $\\mathcal{Y}_{t - 1}$ is conditionally independent of $y_t$ given $x_t$ is a reminder of the importance of getting the state formulation right - given its value, measurements from the past carry no additional information.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Posterior derivation\n",
"\n",
"Reusing the result $(*)$ from from the prior derivation with prior $x_t \\, | \\, \\mathcal{Y}_{t-1}$ as the marginal gaussian and $y_t \\, | \\, x_t, \\mathcal{Y}_{t-1}$ as the conditional gaussian, we arrive at the joint distribution:\n",
"\n",
"$$\n",
"\\left(\\begin{matrix} \n",
"x_t \\, | \\, \\mathcal{Y}_{t-1} \\\\\n",
"y_t \\, | \\, \\mathcal{Y}_{t-1} \n",
"\\end{matrix}\\right) \\sim \n",
"N\\left[ \\left(\\begin{matrix} \n",
"A \\mu_{t - 1} \\\\\n",
"H A \\mu_{t - 1}\n",
"\\end{matrix}\\right),\n",
"\\left(\\begin{matrix} \n",
"P_t & P_t H^T \\\\\n",
"H P_t & H P_t H^T + R \n",
"\\end{matrix}\\right) \\right].$$\n",
"\n",
"Now that we have a joint multivariate gaussian, we can use this result for conditional distributions to get $x_t \\, | \\, y_t, \\mathcal{Y}_{t-1}$:\n",
"\n",
"$$\n",
"\\begin{align*}\n",
"&\\textbf{If} \\quad \\quad\n",
"\\left(\\begin{matrix} \n",
"r \\\\\n",
"s \n",
"\\end{matrix}\\right) \\sim\n",
"N\\left[\\left(\\begin{matrix} \n",
"\\mu_r \\\\\n",
"\\mu_s \n",
"\\end{matrix}\\right),\n",
"\\left(\\begin{matrix} \n",
"\\Sigma_r & C \\\\\n",
"C^T & \\Sigma_s \n",
"\\end{matrix}\\right) \\right], \\\\\n",
"& \\\\\n",
"&\\textbf{Then } \\quad \\, r|s \\; \\, \\sim N\\left[\\mu_r + C \\Sigma_s^{-1} (s - \\mu_s), \\; \\Sigma_r - C \\Sigma_s^{-1}C^T\\right].\n",
"\\end{align*}\n",
"$$\n",
"\n",
"Renaming the variance of $y_t \\, | \\, \\mathcal{Y}_{t-1}$ as\n",
"\n",
"$$\n",
"S_t = H P_t H^T + R,\n",
"$$ we arrive at the posterior\n",
"\n",
"$$\n",
"x_t \\, | \\, y_t, \\mathcal{Y}_{t-1} \\sim N(A \\mu_{t - 1} + P_t H^T S_t^{-1} (y_t - H A \\mu_{t - 1}), \\;\n",
" P_t - P_t H^T S_t^{-1} H P_t).$$\n",
" "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
@khaiyichin
Copy link

This was helpful. Can you point to me where I can find proof for (*)?

@baogorek
Copy link
Author

baogorek commented Aug 13, 2022

Hi @khaiyichin ,

I was just about to give up, and then (I think) I found it. Check out Section 4 here: https://davidrosenberg.github.io/mlcourse/in-prep/multivariate-gaussian.pdf

Let me know if that's not it or if it gives you any insights. I think I trusted the slides that motivated the Medium article where I showed this gist.

Ben

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment