Skip to content

Instantly share code, notes, and snippets.

@dhermes
Last active September 17, 2018 21:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dhermes/d72e36c40626bd93a4a02704ee79c7d1 to your computer and use it in GitHub Desktop.
Save dhermes/d72e36c40626bd93a4a02704ee79c7d1 to your computer and use it in GitHub Desktop.
GMRES Notes and Code
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let $A \\in \\mathbf{R}^{M \\times M}$ be an arbitrary matrix and $b \\in \\mathbf{R}^M$ an arbitrary vector. We want to find a solution to\n",
"$$Ax = b$$\n",
"for $x \\in \\mathbf{R}^M$.\n",
"\n",
"Directly this means $x = A^{-1} b$.\n",
"\n",
"Since $A$ must have a minimal polynomial $m_A(t)$ such that\n",
"$$m_A(A) = \\mathbf{0}_{M \\times M}$$\n",
"If we could find a polynomial $I_A(t)$ such that\n",
"$$t I_A(t) \\equiv 1 \\bmod{m_A(t)}$$\n",
"then\n",
"$$A^{-1} = I_A(A)$$\n",
"since\n",
"$$A I_A(A) = I_M + q(A) m_A(A) = I_M$$\n",
"for some quotient polynomial $q(t)$.\n",
"\n",
"**NOTE**: This is problematic if $m_A(0) = 0$, so we assume $A$ is invertible (it's not too harmful if we hope to uniquely solve $Ax = b$). If not, we can perfom the Euclidean algorithm on $t, m_A(t)$ to find\n",
"$$I_A(t) \\cdot t + q(t) \\cdot m_A(t) = 1$$\n",
"since $t$ and $m_A(t)$ must be coprime when $m_A(0) \\neq 0$."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"----"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In light of the above, we see that\n",
"$$A^{-1} b \\in \\mathcal{K}_M\\left(A, b\\right) = \\operatorname{Span}\\left\\{b, Ab, A^2 b, \\ldots, A^{M - 1}b \\right\\}.$$\n",
"\n",
"In order to approximate the solution, we start with a guess $x_0$ (typically $x_0 = \\mathbf{0}_M$, but we may have extra information).\n",
"\n",
"In this generic case we want\n",
"$$A(x_0 + z) = b \\Longleftrightarrow A z = b - A x_0 = r_0$$\n",
"so that $z \\in \\mathcal{K}_M\\left(A, r_0\\right)$."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Approximations\n",
"\n",
"Then we want to find\n",
"$$x_k = x_0 + z_k, \\qquad z_k \\in \\mathcal{K}_k\\left(A, r_0\\right)$$\n",
"such that\n",
"$$\\left\\lVert b - Ax_k\\right\\rVert_2 = \\min_{z_k \\in \\mathcal{K}_k\\left(A, r_0\\right)} \\left\\lVert r_0 - A z_k\\right\\rVert_2.$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Restarts\n",
"\n",
"Note that if we restart after $m$ steps then\n",
"$$x_0 \\in \\mathcal{K}_m\\left(A, b\\right) \\Longrightarrow r_0 \\in \\mathcal{K}_{m + 1}\\left(A, b\\right) \\Longrightarrow \\mathcal{K}_{\\ell}\\left(A, r_0\\right) \\subseteq \\mathcal{K}_{m + \\ell}\\left(A, b\\right).$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic Algorithm\n",
"\n",
"We start with\n",
"$$q_0 = \\frac{r_0}{\\left\\lVert r_0 \\right\\rVert_2}.$$\n",
"To find $q_1$ with $\\left\\langle q_0, q_1\\right\\rangle_2 = 0$ we write\n",
"$$A q_0 = h_{00} q_0 + h_{10} q_1$$\n",
"and solve via\n",
"$$\\begin{align*}\n",
"w &= A q_0 \\\\\n",
"\\Rightarrow h_{00} &= \\left\\langle w, q_0\\right\\rangle_2 \\\\\n",
"\\Rightarrow w &= w - h_{00} q_0 \\\\\n",
"\\Rightarrow h_{10} &= \\|w\\|_2 \\\\\n",
"\\Rightarrow q_1 &= \\frac{w}{h_{10}}\n",
"\\end{align*}$$\n",
"\n",
"After each step we write\n",
"$$A q_j = h_{0j} q_0 + h_{1j} q_1 + \\cdots + h_{j+1,j} q_{j + 1}$$\n",
"and recursively knock off each coefficient in $w = A q_j$ as above (utilizing the fact that $\\left\\langle q_j, q_k\\right\\rangle_2 = \\delta_{jk}$). (This process is very similar to modified Gram-Schmidt.)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"----"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The method we've construced above is equivalent to a modified QR algorithm\n",
"$$AQ_n = Q_{n+1} H_n, \\qquad Q_n \\in \\mathbf{R}^{M \\times n}, H_n \\in \\mathbf{R}^{(n + 1) \\times n}$$\n",
"where\n",
"$$Q_j = \\left[ \\begin{array}{c c c c} q_0 & q_1 & \\cdots & q_{j - 1}\\end{array}\\right].$$\n",
"More importantly, by construction we have\n",
"$$\\mathcal{K}_j\\left(A, r_0\\right) = \\operatorname{Span}\\left\\{q_0, q_1, \\ldots, q_{j-1}\\right\\}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"----"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Due to the spanning property above\n",
"$$y \\in \\mathbf{R}^n \\Longrightarrow Q_n y \\in \\mathcal{K}_n\\left(A, r_0\\right).$$\n",
"More importantly, we see that \n",
"$$\\operatorname{Image}\\left(Q_n\\right) = \\mathcal{K}_n\\left(A, r_0\\right).$$\n",
"This means that we want to find\n",
"$$\\min_{y \\in \\mathbf{R}^n} \\left\\lVert r_0 - A Q_n y\\right\\rVert_2 = \\min_{y \\in \\mathbf{R}^n} \\left\\lVert r_0 - Q_{n + 1} H_n y\\right\\rVert_2 = \\min_{y \\in \\mathbf{R}^n} \\left\\lVert Q_{n + 1}^{\\ast} r_0 - H_n y\\right\\rVert_2 = \\min_{y \\in \\mathbf{R}^n} \\left\\lVert \\|r_0\\|_2 \\cdot e_0 - H_n y\\right\\rVert_2$$\n",
"where $e_0$ is the first standard basis vector (this is since the first column of $Q_{n+1}$ is parallel to $r_0$ and the rest are perpendicular by construction)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Computing Norm-Minimizing $y$\n",
"\n",
"In order to find a minimizer of $\\beta e_0 - H_n y$, we apply a Givens rotation for each of the $n$ non-zero elements below the diagonal in $H_n$. For example, to remove the $h_{1,0}$ element (in column $0$) we apply\n",
"$$G_0 = \\left[ \\begin{array}{c c c c c} \\cos \\theta_0 & - \\sin \\theta_0 & & & \\\\\n",
"\\sin \\theta_0 & \\cos \\theta_0 & & & \\\\\n",
"& & 1 & & \\\\\n",
"& & & \\ddots & \\\\\n",
"& & & & 1\n",
"\\end{array}\\right] \\in \\mathbf{R}^{(n + 1) \\times (n + 1)}$$\n",
"where\n",
"$$\\cos \\theta_0 = \\frac{h_{0,0}}{\\sqrt{h_{0,0}^2 + h_{1,0}^2}}, \\qquad \\sin \\theta_0 = \\frac{- h_{1,0}}{\\sqrt{h_{0,0}^2 + h_{1,0}^2}}$$\n",
"\n",
"Writing each update as $G_0, G_1, \\ldots$ we compose\n",
"$$G^{(n)} = G_{n - 1} \\cdots G_1 G_0 \\in \\mathbf{R}^{(n + 1) \\times (n + 1)}.$$\n",
"In words, this means we zero out the subdiagonal element in the first column ($G_0$), then the second column ($G_1$) and so on."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Upper Triangular System\n",
"\n",
"After removing the below diagonal elements in the Hessenberg $H_n \\in \\mathbf{R}^{(n + 1) \\times n}$, we have\n",
"$$G^{(n)} H_n = \\left[ \\begin{array}{c} R_n \\\\ \\hline \\mathbf{0}_{1 \\times n} \\end{array}\\right]$$\n",
"for $R_n \\in \\mathbf{R}^{n \\times n}$ and upper triangular. Then we want to minimize the error in\n",
"$$G^{(n)} H_n y - \\beta G^{(n)} e_0$$\n",
"since $G^{(n)}$ is unitary and the two norm is unitarily invariant.\n",
"\n",
"Noting that $e_0 \\in \\mathbf{R}^{n + 1}$, we write\n",
"$$\\beta G^{(n)} e_0 = \\left[ \\begin{array}{c} g_n \\\\ \\hline \\gamma_n \\end{array}\\right]$$\n",
"for $g_n \\in \\mathbf{R}^n, \\gamma_n \\in \\mathbf{R}$.\n",
"\n",
"This is minimized when $R_n y_n = g_n$ and the minimum residual is $\\gamma_n$."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Incremental Updates: $H_n$\n",
"\n",
"Going from $n$ to $n + 1$, many terms are preserved and we can re-use many computations. By the process for computing successive values $q_j$ and filling out columns of $H_n$ we see\n",
"$$H_{n + 1} = \\left[ \\begin{array}{c | c} H_n & h^{(n + 1)} \\\\ \\hline \\mathbf{0}_{1 \\times n} & h_{n + 1, n} \\end{array}\\right] \\in \\mathbf{R}^{(n + 2) \\times (n + 1)}$$\n",
"First to re-use the existing givens rotations we have\n",
"$$\\left[ \\begin{array}{c | c} G^{(n)} & \\mathbf{0}_{(n + 1) \\times 1} \\\\ \\hline \\mathbf{0}_{1 \\times (n + 1)} & 1 \\end{array}\\right] H_{n + 1} = \\left[ \\begin{array}{c | c} R_n & \\widehat{h}^{(n + 1)} \\\\ \\hline \\mathbf{0}_{1 \\times n} & \\nu_{n+1} \\\\ \\hline \\mathbf{0}_{1 \\times n} & h_{n + 1, n} \\end{array}\\right]$$\n",
"where\n",
"$$G^{(n)} h^{(n + 1)} = \\left[ \\begin{array}{c} \\widehat{h}^{(n + 1)} \\\\ \\hline \\nu_{n+1} \\end{array}\\right], \\qquad \\widehat{h}^{(n + 1)} \\in \\mathbf{R}^{n}, \\quad \\nu_{n + 1} \\in \\mathbf{R}.$$\n",
"\n",
"From the above, we see the final Givens rotation ($G_n$) required to zero out the below diagonal part in the final column is given by\n",
"$$\\cos \\theta_n = \\frac{\\nu_{n + 1}}{\\sqrt{\\nu_{n + 1}^2 + h_{n + 1, n}^2}}, \\qquad \\sin \\theta_n = \\frac{- h_{n + 1, n}}{\\sqrt{\\nu_{n + 1}^2 + h_{n + 1, n}^2}}$$\n",
"\n",
"With this\n",
"$$G^{(n + 1)} = \\left[ \\begin{array}{c | c c} I_{n \\times n} & \\mathbf{0}_{n \\times 1} & \\mathbf{0}_{n \\times 1} \\\\ \\hline \\mathbf{0}_{1 \\times n} & \\cos \\theta_n & - \\sin \\theta_n \\\\ \\mathbf{0}_{1 \\times n} & \\sin \\theta_n & \\cos \\theta_n \\end{array}\\right] \\left[ \\begin{array}{c | c} G^{(n)} & \\mathbf{0}_{(n + 1) \\times 1} \\\\ \\hline \\mathbf{0}_{1 \\times (n + 1)} & 1 \\end{array}\\right]$$\n",
"\n",
"Somewhat importantly, **we don't need to perform a matrix multiply**. We only need to update two entries:\n",
"$$\\nu_{n + 1} \\longmapsto \\sqrt{\\nu_{n + 1}^2 + h_{n + 1, n}^2}, \\qquad h_{n + 1, n} \\longmapsto 0.$$\n",
"\n",
"Given all this we have\n",
"$$R_{n + 1} = \\left[ \\begin{array}{c | c} R_n & \\widehat{h}^{(n + 1)} \\\\ \\hline \\mathbf{0}_{1 \\times n} & \\sqrt{\\nu_{n + 1}^2 + h_{n + 1, n}^2} \\end{array}\\right].$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Incremental Updates: $\\widehat{h}^{(n + 1)}$\n",
"\n",
"We have to store each $q_j \\in \\mathbf{R}^M$ in order to compute:\n",
"$$h^{(n + 1)} = \\left[ \\begin{array}{c} h_{0, n} \\\\ h_{1, n} \\\\ \\vdots \\\\ h_{n, n} \\end{array}\\right] \\in \\mathbf{R}^{n + 1} \\quad \\text{and} \\quad h_{n + 1, n} \\in \\mathbf{R}$$\n",
"\n",
"If we keep around the values\n",
"$$c_j = \\cos \\theta_j, \\quad s_j = \\sin \\theta_j$$\n",
"then we can transform\n",
"$$\\left[ \\begin{array}{c} h_{0, n} \\\\ h_{1, n} \\\\ \\vdots \\\\ h_{n, n} \\end{array}\\right] \\longmapsto \\left[ \\begin{array}{c} c_0 h_{0, n} - s_0 h_{1, n} \\\\ s_0 h_{0, n} + c_0 h_{1, n} \\\\ \\vdots \\\\ h_{n, n} \\end{array}\\right] = \\left[ \\begin{array}{c} A \\\\ B \\\\ C \\\\ \\vdots \\end{array}\\right] \\longmapsto \\left[ \\begin{array}{c} A \\\\ c_1 B - s_1 C \\\\ s_1 B + c_1 C \\\\ \\vdots \\end{array}\\right]$$\n",
"\n",
"This allows us to **store** $G^{(n)}$ with $\\mathcal{O}\\left(n\\right)$ instead of $\\mathcal{O}\\left(n^2\\right)$ entries and to **apply** $G^{(n)}$ to a **vector** in $\\mathcal{O}\\left(n\\right)$ instead of $\\mathcal{O}\\left(n^2\\right)$ operations."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Incremental Updates: $g_n$\n",
"\n",
"By the above definition $e_0 \\in \\mathbf{R}^{n + 1}$\n",
"$$\\beta G^{(n)} e_0 = \\left[ \\begin{array}{c} g_n \\\\ \\hline \\gamma_n \\end{array}\\right], \\qquad\n",
"\\beta G^{(n + 1)} \\left[ \\begin{array}{c} e_0 \\\\ \\hline 0 \\end{array}\\right] = \\left[ \\begin{array}{c} g_{n + 1} \\\\ \\hline \\gamma_{n + 1} \\end{array}\\right].$$\n",
"for $g_n \\in \\mathbf{R}^n, \\gamma_n \\in \\mathbf{R}$.\n",
"\n",
"By the above definition of $G^{(n + 1)}$\n",
"$$\\begin{align*}\n",
"\\left[ \\begin{array}{c} g_{n + 1} \\\\ \\hline \\gamma_{n + 1} \\end{array}\\right] &= \\beta \\left[ \\begin{array}{c | c c} I_{n \\times n} & \\mathbf{0}_{n \\times 1} & \\mathbf{0}_{n \\times 1} \\\\ \\hline \\mathbf{0}_{1 \\times n} & \\cos \\theta_n & - \\sin \\theta_n \\\\ \\mathbf{0}_{1 \\times n} & \\sin \\theta_n & \\cos \\theta_n \\end{array}\\right] \\left[ \\begin{array}{c | c} G^{(n)} & \\mathbf{0}_{(n + 1) \\times 1} \\\\ \\hline \\mathbf{0}_{1 \\times (n + 1)} & 1 \\end{array}\\right] \\left[ \\begin{array}{c} e_0 \\\\ \\hline 0 \\end{array}\\right] \\\\\n",
"&= \\left[ \\begin{array}{c | c c} I_{n \\times n} & \\mathbf{0}_{n \\times 1} & \\mathbf{0}_{n \\times 1} \\\\ \\hline \\mathbf{0}_{1 \\times n} & \\cos \\theta_n & - \\sin \\theta_n \\\\ \\mathbf{0}_{1 \\times n} & \\sin \\theta_n & \\cos \\theta_n \\end{array}\\right] \\left[ \\begin{array}{c} g_n \\\\ \\hline \\gamma_n \\\\ \\hline 0 \\end{array}\\right] \\\\\n",
"&= \\left[ \\begin{array}{c} g_n \\\\ \\hline \\cos \\theta_n \\, \\gamma_n \\\\ \\hline \\sin \\theta_n \\, \\gamma_n \\end{array}\\right]\n",
"\\end{align*}$$\n",
"\n",
"So we see\n",
"$$g_{n + 1} = \\left[ \\begin{array}{c} g_n \\\\ \\hline \\cos \\theta_n \\, \\gamma_n \\end{array}\\right], \\qquad \\gamma_{n + 1} = \\sin \\theta_n \\, \\gamma_n.$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Huge Win\n",
"\n",
"From the above, we see that with linear work we can update our residual error, never having to actually compute $y_n$ or $x_n$.\n",
"\n",
"Once we have\n",
"$$\\left|\\gamma_n\\right| \\leq \\varepsilon_{\\text{threshold}}$$\n",
"we can then compute\n",
"$$y_n = R_n^{-1} g_n$$\n",
"\n",
"Recall that we used\n",
"$$b - A x_k = r_0 - A z_k = r_0 - A Q_n y = r_0 - Q_{n + 1} H_n y$$\n",
"so we get\n",
"$$x_n = x_0 + z_n = x_0 + Q_n R_n^{-1} g_n.$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Bad Idea; Incremental Updates: $y_n$\n",
"\n",
"By the above observation, we don't need to compute $y_n$ at every step. We'll see here that doing so would require a **new triangular solve**, which requires $\\mathcal{O}\\left(n^2\\right)$ work.\n",
"\n",
"----\n",
"\n",
"Now that we see $g_{n + 1}$ contains $g_n$ and $R_{n + 1}$ contains $R_n$, it appears that\n",
"$$y_n = R_n^{-1} g_n, \\qquad y_{n + 1} = R_{n + 1}^{-1} g_{n + 1}$$\n",
"will be closely related.\n",
"\n",
"To solve the new system, we write\n",
"$$\\left[ \\begin{array}{c | c} R_n & \\widehat{h}^{(n + 1)} \\\\ \\hline \\mathbf{0}_{1 \\times n} & \\sqrt{\\nu_{n + 1}^2 + h_{n + 1, n}^2} \\end{array}\\right] \\left[ \\begin{array}{c} \\widehat{y_n} \\\\ \\hline \\alpha_n \\end{array}\\right] = \\left[ \\begin{array}{c} g_n \\\\ \\hline \\cos \\theta_n \\, \\gamma_n \\end{array}\\right]$$\n",
"for $\\alpha_n \\in \\mathbf{R}, \\widehat{y_n} \\in \\mathbf{R}^n.$\n",
"\n",
"This means that\n",
"$$\\alpha_n = \\frac{\\cos \\theta_n \\, \\gamma_n}{\\sqrt{\\nu_{n + 1}^2 + h_{n + 1, n}^2}}$$\n",
"and\n",
"$$R_n \\, \\widehat{y_n} = g_n - \\alpha_n \\widehat{h}^{(n + 1)} \\Longrightarrow R_n \\left(\\widehat{y_n} - y_n\\right) = -\\alpha_n \\widehat{h}^{(n + 1)}.$$\n",
"\n",
"As mentioned above, computing the update $\\boxed{\\widehat{y_n} - y_n}$ requires a triangular solve."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment