Last active
September 17, 2018 21:19
-
-
Save dhermes/d72e36c40626bd93a4a02704ee79c7d1 to your computer and use it in GitHub Desktop.
GMRES Notes and Code
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let $A \\in \\mathbf{R}^{M \\times M}$ be an arbitrary matrix and $b \\in \\mathbf{R}^M$ an arbitrary vector. We want to find a solution to\n", | |
"$$Ax = b$$\n", | |
"for $x \\in \\mathbf{R}^M$.\n", | |
"\n", | |
"Directly this means $x = A^{-1} b$.\n", | |
"\n", | |
"Since $A$ must have a minimal polynomial $m_A(t)$ such that\n", | |
"$$m_A(A) = \\mathbf{0}_{M \\times M}$$\n", | |
"If we could find a polynomial $I_A(t)$ such that\n", | |
"$$t I_A(t) \\equiv 1 \\bmod{m_A(t)}$$\n", | |
"then\n", | |
"$$A^{-1} = I_A(A)$$\n", | |
"since\n", | |
"$$A I_A(A) = I_M + q(A) m_A(A) = I_M$$\n", | |
"for some quotient polynomial $q(t)$.\n", | |
"\n", | |
"**NOTE**: This is problematic if $m_A(0) = 0$, so we assume $A$ is invertible (it's not too harmful if we hope to uniquely solve $Ax = b$). If not, we can perfom the Euclidean algorithm on $t, m_A(t)$ to find\n", | |
"$$I_A(t) \\cdot t + q(t) \\cdot m_A(t) = 1$$\n", | |
"since $t$ and $m_A(t)$ must be coprime when $m_A(0) \\neq 0$." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"----" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"In light of the above, we see that\n", | |
"$$A^{-1} b \\in \\mathcal{K}_M\\left(A, b\\right) = \\operatorname{Span}\\left\\{b, Ab, A^2 b, \\ldots, A^{M - 1}b \\right\\}.$$\n", | |
"\n", | |
"In order to approximate the solution, we start with a guess $x_0$ (typically $x_0 = \\mathbf{0}_M$, but we may have extra information).\n", | |
"\n", | |
"In this generic case we want\n", | |
"$$A(x_0 + z) = b \\Longleftrightarrow A z = b - A x_0 = r_0$$\n", | |
"so that $z \\in \\mathcal{K}_M\\left(A, r_0\\right)$." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Approximations\n", | |
"\n", | |
"Then we want to find\n", | |
"$$x_k = x_0 + z_k, \\qquad z_k \\in \\mathcal{K}_k\\left(A, r_0\\right)$$\n", | |
"such that\n", | |
"$$\\left\\lVert b - Ax_k\\right\\rVert_2 = \\min_{z_k \\in \\mathcal{K}_k\\left(A, r_0\\right)} \\left\\lVert r_0 - A z_k\\right\\rVert_2.$$" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Restarts\n", | |
"\n", | |
"Note that if we restart after $m$ steps then\n", | |
"$$x_0 \\in \\mathcal{K}_m\\left(A, b\\right) \\Longrightarrow r_0 \\in \\mathcal{K}_{m + 1}\\left(A, b\\right) \\Longrightarrow \\mathcal{K}_{\\ell}\\left(A, r_0\\right) \\subseteq \\mathcal{K}_{m + \\ell}\\left(A, b\\right).$$" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Basic Algorithm\n", | |
"\n", | |
"We start with\n", | |
"$$q_0 = \\frac{r_0}{\\left\\lVert r_0 \\right\\rVert_2}.$$\n", | |
"To find $q_1$ with $\\left\\langle q_0, q_1\\right\\rangle_2 = 0$ we write\n", | |
"$$A q_0 = h_{00} q_0 + h_{10} q_1$$\n", | |
"and solve via\n", | |
"$$\\begin{align*}\n", | |
"w &= A q_0 \\\\\n", | |
"\\Rightarrow h_{00} &= \\left\\langle w, q_0\\right\\rangle_2 \\\\\n", | |
"\\Rightarrow w &= w - h_{00} q_0 \\\\\n", | |
"\\Rightarrow h_{10} &= \\|w\\|_2 \\\\\n", | |
"\\Rightarrow q_1 &= \\frac{w}{h_{10}}\n", | |
"\\end{align*}$$\n", | |
"\n", | |
"After each step we write\n", | |
"$$A q_j = h_{0j} q_0 + h_{1j} q_1 + \\cdots + h_{j+1,j} q_{j + 1}$$\n", | |
"and recursively knock off each coefficient in $w = A q_j$ as above (utilizing the fact that $\\left\\langle q_j, q_k\\right\\rangle_2 = \\delta_{jk}$). (This process is very similar to modified Gram-Schmidt.)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"----" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The method we've construced above is equivalent to a modified QR algorithm\n", | |
"$$AQ_n = Q_{n+1} H_n, \\qquad Q_n \\in \\mathbf{R}^{M \\times n}, H_n \\in \\mathbf{R}^{(n + 1) \\times n}$$\n", | |
"where\n", | |
"$$Q_j = \\left[ \\begin{array}{c c c c} q_0 & q_1 & \\cdots & q_{j - 1}\\end{array}\\right].$$\n", | |
"More importantly, by construction we have\n", | |
"$$\\mathcal{K}_j\\left(A, r_0\\right) = \\operatorname{Span}\\left\\{q_0, q_1, \\ldots, q_{j-1}\\right\\}$$" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"----" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Due to the spanning property above\n", | |
"$$y \\in \\mathbf{R}^n \\Longrightarrow Q_n y \\in \\mathcal{K}_n\\left(A, r_0\\right).$$\n", | |
"More importantly, we see that \n", | |
"$$\\operatorname{Image}\\left(Q_n\\right) = \\mathcal{K}_n\\left(A, r_0\\right).$$\n", | |
"This means that we want to find\n", | |
"$$\\min_{y \\in \\mathbf{R}^n} \\left\\lVert r_0 - A Q_n y\\right\\rVert_2 = \\min_{y \\in \\mathbf{R}^n} \\left\\lVert r_0 - Q_{n + 1} H_n y\\right\\rVert_2 = \\min_{y \\in \\mathbf{R}^n} \\left\\lVert Q_{n + 1}^{\\ast} r_0 - H_n y\\right\\rVert_2 = \\min_{y \\in \\mathbf{R}^n} \\left\\lVert \\|r_0\\|_2 \\cdot e_0 - H_n y\\right\\rVert_2$$\n", | |
"where $e_0$ is the first standard basis vector (this is since the first column of $Q_{n+1}$ is parallel to $r_0$ and the rest are perpendicular by construction)." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Computing Norm-Minimizing $y$\n", | |
"\n", | |
"In order to find a minimizer of $\\beta e_0 - H_n y$, we apply a Givens rotation for each of the $n$ non-zero elements below the diagonal in $H_n$. For example, to remove the $h_{1,0}$ element (in column $0$) we apply\n", | |
"$$G_0 = \\left[ \\begin{array}{c c c c c} \\cos \\theta_0 & - \\sin \\theta_0 & & & \\\\\n", | |
"\\sin \\theta_0 & \\cos \\theta_0 & & & \\\\\n", | |
"& & 1 & & \\\\\n", | |
"& & & \\ddots & \\\\\n", | |
"& & & & 1\n", | |
"\\end{array}\\right] \\in \\mathbf{R}^{(n + 1) \\times (n + 1)}$$\n", | |
"where\n", | |
"$$\\cos \\theta_0 = \\frac{h_{0,0}}{\\sqrt{h_{0,0}^2 + h_{1,0}^2}}, \\qquad \\sin \\theta_0 = \\frac{- h_{1,0}}{\\sqrt{h_{0,0}^2 + h_{1,0}^2}}$$\n", | |
"\n", | |
"Writing each update as $G_0, G_1, \\ldots$ we compose\n", | |
"$$G^{(n)} = G_{n - 1} \\cdots G_1 G_0 \\in \\mathbf{R}^{(n + 1) \\times (n + 1)}.$$\n", | |
"In words, this means we zero out the subdiagonal element in the first column ($G_0$), then the second column ($G_1$) and so on." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Upper Triangular System\n", | |
"\n", | |
"After removing the below diagonal elements in the Hessenberg $H_n \\in \\mathbf{R}^{(n + 1) \\times n}$, we have\n", | |
"$$G^{(n)} H_n = \\left[ \\begin{array}{c} R_n \\\\ \\hline \\mathbf{0}_{1 \\times n} \\end{array}\\right]$$\n", | |
"for $R_n \\in \\mathbf{R}^{n \\times n}$ and upper triangular. Then we want to minimize the error in\n", | |
"$$G^{(n)} H_n y - \\beta G^{(n)} e_0$$\n", | |
"since $G^{(n)}$ is unitary and the two norm is unitarily invariant.\n", | |
"\n", | |
"Noting that $e_0 \\in \\mathbf{R}^{n + 1}$, we write\n", | |
"$$\\beta G^{(n)} e_0 = \\left[ \\begin{array}{c} g_n \\\\ \\hline \\gamma_n \\end{array}\\right]$$\n", | |
"for $g_n \\in \\mathbf{R}^n, \\gamma_n \\in \\mathbf{R}$.\n", | |
"\n", | |
"This is minimized when $R_n y_n = g_n$ and the minimum residual is $\\gamma_n$." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Incremental Updates: $H_n$\n", | |
"\n", | |
"Going from $n$ to $n + 1$, many terms are preserved and we can re-use many computations. By the process for computing successive values $q_j$ and filling out columns of $H_n$ we see\n", | |
"$$H_{n + 1} = \\left[ \\begin{array}{c | c} H_n & h^{(n + 1)} \\\\ \\hline \\mathbf{0}_{1 \\times n} & h_{n + 1, n} \\end{array}\\right] \\in \\mathbf{R}^{(n + 2) \\times (n + 1)}$$\n", | |
"First to re-use the existing givens rotations we have\n", | |
"$$\\left[ \\begin{array}{c | c} G^{(n)} & \\mathbf{0}_{(n + 1) \\times 1} \\\\ \\hline \\mathbf{0}_{1 \\times (n + 1)} & 1 \\end{array}\\right] H_{n + 1} = \\left[ \\begin{array}{c | c} R_n & \\widehat{h}^{(n + 1)} \\\\ \\hline \\mathbf{0}_{1 \\times n} & \\nu_{n+1} \\\\ \\hline \\mathbf{0}_{1 \\times n} & h_{n + 1, n} \\end{array}\\right]$$\n", | |
"where\n", | |
"$$G^{(n)} h^{(n + 1)} = \\left[ \\begin{array}{c} \\widehat{h}^{(n + 1)} \\\\ \\hline \\nu_{n+1} \\end{array}\\right], \\qquad \\widehat{h}^{(n + 1)} \\in \\mathbf{R}^{n}, \\quad \\nu_{n + 1} \\in \\mathbf{R}.$$\n", | |
"\n", | |
"From the above, we see the final Givens rotation ($G_n$) required to zero out the below diagonal part in the final column is given by\n", | |
"$$\\cos \\theta_n = \\frac{\\nu_{n + 1}}{\\sqrt{\\nu_{n + 1}^2 + h_{n + 1, n}^2}}, \\qquad \\sin \\theta_n = \\frac{- h_{n + 1, n}}{\\sqrt{\\nu_{n + 1}^2 + h_{n + 1, n}^2}}$$\n", | |
"\n", | |
"With this\n", | |
"$$G^{(n + 1)} = \\left[ \\begin{array}{c | c c} I_{n \\times n} & \\mathbf{0}_{n \\times 1} & \\mathbf{0}_{n \\times 1} \\\\ \\hline \\mathbf{0}_{1 \\times n} & \\cos \\theta_n & - \\sin \\theta_n \\\\ \\mathbf{0}_{1 \\times n} & \\sin \\theta_n & \\cos \\theta_n \\end{array}\\right] \\left[ \\begin{array}{c | c} G^{(n)} & \\mathbf{0}_{(n + 1) \\times 1} \\\\ \\hline \\mathbf{0}_{1 \\times (n + 1)} & 1 \\end{array}\\right]$$\n", | |
"\n", | |
"Somewhat importantly, **we don't need to perform a matrix multiply**. We only need to update two entries:\n", | |
"$$\\nu_{n + 1} \\longmapsto \\sqrt{\\nu_{n + 1}^2 + h_{n + 1, n}^2}, \\qquad h_{n + 1, n} \\longmapsto 0.$$\n", | |
"\n", | |
"Given all this we have\n", | |
"$$R_{n + 1} = \\left[ \\begin{array}{c | c} R_n & \\widehat{h}^{(n + 1)} \\\\ \\hline \\mathbf{0}_{1 \\times n} & \\sqrt{\\nu_{n + 1}^2 + h_{n + 1, n}^2} \\end{array}\\right].$$" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Incremental Updates: $\\widehat{h}^{(n + 1)}$\n", | |
"\n", | |
"We have to store each $q_j \\in \\mathbf{R}^M$ in order to compute:\n", | |
"$$h^{(n + 1)} = \\left[ \\begin{array}{c} h_{0, n} \\\\ h_{1, n} \\\\ \\vdots \\\\ h_{n, n} \\end{array}\\right] \\in \\mathbf{R}^{n + 1} \\quad \\text{and} \\quad h_{n + 1, n} \\in \\mathbf{R}$$\n", | |
"\n", | |
"If we keep around the values\n", | |
"$$c_j = \\cos \\theta_j, \\quad s_j = \\sin \\theta_j$$\n", | |
"then we can transform\n", | |
"$$\\left[ \\begin{array}{c} h_{0, n} \\\\ h_{1, n} \\\\ \\vdots \\\\ h_{n, n} \\end{array}\\right] \\longmapsto \\left[ \\begin{array}{c} c_0 h_{0, n} - s_0 h_{1, n} \\\\ s_0 h_{0, n} + c_0 h_{1, n} \\\\ \\vdots \\\\ h_{n, n} \\end{array}\\right] = \\left[ \\begin{array}{c} A \\\\ B \\\\ C \\\\ \\vdots \\end{array}\\right] \\longmapsto \\left[ \\begin{array}{c} A \\\\ c_1 B - s_1 C \\\\ s_1 B + c_1 C \\\\ \\vdots \\end{array}\\right]$$\n", | |
"\n", | |
"This allows us to **store** $G^{(n)}$ with $\\mathcal{O}\\left(n\\right)$ instead of $\\mathcal{O}\\left(n^2\\right)$ entries and to **apply** $G^{(n)}$ to a **vector** in $\\mathcal{O}\\left(n\\right)$ instead of $\\mathcal{O}\\left(n^2\\right)$ operations." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Incremental Updates: $g_n$\n", | |
"\n", | |
"By the above definition $e_0 \\in \\mathbf{R}^{n + 1}$\n", | |
"$$\\beta G^{(n)} e_0 = \\left[ \\begin{array}{c} g_n \\\\ \\hline \\gamma_n \\end{array}\\right], \\qquad\n", | |
"\\beta G^{(n + 1)} \\left[ \\begin{array}{c} e_0 \\\\ \\hline 0 \\end{array}\\right] = \\left[ \\begin{array}{c} g_{n + 1} \\\\ \\hline \\gamma_{n + 1} \\end{array}\\right].$$\n", | |
"for $g_n \\in \\mathbf{R}^n, \\gamma_n \\in \\mathbf{R}$.\n", | |
"\n", | |
"By the above definition of $G^{(n + 1)}$\n", | |
"$$\\begin{align*}\n", | |
"\\left[ \\begin{array}{c} g_{n + 1} \\\\ \\hline \\gamma_{n + 1} \\end{array}\\right] &= \\beta \\left[ \\begin{array}{c | c c} I_{n \\times n} & \\mathbf{0}_{n \\times 1} & \\mathbf{0}_{n \\times 1} \\\\ \\hline \\mathbf{0}_{1 \\times n} & \\cos \\theta_n & - \\sin \\theta_n \\\\ \\mathbf{0}_{1 \\times n} & \\sin \\theta_n & \\cos \\theta_n \\end{array}\\right] \\left[ \\begin{array}{c | c} G^{(n)} & \\mathbf{0}_{(n + 1) \\times 1} \\\\ \\hline \\mathbf{0}_{1 \\times (n + 1)} & 1 \\end{array}\\right] \\left[ \\begin{array}{c} e_0 \\\\ \\hline 0 \\end{array}\\right] \\\\\n", | |
"&= \\left[ \\begin{array}{c | c c} I_{n \\times n} & \\mathbf{0}_{n \\times 1} & \\mathbf{0}_{n \\times 1} \\\\ \\hline \\mathbf{0}_{1 \\times n} & \\cos \\theta_n & - \\sin \\theta_n \\\\ \\mathbf{0}_{1 \\times n} & \\sin \\theta_n & \\cos \\theta_n \\end{array}\\right] \\left[ \\begin{array}{c} g_n \\\\ \\hline \\gamma_n \\\\ \\hline 0 \\end{array}\\right] \\\\\n", | |
"&= \\left[ \\begin{array}{c} g_n \\\\ \\hline \\cos \\theta_n \\, \\gamma_n \\\\ \\hline \\sin \\theta_n \\, \\gamma_n \\end{array}\\right]\n", | |
"\\end{align*}$$\n", | |
"\n", | |
"So we see\n", | |
"$$g_{n + 1} = \\left[ \\begin{array}{c} g_n \\\\ \\hline \\cos \\theta_n \\, \\gamma_n \\end{array}\\right], \\qquad \\gamma_{n + 1} = \\sin \\theta_n \\, \\gamma_n.$$" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Huge Win\n", | |
"\n", | |
"From the above, we see that with linear work we can update our residual error, never having to actually compute $y_n$ or $x_n$.\n", | |
"\n", | |
"Once we have\n", | |
"$$\\left|\\gamma_n\\right| \\leq \\varepsilon_{\\text{threshold}}$$\n", | |
"we can then compute\n", | |
"$$y_n = R_n^{-1} g_n$$\n", | |
"\n", | |
"Recall that we used\n", | |
"$$b - A x_k = r_0 - A z_k = r_0 - A Q_n y = r_0 - Q_{n + 1} H_n y$$\n", | |
"so we get\n", | |
"$$x_n = x_0 + z_n = x_0 + Q_n R_n^{-1} g_n.$$" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Bad Idea; Incremental Updates: $y_n$\n", | |
"\n", | |
"By the above observation, we don't need to compute $y_n$ at every step. We'll see here that doing so would require a **new triangular solve**, which requires $\\mathcal{O}\\left(n^2\\right)$ work.\n", | |
"\n", | |
"----\n", | |
"\n", | |
"Now that we see $g_{n + 1}$ contains $g_n$ and $R_{n + 1}$ contains $R_n$, it appears that\n", | |
"$$y_n = R_n^{-1} g_n, \\qquad y_{n + 1} = R_{n + 1}^{-1} g_{n + 1}$$\n", | |
"will be closely related.\n", | |
"\n", | |
"To solve the new system, we write\n", | |
"$$\\left[ \\begin{array}{c | c} R_n & \\widehat{h}^{(n + 1)} \\\\ \\hline \\mathbf{0}_{1 \\times n} & \\sqrt{\\nu_{n + 1}^2 + h_{n + 1, n}^2} \\end{array}\\right] \\left[ \\begin{array}{c} \\widehat{y_n} \\\\ \\hline \\alpha_n \\end{array}\\right] = \\left[ \\begin{array}{c} g_n \\\\ \\hline \\cos \\theta_n \\, \\gamma_n \\end{array}\\right]$$\n", | |
"for $\\alpha_n \\in \\mathbf{R}, \\widehat{y_n} \\in \\mathbf{R}^n.$\n", | |
"\n", | |
"This means that\n", | |
"$$\\alpha_n = \\frac{\\cos \\theta_n \\, \\gamma_n}{\\sqrt{\\nu_{n + 1}^2 + h_{n + 1, n}^2}}$$\n", | |
"and\n", | |
"$$R_n \\, \\widehat{y_n} = g_n - \\alpha_n \\widehat{h}^{(n + 1)} \\Longrightarrow R_n \\left(\\widehat{y_n} - y_n\\right) = -\\alpha_n \\widehat{h}^{(n + 1)}.$$\n", | |
"\n", | |
"As mentioned above, computing the update $\\boxed{\\widehat{y_n} - y_n}$ requires a triangular solve." | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.7.0" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 1 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment