dhermes/gmres_implementation_notes.ipynb

## gmres_implementation_notes.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let $A \\in \\mathbf{R}^{M \\times M}$ be an arbitrary matrix and $b \\in \\mathbf{R}^M$ an arbitrary vector. We want to find a solution to\n",
    "$$Ax = b$$\n",
    "for $x \\in \\mathbf{R}^M$.\n",
    "\n",
    "Directly this means $x = A^{-1} b$.\n",
    "\n",
    "Since $A$ must have a minimal polynomial $m_A(t)$ such that\n",
    "$$m_A(A) = \\mathbf{0}_{M \\times M}$$\n",
    "If we could find a polynomial $I_A(t)$ such that\n",
    "$$t I_A(t) \\equiv 1 \\bmod{m_A(t)}$$\n",
    "then\n",
    "$$A^{-1} = I_A(A)$$\n",
    "since\n",
    "$$A I_A(A) = I_M + q(A) m_A(A) = I_M$$\n",
    "for some quotient polynomial $q(t)$.\n",
    "\n",
    "**NOTE**: This is problematic if $m_A(0) = 0$, so we assume $A$ is invertible (it's not too harmful if we hope to uniquely solve $Ax = b$). If not, we can perfom the Euclidean algorithm on $t, m_A(t)$ to find\n",
    "$$I_A(t) \\cdot t + q(t) \\cdot m_A(t) = 1$$\n",
    "since $t$ and $m_A(t)$ must be coprime when $m_A(0) \\neq 0$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "----"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In light of the above, we see that\n",
    "$$A^{-1} b \\in \\mathcal{K}_M\\left(A, b\\right) = \\operatorname{Span}\\left\\{b, Ab, A^2 b, \\ldots, A^{M - 1}b \\right\\}.$$\n",
    "\n",
    "In order to approximate the solution, we start with a guess $x_0$ (typically $x_0 = \\mathbf{0}_M$, but we may have extra information).\n",
    "\n",
    "In this generic case we want\n",
    "$$A(x_0 + z) = b \\Longleftrightarrow A z = b - A x_0 = r_0$$\n",
    "so that $z \\in \\mathcal{K}_M\\left(A, r_0\\right)$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Approximations\n",
    "\n",
    "Then we want to find\n",
    "$$x_k = x_0 + z_k, \\qquad z_k \\in \\mathcal{K}_k\\left(A, r_0\\right)$$\n",
    "such that\n",
    "$$\\left\\lVert b - Ax_k\\right\\rVert_2 = \\min_{z_k \\in \\mathcal{K}_k\\left(A, r_0\\right)} \\left\\lVert r_0 - A z_k\\right\\rVert_2.$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Restarts\n",
    "\n",
    "Note that if we restart after $m$ steps then\n",
    "$$x_0 \\in \\mathcal{K}_m\\left(A, b\\right) \\Longrightarrow r_0 \\in \\mathcal{K}_{m + 1}\\left(A, b\\right) \\Longrightarrow \\mathcal{K}_{\\ell}\\left(A, r_0\\right) \\subseteq \\mathcal{K}_{m + \\ell}\\left(A, b\\right).$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Basic Algorithm\n",
    "\n",
    "We start with\n",
    "$$q_0 = \\frac{r_0}{\\left\\lVert r_0 \\right\\rVert_2}.$$\n",
    "To find $q_1$ with $\\left\\langle q_0, q_1\\right\\rangle_2 = 0$ we write\n",
    "$$A q_0 = h_{00} q_0 + h_{10} q_1$$\n",
    "and solve via\n",
    "$$\\begin{align*}\n",
    "w &= A q_0 \\\\\n",
    "\\Rightarrow h_{00} &= \\left\\langle w, q_0\\right\\rangle_2 \\\\\n",
    "\\Rightarrow w &= w - h_{00} q_0 \\\\\n",
    "\\Rightarrow h_{10} &= \\|w\\|_2 \\\\\n",
    "\\Rightarrow q_1 &= \\frac{w}{h_{10}}\n",
    "\\end{align*}$$\n",
    "\n",
    "After each step we write\n",
    "$$A q_j = h_{0j} q_0 + h_{1j} q_1 + \\cdots + h_{j+1,j} q_{j + 1}$$\n",
    "and recursively knock off each coefficient in $w = A q_j$ as above (utilizing the fact that $\\left\\langle q_j, q_k\\right\\rangle_2 = \\delta_{jk}$). (This process is very similar to modified Gram-Schmidt.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "----"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The method we've construced above is equivalent to a modified QR algorithm\n",
    "$$AQ_n = Q_{n+1} H_n, \\qquad Q_n \\in \\mathbf{R}^{M \\times n}, H_n \\in \\mathbf{R}^{(n + 1) \\times n}$$\n",
    "where\n",
    "$$Q_j = \\left[ \\begin{array}{c c c c} q_0 & q_1 & \\cdots & q_{j - 1}\\end{array}\\right].$$\n",
    "More importantly, by construction we have\n",
    "$$\\mathcal{K}_j\\left(A, r_0\\right) = \\operatorname{Span}\\left\\{q_0, q_1, \\ldots, q_{j-1}\\right\\}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "----"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Due to the spanning property above\n",
    "$$y \\in \\mathbf{R}^n \\Longrightarrow Q_n y \\in \\mathcal{K}_n\\left(A, r_0\\right).$$\n",
    "More importantly, we see that \n",
    "$$\\operatorname{Image}\\left(Q_n\\right) = \\mathcal{K}_n\\left(A, r_0\\right).$$\n",
    "This means that we want to find\n",
    "$$\\min_{y \\in \\mathbf{R}^n} \\left\\lVert r_0 - A Q_n y\\right\\rVert_2 = \\min_{y \\in \\mathbf{R}^n} \\left\\lVert r_0 - Q_{n + 1} H_n y\\right\\rVert_2 = \\min_{y \\in \\mathbf{R}^n} \\left\\lVert Q_{n + 1}^{\\ast} r_0 - H_n y\\right\\rVert_2 = \\min_{y \\in \\mathbf{R}^n} \\left\\lVert \\|r_0\\|_2 \\cdot e_0 - H_n y\\right\\rVert_2$$\n",
    "where $e_0$ is the first standard basis vector (this is since the first column of $Q_{n+1}$ is parallel to $r_0$ and the rest are perpendicular by construction)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Computing Norm-Minimizing $y$\n",
    "\n",
    "In order to find a minimizer of $\\beta e_0 - H_n y$, we apply a Givens rotation for each of the $n$ non-zero elements below the diagonal in $H_n$. For example, to remove the $h_{1,0}$ element (in column $0$) we apply\n",
    "$$G_0 = \\left[ \\begin{array}{c c c c c} \\cos \\theta_0 & - \\sin \\theta_0 & & & \\\\\n",
    "\\sin \\theta_0 & \\cos \\theta_0 & & & \\\\\n",
    "& & 1 & & \\\\\n",
    "& & & \\ddots & \\\\\n",
    "& & & & 1\n",
    "\\end{array}\\right] \\in \\mathbf{R}^{(n + 1) \\times (n + 1)}$$\n",
    "where\n",
    "$$\\cos \\theta_0 = \\frac{h_{0,0}}{\\sqrt{h_{0,0}^2 + h_{1,0}^2}}, \\qquad \\sin \\theta_0 = \\frac{- h_{1,0}}{\\sqrt{h_{0,0}^2 + h_{1,0}^2}}$$\n",
    "\n",
    "Writing each update as $G_0, G_1, \\ldots$ we compose\n",
    "$$G^{(n)} = G_{n - 1} \\cdots G_1 G_0 \\in \\mathbf{R}^{(n + 1) \\times (n + 1)}.$$\n",
    "In words, this means we zero out the subdiagonal element in the first column ($G_0$), then the second column ($G_1$) and so on."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Upper Triangular System\n",
    "\n",
    "After removing the below diagonal elements in the Hessenberg $H_n \\in \\mathbf{R}^{(n + 1) \\times n}$, we have\n",
    "$$G^{(n)} H_n = \\left[ \\begin{array}{c} R_n \\\\ \\hline \\mathbf{0}_{1 \\times n} \\end{array}\\right]$$\n",
    "for $R_n \\in \\mathbf{R}^{n \\times n}$ and upper triangular. Then we want to minimize the error in\n",
    "$$G^{(n)} H_n y - \\beta G^{(n)} e_0$$\n",
    "since $G^{(n)}$ is unitary and the two norm is unitarily invariant.\n",
    "\n",
    "Noting that $e_0 \\in \\mathbf{R}^{n + 1}$, we write\n",
    "$$\\beta G^{(n)} e_0 = \\left[ \\begin{array}{c} g_n \\\\ \\hline \\gamma_n \\end{array}\\right]$$\n",
    "for $g_n \\in \\mathbf{R}^n, \\gamma_n \\in \\mathbf{R}$.\n",
    "\n",
    "This is minimized when $R_n y_n = g_n$ and the minimum residual is $\\gamma_n$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Incremental Updates: $H_n$\n",
    "\n",
    "Going from $n$ to $n + 1$, many terms are preserved and we can re-use many computations. By the process for computing successive values $q_j$ and filling out columns of $H_n$ we see\n",
    "$$H_{n + 1} = \\left[ \\begin{array}{c | c} H_n & h^{(n + 1)} \\\\ \\hline \\mathbf{0}_{1 \\times n} & h_{n + 1, n} \\end{array}\\right] \\in \\mathbf{R}^{(n + 2) \\times (n + 1)}$$\n",
    "First to re-use the existing givens rotations we have\n",
    "$$\\left[ \\begin{array}{c | c} G^{(n)} & \\mathbf{0}_{(n + 1) \\times 1} \\\\ \\hline \\mathbf{0}_{1 \\times (n + 1)} & 1 \\end{array}\\right] H_{n + 1} = \\left[ \\begin{array}{c | c} R_n & \\widehat{h}^{(n + 1)} \\\\ \\hline \\mathbf{0}_{1 \\times n} & \\nu_{n+1} \\\\ \\hline \\mathbf{0}_{1 \\times n} & h_{n + 1, n} \\end{array}\\right]$$\n",
    "where\n",
    "$$G^{(n)} h^{(n + 1)} = \\left[ \\begin{array}{c} \\widehat{h}^{(n + 1)} \\\\ \\hline \\nu_{n+1} \\end{array}\\right], \\qquad \\widehat{h}^{(n + 1)} \\in \\mathbf{R}^{n}, \\quad \\nu_{n + 1} \\in \\mathbf{R}.$$\n",
    "\n",
    "From the above, we see the final Givens rotation ($G_n$) required to zero out the below diagonal part in the final column is given by\n",
    "$$\\cos \\theta_n = \\frac{\\nu_{n + 1}}{\\sqrt{\\nu_{n + 1}^2 + h_{n + 1, n}^2}}, \\qquad \\sin \\theta_n = \\frac{- h_{n + 1, n}}{\\sqrt{\\nu_{n + 1}^2 + h_{n + 1, n}^2}}$$\n",
    "\n",
    "With this\n",
    "$$G^{(n + 1)} = \\left[ \\begin{array}{c | c c} I_{n \\times n} & \\mathbf{0}_{n \\times 1} & \\mathbf{0}_{n \\times 1} \\\\ \\hline \\mathbf{0}_{1 \\times n} & \\cos \\theta_n & - \\sin \\theta_n \\\\ \\mathbf{0}_{1 \\times n} & \\sin \\theta_n & \\cos \\theta_n \\end{array}\\right] \\left[ \\begin{array}{c | c} G^{(n)} & \\mathbf{0}_{(n + 1) \\times 1} \\\\ \\hline \\mathbf{0}_{1 \\times (n + 1)} & 1 \\end{array}\\right]$$\n",
    "\n",
    "Somewhat importantly, **we don't need to perform a matrix multiply**. We only need to update two entries:\n",
    "$$\\nu_{n + 1} \\longmapsto \\sqrt{\\nu_{n + 1}^2 + h_{n + 1, n}^2}, \\qquad h_{n + 1, n} \\longmapsto 0.$$\n",
    "\n",
    "Given all this we have\n",
    "$$R_{n + 1} = \\left[ \\begin{array}{c | c} R_n & \\widehat{h}^{(n + 1)} \\\\ \\hline \\mathbf{0}_{1 \\times n} & \\sqrt{\\nu_{n + 1}^2 + h_{n + 1, n}^2} \\end{array}\\right].$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Incremental Updates: $\\widehat{h}^{(n + 1)}$\n",
    "\n",
    "We have to store each $q_j \\in \\mathbf{R}^M$ in order to compute:\n",
    "$$h^{(n + 1)} = \\left[ \\begin{array}{c} h_{0, n} \\\\ h_{1, n} \\\\ \\vdots \\\\ h_{n, n} \\end{array}\\right]  \\in \\mathbf{R}^{n + 1} \\quad \\text{and} \\quad h_{n + 1, n} \\in \\mathbf{R}$$\n",
    "\n",
    "If we keep around the values\n",
    "$$c_j = \\cos \\theta_j, \\quad s_j = \\sin \\theta_j$$\n",
    "then we can transform\n",
    "$$\\left[ \\begin{array}{c} h_{0, n} \\\\ h_{1, n} \\\\ \\vdots \\\\ h_{n, n} \\end{array}\\right] \\longmapsto \\left[ \\begin{array}{c} c_0 h_{0, n} - s_0 h_{1, n} \\\\ s_0 h_{0, n} + c_0 h_{1, n} \\\\ \\vdots \\\\ h_{n, n} \\end{array}\\right] = \\left[ \\begin{array}{c} A \\\\ B \\\\ C \\\\ \\vdots \\end{array}\\right] \\longmapsto \\left[ \\begin{array}{c} A \\\\ c_1 B - s_1 C \\\\ s_1 B + c_1 C \\\\ \\vdots \\end{array}\\right]$$\n",
    "\n",
    "This allows us to **store** $G^{(n)}$ with $\\mathcal{O}\\left(n\\right)$ instead of $\\mathcal{O}\\left(n^2\\right)$ entries and to **apply** $G^{(n)}$ to a **vector** in $\\mathcal{O}\\left(n\\right)$ instead of $\\mathcal{O}\\left(n^2\\right)$ operations."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Incremental Updates: $g_n$\n",
    "\n",
    "By the above definition $e_0 \\in \\mathbf{R}^{n + 1}$\n",
    "$$\\beta G^{(n)} e_0 = \\left[ \\begin{array}{c} g_n \\\\ \\hline \\gamma_n \\end{array}\\right], \\qquad\n",
    "\\beta G^{(n + 1)} \\left[ \\begin{array}{c} e_0 \\\\ \\hline 0 \\end{array}\\right] = \\left[ \\begin{array}{c} g_{n + 1} \\\\ \\hline \\gamma_{n + 1} \\end{array}\\right].$$\n",
    "for $g_n \\in \\mathbf{R}^n, \\gamma_n \\in \\mathbf{R}$.\n",
    "\n",
    "By the above definition of $G^{(n + 1)}$\n",
    "$$\\begin{align*}\n",
    "\\left[ \\begin{array}{c} g_{n + 1} \\\\ \\hline \\gamma_{n + 1} \\end{array}\\right] &= \\beta \\left[ \\begin{array}{c | c c} I_{n \\times n} & \\mathbf{0}_{n \\times 1} & \\mathbf{0}_{n \\times 1} \\\\ \\hline \\mathbf{0}_{1 \\times n} & \\cos \\theta_n & - \\sin \\theta_n \\\\ \\mathbf{0}_{1 \\times n} & \\sin \\theta_n & \\cos \\theta_n \\end{array}\\right] \\left[ \\begin{array}{c | c} G^{(n)} & \\mathbf{0}_{(n + 1) \\times 1} \\\\ \\hline \\mathbf{0}_{1 \\times (n + 1)} & 1 \\end{array}\\right] \\left[ \\begin{array}{c} e_0 \\\\ \\hline 0 \\end{array}\\right] \\\\\n",
    "&= \\left[ \\begin{array}{c | c c} I_{n \\times n} & \\mathbf{0}_{n \\times 1} & \\mathbf{0}_{n \\times 1} \\\\ \\hline \\mathbf{0}_{1 \\times n} & \\cos \\theta_n & - \\sin \\theta_n \\\\ \\mathbf{0}_{1 \\times n} & \\sin \\theta_n & \\cos \\theta_n \\end{array}\\right] \\left[ \\begin{array}{c} g_n \\\\ \\hline \\gamma_n \\\\ \\hline 0 \\end{array}\\right] \\\\\n",
    "&= \\left[ \\begin{array}{c} g_n \\\\ \\hline \\cos \\theta_n \\, \\gamma_n \\\\ \\hline \\sin \\theta_n \\, \\gamma_n  \\end{array}\\right]\n",
    "\\end{align*}$$\n",
    "\n",
    "So we see\n",
    "$$g_{n + 1} = \\left[ \\begin{array}{c} g_n \\\\ \\hline \\cos \\theta_n \\, \\gamma_n \\end{array}\\right], \\qquad \\gamma_{n + 1} = \\sin \\theta_n \\, \\gamma_n.$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Huge Win\n",
    "\n",
    "From the above, we see that with linear work we can update our residual error, never having to actually compute $y_n$ or $x_n$.\n",
    "\n",
    "Once we have\n",
    "$$\\left|\\gamma_n\\right| \\leq \\varepsilon_{\\text{threshold}}$$\n",
    "we can then compute\n",
    "$$y_n = R_n^{-1} g_n$$\n",
    "\n",
    "Recall that we used\n",
    "$$b - A x_k = r_0 - A z_k = r_0 - A Q_n y = r_0 - Q_{n + 1} H_n y$$\n",
    "so we get\n",
    "$$x_n = x_0 + z_n = x_0 + Q_n R_n^{-1} g_n.$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Bad Idea; Incremental Updates: $y_n$\n",
    "\n",
    "By the above observation, we don't need to compute $y_n$ at every step. We'll see here that doing so would require a **new triangular solve**, which requires $\\mathcal{O}\\left(n^2\\right)$ work.\n",
    "\n",
    "----\n",
    "\n",
    "Now that we see $g_{n + 1}$ contains $g_n$ and $R_{n + 1}$ contains $R_n$, it appears that\n",
    "$$y_n = R_n^{-1} g_n, \\qquad y_{n + 1} = R_{n + 1}^{-1} g_{n + 1}$$\n",
    "will be closely related.\n",
    "\n",
    "To solve the new system, we write\n",
    "$$\\left[ \\begin{array}{c | c} R_n & \\widehat{h}^{(n + 1)} \\\\ \\hline \\mathbf{0}_{1 \\times n} & \\sqrt{\\nu_{n + 1}^2 + h_{n + 1, n}^2} \\end{array}\\right] \\left[ \\begin{array}{c} \\widehat{y_n} \\\\ \\hline \\alpha_n \\end{array}\\right] = \\left[ \\begin{array}{c} g_n \\\\ \\hline \\cos \\theta_n \\, \\gamma_n \\end{array}\\right]$$\n",
    "for $\\alpha_n \\in \\mathbf{R}, \\widehat{y_n} \\in \\mathbf{R}^n.$\n",
    "\n",
    "This means that\n",
    "$$\\alpha_n = \\frac{\\cos \\theta_n \\, \\gamma_n}{\\sqrt{\\nu_{n + 1}^2 + h_{n + 1, n}^2}}$$\n",
    "and\n",
    "$$R_n \\, \\widehat{y_n} = g_n - \\alpha_n \\widehat{h}^{(n + 1)} \\Longrightarrow R_n \\left(\\widehat{y_n} - y_n\\right) = -\\alpha_n \\widehat{h}^{(n + 1)}.$$\n",
    "\n",
    "As mentioned above, computing the update $\\boxed{\\widehat{y_n} - y_n}$ requires a triangular solve."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Let $A \\in \\mathbf{R}^{M \\times M}$ be an arbitrary matrix and $b \\in \\mathbf{R}^M$ an arbitrary vector. We want to find a solution to\n",
	"$$Ax = b$$\n",
	"for $x \\in \\mathbf{R}^M$.\n",
	"\n",
	"Directly this means $x = A^{-1} b$.\n",
	"\n",
	"Since $A$ must have a minimal polynomial $m_A(t)$ such that\n",
	"$$m_A(A) = \\mathbf{0}_{M \\times M}$$\n",
	"If we could find a polynomial $I_A(t)$ such that\n",
	"$$t I_A(t) \\equiv 1 \\bmod{m_A(t)}$$\n",
	"then\n",
	"$$A^{-1} = I_A(A)$$\n",
	"since\n",
	"$$A I_A(A) = I_M + q(A) m_A(A) = I_M$$\n",
	"for some quotient polynomial $q(t)$.\n",
	"\n",
	"NOTE: This is problematic if $m_A(0) = 0$, so we assume $A$ is invertible (it's not too harmful if we hope to uniquely solve $Ax = b$). If not, we can perfom the Euclidean algorithm on $t, m_A(t)$ to find\n",
	"$$I_A(t) \\cdot t + q(t) \\cdot m_A(t) = 1$$\n",
	"since $t$ and $m_A(t)$ must be coprime when $m_A(0) \\neq 0$."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"----"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"In light of the above, we see that\n",
	"$$A^{-1} b \\in \\mathcal{K}_M\\left(A, b\\right) = \\operatorname{Span}\\left\\{b, Ab, A^2 b, \\ldots, A^{M - 1}b \\right\\}.$$\n",
	"\n",
	"In order to approximate the solution, we start with a guess $x_0$ (typically $x_0 = \\mathbf{0}_M$, but we may have extra information).\n",
	"\n",
	"In this generic case we want\n",
	"$$A(x_0 + z) = b \\Longleftrightarrow A z = b - A x_0 = r_0$$\n",
	"so that $z \\in \\mathcal{K}_M\\left(A, r_0\\right)$."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Approximations\n",
	"\n",
	"Then we want to find\n",
	"$$x_k = x_0 + z_k, \\qquad z_k \\in \\mathcal{K}_k\\left(A, r_0\\right)$$\n",
	"such that\n",
	"$$\\left\\lVert b - Ax_k\\right\\rVert_2 = \\min_{z_k \\in \\mathcal{K}_k\\left(A, r_0\\right)} \\left\\lVert r_0 - A z_k\\right\\rVert_2.$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Restarts\n",
	"\n",
	"Note that if we restart after $m$ steps then\n",
	"$$x_0 \\in \\mathcal{K}_m\\left(A, b\\right) \\Longrightarrow r_0 \\in \\mathcal{K}_{m + 1}\\left(A, b\\right) \\Longrightarrow \\mathcal{K}_{\\ell}\\left(A, r_0\\right) \\subseteq \\mathcal{K}_{m + \\ell}\\left(A, b\\right).$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Basic Algorithm\n",
	"\n",
	"We start with\n",
	"$$q_0 = \\frac{r_0}{\\left\\lVert r_0 \\right\\rVert_2}.$$\n",
	"To find $q_1$ with $\\left\\langle q_0, q_1\\right\\rangle_2 = 0$ we write\n",
	"$$A q_0 = h_{00} q_0 + h_{10} q_1$$\n",
	"and solve via\n",
	"$$\\begin{align*}\n",
	"w &= A q_0 \\\\\n",
	"\\Rightarrow h_{00} &= \\left\\langle w, q_0\\right\\rangle_2 \\\\\n",
	"\\Rightarrow w &= w - h_{00} q_0 \\\\\n",
	"\\Rightarrow h_{10} &= \\\|w\\\|_2 \\\\\n",
	"\\Rightarrow q_1 &= \\frac{w}{h_{10}}\n",
	"\\end{align*}$$\n",
	"\n",
	"After each step we write\n",
	"$$A q_j = h_{0j} q_0 + h_{1j} q_1 + \\cdots + h_{j+1,j} q_{j + 1}$$\n",
	"and recursively knock off each coefficient in $w = A q_j$ as above (utilizing the fact that $\\left\\langle q_j, q_k\\right\\rangle_2 = \\delta_{jk}$). (This process is very similar to modified Gram-Schmidt.)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"----"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"The method we've construced above is equivalent to a modified QR algorithm\n",
	"$$AQ_n = Q_{n+1} H_n, \\qquad Q_n \\in \\mathbf{R}^{M \\times n}, H_n \\in \\mathbf{R}^{(n + 1) \\times n}$$\n",
	"where\n",
	"$$Q_j = \\left[ \\begin{array}{c c c c} q_0 & q_1 & \\cdots & q_{j - 1}\\end{array}\\right].$$\n",
	"More importantly, by construction we have\n",
	"$$\\mathcal{K}_j\\left(A, r_0\\right) = \\operatorname{Span}\\left\\{q_0, q_1, \\ldots, q_{j-1}\\right\\}$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"----"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Due to the spanning property above\n",
	"$$y \\in \\mathbf{R}^n \\Longrightarrow Q_n y \\in \\mathcal{K}_n\\left(A, r_0\\right).$$\n",
	"More importantly, we see that \n",
	"$$\\operatorname{Image}\\left(Q_n\\right) = \\mathcal{K}_n\\left(A, r_0\\right).$$\n",
	"This means that we want to find\n",
	"$$\\min_{y \\in \\mathbf{R}^n} \\left\\lVert r_0 - A Q_n y\\right\\rVert_2 = \\min_{y \\in \\mathbf{R}^n} \\left\\lVert r_0 - Q_{n + 1} H_n y\\right\\rVert_2 = \\min_{y \\in \\mathbf{R}^n} \\left\\lVert Q_{n + 1}^{\\ast} r_0 - H_n y\\right\\rVert_2 = \\min_{y \\in \\mathbf{R}^n} \\left\\lVert \\\|r_0\\\|_2 \\cdot e_0 - H_n y\\right\\rVert_2$$\n",
	"where $e_0$ is the first standard basis vector (this is since the first column of $Q_{n+1}$ is parallel to $r_0$ and the rest are perpendicular by construction)."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Computing Norm-Minimizing $y$\n",
	"\n",
	"In order to find a minimizer of $\\beta e_0 - H_n y$, we apply a Givens rotation for each of the $n$ non-zero elements below the diagonal in $H_n$. For example, to remove the $h_{1,0}$ element (in column $0$) we apply\n",
	"$$G_0 = \\left[ \\begin{array}{c c c c c} \\cos \\theta_0 & - \\sin \\theta_0 & & & \\\\\n",
	"\\sin \\theta_0 & \\cos \\theta_0 & & & \\\\\n",
	"& & 1 & & \\\\\n",
	"& & & \\ddots & \\\\\n",
	"& & & & 1\n",
	"\\end{array}\\right] \\in \\mathbf{R}^{(n + 1) \\times (n + 1)}$$\n",
	"where\n",
	"$$\\cos \\theta_0 = \\frac{h_{0,0}}{\\sqrt{h_{0,0}^2 + h_{1,0}^2}}, \\qquad \\sin \\theta_0 = \\frac{- h_{1,0}}{\\sqrt{h_{0,0}^2 + h_{1,0}^2}}$$\n",
	"\n",
	"Writing each update as $G_0, G_1, \\ldots$ we compose\n",
	"$$G^{(n)} = G_{n - 1} \\cdots G_1 G_0 \\in \\mathbf{R}^{(n + 1) \\times (n + 1)}.$$\n",
	"In words, this means we zero out the subdiagonal element in the first column ($G_0$), then the second column ($G_1$) and so on."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Upper Triangular System\n",
	"\n",
	"After removing the below diagonal elements in the Hessenberg $H_n \\in \\mathbf{R}^{(n + 1) \\times n}$, we have\n",
	"$$G^{(n)} H_n = \\left[ \\begin{array}{c} R_n \\\\ \\hline \\mathbf{0}_{1 \\times n} \\end{array}\\right]$$\n",
	"for $R_n \\in \\mathbf{R}^{n \\times n}$ and upper triangular. Then we want to minimize the error in\n",
	"$$G^{(n)} H_n y - \\beta G^{(n)} e_0$$\n",
	"since $G^{(n)}$ is unitary and the two norm is unitarily invariant.\n",
	"\n",
	"Noting that $e_0 \\in \\mathbf{R}^{n + 1}$, we write\n",
	"$$\\beta G^{(n)} e_0 = \\left[ \\begin{array}{c} g_n \\\\ \\hline \\gamma_n \\end{array}\\right]$$\n",
	"for $g_n \\in \\mathbf{R}^n, \\gamma_n \\in \\mathbf{R}$.\n",
	"\n",
	"This is minimized when $R_n y_n = g_n$ and the minimum residual is $\\gamma_n$."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Incremental Updates: $H_n$\n",
	"\n",
	"Going from $n$ to $n + 1$, many terms are preserved and we can re-use many computations. By the process for computing successive values $q_j$ and filling out columns of $H_n$ we see\n",
	"$$H_{n + 1} = \\left[ \\begin{array}{c \| c} H_n & h^{(n + 1)} \\\\ \\hline \\mathbf{0}_{1 \\times n} & h_{n + 1, n} \\end{array}\\right] \\in \\mathbf{R}^{(n + 2) \\times (n + 1)}$$\n",
	"First to re-use the existing givens rotations we have\n",
	"$$\\left[ \\begin{array}{c \| c} G^{(n)} & \\mathbf{0}_{(n + 1) \\times 1} \\\\ \\hline \\mathbf{0}_{1 \\times (n + 1)} & 1 \\end{array}\\right] H_{n + 1} = \\left[ \\begin{array}{c \| c} R_n & \\widehat{h}^{(n + 1)} \\\\ \\hline \\mathbf{0}_{1 \\times n} & \\nu_{n+1} \\\\ \\hline \\mathbf{0}_{1 \\times n} & h_{n + 1, n} \\end{array}\\right]$$\n",
	"where\n",
	"$$G^{(n)} h^{(n + 1)} = \\left[ \\begin{array}{c} \\widehat{h}^{(n + 1)} \\\\ \\hline \\nu_{n+1} \\end{array}\\right], \\qquad \\widehat{h}^{(n + 1)} \\in \\mathbf{R}^{n}, \\quad \\nu_{n + 1} \\in \\mathbf{R}.$$\n",
	"\n",
	"From the above, we see the final Givens rotation ($G_n$) required to zero out the below diagonal part in the final column is given by\n",
	"$$\\cos \\theta_n = \\frac{\\nu_{n + 1}}{\\sqrt{\\nu_{n + 1}^2 + h_{n + 1, n}^2}}, \\qquad \\sin \\theta_n = \\frac{- h_{n + 1, n}}{\\sqrt{\\nu_{n + 1}^2 + h_{n + 1, n}^2}}$$\n",
	"\n",
	"With this\n",
	"$$G^{(n + 1)} = \\left[ \\begin{array}{c \| c c} I_{n \\times n} & \\mathbf{0}_{n \\times 1} & \\mathbf{0}_{n \\times 1} \\\\ \\hline \\mathbf{0}_{1 \\times n} & \\cos \\theta_n & - \\sin \\theta_n \\\\ \\mathbf{0}_{1 \\times n} & \\sin \\theta_n & \\cos \\theta_n \\end{array}\\right] \\left[ \\begin{array}{c \| c} G^{(n)} & \\mathbf{0}_{(n + 1) \\times 1} \\\\ \\hline \\mathbf{0}_{1 \\times (n + 1)} & 1 \\end{array}\\right]$$\n",
	"\n",
	"Somewhat importantly, we don't need to perform a matrix multiply. We only need to update two entries:\n",
	"$$\\nu_{n + 1} \\longmapsto \\sqrt{\\nu_{n + 1}^2 + h_{n + 1, n}^2}, \\qquad h_{n + 1, n} \\longmapsto 0.$$\n",
	"\n",
	"Given all this we have\n",
	"$$R_{n + 1} = \\left[ \\begin{array}{c \| c} R_n & \\widehat{h}^{(n + 1)} \\\\ \\hline \\mathbf{0}_{1 \\times n} & \\sqrt{\\nu_{n + 1}^2 + h_{n + 1, n}^2} \\end{array}\\right].$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Incremental Updates: $\\widehat{h}^{(n + 1)}$\n",
	"\n",
	"We have to store each $q_j \\in \\mathbf{R}^M$ in order to compute:\n",
	"$$h^{(n + 1)} = \\left[ \\begin{array}{c} h_{0, n} \\\\ h_{1, n} \\\\ \\vdots \\\\ h_{n, n} \\end{array}\\right] \\in \\mathbf{R}^{n + 1} \\quad \\text{and} \\quad h_{n + 1, n} \\in \\mathbf{R}$$\n",
	"\n",
	"If we keep around the values\n",
	"$$c_j = \\cos \\theta_j, \\quad s_j = \\sin \\theta_j$$\n",
	"then we can transform\n",
	"$$\\left[ \\begin{array}{c} h_{0, n} \\\\ h_{1, n} \\\\ \\vdots \\\\ h_{n, n} \\end{array}\\right] \\longmapsto \\left[ \\begin{array}{c} c_0 h_{0, n} - s_0 h_{1, n} \\\\ s_0 h_{0, n} + c_0 h_{1, n} \\\\ \\vdots \\\\ h_{n, n} \\end{array}\\right] = \\left[ \\begin{array}{c} A \\\\ B \\\\ C \\\\ \\vdots \\end{array}\\right] \\longmapsto \\left[ \\begin{array}{c} A \\\\ c_1 B - s_1 C \\\\ s_1 B + c_1 C \\\\ \\vdots \\end{array}\\right]$$\n",
	"\n",
	"This allows us to store $G^{(n)}$ with $\\mathcal{O}\\left(n\\right)$ instead of $\\mathcal{O}\\left(n^2\\right)$ entries and to apply $G^{(n)}$ to a vector in $\\mathcal{O}\\left(n\\right)$ instead of $\\mathcal{O}\\left(n^2\\right)$ operations."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Incremental Updates: $g_n$\n",
	"\n",
	"By the above definition $e_0 \\in \\mathbf{R}^{n + 1}$\n",
	"$$\\beta G^{(n)} e_0 = \\left[ \\begin{array}{c} g_n \\\\ \\hline \\gamma_n \\end{array}\\right], \\qquad\n",
	"\\beta G^{(n + 1)} \\left[ \\begin{array}{c} e_0 \\\\ \\hline 0 \\end{array}\\right] = \\left[ \\begin{array}{c} g_{n + 1} \\\\ \\hline \\gamma_{n + 1} \\end{array}\\right].$$\n",
	"for $g_n \\in \\mathbf{R}^n, \\gamma_n \\in \\mathbf{R}$.\n",
	"\n",
	"By the above definition of $G^{(n + 1)}$\n",
	"$$\\begin{align*}\n",
	"\\left[ \\begin{array}{c} g_{n + 1} \\\\ \\hline \\gamma_{n + 1} \\end{array}\\right] &= \\beta \\left[ \\begin{array}{c \| c c} I_{n \\times n} & \\mathbf{0}_{n \\times 1} & \\mathbf{0}_{n \\times 1} \\\\ \\hline \\mathbf{0}_{1 \\times n} & \\cos \\theta_n & - \\sin \\theta_n \\\\ \\mathbf{0}_{1 \\times n} & \\sin \\theta_n & \\cos \\theta_n \\end{array}\\right] \\left[ \\begin{array}{c \| c} G^{(n)} & \\mathbf{0}_{(n + 1) \\times 1} \\\\ \\hline \\mathbf{0}_{1 \\times (n + 1)} & 1 \\end{array}\\right] \\left[ \\begin{array}{c} e_0 \\\\ \\hline 0 \\end{array}\\right] \\\\\n",
	"&= \\left[ \\begin{array}{c \| c c} I_{n \\times n} & \\mathbf{0}_{n \\times 1} & \\mathbf{0}_{n \\times 1} \\\\ \\hline \\mathbf{0}_{1 \\times n} & \\cos \\theta_n & - \\sin \\theta_n \\\\ \\mathbf{0}_{1 \\times n} & \\sin \\theta_n & \\cos \\theta_n \\end{array}\\right] \\left[ \\begin{array}{c} g_n \\\\ \\hline \\gamma_n \\\\ \\hline 0 \\end{array}\\right] \\\\\n",
	"&= \\left[ \\begin{array}{c} g_n \\\\ \\hline \\cos \\theta_n \\, \\gamma_n \\\\ \\hline \\sin \\theta_n \\, \\gamma_n \\end{array}\\right]\n",
	"\\end{align*}$$\n",
	"\n",
	"So we see\n",
	"$$g_{n + 1} = \\left[ \\begin{array}{c} g_n \\\\ \\hline \\cos \\theta_n \\, \\gamma_n \\end{array}\\right], \\qquad \\gamma_{n + 1} = \\sin \\theta_n \\, \\gamma_n.$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# Huge Win\n",
	"\n",
	"From the above, we see that with linear work we can update our residual error, never having to actually compute $y_n$ or $x_n$.\n",
	"\n",
	"Once we have\n",
	"$$\\left\|\\gamma_n\\right\| \\leq \\varepsilon_{\\text{threshold}}$$\n",
	"we can then compute\n",
	"$$y_n = R_n^{-1} g_n$$\n",
	"\n",
	"Recall that we used\n",
	"$$b - A x_k = r_0 - A z_k = r_0 - A Q_n y = r_0 - Q_{n + 1} H_n y$$\n",
	"so we get\n",
	"$$x_n = x_0 + z_n = x_0 + Q_n R_n^{-1} g_n.$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Bad Idea; Incremental Updates: $y_n$\n",
	"\n",
	"By the above observation, we don't need to compute $y_n$ at every step. We'll see here that doing so would require a new triangular solve, which requires $\\mathcal{O}\\left(n^2\\right)$ work.\n",
	"\n",
	"----\n",
	"\n",
	"Now that we see $g_{n + 1}$ contains $g_n$ and $R_{n + 1}$ contains $R_n$, it appears that\n",
	"$$y_n = R_n^{-1} g_n, \\qquad y_{n + 1} = R_{n + 1}^{-1} g_{n + 1}$$\n",
	"will be closely related.\n",
	"\n",
	"To solve the new system, we write\n",
	"$$\\left[ \\begin{array}{c \| c} R_n & \\widehat{h}^{(n + 1)} \\\\ \\hline \\mathbf{0}_{1 \\times n} & \\sqrt{\\nu_{n + 1}^2 + h_{n + 1, n}^2} \\end{array}\\right] \\left[ \\begin{array}{c} \\widehat{y_n} \\\\ \\hline \\alpha_n \\end{array}\\right] = \\left[ \\begin{array}{c} g_n \\\\ \\hline \\cos \\theta_n \\, \\gamma_n \\end{array}\\right]$$\n",
	"for $\\alpha_n \\in \\mathbf{R}, \\widehat{y_n} \\in \\mathbf{R}^n.$\n",
	"\n",
	"This means that\n",
	"$$\\alpha_n = \\frac{\\cos \\theta_n \\, \\gamma_n}{\\sqrt{\\nu_{n + 1}^2 + h_{n + 1, n}^2}}$$\n",
	"and\n",
	"$$R_n \\, \\widehat{y_n} = g_n - \\alpha_n \\widehat{h}^{(n + 1)} \\Longrightarrow R_n \\left(\\widehat{y_n} - y_n\\right) = -\\alpha_n \\widehat{h}^{(n + 1)}.$$\n",
	"\n",
	"As mentioned above, computing the update $\\boxed{\\widehat{y_n} - y_n}$ requires a triangular solve."
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.7.0"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 1
	}