Last active
October 27, 2021 14:52
-
-
Save hamelin/b510387996f0d66ec7c8660bc854bb98 to your computer and use it in GitHub Desktop.
A discussion of optimization with linear equality constrains applied to minimal transport problems
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"id": "c2b9aa89-4fc2-4205-8df4-041f129d7eea", | |
"metadata": {}, | |
"source": [ | |
"# Linearly constrained optimization\n", | |
"\n", | |
"Consider the problem\n", | |
"\n", | |
"$$\n", | |
"\\min_{T \\in \\Pi_{p,q}} \\left<D, T\\right>_F\n", | |
"$$\n", | |
"\n", | |
"where $\\left<\\cdot, \\cdot\\right>_F$ is the Frobenius norm, and\n", | |
"\n", | |
"$$\n", | |
"\\Pi_{p,q} = \\left\\{T \\in \\mathbb{R}_{+}^{m,n} \\right|\\left. T{\\mathbb{1}_n} = p\\text{ and }T^\\mathrm{t}\\mathbb{1}_m = q\\right\\}\n", | |
"$$\n", | |
"\n", | |
"This problem encodes many questions related to minimal transport to make two distributions as close to each other as possible. Taking $d \\in \\mathbb{R}^{m\\times n}$ as the vectorization of matrix $D$ above, this problem is equivalent to\n", | |
"\n", | |
"$$\n", | |
"\\begin{array}{ll}\n", | |
" \\min_{t \\in \\mathbb{R}^{m \\times n}} & f(t) \\stackrel{\\text{def}}{=} \\left<d, t\\right> \\\\\n", | |
" \\text{subject to} & \\sum_{j=1}^n t_{ij} = p \\text{ for }1 \\leq i \\leq m\\\\\n", | |
" & \\sum_{i=1}^m t_{ij} = q \\text{ for }1 \\leq j \\leq n\\\\\n", | |
" & t \\geq 0,\n", | |
"\\end{array}\n", | |
"$$\n", | |
"\n", | |
"where $\\left<\\cdot,\\cdot\\right>$ denotes the inner vector product. Constraints involved are linear equality constraints and bound constraints. The former can be handled by a change of variables. First, we assess that summations to $p$ can be rewritten as the matrix-vector product\n", | |
"\n", | |
"$$\n", | |
"\\left[\n", | |
"\\begin{array}{cccccccccccccc}\n", | |
" 1 & 0 & 0 & \\ldots & 0 & 1 & 0 & 0 & \\ldots & 0 & \\ldots & 1 & 0 & 0 & \\ldots & 0 \\\\\n", | |
" 0 & 1 & 0 & \\ldots & 0 & 0 & 1 & 0 & \\ldots & 0 & \\ldots & 0 & 1 & 0 & \\ldots & 0 \\\\\n", | |
" 0 & 0 & 1 & \\ldots & 0 & 0 & 0 & 1 & \\ldots & 0 & \\ldots & 0 & 0 & 1 & \\ldots & 0 \\\\\n", | |
" \\vdots & \\vdots & \\vdots & \\ddots & \\vdots & \\vdots & \\vdots & \\vdots & \\ddots & \\vdots & \\ddots & \\vdots & \\vdots & \\vdots & \\ddots & \\vdots \\\\\n", | |
" 0 & 0 & 0 & \\ldots & 1 & 0 & 0 & 0 & \\ldots & 1 & \\ldots & 0 & 0 & 0 & \\ldots & 1\n", | |
"\\end{array}\n", | |
"\\right]t \\stackrel{\\text{def}}{=} \\mathbf{\\mathrm{G_1}}t = p\n", | |
"$$\n", | |
"\n", | |
"Similarly, summations to $q$ can be rewritten as\n", | |
"\n", | |
"$$\n", | |
"\\left[\n", | |
"\\begin{array}{}\n", | |
" 1 & 1 & \\ldots & 1 & 0 & 0 & \\ldots & 0 & \\ldots & 0 & 0 & \\ldots & 0 \\\\\n", | |
" 0 & 0 & \\ldots & 0 & 1 & 1 & \\ldots & 1 & \\ldots & 0 & 0 & \\ldots & 0 \\\\\n", | |
" \\vdots & \\vdots & \\ddots & \\vdots & \\vdots & \\vdots & \\ddots & \\vdots & \\ddots & \\vdots & \\vdots & \\ddots & \\vdots \\\\\n", | |
" 0 & 0 & \\ldots & 0 & 0 & 0 & \\ldots & 0 & \\ldots & 1 & 1 & \\ldots & 1\n", | |
"\\end{array}\n", | |
"\\right]t \\stackrel{\\text{def}}{=} \\mathbf{\\mathrm{G_2}}t = q\n", | |
"$$\n", | |
"\n", | |
"This enables the reformulation of the optimization problem as\n", | |
"\n", | |
"$$\n", | |
"\\begin{array}{ll}\n", | |
" \\min_{t \\in \\mathbb{R}^{m \\times n}} & f(t) \\\\\n", | |
" \\text{subject to} & \\mathbf{\\mathrm{G}}t = h \\\\\n", | |
" & t \\geq 0\n", | |
"\\end{array}\n", | |
"$$\n", | |
"\n", | |
"where\n", | |
"\n", | |
"$$\\mathbf{\\mathrm{G}} = \\left[\\begin{array}{c} \\mathbf{\\mathrm{G_1}} \\\\ \\mathbf{\\mathrm{G_2}} \\end{array}\\right]\\text{ and } h = \\left[\\begin{array}{c} p \\\\ q \\end{array}\\right].$$\n", | |
"\n", | |
"The rank of matrix $\\mathbf{\\mathrm{G}}$ is less than full: in addition to having merely $m+n$ rows to its $mn$ columns, the sum of the rows of its component parts $\\mathbf{\\mathrm{A}}$ and $\\mathbf{\\mathrm{B}}$ is respectively the same. Thus, we can find a linear transformation $\\mathbf{\\mathrm{T}}$ such that $\\mathbf{\\mathrm{G}}' = \\mathbf{\\mathrm{TG}}$ is full-rank, and such that $\\mathbf{\\mathrm{G'}}t = \\mathbf{\\mathrm{T}}h \\stackrel{\\mathrm{def}}{=} h' \\Leftrightarrow \\mathbf{\\mathrm{G}}t = h$.\n", | |
"\n", | |
"Second, the first-order optimality conditions for a solution $t^*$ to this problem are\n", | |
"\n", | |
"$$\n", | |
"\\nabla f(t^*) = 0 \\\\\n", | |
"\\mathbf{\\mathrm{G'}} t^* = h' \\\\\n", | |
"t^* \\geq 0.\n", | |
"$$\n", | |
"\n", | |
"A *feasible point* algorithm to solve this problem presumes the generation of a sequence of iterates $t_1, t_2\\ldots$ converging to a stationary point $t^*$ such that every iterate satisfies $\\mathbf{\\mathrm{G}}t_i = h$. Given one solution $\\hat{t}$ to this system, our iterates $t_i$ must belong to set $\\ker(\\mathbf{\\mathrm{G'}}) + \\{\\hat{t}\\}$. Assuming one forms a matrix $\\mathbf{\\mathrm{C}}$ whose columns for the basis of $\\ker(\\mathbf{\\mathrm{G'}})$, we can effect the change of variables\n", | |
"\n", | |
"$$\n", | |
"t = \\mathbf{\\mathrm{C}}v + \\hat{t},\n", | |
"$$\n", | |
"\n", | |
"and rewrite the optimization problem as\n", | |
"\n", | |
"$$\n", | |
"\\begin{array}{ll}\n", | |
" \\min_{v \\in \\mathbb{R}^{k}} & f(\\mathbf{\\mathrm{C}}v + \\hat{t}) \\\\\n", | |
" \\text{subject to} & v \\geq 0,\n", | |
"\\end{array}\n", | |
"$$\n", | |
"\n", | |
"where $k = \\dim(\\ker(\\mathbf{\\mathrm{G'}}))$. Optimality conditions for this last version of the problem are stated as\n", | |
"\n", | |
"$$\n", | |
"\\mathbf{\\mathrm{C}}^\\mathrm{t}\\nabla f(\\mathbf{\\mathrm{C}}v^* + \\hat{t}) = 0 \\\\\n", | |
"v^* \\geq 0.\n", | |
"$$\n", | |
"\n", | |
"<!-- Second, the optimality conditions that a solution $t^*$ to the problem satisfy are\n", | |
"\n", | |
"$$\n", | |
"\\nabla f(t^*) = 0 \\\\\n", | |
"\\mathbf{\\mathrm{G'}}t^* = h' \\\\\n", | |
"t^* \\geq 0.\n", | |
"$$\n", | |
"\n", | |
"The various components of the solution may live either on the boundary of the feasible set, or in its interior. As such, a useful representation of the solution involves the partitioning of its components in three subsets:\n", | |
"\n", | |
"1. The *non-basis components*, which are null; these correspond as a submatrix $\\mathbf{\\mathrm{N}}$ of $\\mathbf{\\mathrm{G'}}$.\n", | |
"1. The *in-basis components*, expressed as a linear combination of the basis of an inversible submatrix $\\mathrm{\\mathbf{B}}$ of $\\mathbf{\\mathrm{G'}}$.\n", | |
"1. The *out-basis components*, expressed as a linear combination of other columns of $\\mathbf{\\mathrm{G'}}$.\n", | |
" -->\n", | |
"<!-- \n", | |
"\n", | |
" -->" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "root (Python) *", | |
"language": "python", | |
"name": "conda-root-py" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.8.11" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 5 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment