nmayorov/GSOC_proposal.md

## GSOC_proposal.md

      
    Raw
  

              GSOC_proposal.md
            
          
    Improve nonlinear least squares minimization functionality in SciPy

Suborganization: scipy

Student information

Name:  Nikolay Mayorov
e-mail: n59_ru@hotmail.com
github: nmayorov
I'll add other officially required info later.
University information

University: Lomonosov Moscow State University
Major: Applied Mechanics / Navigation and Control
Current Year: 2nd year
Expected Graduation date: May of 2015
Degree: Master
Proposal information

Title: Improve nonlinear least squares minimization functionality in SciPy
Abstract:
The project consists of two major parts.

Finish porting Levenberg-Marquardt algorithm from MINPACK (Fortran) to Python / Cython implementation.
Implement a modification of the dogleg method with sparse Jacobian and box constraints support.

Details:

Current scipy functionality for nonlinear least squares minimization is limited to the wrapper of MINPACK routines lmder / lmdif. These functions implement trust-region-formulated Levenberg-Marquardt algorithm with the near-exact trust region subproblem solution at each iterator. This heavy-duty procedure is described in [1].

The reasonable approach towards this code is not to change the algorithmic part at all, since it was developed by really good specialists and well tested during the course of time. On the other hand working with Fortran code from Python is rather inconvenient, luckily the substantial part of porting of lmder \ lmdif  to Python was done here scipy/scipy#90. The remaining challenges are to choose and implement good API and extend the test suite.
I also suggest one possible extension hinted in [2, p. 255]. When the number of observations $m$ is very large and the number of parameters to fit $n$ is moderately large --- constructing the Jacobian might be infeasible. And as the MINPACK implementation uses QR decomposition approach to solve the equivalent least squares problem instead of directly solving normal equation --- it won't be able to handle such a case. On the other hand a user can compute
\begin{align}
& J^T J = \sum_{j = 1}^m (\nabla r_j) (\nabla r_j)^T \
& J^T r = \sum_{j = 1}^m r_j (\nabla r_j)
\end{align}
And the normal equation $(J^T J + \lambda D) p = -J^T r$ can be solved directly, say by Cholesky factorization. Note that this implies modifying only one step in MINPACK algorithm, but poses the new problem of providing the well designed interface.
One simple LM algorithm for large sparse $J$ is described in [3]. The authors suggest to use LSQR algorithm to approximately solve the normal equation and directly control parameter $\lambda$. Theoretically it could be included as the step of MINPACK algorithm, but I suggest perhaps a better approach below.

The dogleg method [2, p. 73] finds an approximate solution to the quadratic trust region subproblem:
$$
\min_{p \in R^n} m(p) = f + g^T p + \frac{1}{2} p^T B p, \text{ s. t. } ||D p|| \leq \Delta
$$

It seeks the solution $p^*$ as the combination of the steepest descent unconstrained minimizer (the Cauchy point)
$$p^U = - \frac{g^T g}{g^T B g} g,$$
and the Newton step
$$
p^B = -B^{-1} g,
$$
in the following manner:
\begin{equation}
p(\tau) = \begin{cases}
\tau p^U, & 0 \leq \tau \leq 1 \
p^U + (\tau - 1) (p^B - p^U), & 1  < \tau \leq 2
\end{cases}
\end{equation}
The form of this trajectory remains dogleg, hence the name of the method. We find $\tau$ which minimizes the model $m(p(\tau))$ subject to the trust-region bound (this at most requires solving the equation for a single variable). The important result is that if $B$ is positive defined then the dogleg path intersects the trust-region boundary at most once. In the nonlinear least squares problem we have:
\begin{align}
g &= J^T r \
B &= J^T J
\end{align}
So $B$ will be positive defined as long as $m \ge n$, and $J$ has the full column rank.
This method has the following advantages:

It's straightforward to implement. In fact its basic version is already in scipy.
At each iteration we need to compute $p^U$ only once, and $p^B$ at most once. Then the trust region radius is adjusted until sufficient reduction of the objective function is fulfilled. That means it could work faster than LM, see [4].
It is well suited for nonlinear least squares, since matrix $B=J^T J$ will be positive defined in practical cases.
We can apply it when $J$ is large and sparse.
It allows rather easy extension to the case of box-constrained nonlinear least squares.

More on point 4.
In order to apply the method we need to compute two vectors: $p^U$ and $p^T$. It's easy to see that we only need to compute several matrix-vector and scalar vector products for $p^U$. The explicit form of $J$ is not even required: in scipy it  could be LinearOperator .
To compute the Gauss-Newton step we need to solve the equation
\begin{equation}
J^T J p^B = -J^T r
\end{equation}
But this is equivalent to solving
\begin{equation}
J p^B = -r
\end{equation}
in the least squares sense. For large sparse matrices this problem could be readily solved by routines lsqr or lsrm presented in scipy.
More on point 5.
I admit that this point requires further research. I found three sources where some kind of a gradient projection methods are proposed.
In [5] the authors suggest to use a rectangular trust region. Then on each step the intersection of the trust region and the constraint box forms the effective rectangular trust region where each point is feasible. Then we use the standard approach described above. If the constraint on some variable is active and the gradient along this variable has the appropriate sign we eliminate such variable from optimization at the current step.
In [6] the author suggests to use the projected gradient and the reduced Hessian, then use the standard dogleg procedure with modified matrices to find the optimal step and finally project it into the feasible region.
In [7]  the authors suggest to initially find the projection of Gauss-Newton step into the feasible region. As for the gradient direction they suggest to find generalized Cauchy point (the first minimizer of quadratic model along the gradient constrained to be in the trust region). Then the optimal step is seek as the feasible convex combination $p(\tau) = p^U + (\tau - 1) (p^B - p^U)$, $1 \leq \tau \leq 2$. To my understanding the initial segment $\tau p^U$ must be added too, as both $p^U$ and $p^B$ might be outside the trust region. To summarize: we are going to search along the dogleg path, but $p^U$ and $p^B$ will be preliminary adjusted to be in the feasible region. This approach is the specific instance of the gradient projection method described in [2, p. 486]. The main idea: by taking steps into generalized Cauchy points we already obtain global convergence, by mixing Gauss-Newton steps we improve the rate of convergence.
Links
[1] Jorge J. More. The Levenberg-Marquardt Algorithm: Implementation and Theory
[2] Jorge Nocedal and Stephen J. Wright. Numerical Optimization, 2nd edition
[3] S. J. Wright and J. N. Holt. An Inexact Levenberg-Marquardt Algorithm for Large Sparse Nonlinear Least Squares
[4] http://users.ics.forth.gr/~argyros/mypapers/2005_10_iccv_levenberg.pdf
[5] http://cs.uoi.gr/~voglis/dogbox.ps
[6] http://www.math.umt.edu/bardsley/papers/EMopt04.pdf
[7] http://www.optimization-online.org/DB_FILE/2009/12/2504.pdf

Written with StackEdit.