nmayorov/GSOC_proposal_v2

## GSOC_proposal_v2
# Improve nonlinear least squares minimization functionality in SciPy

## Suborganization: scipy

## Student information

**Name**:  Nikolay Mayorov
**e-mail**: n59_ru@hotmail.com
**github**: nmayorov

*I'll add other officially required info later.*

## University information

**University**: Lomonosov Moscow State University
**Major**: Applied Mechanics / Navigation and Control
**Current Year**: 2nd year
**Expected Graduation date**: May of 2015
**Degree**: Master

## Proposal information

**Title**: Improve nonlinear least squares minimization functionality in SciPy

**Abstract**: The main goal is to implement the new solver for nonlinear least squares minimization based on modified *dogleg* approach. The solver will be able to work with large sparse Jacobian matrices and efficiently handle bound constraints on some of the variables. The second goal is to implement *separable nonlinear least squares* algorithm. This algorithm is applicable when an approximating function is sought as the linear combination of basis functions and it usually enjoys much faster convergence.

**Details**:

1) The current scipy functionality for nonlinear least squares minimization is limited to the wrapper of MINPACK routines `lmder` / `lmdif` written in Fortran. These functions implement trust-region-formulated Levenberg-Marquardt algorithm with the near-exact trust region subproblem solution at each iterator as described in [1].

The MINPACK approach selected in MINPACK to solving the *normal equation* via QR decomposition despite being robust is not always justified and can be unnecessary slow when the number of observations $m$ is much higher than the number of variables to fit $n$. It is also not suitable for working with large and sparse Jacobian matrices. Another lacking feature with the high demand from community is the support of bound constraints.

Overall I think that functions from MINPACK should remain as they are now. And the justification is the following:

1. They were developed by experts in optimization.
2. They do their job well in their domain of applicability.
3. They are quite complicated even at the current state, i.e. the big effort was put to implement advanced and robust version of the Levenberg-Marquadt algorithm. As the result it is not very open for modification.
4. To my understanding it is not the best framework for solving large problems as each iteration (with fixed trust-region radius) requires solving the normal equation multiple times before the required parameter $\lambda$ is found. This is because MINPACK routines search the exact solution of the trust-region subproblem with a given radius.
5. The things become even more complicated if we want to add bound constraints support. I haven't found a decent explantation of how to handle bound constraints in LM algorithm. There is a paper [2], but it's quite shady on how to handle situations when the function reduction by LM step is not sufficient. And the implementation used by the authors for testing is even something different they describe in the paper.

In my opinion the maximum to be done with MINPACK routines is to port them "as is" to Python / Cython, it was mostly done here https://github.com/scipy/scipy/pull/90. I want to point out that the same approach is taken in MATLAB [3]: the LM algorithm is one of the methods for nonlinear least squares, but it doesn't support neither sparse Jacobian nor bound constraints. Porting to Python could be finished if I will have time left after finishing the main part of the proposal.

2) Instead of tinkering with MINPACK functions I propose to implement the new solver for nonlinear least squares. Unfortunately there is no established textbook algorithm for specifically solving nonlinear least squares with bound constraints. As an illustration: the discussion here https://github.com/scipy/scipy/issues/3129 clearly lacks concrete ideas and references. Most of the papers on this subject are written by mathematicians and it is often hard to "get to the point" and understand the important things needed for the implementation. After some research I chose the paper [4] as the main references. It is appealing from 3 perspectives:

1. It is written by mathematicians, it contains all necessary proofs, etc. This means it is more or less trustworthy from the theoretical point of view.
2. It contains enough details and is not very difficult to follow.
3. It provides accompanying code in MATLAB http://codosol.de.unifi.it/
Obviously the last point is particularly important.

I'm not going to retell the whole algorithm here (please read the paper), but mention points important for adapting it into scipy:

1. The authors apply their algorithm for solving a nonlinear system, i.e their case is $n$ = $m$. But as the problem they solve is nevertheless nonlinear least squares, we'll only need to modify the termination conditions, the good reference point for that is the termination conditions in Levenberg-Marquardt from MINPACK.
2. The step which defines efficiency of iterations is solving the equation $J p = -r$ for the Gauss-Newton step. Note that in a general case $m \neq n$ this equation is solved in a least squares sense, or in other words the equivalent *normal equation* needs to be solved. Here we should implement several approaches:
    - Use the standard solver `numpy.linalg.lstsq` --- suitable for medium scale dense problems.
    - Directly form and solve the normal equation $J^T J p = - J^T r$ --- it can be particularly useful when $m >> n$. In this scenario storing $J$ in memory might be infeasible, but $J^T J$ and $J r$ can be computed as suggested in [4, p. 255].
    - In case of Jacobian being sparse or `LinearOperator` use `lsqr` or `lsmr` from `scipy.linalg.sparse`.
3. The algorithm won't work when $m < n$, in practice it isn't a restrictive assumption. The same restriction is imposed in MATLAB for its advanced nonlinear least squares solver.
4. Sparsity pattern of the finite difference approximation of Jacobian could be specified in a way described in [3].
5.  Several options for the gradient scaling matrix $D$ and the trust-region shape matrix $G$ exist. These options should be investigated to include only useful ones in the final API.

3) This idea was hinted by Gregor Thalhammer in scipy mailing list. Consider a problem of fitting a function to data $\{t_i, y_i\}$, $\:i = 1, \ldots, m$ in the form
\begin{equation}
y_i \approx c_1 f_1(t_i, x) + c_2 f_2(t_i, x) + \ldots + c_p f_p(t_i, x)
\end{equation}
The unknown parameters are $c_1, c_2, \ldots, c_p$ and the vector $x$. We have the model linear in $c_i$ and nonlinear in $x$. It turns out that this problem can be solved more efficiently by alternating nonlinear and linear least squares steps as described in [6]. Such method is called *separable nonlinear least squares* and it is demonstrated to often converge in significantly less number of iterations than the standard nonlinear approach. Searching an approximating function in a form of a linear combination (but now basis functions depend on unknown parameters) is quite common in engineering / scientific practice, so it would be useful to provide a high quality implementation of this method to scipy users.

The algorithm will be implemented as the separate function, where the nonlinear step will be handled by one of the available solvers: Levenberg-Marquardt or constrained dogleg. The provided reference [6] contains enough details to confidently implement the method.


**References**
[1] Jorge J. More. *The Levenberg-Marquardt Algorithm: Implementation and Theory*
[2] Christian Kanzowa, Nobuo Yamashitab, Masao Fukushimab. *Levenberg–Marquardt methods with strong local convergence properties for solving nonlinear equations with convex constraints*
[3] http://www.mathworks.com/help/optim/ug/lsqcurvefit.html
[4] S. Bellavia, M. Macconi, S. Pieraccini. *Constrained dogleg methods for nonlinear systems with simple bounds*, http://www.optimization-online.org/DB_FILE/2009/12/2504.pdf
[5] Jorge Nocedal and Stephen J. Wright. *Numerical Optimization, 2nd edition*
[6] Hans Bruun Nielsen, *Separable Nonlinear Least Squares*

> Written with [StackEdit](https://stackedit.io/).
	# Improve nonlinear least squares minimization functionality in SciPy

	## Suborganization: scipy

	## Student information

	Name: Nikolay Mayorov
	e-mail: n59_ru@hotmail.com
	github: nmayorov

	I'll add other officially required info later.

	## University information

	University: Lomonosov Moscow State University
	Major: Applied Mechanics / Navigation and Control
	Current Year: 2nd year
	Expected Graduation date: May of 2015
	Degree: Master

	## Proposal information

	Title: Improve nonlinear least squares minimization functionality in SciPy

	Abstract: The main goal is to implement the new solver for nonlinear least squares minimization based on modified dogleg approach. The solver will be able to work with large sparse Jacobian matrices and efficiently handle bound constraints on some of the variables. The second goal is to implement separable nonlinear least squares algorithm. This algorithm is applicable when an approximating function is sought as the linear combination of basis functions and it usually enjoys much faster convergence.

	Details:

	1) The current scipy functionality for nonlinear least squares minimization is limited to the wrapper of MINPACK routines `lmder` / `lmdif` written in Fortran. These functions implement trust-region-formulated Levenberg-Marquardt algorithm with the near-exact trust region subproblem solution at each iterator as described in [1].

	The MINPACK approach selected in MINPACK to solving the normal equation via QR decomposition despite being robust is not always justified and can be unnecessary slow when the number of observations $m$ is much higher than the number of variables to fit $n$. It is also not suitable for working with large and sparse Jacobian matrices. Another lacking feature with the high demand from community is the support of bound constraints.

	Overall I think that functions from MINPACK should remain as they are now. And the justification is the following:

	1. They were developed by experts in optimization.
	2. They do their job well in their domain of applicability.
	3. They are quite complicated even at the current state, i.e. the big effort was put to implement advanced and robust version of the Levenberg-Marquadt algorithm. As the result it is not very open for modification.
	4. To my understanding it is not the best framework for solving large problems as each iteration (with fixed trust-region radius) requires solving the normal equation multiple times before the required parameter $\lambda$ is found. This is because MINPACK routines search the exact solution of the trust-region subproblem with a given radius.
	5. The things become even more complicated if we want to add bound constraints support. I haven't found a decent explantation of how to handle bound constraints in LM algorithm. There is a paper [2], but it's quite shady on how to handle situations when the function reduction by LM step is not sufficient. And the implementation used by the authors for testing is even something different they describe in the paper.

	In my opinion the maximum to be done with MINPACK routines is to port them "as is" to Python / Cython, it was mostly done here https://github.com/scipy/scipy/pull/90. I want to point out that the same approach is taken in MATLAB [3]: the LM algorithm is one of the methods for nonlinear least squares, but it doesn't support neither sparse Jacobian nor bound constraints. Porting to Python could be finished if I will have time left after finishing the main part of the proposal.

	2) Instead of tinkering with MINPACK functions I propose to implement the new solver for nonlinear least squares. Unfortunately there is no established textbook algorithm for specifically solving nonlinear least squares with bound constraints. As an illustration: the discussion here https://github.com/scipy/scipy/issues/3129 clearly lacks concrete ideas and references. Most of the papers on this subject are written by mathematicians and it is often hard to "get to the point" and understand the important things needed for the implementation. After some research I chose the paper [4] as the main references. It is appealing from 3 perspectives:

	1. It is written by mathematicians, it contains all necessary proofs, etc. This means it is more or less trustworthy from the theoretical point of view.
	2. It contains enough details and is not very difficult to follow.
	3. It provides accompanying code in MATLAB http://codosol.de.unifi.it/
	Obviously the last point is particularly important.

	I'm not going to retell the whole algorithm here (please read the paper), but mention points important for adapting it into scipy:

	1. The authors apply their algorithm for solving a nonlinear system, i.e their case is $n$ = $m$. But as the problem they solve is nevertheless nonlinear least squares, we'll only need to modify the termination conditions, the good reference point for that is the termination conditions in Levenberg-Marquardt from MINPACK.
	2. The step which defines efficiency of iterations is solving the equation $J p = -r$ for the Gauss-Newton step. Note that in a general case $m \neq n$ this equation is solved in a least squares sense, or in other words the equivalent normal equation needs to be solved. Here we should implement several approaches:
	- Use the standard solver `numpy.linalg.lstsq` --- suitable for medium scale dense problems.
	- Directly form and solve the normal equation $J^T J p = - J^T r$ --- it can be particularly useful when $m >> n$. In this scenario storing $J$ in memory might be infeasible, but $J^T J$ and $J r$ can be computed as suggested in [4, p. 255].
	- In case of Jacobian being sparse or `LinearOperator` use `lsqr` or `lsmr` from `scipy.linalg.sparse`.
	3. The algorithm won't work when $m < n$, in practice it isn't a restrictive assumption. The same restriction is imposed in MATLAB for its advanced nonlinear least squares solver.
	4. Sparsity pattern of the finite difference approximation of Jacobian could be specified in a way described in [3].
	5. Several options for the gradient scaling matrix $D$ and the trust-region shape matrix $G$ exist. These options should be investigated to include only useful ones in the final API.

	3) This idea was hinted by Gregor Thalhammer in scipy mailing list. Consider a problem of fitting a function to data $\{t_i, y_i\}$, $\:i = 1, \ldots, m$ in the form
	\begin{equation}
	y_i \approx c_1 f_1(t_i, x) + c_2 f_2(t_i, x) + \ldots + c_p f_p(t_i, x)
	\end{equation}
	The unknown parameters are $c_1, c_2, \ldots, c_p$ and the vector $x$. We have the model linear in $c_i$ and nonlinear in $x$. It turns out that this problem can be solved more efficiently by alternating nonlinear and linear least squares steps as described in [6]. Such method is called separable nonlinear least squares and it is demonstrated to often converge in significantly less number of iterations than the standard nonlinear approach. Searching an approximating function in a form of a linear combination (but now basis functions depend on unknown parameters) is quite common in engineering / scientific practice, so it would be useful to provide a high quality implementation of this method to scipy users.

	The algorithm will be implemented as the separate function, where the nonlinear step will be handled by one of the available solvers: Levenberg-Marquardt or constrained dogleg. The provided reference [6] contains enough details to confidently implement the method.


	References
	[1] Jorge J. More. The Levenberg-Marquardt Algorithm: Implementation and Theory
	[2] Christian Kanzowa, Nobuo Yamashitab, Masao Fukushimab. Levenberg–Marquardt methods with strong local convergence properties for solving nonlinear equations with convex constraints
	[3] http://www.mathworks.com/help/optim/ug/lsqcurvefit.html
	[4] S. Bellavia, M. Macconi, S. Pieraccini. Constrained dogleg methods for nonlinear systems with simple bounds, http://www.optimization-online.org/DB_FILE/2009/12/2504.pdf
	[5] Jorge Nocedal and Stephen J. Wright. Numerical Optimization, 2nd edition
	[6] Hans Bruun Nielsen, Separable Nonlinear Least Squares

	> Written with [StackEdit](https://stackedit.io/).