nmayorov/GSOC Proposal v. 3

## GSOC Proposal v. 3
# Improve nonlinear least squares minimization functionality in SciPy

## Suborganization: scipy

## Student information

**Name**:  Nikolay Mayorov
**e-mail**: n59_ru@hotmail.com
**github**: nmayorov

*I'll add other officially required info later.*

## University information

**University**: Lomonosov Moscow State University
**Major**: Applied Mechanics / Navigation and Control
**Current Year**: 2nd year
**Expected Graduation date**: May of 2015
**Degree**: Master

## Proposal information

**Title**: Improve nonlinear least squares minimization functionality in SciPy

**Abstract**: The main goal is to implement the new solver for nonlinear least squares minimization based on modified *dogleg* approach. The solver will be able to work with large sparse Jacobian matrices and efficiently handle bound constraints on some of the variables. The second goal is to implement *separable nonlinear least squares* algorithm. This algorithm is applicable when an approximating function is sought as the linear combination of basis functions and it usually enjoys much faster convergence.

**Details**:

1) The current scipy functionality for nonlinear least squares minimization is limited to the wrapper of MINPACK routines `lmder` / `lmdif` written in Fortran. These functions implement trust-region-formulated Levenberg-Marquardt algorithm with the near-exact trust region subproblem solution at each iterator as described in [1].

The MINPACK approach selected in MINPACK to solving the *normal equation* via QR decomposition despite being robust is not always justified and can be unnecessary slow when the number of observations $m$ is much higher than the number of variables to fit $n$. It is also not suitable for working with large and sparse Jacobian matrices. Another lacking feature with the high demand from community is the support of bound constraints.

Overall I think that functions from MINPACK should remain as they are now. They do their job well in their domain of applicability, they are quite complicated even at the current state and not very open for modification.

So in my opinion the maximum to be done with MINPACK routines is to port them "as is" to Python / Cython, it was mostly done here https://github.com/scipy/scipy/pull/90. I want to point out that the same approach is taken in MATLAB [3]: the LM algorithm is one of the methods for nonlinear least squares, but it doesn't support neither sparse Jacobian nor bound constraints. Porting to Python could be finished if I will have time left after finishing the main part of the proposal.

2) Instead of tinkering with MINPACK functions I propose to implement the new solver for nonlinear least squares. Unfortunately there is no established textbook algorithm for specifically solving nonlinear least squares with bound constraints. As an illustration: the discussion here https://github.com/scipy/scipy/issues/3129 clearly lacks concrete ideas and references. Most of the papers on this subject are written by mathematicians and it is often hard to "get to the point" and understand the important things needed for the implementation. After some research I chose the paper [4] as the main references. It is appealing from 3 perspectives:

1. It is written by mathematicians, it contains all necessary proofs, etc. This means it is more or less trustworthy from the theoretical point of view.
2. It contains enough details and is not very difficult to follow.
3. It provides accompanying code in MATLAB http://codosol.de.unifi.it/
Obviously the last point is particularly important.

I'm not going to retell the whole algorithm here (please read the paper), but mention points important for adapting it into scipy:

1. The authors apply their algorithm for solving a nonlinear system, i.e their case is $n$ = $m$. But as the problem they solve is nevertheless nonlinear least squares, we'll only need to modify the termination conditions, the good reference point for that is the termination conditions in Levenberg-Marquardt from MINPACK.
2. The step which defines efficiency of iterations is solving the equation $J p = -r$ for the Gauss-Newton step. Note that in a general case $m \neq n$ this equation is solved in a least squares sense, or in other words the equivalent *normal equation* needs to be solved. Here we should implement several approaches:
    - Use the standard solver `numpy.linalg.lstsq` --- suitable for medium scale dense problems.
    - Directly form and solve the normal equation $J^T J p = - J^T r$ --- it can be particularly useful when $m >> n$. In this scenario storing $J$ in memory might be infeasible, but $J^T J$ and $J r$ can be computed as suggested in [4, p. 255].
    - In case of Jacobian being sparse or `LinearOperator` use `lsqr` or `lsmr` from `scipy.linalg.sparse`.
3. The algorithm won't work when $m < n$, in practice it isn't a restrictive assumption. The same restriction is imposed in MATLAB for its advanced nonlinear least squares solver.
4. Sparsity pattern of the finite difference approximation of Jacobian could be specified in a way described in [3].
5.  Several options for the gradient scaling matrix $D$ and the trust-region shape matrix $G$ exist. These options should be investigated to include only useful ones in the final API.

3) This idea was hinted by Gregor Thalhammer in scipy mailing list. Consider a problem of fitting a function to data $\{t_i, y_i\}$, $\:i = 1, \ldots, m$ in the form
\begin{equation}
y_i \approx c_1 f_1(t_i, x) + c_2 f_2(t_i, x) + \ldots + c_p f_p(t_i, x)
\end{equation}
The unknown parameters are $c_1, c_2, \ldots, c_p$ and the vector $x$. We have the model linear in $c_i$ and nonlinear in $x$. It turns out that this problem can be solved more efficiently by alternating nonlinear and linear least squares steps as described in [6]. Such method is called *separable nonlinear least squares* and it is demonstrated to often converge in significantly less number of iterations than the standard nonlinear approach. Searching an approximating function in a form of a linear combination (but now basis functions depend on unknown parameters) is quite common in engineering / scientific practice, so it would be useful to provide a high quality implementation of this method to scipy users.

The algorithm will be implemented as the separate function, where the nonlinear step will be handled by one of the available solvers: Levenberg-Marquardt or constrained dogleg. The provided reference [6] contains enough details to confidently implement the method.

4) Testing and benchmarking. The substantial part of my work will be devoted to creation of test and benchmarks suites. I'm going to collect the set of different problems: small dense problems, large sparse problems, constrained / unconstrained problems, problems suitable for separable least squares. The starting point will be [7] which contains a nice set of small least squares problems.

I'm going to create several benchmark classes similar to `BenchSmoothUnbounded` for usage with http://spacetelescope.github.io/asv/index.html

5) Provided I will have time left after finishing 2), 3) and 4) the work could be extended in the direction of general large-scale bounded-constrained optimization by implementing *spectral projected gradient method* [7], implementation in R will be available for reference. It is not hard to implement and this new type of algorithms might be useful addition to scipy.optimize.

**Approximate schedule**

Community bonding period: I'll spent this time searching for suitable problems to include in the future benchmark suite, especially for large sparse problems. Also I'm going to run the code from here http://codosol.de.unifi.it/ and try to better understand all the details about the algorithm I'm going to implement.

Week 1: Create the first benchmark suite with small dense problems from [7]. Make sure everything works with the existing `leastsq` (wrapper over MINPACK).

Weeks 2-3: Implement the general framework of constrained dogleg algorithm. At this stage sparse problems will be out of consideration. Create a benchmark suite with small constrained problems (some are presented in [7], some are referenced in [4]) and ensure the new algorithm works as expected. Naturally it should also work as the unconstrained optimizer on the test suite from Week 1.

Weeks 4-5: Add support of sparse Jacobians and the option to solve the normal equation directly. Include corresponding problems to the benchmarks and test.

Week 6: I think this week will be required to fit all the previous work together, review the interface, improve documentation, etc.

Week 7-8: Implement the basic version of separable nonlinear least squares. The goal is to provide the first working version, probably not feature-reach (variable constraints, sparseness support will be omitted for now). Simple testing problems can be taken from [7].

Week 9-10: Based on the experience from the two previous weeks decide which features can be included to separable least squares and implement them. Try to deliver the "production" version of this algorithm by the end of these weeks.

Remaining time: Consider providing a general interface to least squares minimization functions. Write narrative documentation about the new methods. Try to eliminate all	remaining deficiencies.

If everything will be in order and there'll be some time left: implement spectral projected gradient algorithm.


## Additional information

The last two years I was studying at mechanics department of Faculty of Mechanics and Mathematics in MSU. But most of my effort I applied in School of Data Analysis organized by the largest Russian search-engine operator Yandex. There I took an extensive class of algorithms and data structures in C++ with quite a lot of coding and code reviews. For all data analysis and machine learning classes we used Python and its scientific libraries as our main tool. Naturally I developed strong appreciation of this language and became interested in open source development. [Here](https://github.com/scikit-learn/scikit-learn/pulls?q=is%3Apr+author%3Anmayorov+is%3Aclosed) are my first attempts to contribute to scikit-learn project. At this semester I'm taking "Optimization in Machine Learning" class, so I'll have decent amount of practice implementing optimization algorithms in Python and get more familiar with `scipy.optimize` module as well.

In general my motivation for participation in GSoC is the following:

1. Make a solid contribution to Python scientific environment, i.e. be helpful for other people.
2. Improve my skills.
3. Become more involved in Python open-source development community.

My **patches contributed to scipy** could be found [here](https://github.com/scipy/scipy/pulls?q=is%3Apr+author%3Anmayorov+is%3Aclosed).

## Information about my schedule

My final exam will be at May, 19. There is a slight possibility I'll have to spend another week finishing some business, but I'll try to be done with my studies exactly at May, 19.


**References**
[1] Jorge J. More. *The Levenberg-Marquardt Algorithm: Implementation and Theory*
[2] Christian Kanzowa, Nobuo Yamashitab, Masao Fukushimab. *Levenberg–Marquardt methods with strong local convergence properties for solving nonlinear equations with convex constraints*
[3] http://www.mathworks.com/help/optim/ug/lsqcurvefit.html
[4] S. Bellavia, M. Macconi, S. Pieraccini. *Constrained dogleg methods for nonlinear systems with simple bounds*, http://www.optimization-online.org/DB_FILE/2009/12/2504.pdf
[5] Jorge Nocedal and Stephen J. Wright. *Numerical Optimization, 2nd edition*
[6] Hans Bruun Nielsen, *Separable Nonlinear Least Squares*
[7] Brett M. Averick, Richard G. Carter, Jorge J. More, and Guo-Liang Xue. *The MINPACK-2 test problem collection*.
[8] E. G. Birgin, J. M. Martinez, M. Raydan *Spectral Projected Gradient methods: Review and Perspectives*
	# Improve nonlinear least squares minimization functionality in SciPy

	## Suborganization: scipy

	## Student information

	Name: Nikolay Mayorov
	e-mail: n59_ru@hotmail.com
	github: nmayorov

	I'll add other officially required info later.

	## University information

	University: Lomonosov Moscow State University
	Major: Applied Mechanics / Navigation and Control
	Current Year: 2nd year
	Expected Graduation date: May of 2015
	Degree: Master

	## Proposal information

	Title: Improve nonlinear least squares minimization functionality in SciPy

	Abstract: The main goal is to implement the new solver for nonlinear least squares minimization based on modified dogleg approach. The solver will be able to work with large sparse Jacobian matrices and efficiently handle bound constraints on some of the variables. The second goal is to implement separable nonlinear least squares algorithm. This algorithm is applicable when an approximating function is sought as the linear combination of basis functions and it usually enjoys much faster convergence.

	Details:

	1) The current scipy functionality for nonlinear least squares minimization is limited to the wrapper of MINPACK routines `lmder` / `lmdif` written in Fortran. These functions implement trust-region-formulated Levenberg-Marquardt algorithm with the near-exact trust region subproblem solution at each iterator as described in [1].

	The MINPACK approach selected in MINPACK to solving the normal equation via QR decomposition despite being robust is not always justified and can be unnecessary slow when the number of observations $m$ is much higher than the number of variables to fit $n$. It is also not suitable for working with large and sparse Jacobian matrices. Another lacking feature with the high demand from community is the support of bound constraints.

	Overall I think that functions from MINPACK should remain as they are now. They do their job well in their domain of applicability, they are quite complicated even at the current state and not very open for modification.

	So in my opinion the maximum to be done with MINPACK routines is to port them "as is" to Python / Cython, it was mostly done here https://github.com/scipy/scipy/pull/90. I want to point out that the same approach is taken in MATLAB [3]: the LM algorithm is one of the methods for nonlinear least squares, but it doesn't support neither sparse Jacobian nor bound constraints. Porting to Python could be finished if I will have time left after finishing the main part of the proposal.

	2) Instead of tinkering with MINPACK functions I propose to implement the new solver for nonlinear least squares. Unfortunately there is no established textbook algorithm for specifically solving nonlinear least squares with bound constraints. As an illustration: the discussion here https://github.com/scipy/scipy/issues/3129 clearly lacks concrete ideas and references. Most of the papers on this subject are written by mathematicians and it is often hard to "get to the point" and understand the important things needed for the implementation. After some research I chose the paper [4] as the main references. It is appealing from 3 perspectives:

	1. It is written by mathematicians, it contains all necessary proofs, etc. This means it is more or less trustworthy from the theoretical point of view.
	2. It contains enough details and is not very difficult to follow.
	3. It provides accompanying code in MATLAB http://codosol.de.unifi.it/
	Obviously the last point is particularly important.

	I'm not going to retell the whole algorithm here (please read the paper), but mention points important for adapting it into scipy:

	1. The authors apply their algorithm for solving a nonlinear system, i.e their case is $n$ = $m$. But as the problem they solve is nevertheless nonlinear least squares, we'll only need to modify the termination conditions, the good reference point for that is the termination conditions in Levenberg-Marquardt from MINPACK.
	2. The step which defines efficiency of iterations is solving the equation $J p = -r$ for the Gauss-Newton step. Note that in a general case $m \neq n$ this equation is solved in a least squares sense, or in other words the equivalent normal equation needs to be solved. Here we should implement several approaches:
	- Use the standard solver `numpy.linalg.lstsq` --- suitable for medium scale dense problems.
	- Directly form and solve the normal equation $J^T J p = - J^T r$ --- it can be particularly useful when $m >> n$. In this scenario storing $J$ in memory might be infeasible, but $J^T J$ and $J r$ can be computed as suggested in [4, p. 255].
	- In case of Jacobian being sparse or `LinearOperator` use `lsqr` or `lsmr` from `scipy.linalg.sparse`.
	3. The algorithm won't work when $m < n$, in practice it isn't a restrictive assumption. The same restriction is imposed in MATLAB for its advanced nonlinear least squares solver.
	4. Sparsity pattern of the finite difference approximation of Jacobian could be specified in a way described in [3].
	5. Several options for the gradient scaling matrix $D$ and the trust-region shape matrix $G$ exist. These options should be investigated to include only useful ones in the final API.

	3) This idea was hinted by Gregor Thalhammer in scipy mailing list. Consider a problem of fitting a function to data $\{t_i, y_i\}$, $\:i = 1, \ldots, m$ in the form
	\begin{equation}
	y_i \approx c_1 f_1(t_i, x) + c_2 f_2(t_i, x) + \ldots + c_p f_p(t_i, x)
	\end{equation}
	The unknown parameters are $c_1, c_2, \ldots, c_p$ and the vector $x$. We have the model linear in $c_i$ and nonlinear in $x$. It turns out that this problem can be solved more efficiently by alternating nonlinear and linear least squares steps as described in [6]. Such method is called separable nonlinear least squares and it is demonstrated to often converge in significantly less number of iterations than the standard nonlinear approach. Searching an approximating function in a form of a linear combination (but now basis functions depend on unknown parameters) is quite common in engineering / scientific practice, so it would be useful to provide a high quality implementation of this method to scipy users.

	The algorithm will be implemented as the separate function, where the nonlinear step will be handled by one of the available solvers: Levenberg-Marquardt or constrained dogleg. The provided reference [6] contains enough details to confidently implement the method.

	4) Testing and benchmarking. The substantial part of my work will be devoted to creation of test and benchmarks suites. I'm going to collect the set of different problems: small dense problems, large sparse problems, constrained / unconstrained problems, problems suitable for separable least squares. The starting point will be [7] which contains a nice set of small least squares problems.

	I'm going to create several benchmark classes similar to `BenchSmoothUnbounded` for usage with http://spacetelescope.github.io/asv/index.html

	5) Provided I will have time left after finishing 2), 3) and 4) the work could be extended in the direction of general large-scale bounded-constrained optimization by implementing spectral projected gradient method [7], implementation in R will be available for reference. It is not hard to implement and this new type of algorithms might be useful addition to scipy.optimize.

	Approximate schedule

	Community bonding period: I'll spent this time searching for suitable problems to include in the future benchmark suite, especially for large sparse problems. Also I'm going to run the code from here http://codosol.de.unifi.it/ and try to better understand all the details about the algorithm I'm going to implement.

	Week 1: Create the first benchmark suite with small dense problems from [7]. Make sure everything works with the existing `leastsq` (wrapper over MINPACK).

	Weeks 2-3: Implement the general framework of constrained dogleg algorithm. At this stage sparse problems will be out of consideration. Create a benchmark suite with small constrained problems (some are presented in [7], some are referenced in [4]) and ensure the new algorithm works as expected. Naturally it should also work as the unconstrained optimizer on the test suite from Week 1.

	Weeks 4-5: Add support of sparse Jacobians and the option to solve the normal equation directly. Include corresponding problems to the benchmarks and test.

	Week 6: I think this week will be required to fit all the previous work together, review the interface, improve documentation, etc.

	Week 7-8: Implement the basic version of separable nonlinear least squares. The goal is to provide the first working version, probably not feature-reach (variable constraints, sparseness support will be omitted for now). Simple testing problems can be taken from [7].

	Week 9-10: Based on the experience from the two previous weeks decide which features can be included to separable least squares and implement them. Try to deliver the "production" version of this algorithm by the end of these weeks.

	Remaining time: Consider providing a general interface to least squares minimization functions. Write narrative documentation about the new methods. Try to eliminate all remaining deficiencies.

	If everything will be in order and there'll be some time left: implement spectral projected gradient algorithm.


	## Additional information

	The last two years I was studying at mechanics department of Faculty of Mechanics and Mathematics in MSU. But most of my effort I applied in School of Data Analysis organized by the largest Russian search-engine operator Yandex. There I took an extensive class of algorithms and data structures in C++ with quite a lot of coding and code reviews. For all data analysis and machine learning classes we used Python and its scientific libraries as our main tool. Naturally I developed strong appreciation of this language and became interested in open source development. [Here](https://github.com/scikit-learn/scikit-learn/pulls?q=is%3Apr+author%3Anmayorov+is%3Aclosed) are my first attempts to contribute to scikit-learn project. At this semester I'm taking "Optimization in Machine Learning" class, so I'll have decent amount of practice implementing optimization algorithms in Python and get more familiar with `scipy.optimize` module as well.

	In general my motivation for participation in GSoC is the following:

	1. Make a solid contribution to Python scientific environment, i.e. be helpful for other people.
	2. Improve my skills.
	3. Become more involved in Python open-source development community.

	My patches contributed to scipy could be found [here](https://github.com/scipy/scipy/pulls?q=is%3Apr+author%3Anmayorov+is%3Aclosed).

	## Information about my schedule

	My final exam will be at May, 19. There is a slight possibility I'll have to spend another week finishing some business, but I'll try to be done with my studies exactly at May, 19.


	References
	[1] Jorge J. More. The Levenberg-Marquardt Algorithm: Implementation and Theory
	[2] Christian Kanzowa, Nobuo Yamashitab, Masao Fukushimab. Levenberg–Marquardt methods with strong local convergence properties for solving nonlinear equations with convex constraints
	[3] http://www.mathworks.com/help/optim/ug/lsqcurvefit.html
	[4] S. Bellavia, M. Macconi, S. Pieraccini. Constrained dogleg methods for nonlinear systems with simple bounds, http://www.optimization-online.org/DB_FILE/2009/12/2504.pdf
	[5] Jorge Nocedal and Stephen J. Wright. Numerical Optimization, 2nd edition
	[6] Hans Bruun Nielsen, Separable Nonlinear Least Squares
	[7] Brett M. Averick, Richard G. Carter, Jorge J. More, and Guo-Liang Xue. The MINPACK-2 test problem collection.
	[8] E. G. Birgin, J. M. Martinez, M. Raydan Spectral Projected Gradient methods: Review and Perspectives