Skip to content

Instantly share code, notes, and snippets.

@nmayorov
Created March 15, 2015 21:07
Show Gist options
  • Save nmayorov/ca627d16419f929b3f30 to your computer and use it in GitHub Desktop.
Save nmayorov/ca627d16419f929b3f30 to your computer and use it in GitHub Desktop.
GSOC Proposal v.2

Improve nonlinear least squares minimization functionality in SciPy

Suborganization: scipy

Student information

Name: Nikolay Mayorov e-mail: n59_ru@hotmail.com github: nmayorov

I'll add other officially required info later.

University information

University: Lomonosov Moscow State University Major: Applied Mechanics / Navigation and Control Current Year: 2nd year Expected Graduation date: May of 2015 Degree: Master

Proposal information

Title: Improve nonlinear least squares minimization functionality in SciPy

Abstract: The main goal is to implement the new solver for nonlinear least squares minimization based on modified dogleg approach. The solver will be able to work with large sparse Jacobians and also efficiently handle box constraints on some of the variables.

Details:

  1. The current scipy functionality for nonlinear least squares minimization is limited to the wrapper of MINPACK routines lmder / lmdif written in Fortran. These functions implement trust-region-formulated Levenberg-Marquardt algorithm with the near-exact trust region subproblem solution at each iterator as described in [1]. The selected approach to solving the normal equation via QR decomposition despite being robust is not always justified and can be unnecessary slow when the number of observations $m$ is much higher than the number of variables to fit $n$. It is also not suitable for working with large and sparse Jacobian matrices. Another lacking feature is the support of box constraints.

Overall I think that functions from MINPACK should remain as they are now. And the justification is the following:

  1. They were developed by experts in optimization.
  2. They do their job well in their domain of applicability.
  3. They are quite complicated even at the current state, i.e. the big effort was put to implement advanced and robust version of the Levenberg-Marquadt algorithm. As the result it is not very open for modification.
  4. To my understanding it is not the best framework for solving large problems as each iteration (with fixed trust-region radius) requires solving the normal equation multiple times before the required parameter $\lambda$ is found. This is because MINPACK routines search the exact solution of the trust-region subproblem with a given radius.
  5. The things become even more complicated if we want to add bound constraints support. I haven't found a decent explantation of how to handle bound constraints in LM algorithm. There is a paper [2], but it's quite shady on how to handle situations when the function reduction by LM step is not sufficient. And the implementation used by the authors for testing is even something different they describe in the paper.

So in my opinion the maximum to be done with MINPACK routines is to port them "as is" to Python / Cython, it was mostly done here scipy/scipy#90. I want to point out that the same approach is taken in MATLAB: the LM algorithm is one of the methods for nonlinear least squares, but it doesn't support neither sparse Jacobian nor bound constraints. Porting to Python could be done if I will have time after finishing the main part of the proposal.

  1. Instead of tinkering with MINPACK functions I propose to implement the new solver for nonlinear least squares. Unfortunately there is no established textbook algorithm for solving nonlinear least squares with bound constraints. As an illustration: the discussion here scipy/scipy#3129 clearly lacks concrete ideas and references. Most of the papers on this subject are written by mathematicians and it is often hard to "get to the point" and understand the important things needed for the implementation. After some research I chose the paper [3] as the main references. It is appealing from 3 perspectives:
  1. It is written by mathematicians, it contains all necessary proofs, etc. This means it is more or less trustworthy from the theoretical point of view.
  2. It contains enough details and is not very difficult to follow.
  3. It provides accompanying code in MATLAB http://codosol.de.unifi.it/ Obviously the last point is particularly important.

I'm not going to retell the whole algorithm here (please read the paper), but mention points important for adapting it into scipy:

  1. The authors apply their algorithm for solving a nonlinear system, i.e their case is $n$ = $m$. But as the problem they solve is nevertheless nonlinear least squares, we'll only need to modify the termination conditions, the good reference point for that is the termination conditions in Levenberg-Marquardt from MINPACK.
  2. The step which defines efficiency of iterations is solving the equation $F' p = -F$ for the Gauss-Newton step. Note that in a general case $m \neq n$ this equation is solved in a least squares sense, or in other words the equivalent normal equation needs to be solved. Here we should implement several approaches:
    • Use the standard solver numpy.linalg.lstsq --- suitable for medium scale dense problems.
    • Directly form and solve the normal equation $(F')^T F p = - (F')^T F$ --- it can be useful when $m >> n$. In this scenario storing $F'$ in memory might be infeasible, but $(F')^T F'$ and $(F')^T F$ can be computed as suggested in [4, p. 255].
    • In case of sparse Jacobian use lsqr or lsmr from scipy.linalg.sparse.
  3. Several options for the gradient scaling matrix $D$ and the trust-region shape matrix $G$ exist. These options should be investigated to include only useful ones in the final API.
  1. Another helpful method to add would be separable nonlinear least squares. Consider a problem of fitting a function to data ${t_i, y_i}$, $:i = 1, \ldots, m$ in the form \begin{equation} y_i \approx c_1 f_1(t_i, x) + c_2 f_2(t_i, x) + \ldots + c_p f_p(t_i, x) \end{equation} The unknown parameters are $c_1, c_2, \ldots, c_p$ and the vector $x$. We have the model linear in $c_i$ and nonlinear in $x$. It turns out that this problem can be solved more efficiently by alternating nonlinear and linear least squares steps as described in [5]. In this paper it is demonstrated that using separable nonlinear least squares can significantly reduce the required number of iterations and improve robustness of estimation.

This algorithm will be implemented as the separate function, where the nonlinear step will be handled by one of the available solvers: Levenberg-Marquardt or constrained dogleg.

References [1] Jorge J. More. The Levenberg-Marquardt Algorithm: Implementation and Theory [2] Christian Kanzowa, Nobuo Yamashitab, Masao Fukushimab. Levenberg–Marquardt methods with strong local convergence properties for solving nonlinear equations with convex constraints [3] S. Bellavia, M. Macconi, S. Pieraccini. Constrained dogleg methods for nonlinear systems with simple bounds, http://www.optimization-online.org/DB_FILE/2009/12/2504.pdf [4] Jorge Nocedal and Stephen J. Wright. Numerical Optimization, 2nd edition [5] Hans Bruun Nielsen, Separable Nonlinear Least Squares

Written with StackEdit.

Written with StackEdit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment