Skip to content

Instantly share code, notes, and snippets.

@denis-bz
Last active March 15, 2017 15:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save denis-bz/57d8b449bb65f39665ab644151a9e801 to your computer and use it in GitHub Desktop.
Save denis-bz/57d8b449bb65f39665ab644151a9e801 to your computer and use it in GitHub Desktop.
Qmin: minimize a noisy function by fitting quadratics 2017-03-15 Mar

Qmin: minimize a noisy function by fitting quadratics.

Purpose: short, clear code for

  • fitting quadratics to data, aka quadratic regression
  • iterating quad fits to a local minimum of a noisy function.

This code is for students of programming and optimization to read and try out, not for professionals.

Keywords: optimization, quadratic, parabola, python, numpy, Iteratively reweighted least squares, IRLS weighted regression, derivative-free optimization, DFO

Plot

This shows qmin bouncing around on a parabolic hill + ripple -- no surprise:

qmin-15mar

Overview

  1. lsq.py is just a thin wrapper for np.linalg.lstsq

  2. quad.py: quad_fit( X, z ) adds cross-product columns to X, then lstsq
    quad_min( X, z ) fits a quadratic z ~ x'Ax + b x + c and solves for xmin = - b / 2A -- like a parabola in 1d

  3. qmin.py:
    Start with points X, y: X_i n-dim vectors, y_i = func( X_i ) 1d .
    Fit a quadratic x'Ax + b x + c to the points by least squares, up-weighting the best ones so far
    Add the new point xmin = - b / 2A, ymin = func( xmin ) to X, y .
    Iterate these two steps.

  4. test-qmin.py: a trivial 2d test to plot.

(Instead of class, I use records (namedtuple in python) such as

Lsqfit = namedtuple( "Lsqfit", "coef X y weight fit res rank sing" )

-- a matter of taste.)

Why (not) quadratic approximations ?

A quadratic approximation to your function landscape may, of course, be pretty good, or may be terrible. The most common problem in my experience is simply not looking at the real and fitted landscapes. Another is scaling; see the excellent scipy.optimize.least_squares on this. A third is swamping: with 10 variables, 55 pure quadratic terms a00 x0^2 + a01 x0 x1 + ... + a99 x9^2 can swamp the 11 linear b0 x0 + ... + b9 x9 + c . A simple way to damp this effect is to fit linear, fit quadratic, and average the two.

Notes

There are dozens of methods for optimizing noisy, expensive functions without gradients. Why ? For one thing, there are zillions of different problems in the universe; for another, it's easy to invent a new method that works on one or two problems, and publish it. Thus we have more papers than reproducible test cases, beyond Rosenbrock. Nonetheless, here are two starting points, each with a dozen or so algorithms:

NLopt: solid c, python interface. Of these, BOBYQA (Powell, 2009) "constructs a quadratic approximation of the objective".

Optizelle

A basic suggestion for running any optimizer:

  • invest in test scaffolding, and in plotting: how can you see what the optimizer is doing ?

Comments are welcome, test cases most welcome.

cheers
-- denis

Last change: 15 March 2017

#!/usr/bin/env python2
""" lsq_fit( X, y ): a thin wrapper for np.linalg.lstsq
`verbose=1` helps follow what it's doing.
squares picture: https://en.wikipedia.org/wiki/Coefficient_of_determination
regression plot: http://seaborn.pydata.org/tutorial/regression.html
"""
# clarity, clarity
from __future__ import division
from collections import namedtuple
import numpy as np
import zutil as ut
__version__ = "2017-03-13 mar denis-bz-py t-online de"
#...............................................................................
Lsqfit = namedtuple( "Lsqfit", "coef X y weight fit res rank sing" ) # struct, record
def lsq_fit( X, y, weight=None, addones=True, verbose=1 ):
""" `lsq_fit( X, y )`: fit `y ~ X . coef` using `np.linalg.lstsq`
-> a namedtuple with fields: `coef X y weight fit res rank sing`
`y ~ X . coef = coef0 X0 + coef1 X1 + ...`
`fit = X . coef` -- line, parabola, paraboloid
`res` (residuals) = `y - fit` .
`verbose=1` prints inputs and outputs
with the caller's `np.set_printoptions`
See also: quad.py test-lsq.py
"""
# https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.lstsq.html
X = np.asarray_chkfinite( X )
y = np.asarray_chkfinite( y )
if addones:
X = np.c_[ np.ones( len(X) ), X ]
if X.ndim == 1:
X = X.reshape( -1, 1 )
if verbose:
print "\n{ lsq_fit"
print "\nX:", ut.asum(X), "\n", X
print "\ny:", ut.asum(y), "\n", y
if weight is not None:
X *= weight[:,np.newaxis] # /= sigma_i
y *= weight
if verbose:
print "\nweight:", weight
#...........................................................................
coef, res2, rank, sing = np.linalg.lstsq( X, y, rcond=1e-10 )
if rank < X.shape[1]:
print "Warning: lsq_fit: rank %d < %d columns" % (rank, X.shape[1])
print "singular values:", sing
fit = X.dot( coef )
res = y - fit # res \perp X weighted
# norms y^2 = fit^2 + res^2
# but sums of squares can be rather non-intuitive, see ...
if weight is not None:
X /= weight[:,np.newaxis]
y /= weight
if verbose:
print "\ncoef:", coef
print "\nfit = X . coef:", ut.asum(fit), "\n", fit
print "\nres = y - fit:", ut.asum(res), "\n", res
print "} lsq_fit\n"
return Lsqfit( coef.copy(), X, y, weight, fit, res, rank, sing )
def lsq_eval( fit, x ):
""" In: `fit = lsq_fit( X, y ) -> fit.coef`
Out: `x . coef` e.g. for a line, slope * x + b
"""
return x.dot( fit.coef )
#!/usr/bin/env python2
from __future__ import division
import numpy as np
import quad
__version__ = "2017-03-09 mar denis-bz-py t-online de"
#...............................................................................
def qmin( func, X, y, maxiter=10, yweight=5, verbose=1 ):
""" minimize a noisy function by fitting quadratics
Method
------
Start with points `X, y`: `X_i` n-dim vectors, `y_i = func( X_i )` 1d .
1) Fit a quadratic ` x'Ax + b x + c` to the points by least squares,
up-weighting the best ones so far
2) Add the new point
`xmin = - b / 2A`
`ymin = func( xmin )`
to `X, y` .
Iterate these two steps `maxiter` times.
Example: see `test-qmin.py` .
In
--
func: a (noisy, expensive) function
X, y: numpy array-like of initial data points, `y_i = func( X_i )`
Out
---
X, y: the input arrays, with new points `x, y = func(x)` added
qrecs: a list of `Quadfmin` recs, see `quad.py` .
Parameters, with their defaults
----------
`yweight = 5`
At each iteration, up-weight better points:
the best (smallest `y`) so far -> this `yweight`, the worst -> 1.
`maxiter = 10`
`verbose = 1`
Keywords: optimization, quadratic, parabola, python, numpy,
Iteratively reweighted least squares, IRLS
weighted regression, derivative-free, DFO
"""
# inf many ways of
# weighting: |X - xbest|, |y - ybest|, both, residuals
# stepsize / learning rate / cooling schedule
# test cases welcome
X = np.asarray_chkfinite( X )
y = np.asarray_chkfinite( y )
assert X.ndim == 2, X.shape
assert y.ndim == 1, y.shape
nx = len(X)
ymin = y.min()
if verbose:
print "\n{ qmin"
print "In:"
print "X.T: \n", X.T # with user's np.set_printoptions
print "y:", y
print "ymin: %.3g" % ymin
print "yweight: %.3g" % yweight
print "\n# iter func(xmin) quad(xmin) func - quad av |res| eigenvalues xmin"
qrecs = []
for iter in range(maxiter):
w = np.interp( y, [y.min(), y.max()],
[yweight, 1] ) # linear, geom ?
#.......................................................................
qrec = quad.quad_min( X, y, weight=w, verbose=verbose )
xmin = qrec.xmin # quad min or max or saddle
fmin = func( xmin )
X = np.append( X, [xmin], axis=0 )
y = np.append( y, fmin )
qrecs.append( qrec )
eigvals = np.linalg.eigvalsh( qrec.A ) # max eigenvec: best direction
err = " <-- error: parabola down, not up" if eigvals[0] <= 0 \
else ""
if verbose or err:
qatmin = quad.quad_eval( qrec, xmin )
avres = np.fabs( qrec.res ).mean()
print "qmin %2d: %10.3g %10.3g %10.2g %10.2g %s \t%s %s" % (
iter, fmin, qatmin, fmin - qatmin, avres, eigvals, xmin, err )
if eigvals[0] <= 0: # xtol ytol: stopiter.py
break
if verbose:
# print "qmin final weights:", np.sort( w )
print "qmin func(y): start ymin %.3g + %s" % (ymin, y[nx:] - ymin)
print "} qmin\n"
return X, y, qrecs
#!/usr/bin/env python2
from __future__ import division
from collections import namedtuple # aka struct, record
import numpy as np
import lsq
__version__ = "2017-03-10 mar denis-bz-py t-online de"
#...............................................................................
Quadmin = namedtuple( "Quadmin", "xmin A b c X z weight fit res quadfit" )
def quad_min( X, z, weight=None, verbose=1 ):
""" fit a quadratic `z ~ x'Ax + b x + c` to vectors `X_i`
-> a Quadmin rec: xmin ...
E.g. if the rows of `X` are points `x y` in the plane, fit
`z ~ a00 x^2 + a01 xy + a11 y^2` -- quadratic
`+ b0 x + b1 y + c` -- linear
"""
nx, dim = X.shape
assert nx >= (dim+1) * (dim+2) // 2, X.shape # too few points to fit a quad
qfit = quad_fit( X, z, weight=weight, verbose=verbose )
# x'Ax + bx + c, xmin = - b / 2A --
coef = qfit.coef
c, b, alin = coef[0], coef[1:dim+1], coef[dim+1:]
A = _tri_square( dim, alin, offdiag=0.5 )
xmin = np.linalg.solve( A, - b / 2 ) # - b / 2A like 1d
return Quadmin( xmin, A, b, c, X, z, weight,
fit=qfit.fit, res=qfit.res, quadfit=qfit )
#...............................................................................
def quad_fit( X, z, weight=None, verbose=1 ):
""" add cross-product cols to X, lsq_fit
-> Lsqfit: coef X y weight fit res rank sing, see lsq.py
weight: a vec ~ 1 / sigma, or None
"""
X = np.asarray_chkfinite( X )
z = np.asarray_chkfinite( z )
if X.ndim == 1:
X = X.reshape( -1, 1 )
nx, dim = X.shape
Xprod = _row_products( X.T ).T # column products e.g. x^2 xy y^2
Xquad = np.c_[ np.ones(nx), X, Xprod ] # 1 x y x^2 xy y^2
return lsq.lsq_fit( Xquad, z, weight=weight, addones=False,
verbose=(verbose >= 2) )
#...............................................................................
def quad_eval( qmin, X ):
""" In: `qmin = quad_min() -> A b c`
Out: ` x'Ax + x . b + c x` a vec or ndarray
"""
A, b, c = qmin.A, qmin.b, qmin.c
if X.ndim == 1:
return X.dot(A).dot(X) + X.dot(b) + c
else:
return np.array([ x.dot(A).dot(x) + x.dot(b) + c for x in X ])
# vectorize this ?
# class Quadfit ? mttiw
#...............................................................................
# util --
def _row_products( X ):
""" e.g. rows [x y z] -> [x^2 xy xz y^2 yz z^2] """
return np.vstack([ X[j] * X[j:] for j in xrange( len(X) )])
def _tri_square( n, X=None, offdiag=1 ):
""" e.g. [0 1 2 3 4 5] [xx xy xz yy yz zz]
-> 3 x 3 symmetric
[[0 1 2]
[1 3 4]
[2 4 5]]
"""
if X is None:
X = np.arange( n * (n + 1) // 2 )
A = np.zeros(( n, n ), dtype=X.dtype )
# j, k walk upper triangle --
j = k = 0
for x in X:
if j != k:
x *= offdiag
A[j,k] = A[k,j] = x
k += 1
if k >= n:
j += 1
k = j
return A
#...............................................................................
if __name__ == "__main__":
for n in [2, 3, 4]:
print "_tri_square:", _tri_square( n )
# from: test-qmin.py
# run: 15 Mar 2017 14:35 in ~bz/py/np/lsq/quad denis-imac 10.8.3
# versions: numpy 1.12.0 scipy 0.19.0 python 2.7.12 mac 10.8.3
================================================================================
test-qmin.py hill 2 ripple 1 yweight 10 dim 2 nx 10 seed 7
{ qmin
In:
X.T:
[[9.6 8.6 -2.4 -4.2 -8.5 0.022 3.6 -1.2 -5.7 -4.6]
[0.77 -9.5 -8.7 8.2 5.6 -8.6 6.1 4.5 -0.96 -0.0023]]
y: [1.8 1.3 1 0.71 0.65 -0.26 -0.38 -0.46 -0.49 -0.7]
ymin: -0.699
yweight: 10
# iter func(xmin) quad(xmin) func - quad av |res| eigenvalues xmin
qmin 0: 0.984 -1.11 2.1 1.1 [0.013 0.017] [1.1 -0.63]
qmin 1: -0.966 -0.63 -0.34 1.9 [0.0039 0.013] [-1.4 -1]
qmin 2: -0.451 -0.828 0.38 1.6 [0.0089 0.014] [-0.25 -0.6]
qmin 3: -0.00631 -0.707 0.7 1.8 [0.0058 0.013] [-1.1 -0.9]
qmin 4: -0.866 -0.605 -0.26 2 [0.0029 0.012] [-2.6 -1.8]
qmin func(y): start ymin -0.699 + [1.7 -0.27 0.25 0.69 -0.17]
} qmin
qmin: minimize a noisy function by fitting quadratics
start: -0.7 at [-4.6 -0.0023] qmin: -0.97 at [-1.4 -1] hill 2 ripple 1 yweight 10
#!/usr/bin/env python2
""" test-qmin.py """
from __future__ import division
import sys
import numpy as np
from qmin import qmin
import zutil as ut
__version__ = "2017-03-14 mar denis-bz-py t-online de"
np.set_printoptions( threshold=20, edgeitems=10, linewidth=140,
formatter = dict( float=ut.numstr ))
# formatter = dict( float = lambda x: "%.3g" % x )) # float arrays %.3g
print "\n", 80 * "="
#...............................................................................
# a simple noisy func
# see also ndtestfuncs.py
def func( x, xmax=10 ): # hill=10, ripple=10
""" -> parabola + ripple
hill * ||x||^2 scaled + ripple * sin( 2 pi av x )
"""
assert x.ndim == 1, x.shape
xsum2 = (np.linalg.norm(x) / xmax) ** 2 / len(x) # 1 at corners
znoise = np.sin( 2*np.pi * x.mean() ) # 0 on diagonal lines / planes 0 .5 1 ...
return hill * xsum2 + ripple * znoise
# default params --
hill = 2 # func: hill + ripple
ripple = 1
yweight = 10
maxiter = 5
dim = 2
nx = 10 # nx random start points in [-10, 10] square
verbose = 1 # 2: lsq too
seed = 7
plot = 0
# to change these params, run this.py a=1 b=None 'c = ...' in sh or ipython
for arg in sys.argv[1:]:
exec( arg )
assert dim > 1, dim
np.random.seed(seed)
print "%s hill %g ripple %g yweight %g dim %d nx %d seed %d " % (
sys.argv[0], hill, ripple, yweight, dim, nx, seed )
#...............................................................................
# nx random start points in [-10, 10] square --
X0 = np.random.uniform( -10, 10, size=[nx, dim] )
z0 = ut.vmap( func, X0 )
jsort = z0.argsort() [::-1]
X0 = X0[jsort]
z0 = z0[jsort]
#...............................................................................
X, z, qrecs = qmin( func, X0, z0, maxiter=maxiter, yweight=yweight, verbose=verbose )
jmin = z.argmin()
Xmin, zmin = X[jmin], z[jmin]
title = \
"""qmin: minimize a noisy function by fitting quadratics
start: %.2g at %s qmin: %.2g at %s hill %g ripple %g yweight %g """ % (
z0[-1], X0[-1], zmin, Xmin, hill, ripple, yweight )
print title
# plot xy dots in colors ~ z: useless, try plot-idw.py
if plot and dim == 2:
from matplotlib import pyplot as pl
import seaborn as sns
pl.rc( "figure.subplot", top=.90, bottom=.06 )
fig, ax = pl.subplots( figsize=[10, 5] )
fig.suptitle( title, multialignment="left" )
ax.set( xlim=[-6, 2] )
x, y = X.T
m, M = z.min(), z.max()
z01 = (z - m) / (M - m)
colors = pl.cm.inferno_r( z01 ) # http://matplotlib.org/users/colormaps.html
ax.scatter( x, y, c=colors, s=100 )
x, y = X[nx-1:].T
ax.plot( x, y, "b", lw=.5 )
# pl.colorbar() # grr
pl.draw()
if plot >= 2:
png = "tmp-yweight%g.png" % yweight
from etc import plotutil
plotutil.savefig( png, __file__ )
#!/usr/bin/env python2
""" zutil.py: asum ... """
from __future__ import division
import numpy as np
def asum( X ):
""" array summary: "shape dtype min av max [density]" """
if not hasattr( X, "dtype" ) or X.dtype.kind not in "fiu": # float int uint
return str(X)
if hasattr( X, "todense" ): # issparse
sparsetype = type(X).__name__ + " " # csr_matrix etc.
density = " of the %.3g %% non-0" % (
100. * X.nnz / np.prod( X.shape ))
else:
sparsetype = density = ""
dtype = str(X.dtype) .replace( "float64", "" )
return "%s %s%s min av max %.3g %.3g %.3g %s" % (
X.shape, sparsetype, dtype, X.min(), X.mean(), X.max(), density )
def findfile( filename, dirs=["", "data", "../data", "$SCIKIT_LEARN_DATA", "$webdata"] ):
""" -> first dir/file found in a list of dirs
IOError if none
"""
from os.path import expanduser, expandvars, isfile, join
for dir in dirs:
dirfile = expanduser( expandvars( join( dir, filename )))
if isfile( dirfile ):
return dirfile
raise IOError( "file \"%s\" not found in folders %s" % (filename, dirs) )
def ints( X ):
return np.round(X).astype(int) # NaN Inf -> - maxint
def numstr( x, fmt="%.2g" ):
""" %.2g but 1201 not 1.2e+02 """
# np.set_printoption(... formatter = dict( float=ut.numstr ))
if isinstance( x, basestring ) or x is None:
return x
assert np.isscalar(x), type(x)
if abs(x) < 1e-10: return "0"
if abs(x) < 1e-6: return "%.1g" % x
if 10 <= abs(x) < 1e5: return "%.0f" % x
return fmt % x
def vmap( f, X ):
return np.vstack( map( f, X )) .squeeze()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment