Gradient descent with 2-point line fit where gradients cross 0

The Gradient_descent method iterates

xnew = xold - rate(t) * grad(xold)

GD is a workhorse in machine learning, because it's so simple, uses gradients only (not function values), and can do very big x.

rate(t) is a step-size or "learning rate" (aka η, Greek eta).

#!/usr/bin/env python2
from __future__ import division
import numpy as np
def classify01( X, probA, probB ):
""" classify 1 flip of a coin which is heads with either probA or probB:
heads -> A if probA > probB, B if <
tails -> A if probA < probB, B if >
#!/usr/bin/env python2
""" How many of the longest pieces of a randomly-broken stick add up to half its length ? """
from __future__ import division
import sys
import numpy as np
__version__ = "2014-10-26 oct denis-bz-py t-online de"
np.set_printoptions( 1, threshold=100, edgeitems=5, suppress=True )
#!/usr/bin/env python2
""" color_hilo: color high-water or low-water numbers in a given column
Say `my.txt` is a file with the numbers 3 1 4 1 5 5 9 in column 3.
Color column 3 of high-water lines, here those with 3 4 5 5 9, red:
color_lohi -col 3 my.txt
Ouput low-water lines only, with their line numbers:

EPA air quality: missing days for ozone, TEMP, WIND.

The Environmental Protection Agency collects data on ozone, wind etc. in hundreds of US cities, and puts it in files like daily_TEMP_2015.csv on the web site .

It looks as though many days are missing in the ozone, TEMP and WIND data for many cities. The file 2015-yearavs.tsv below, from the program, lists the number of days of data for each city, and the average over the year:

#!/usr/bin/env python2
""" min_x av |exp - x| at 0.7 -- W Least_absolute_deviations, L1
min_x rms( exp - x ) at 1 -- least squares, L2
are both very flat
which might explain why L1 minimization with IRLS doesn't work very well.
# goo "L1 minimization" irls
# different L1 min problems: sparsity, outliers
from __future__ import division

Adaptive soft threshold and smooth abs: scale by average |X|

The soft threshold and smooth absolute value functions


are widely used in optimization and signal processing. (Soft thresholding squeezes small values to 0; if "noise" is small and "signal" large, this improves the signal-to-noise ratio. Smooth abs, also called


Qmin: minimize a noisy function by fitting quadratics.

Purpose: short, clear code for

  • fitting quadratics to data, aka quadratic regression
  • iterating quad fits to a local minimum of a noisy function.

This code is for students of programming and optimization to read and try out, not for professionals.


How noisy is test error in classification ?

Algorithms for classification, in particular binary classification, have two different objectives:

  • a smooth, fast, approximate Loss function used in most optimizers
  • real loss measured on a test set: expensive to calculate, so usually done only at the end.
""" logger = Datalogger( "x y z ..." ): log vars to plot or save
logger( locals(), x= ... ) in a loop
looks up x y ... in e.g. locals()
and grows
logger.mem["x"] = [x0 x1 ...]
logger.mem["y"] = [y0 y1 ...]
... over all calls to logger(), e.g. in a loop.
logger.savez( npzfile ) saves all x y z ... with numpy.savez .