View cube-dist-L12max-42.log
# from:
# run: 20 Nov 2017 15:37 in ~bz/py/random/uniform denis-imac 10.8.3
# versions: numpy 1.13.3 scipy 1.0.0 python 2.7.14 mac 10.8.3
N 1000 dims [3, 8, 16, 32] quantiles [10 50 90] seed 3
Distances between random points in the unit cube in metrics L1, L2, Lmax

NO2 in Munich 2016: high traffic => high NO2


This plot shows NO2 levels over the day in Munich in June and December 2016. München-Landshuter-Allee on the left has about the highest NO2 levels in all Germany, and a lot of traffic — 120,000 to 150,000 cars and light trucks per day.
Surprise: high traffic => high NO2.


color_hilo: color high-water or low-water numbers in a given column

See below for a description of what this does.

To install on a Unix-like system,

  1. click on "Download ZIP" in the top right-hand corner; ls ~/*/*.zip to see where it is, e.g. ~/Downloads/
  2. cd to a directory in your $PYTHONPATH
  3. unzip ~/Downloads/

Gradient descent with 2-point line fit where gradients cross 0

The Gradient_descent method iterates

xnew = xold - rate(t) * grad(xold)

GD is a workhorse in machine learning, because it's so simple, uses gradients only (not function values), and can do very big x.

rate(t) is a step-size or "learning rate" (aka η, Greek eta).

#!/usr/bin/env python2
""" How many of the longest pieces of a randomly-broken stick add up to half its length ? """
from __future__ import division
import sys
import numpy as np
__version__ = "2014-10-26 oct denis-bz-py t-online de"
np.set_printoptions( 1, threshold=100, edgeitems=5, suppress=True )
#!/usr/bin/env python2
""" min_x av |exp - x| at 0.7 -- W Least_absolute_deviations, L1
min_x rms( exp - x ) at 1 -- least squares, L2
are both very flat
which might explain why L1 minimization with IRLS doesn't work very well.
# goo "L1 minimization" irls
# different L1 min problems: sparsity, outliers
from __future__ import division

Adaptive soft threshold and smooth abs: scale by average |X|

The soft threshold and smooth absolute value functions


are widely used in optimization and signal processing. (Soft thresholding squeezes small values to 0; if "noise" is small and "signal" large, this improves the signal-to-noise ratio. Smooth abs, also called


Qmin: minimize a noisy function by fitting quadratics.

Purpose: short, clear code for

  • fitting quadratics to data, aka quadratic regression
  • iterating quad fits to a local minimum of a noisy function.

This code is for students of programming and optimization to read and try out, not for professionals.


How noisy is test error in classification ?

Algorithms for classification, in particular binary classification, have two different objectives:

  • a smooth, fast, approximate Loss function used in most optimizers
  • real loss measured on a test set: expensive to calculate, so usually done only at the end.