View cube-dist-L12max-42.log
# from: cube-dist-L12max-42.py
# run: 20 Nov 2017 15:37 in ~bz/py/random/uniform denis-imac 10.8.3
# versions: numpy 1.13.3 scipy 1.0.0 python 2.7.14 mac 10.8.3
================================================================================
cube-dist-L12max-42.py
N 1000 dims [3, 8, 16, 32] quantiles [10 50 90] seed 3
Distances between random points in the unit cube in metrics L1, L2, Lmax
View Munich-NO2.md

NO2 in Munich 2016: high traffic => high NO2

2016-mu5-hours-junedec

This plot shows NO2 levels over the day in Munich in June and December 2016. München-Landshuter-Allee on the left has about the highest NO2 levels in all Germany, and a lot of traffic — 120,000 to 150,000 cars and light trucks per day.
Surprise: high traffic => high NO2.

View 0-EPA-air-quality.md
View 0-color_hilo.md

color_hilo: color high-water or low-water numbers in a given column

See color_hilo.py below for a description of what this does.

To install on a Unix-like system,

  1. click on "Download ZIP" in the top right-hand corner; ls ~/*/*.zip to see where it is, e.g. ~/Downloads/verylongname.zip
  2. cd to a directory in your $PYTHONPATH
  3. unzip ~/Downloads/verylongname.zip
View 0-Gradient-descent-with-0-crossing.md

Gradient descent with 2-point line fit where gradients cross 0

The Gradient_descent method iterates

xnew = xold - rate(t) * grad(xold)

GD is a workhorse in machine learning, because it's so simple, uses gradients only (not function values), and can do very big x.

rate(t) is a step-size or "learning rate" (aka η, Greek eta).

View half-brokenstick.py
#!/usr/bin/env python2
""" How many of the longest pieces of a randomly-broken stick add up to half its length ? """
# http://demonstrations.wolfram.com/BrokenStickRule
from __future__ import division
import sys
import numpy as np
__version__ = "2014-10-26 oct denis-bz-py t-online de"
np.set_printoptions( 1, threshold=100, edgeitems=5, suppress=True )
View exp-L1-L2.py
#!/usr/bin/env python2
""" min_x av |exp - x| at 0.7 -- W Least_absolute_deviations, L1
min_x rms( exp - x ) at 1 -- least squares, L2
are both very flat
which might explain why L1 minimization with IRLS doesn't work very well.
"""
# goo "L1 minimization" irls
# different L1 min problems: sparsity, outliers
from __future__ import division
View 0-Adaptive-soft-threshold-smooth-abs.md

Adaptive soft threshold and smooth abs: scale by average |X|

The soft threshold and smooth absolute value functions

adasoft

are widely used in optimization and signal processing. (Soft thresholding squeezes small values to 0; if "noise" is small and "signal" large, this improves the signal-to-noise ratio. Smooth abs, also called

View 0-Qmin.md

Qmin: minimize a noisy function by fitting quadratics.

Purpose: short, clear code for

  • fitting quadratics to data, aka quadratic regression
  • iterating quad fits to a local minimum of a noisy function.

This code is for students of programming and optimization to read and try out, not for professionals.

View Test-error-in-classification.md

How noisy is test error in classification ?

Algorithms for classification, in particular binary classification, have two different objectives:

  • a smooth, fast, approximate Loss function used in most optimizers
  • real loss measured on a test set: expensive to calculate, so usually done only at the end.