View groupby_rows.py
#!/usr/bin/env python2
""" groupby_rows( A )
in: numpy array A, 1d or 2d, no NaN s
out: a dict, index of the first unique row -> [indices of identical rows]
like pandas gr.groups, a dict {group name -> [group labels] ...}, one column
"""
import numpy as np
from itertools import imap
View noisyUSV.py
#!/usr/bin/env python2
""" A = noisyUSV( n, d, r, noise ): U S V + noise, n x d, rank r """
from __future__ import division
import numpy as np
from numpy.linalg import norm
from etc import znumpyutil as nu
__version__ = "2018-05-15 May denis-bz-py t-online de" # scale noise * S.max
View Gish.md

Gish: share gists by name

gish is a command-line program to copy files between local computers and gist.github.com, using file names or gist ids. An example:

Alice:  gish put @Alice Gistname.md a.py b.py  # upload a gist with 3 files

Bob:    gish list @Alice             # list Alice's gists
View Munich-NO2.md

NO2 in Munich 2016: high traffic => high NO2

2016-mu5-hours-junedec

This plot shows NO2 levels over the day in Munich in June and December 2016. München-Landshuter-Allee on the left has about the highest NO2 levels in all Germany, and a lot of traffic — 120,000 to 150,000 cars and light trucks per day.
Surprise: high traffic => high NO2.

View 0-EPA-air-quality.md
View 0-color_hilo.md

color_hilo: color high-water or low-water numbers in a given column

See color_hilo.py below for a description of what this does.

To install on a Unix-like system,

  1. click on "Download ZIP" in the top right-hand corner; ls ~/*/*.zip to see where it is, e.g. ~/Downloads/verylongname.zip
  2. cd to a directory in your $PYTHONPATH
  3. unzip ~/Downloads/verylongname.zip
View 0-Gradient-descent-with-0-crossing.md

Gradient descent with 2-point line fit where gradients cross 0

The Gradient_descent method iterates

xnew = xold - rate(t) * grad(xold)

GD is a workhorse in machine learning, because it's so simple, uses gradients only (not function values), and can do very big x.

rate(t) is a step-size or "learning rate" (aka η, Greek eta).

View half-brokenstick.py
#!/usr/bin/env python2
""" How many of the longest pieces of a randomly-broken stick add up to half its length ? """
# http://demonstrations.wolfram.com/BrokenStickRule
from __future__ import division
import sys
import numpy as np
__version__ = "2014-10-26 oct denis-bz-py t-online de"
np.set_printoptions( 1, threshold=100, edgeitems=5, suppress=True )
View exp-L1-L2.py
#!/usr/bin/env python2
""" min_x av |exp - x| at 0.7 -- W Least_absolute_deviations, L1
min_x rms( exp - x ) at 1 -- least squares, L2
are both very flat
which might explain why L1 minimization with IRLS doesn't work very well.
"""
# goo "L1 minimization" irls
# different L1 min problems: sparsity, outliers
from __future__ import division
View 0-Adaptive-soft-threshold-smooth-abs.md

Adaptive soft threshold and smooth abs: scale by average |X|

The soft threshold and smooth absolute value functions

adasoft

are widely used in optimization and signal processing. (Soft thresholding squeezes small values to 0; if "noise" is small and "signal" large, this improves the signal-to-noise ratio. Smooth abs, also called