Skip to content

Instantly share code, notes, and snippets.

@brendano
Last active March 8, 2020 14:12
Show Gist options
  • Save brendano/7bea889cbe33ba2d40f25c264cc3570f to your computer and use it in GitHub Desktop.
Save brendano/7bea889cbe33ba2d40f25c264cc3570f to your computer and use it in GitHub Desktop.
numerically stable implementation of the log-logistic function

Binary case

This is just the middle section of Bob Carpenter's note for evaluating log-loss via the binary logistic functoin https://lingpipe-blog.com/2012/02/16/howprevent-overflow-underflow-logistic-regression/

The logp function calculates the negative cross-entropy:

    dotproduct( [y, 1-y],  [logP(y=1), logP(y=0)] )

where the input s is the beta'x log-odds scalar value. The trick is to make this numerically stable for any choice of s and y.

>>> logp=lambda y,s: -scipy.special.logsumexp([0, -s]) if y==1 else -scipy.special.logsumexp([0, s]) if y==0 else None

>>> logp(0,100)  ## i'm exp(100):1 confident in y=1 but oops! got y=0. suffer lots of loss.
-100.0
>>> logp(1,100)  ## i'm exp(100):1 confident in y=1 and got it. yay.
-0.0
>>> logp(1,-100)  ## i'm exp(100):1 confident in y=0 but oops! got y=1.
-100.0

>>> logp(1,10)   ## i'm exp(10):1 confident in y=1 and got it.  i'm nearly perfectly happy. nearly as good as exp(100):1 confidence.
-4.5398899216870535e-05

>>> logp(1,0)    ## i'm 50:50 undecided and got y=1, and get hit with log(.5) negative-loss
-0.6931471805599453
>>> logp(0,0)
-0.6931471805599453

Notes on sklearn: they have a sklearn.metrics.log_loss function which calculates the same thing, but it can't handle numbers near prob=0 or prob=1 and instead clips them to the range [1e-15, 1-1e-15]. this can actually change results sometimes -- it doesn't know the difference between P(y=1)=1e-15 versus a P(y=0)=1e-30 prediction where y=1 turned out to be true, which log-loss/xent does care about quite a bit, mathematically.

Also their sklearn.linear_model.LogisticRegression function calculates log probs by going through the numerically unstable probability representation first. I think if you wanted to do this more correctly, you'd have to get the beta'x dot product via, perhaps, the decision_function method instead...

Multiclass

On the multiclass case. If s is a vector of unnormalized log-probabilities, it's just this...

logp = lambda y,s: (s - scipy.special.logsumexp(s))[y]
>>> logp(0,np.array([0,100]))
-100.0
>>> logp(1,np.array([0,100]))
0.0
>>> logp(1,np.array([0,-100]))
-100.0
>>> logp(1,np.array([0,10]))
-4.5398899217730104e-05
>>> logp(1,np.array([0,0]))
-0.6931471805599453
>>> logp(0,np.array([0,0]))
-0.6931471805599453


# Make sure it works for cases outside the finite tolerance of exp.  exp(1000)=inf, exp(-1000)=0 under double-prec floating point.
>>> logp(0,np.array([-1000,1000]))
-2000.0
>>> logp(1,np.array([-1000,1000]))
0.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment