{{ message }}

Instantly share code, notes, and snippets.

# CamDavidsonPilon/delta_method.md

Last active Sep 13, 2019

## The Delta Method and Autograd

One of the reasons I'm really excited about autograd is because it enables me to be able to transform my abstract parameters into business-logic. Let me explain with an example. Suppose I am modeling customer churn, and I have fitted a Weibull survival model using maximum likelihood estimation. I have two parameter estimates: lambda-hat and rho-hat. I also have their covariance matrix, which tells me how much uncertainty is present in the estimates (in lifelines, this is under the `variance_matrix_` property. From this, I can plot the survival curve, which is fine, but what I really want is a measure of lifetime value. Customers give us \$10 each time period for the first 3 time periods, and then \$30 a month afterwards. For a single user, the average LTV (up to `timeline`) calculation might look like:

``````# create a Weibull model with fake data
wf = WeibullFitter().fit(np.arange(1, 100))

from autograd import numpy as np

def LTV(params, timeline):
lambda_, rho_ = params
sf = np.exp(-(timeline/lambda_) ** rho_)
clv = 10 * (timeline <= 3) * sf + 30 * (timeline > 3) * sf
return clv.sum()

timeline = np.arange(10)
params = np.array([wf.lambda_, wf.rho_])

LTV(params, timeline)
# 214.76
``````

This gives me a point estimate. But I also want the variance, since there is uncertainty in lambda-hat and rho-hat. There is a technique called the delta-method that allows me to transform variance from one domain to another. But to use that, I need the gradient of LTV w.r.t to lambda and rho. For this problem, I could probably write out the gradient, but it would be messy and error-prone. I can use autograd instead to do this.

``````from autograd import jacobian

# array([ 0.15527043, 10.78900217])
``````

The output of the above is the gradient at the `params` value. Very easy. Now, if I want the variance of the LTV, the delta methods tells me to multiple the above by the covariance matrix and the gradient's transpose (check: this should return a scalar and not a vector/matrix):

``````var = gradient_LTV(params, timeline).dot(wf.variance_matrix_).dot(gradient_LTV(params, timeline))
# 3.127
``````

So, my best estimate of LTV (at timeline=10) is 214.76 (3.127). From this, we can build confidence intervals, etc.

### CamDavidsonPilon commented Sep 13, 2019

 See blog post here too: https://dataorigami.net/blogs/napkin-folding/the-delta-method-and-autograd