The Delta Method and Autograd
One of the reasons I'm really excited about autograd is because it enables me to be able to transform my abstract parameters into business-logic. Let me explain with an example. Suppose I am modeling customer churn, and I have fitted a Weibull survival model using maximum likelihood estimation. I have two parameter estimates: lambda-hat and rho-hat. I also have their covariance matrix, which tells me how much uncertainty is present in the estimates (in lifelines, this is under the
variance_matrix_ property. From this, I can plot the survival curve, which is fine, but what I really want is a measure of lifetime value. Customers give us $10 each time period for the first 3 time periods, and then $30 a month afterwards. For a single user, the average LTV (up to
timeline) calculation might look like:
# create a Weibull model with fake data wf = WeibullFitter().fit(np.arange(1, 100)) from autograd import numpy as np def LTV(params, timeline): lambda_, rho_ = params sf = np.exp(-(timeline/lambda_) ** rho_) clv = 10 * (timeline <= 3) * sf + 30 * (timeline > 3) * sf return clv.sum() timeline = np.arange(10) params = np.array([wf.lambda_, wf.rho_]) LTV(params, timeline) # 214.76
This gives me a point estimate. But I also want the variance, since there is uncertainty in lambda-hat and rho-hat. There is a technique called the delta-method that allows me to transform variance from one domain to another. But to use that, I need the gradient of LTV w.r.t to lambda and rho. For this problem, I could probably write out the gradient, but it would be messy and error-prone. I can use autograd instead to do this.
from autograd import jacobian gradient_LTV = grad(LTV) gradient_LTV(params, timeline) # array([ 0.15527043, 10.78900217])
The output of the above is the gradient at the
params value. Very easy. Now, if I want the variance of the LTV, the delta methods tells me to multiple the above by the covariance matrix and the gradient's transpose (check: this should return a scalar and not a vector/matrix):
var = gradient_LTV(params, timeline).dot(wf.variance_matrix_).dot(gradient_LTV(params, timeline)) # 3.127
So, my best estimate of LTV (at timeline=10) is 214.76 (3.127). From this, we can build confidence intervals, etc.