Skip to content

Instantly share code, notes, and snippets.

:shipit:
Learning bio brb

Cameron Davidson-Pilon CamDavidsonPilon

:shipit:
Learning bio brb
Block or report user

Report or block CamDavidsonPilon

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
View tanks!.py
MAX = 114
MIN = 22
def falling_factorial(n, k):
"""
computes (n) * (n-1) * ... * (n-(k-1))
"""
running_product = n
for p in range(n-1, n-k, -1):
running_product *= p
View splines.py
from zepid.base import create_spline_transform
from lifelines import CoxPHFitter
from lifelines.datasets import load_rossi
rossi = load_rossi()
rossi_with_splines = rossi.copy()
spline_transform, bp = create_spline_transform(rossi_with_splines['age'], term=3, restricted=False)
rossi_with_splines[['age0', 'age1', 'age2']] = pd.DataFrame(spline_transform(rossi_with_splines['age']))
rossi_with_splines = rossi_with_splines.drop('age', axis=1)
View ci.py
from lifelines import WeibullFitter
lambda_, rho_ = 2, 0.5
N = 10_000
T_actual = lambda_ * np.random.exponential(1, size=N)**(1/rho_)
T_censor = lambda_ * np.random.exponential(1, size=N)**(1/rho_)
T = np.minimum(T_actual, T_censor)
E = T_actual < T_censor
View delta_method.md

The Delta Method and Autograd

One of the reasons I'm really excited about autograd is because it enables me to be able to transform my abstract parameters into business-logic. Let me explain with an example. Suppose I am modeling customer churn, and I have fitted a Weibull survival model using maximum likelihood estimation. I have two parameter estimates: lambda-hat and rho-hat. I also have their covariance matrix, which tells me how much uncertainty is present in the estimates (in lifelines, this is under the variance_matrix_ property. From this, I can plot the survival curve, which is fine, but what I really want is a measure of lifetime value. Customers give us $10 each time period for the first 3 time periods, and then $30 a month afterwards. For a single user, the average LTV (up to timeline) calculation might look like:

# create a Weibull model with fake data
wf = WeibullFitter().fit(np.arange(1, 100))

from autograd import numpy as np
View piecewise_exp.py
import numpy as np
def piecewise_exponential_survival_data(n, breakpoints, lambdas):
"""
Examples
--------
>>> T = piecewise_exponential_survival_data(100000, [1, 3], [0.2, 3, 1.])
>>> NelsonAalenFitter().fit(T).plot()
View fitting.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
View gist:1f59682876c47266df8467cf7a0423d4
for _ in range(10):
v = np.random.uniform(-10, 10, size=2)
print(check_grad(_negative_log_likelihood, gradient_function, v, log(T), E), v)
"""
0.4712740782685882 [-6.56058891 5.14902935]
3237.6524450846837 [ 9.08538372 -1.41142257]
View perf_results.csv
N frac batch single ratio
432 0.01 0.17586684226989746 0.24918293952941895 0.7057740092560965
432 0.09909090909090909 0.139909029006958 0.18905401229858398 0.7400479223154045
432 0.1881818181818182 0.18714380264282227 0.19823789596557617 0.9440364655369406
432 0.2772727272727273 0.15966224670410156 0.1492321491241455 1.0698917601949116
432 0.3663636363636364 0.1783008575439453 0.2960469722747803 0.6022721873286135
432 0.4554545454545455 0.2235269546508789 0.2482771873474121 0.9003120948768426
432 0.5445454545454546 0.2755610942840576 0.27878808975219727 0.9884249163190293
432 0.6336363636363637 0.2598390579223633 0.19139719009399414 1.3575907660648399
432 0.7227272727272728 0.28580784797668457 0.19529199600219727 1.4634898194878856
View cox_performance_batch_single.py
from time import time
import pandas as pd
import numpy as np
from lifelines.datasets import load_rossi
from lifelines import CoxPHFitter
# This compares the batch algorithm (in CTV) vs the single iteration algorithm (original in CPH)
# N vs (% ties == unique(T) / N)
View lex_loader.py
import csv
from collections import defaultdict
d = defaultdict(dict)
with open('lexicon_3.csv') as dict_:
csv_reader=csv.reader(dict_, delimiter=',')
for row in csv_reader:
d[row[0]][row[1]] = int(row[2])
You can’t perform that action at this time.