Skip to content

Instantly share code, notes, and snippets.

View dsal1951's full-sized avatar

David "Trent" Salazar dsal1951

View GitHub Profile
| Group | Base Convert Rate | Number of Converters | Number of Leads | Convert Rate | Convert Index |
|-------|-------------------|----------------------|-----------------|--------------|---------------|
| 0 | 0.053 | 82 | 290 | 0.283 | 5.34x |
| 1 | 0.053 | 21 | 292 | 0.072 | 1.36x |
| 2 | 0.053 | 18 | 297 | 0.061 | 1.14x |
| 3 | 0.053 | 6 | 283 | 0.021 | 0.40x |
| 4 | 0.053 | 9 | 338 | 0.027 | 0.50x |
| 5 | 0.053 | 2 | 300 | 0.007 | 0.13x |
| 6 | 0.053 | 3 | 300 | 0.01 | 0.19x |
| 7 | 0.053 | 6 | 300 | 0.02 | 0.38x |
#Observations where y=1
total_converts = df['CONVERT_ACTUAL'].sum()
#Total Observations
total_leads = df.index.size
baseline_convert_prob = total_converts/float(total_leads)
#Calculate Index of Positive Outcome Rate in Each Decile to the Baseline Positive Rate
lift_df['LIFT_INDEX'] = (lift_df['lift']/baseline_convert_prob)*100
#Add Baseling Positive Rate to DataFrame
group_df = df.groupby('GROUP')
#Total Number of Leads in each Group
leads_count = group_df['CONVERT_ACTUAL'].count()
#Number of Leads who Converted in Each Group
convert_count = group_df['CONVERT_ACTUAL'].sum()
lift = convert_count/leads_count
#Actual Outcome (y=1 converted to customer, y=0 did not convert to customer
convert_actual = y
#Predicted Probability that lead converts into a customer
convert_prob = clf.predict_proba(x)
cols = ['CONVERT_ACTUAL','CONVERT_PROB']
data = [convert_actual,convert_prob[:,1]]
df = pd.DataFrame(dict(zip(cols,data)))
#Sort Ascending based on Predicted Probability y=1
df.sort_values('CONVERT_PROB',ascending=False)
#Create dataset
x,y = datasets.make_classification(n_samples=10000, n_features=20,
n_informative=15,n_redundant=5,
n_classes=2, weights=[0.95,0.05],
random_state=1000)
#Split into Training and Test sets
x_train, x_test, y_train, y_test = cross_validation.train_test_split(x,y,
test_size=0.3,random_state=1000)
#Train a Decision Tree to Predict y
clf = tree.DecisionTreeClassifier(min_samples_leaf=10)
@dsal1951
dsal1951 / Calculate Model Lift
Created July 4, 2016 05:53
Data needed for a Lift chart (aka Gains chart) for a predictive model created using Sklearn and Matplotlib
def calc_lift(x,y,clf,bins=10):
"""
Takes input arrays and trained SkLearn Classifier and returns a Pandas
DataFrame with the average lift generated by the model in each bin
Parameters
-------------------
x: Numpy array or Pandas Dataframe with shape = [n_samples, n_features]
y: A 1-d Numpy array or Pandas Series with shape = [n_samples]
// In app.js or main.js or whatever:
// var myApp = angular.module('askchisne', ['ngSanitize', 'ngAnimate', 'ui.bootstrap', 'ui.bootstrap.tpls']);
// This filter makes the assumption that the input will be in decimal form (i.e. 17% is 0.17).
myApp.filter('percentage', ['$filter', function ($filter) {
return function (input, decimals) {
return $filter('number')(input * 100, decimals) + '%';
};
}]);