Skip to content

Instantly share code, notes, and snippets.

@markerdmann
Created August 12, 2014 19:40
Show Gist options
  • Save markerdmann/e5e28014ad80e9c89242 to your computer and use it in GitHub Desktop.
Save markerdmann/e5e28014ad80e9c89242 to your computer and use it in GitHub Desktop.
{
"metadata": {
"name": "",
"signature": "sha256:b7eed14979a18e9675ef15ea95b74002b91b118a88581e3ceaa5e109b831a13a"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"The Case of the Mysterious Acme Company A/B Test"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%pylab inline\n",
"import pandas as pd\n",
"from scipy import stats\n",
"import matplotlib.pyplot as plt\n",
"\n",
"experiment = pd.read_csv('experiment.csv')\n",
"treatment = experiment[experiment.ab == 'treatment']\n",
"control = experiment[experiment.ab == 'control']"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Populating the interactive namespace from numpy and matplotlib\n"
]
}
],
"prompt_number": 18
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Now we've loaded our treatment and control groups... let's use Pearson's chi squared test to see if the results are statistically significant. We'll use p < 0.05 as our threshold of significance."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"control_counts = pd.value_counts(control.converted)\n",
"treatment_counts = pd.value_counts(treatment.converted)\n",
"\n",
"chi2, p, dof, expected = stats.chi2_contingency([control_counts, treatment_counts])\n",
"print 'p value: ', p\n",
"print 'Result is statistically significant: ', p < 0.05"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"p value: 0.830111319891\n",
"Result is statistically significant: False\n"
]
}
],
"prompt_number": 25
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Unfortunately it looks like the third-party software stopped because we had collected a large number of samples without showing a statistically significant improvement in the conversion rate. This implies that our null hypothesis (that the new landing page does not improve conversions) cannot be rejected by the data we've collected. Here are the conversion rates:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"control_counts = pd.value_counts(control.converted, normalize=True)\n",
"treatment_counts = pd.value_counts(treatment.converted, normalize=True)\n",
"\n",
"print 'Control conversion rate: {}%'.format(round(control_counts[1] * 100, 2))\n",
"print 'Treatment conversion rate: {}%'.format(round(treatment_counts[1] * 100, 2))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Control conversion rate: 9.96%\n",
"Treatment conversion rate: 9.99%\n"
]
}
],
"prompt_number": 3
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"There is a slight improvement, though... could that be shown to be significant if we collect more samples?"
]
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Let's investigate further"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print pd.value_counts(treatment.landing_page)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"new_page 95574\n",
"old_page 4759\n",
"dtype: int64\n"
]
}
],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"http://img.photobucket.com/albums/v246/Neo-Larry/Twilight%20of%20the%20Thunder%20Mouse/Screen4-WildPikachuAppears.jpg\">"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"It looks like there's a bug in the A/B testing code. For some reason 4,759 people in the treatment group saw the old page. Does this affect the significance of our result?"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# let's remove the people who saw the old page from the treatment group\n",
"treatment = experiment[(experiment.ab == 'treatment') & (experiment.landing_page == 'new_page')]\n",
"buggy_treatment = experiment[(experiment.ab == 'treatment') & (experiment.landing_page == 'old_page')]\n",
"\n",
"control_counts = pd.value_counts(control.converted)\n",
"treatment_counts = pd.value_counts(treatment.converted)\n",
"\n",
"chi2, p, dof, expected = stats.chi2_contingency([control_counts, treatment_counts])\n",
"print 'p: ', p\n",
"print 'Result is statistically significant: ', p < 0.05\n",
"\n",
"control_counts = pd.value_counts(control.converted, normalize=True)\n",
"treatment_counts = pd.value_counts(treatment.converted, normalize=True)\n",
"buggy_treatment_counts = pd.value_counts(buggy_treatment.converted, normalize=True)\n",
"\n",
"print 'Control conversion rate: {}%'.format(round(control_counts[1] * 100, 2))\n",
"print 'Treatment conversion rate: {}%'.format(round(treatment_counts[1] * 100, 2))\n",
"print 'Buggy treatment conversion rate: {}%'.format(round(buggy_treatment_counts[1] * 100, 2))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"p: 0.983300660142\n",
"Result is statistically significant: False\n",
"Control conversion rate: 9.96%\n",
"Treatment conversion rate: 9.97%\n",
"Buggy treatment conversion rate: 10.53%\n"
]
}
],
"prompt_number": 28
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"In this case it's even more clear that the old and new landing pages are converting at the same rate. For some reason, however, people in the treatment group who saw the old landing page converted at a much higher rate than the other groups. That merits further investigation!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Answers for the project manager:\n",
"--------------------------------\n",
"\n",
"It looks like our third-party software is correct. Given the number of samples, the slight increase in the conversion rate is almost certainly just due to chance. If a 0.01% improvement in conversions would be a big win, however, I would recommend running another test with a much larger number of samples to make certain that the new landing page isn't converting at a higher rate.\n",
"\n",
"More importantly, we should probably investigate the bug in the A/B testing code and find out why the people who encountered that bug converted at a much higher rate!"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment