Skip to content

Instantly share code, notes, and snippets.

@akelleh
Last active July 4, 2022 22:36
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save akelleh/2f7b59814f9a74ef0ac70be2645d192f to your computer and use it in GitHub Desktop.
Save akelleh/2f7b59814f9a74ef0ac70be2645d192f to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"In this example, we'll look at a situation where there's some variable we haven't measured that helps determine whether or not a user visits website X. The same variable also helps determine their response to the AB test: the \"treatment effect\". For example, if we're measuring whether a user re-visits the site after exposure to the test as the KPI, then the \"average treatment effect\" is just the increase in the average visit rate (averaging over all users).\n",
"We're interested in how much bias is introduced by self-selection, so we'll use really big sample sizes. We're not worried about random error for this example (but you should be when you're experimenting!). Let's continue.\n",
"\n",
"Use a big sample size so we don't have to worry about error bars...\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"N = 10000000"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$u$ will be the unmeasured variable that causes a person to enter into the test. $u$ will just be the propensity to visit.\n",
"\n",
"We'll use $u$ to get $vi$, which is a binary variable for whether or not the user visits at time $t_i$, and then calculate the treatment assignment $a$. The assignment variable will have three states, even though this is an AB test! This is one of the key differences. We're assigning to all site users, $U$, and not just the ones entering the test, $T$. The assignments are 2 (test group), 1 (control group), and 0 (not assigned). The user is assigned whenever they visit the site.\n",
"\n",
"\n",
"Notice $a=0$ whenever $v_i=0$: if a user doesn't visit the site ($v_i=0$) then they can't be assigned to test or control (so $a=0$). $a$ is 1 (control) or 2 (test) otherwise.\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"u = 1. / (1. + np.exp(-np.random.normal(size=N)))\n",
"vi = np.random.binomial(1, u)\n",
"a = vi*(1+np.random.binomial(1, 0.5, size=N))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then we define the \"potential outcomes\", $v(0)$, $v(1)$, and $v(2)$, and use them to calculate $v$ at time $t_{i+1}$, denoted vi1. The potential outcomes $v(a)$ are defined on each unit (you might write them as $v_i$ for the $i$th unit), and indicate the value that $v$ would take on if the unit has assignment $a$.\n",
"\n",
"The latent variable $u$ has average 0.5, so the expected differences in the potential outcomes (in order) are 0.05 and 0.05."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"v0 = np.random.binomial(1, u*0.1)\n",
"v1 = np.random.binomial(1, u*0.2)\n",
"v2 = np.random.binomial(1, u*0.3)\n",
"vi1 = (a==0)*v0 + (a==1)*v1 + (a==2)*v2\n",
"\n",
"X = pd.DataFrame({'vi': vi, 'ai': a, 'v_i+1': vi1, 'v0': v0, 'v1': v1, 'v2': v2})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can look at this data, where each row corresponds to one user in the general internet population, I. Many of these users will never even visit the website."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ai</th>\n",
" <th>v0</th>\n",
" <th>v1</th>\n",
" <th>v2</th>\n",
" <th>v_i+1</th>\n",
" <th>vi</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ai v0 v1 v2 v_i+1 vi\n",
"0 0 0 0 1 0 0\n",
"1 1 0 0 0 0 1\n",
"2 0 0 0 0 0 0\n",
"3 0 0 0 0 0 0\n",
"4 0 0 1 0 0 0"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see the `ai` column, which tells us whether the user has visited the site, and if they did, what their treatment assignment was (test or control). The v0, v1, and v2 columns show us the experiment outcomes for each user in the case that they're assign to the 0, 1, or 2 treatment state (i.e. not visiting the site, visiting and being assigned to control, and visiting and being assigned to the test group). Finally, the v_i+1 column shows the measured experiment outcome for each user. It's equal to the v0, v1, or v2 column, depending on whether the user is in the ai=0, 1, or 2 state. We could never measure v0, v1, or v2 in a real experiment: we can only measure v_i+1. If we could measure the other v's, we could easily calculate the effect of the experiment in the general population, I:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0.0496558"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(X['v2'] - X['v1']).mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So we see that if we take the expected difference of $V(2)$ and $V(1)$, we'd have the average treatment effect from the experiment! \n",
"\n",
"Again, you can never do that in a real data set: you can only ever measure $V(a)$ for a single value of $a$ on each unit. This is the \"fundamental problem of causal inference\". Since $a$ and $V(a)$ are all determined by $u$, we'll generally get biased estimates of the true causal effect. i.e. $E[V(2)] - E[V(1)] \\neq E[V|A=2] - E[V|A=1]$. This is because we're not implementing complete randomized control on $a$! Just the two values of $A$, $a=1$, and $a=2$. This dependence of treatment assignment on $u$ makes it so we don't have a good experiment, and we're effectively working with observational data for the treatment assignment. We have to apply the back-door criterion if we want unbiased effect estimates!\n",
"\n",
"So what $\\mathit{are}$ we measuring? It turns out to be a weird conditional treatment effect: $E[V_{i+1}| A_i=2, V_i =1] - E[V_{i+1}| A_i=1, V_i =1]$, since $V_{i+1} \\not\\perp V_{i}$. That is, it's the treatment effect on people who visited the site, and so were assigned into the experiment:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0.058286099226687843"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X.query(\"vi == 1\").query(\"ai == 2\").mean()['v_i+1'] - X.query(\"vi == 1\").query(\"ai == 1\").mean()['v_i+1']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To see that this is what we're measuring, we should compare it with the treatment effect we'd estimate from the randomized experiment, where we just take the difference in average outcomes for the test and control groups:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0.058286099226687843"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X.query(\"ai == 2\").mean()['v_i+1'] - X.query(\"ai == 1\").mean()['v_i+1']\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These are equal because $a_i = 1$ or $a_i=2$ implies $v_i = 1$, so $\\mathrm{X[X[`vi'] == 1][X[`ai']==2]}$ is the same data as $\\mathrm{X[X[`ai']==2]}$!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To get a correct randomized controlled experiment, you'd need complete control over whether people visit your website in the first place. You can't really do that, but we can at least simulate it and convince ourselves that bad treatment assignment really is the problem. We can use the same data-generating process, but now randomize assignment:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"N = 10000000\n",
"\n",
"u = 1. / (1. + np.exp(-np.random.normal(size=N)))\n",
"vi = np.random.binomial(1, u)\n",
"a = np.random.choice([0,1,2], size=N ) # random, instead of vi*(1+np.random.binomial(1, 0.1, size=N))\n",
"\n",
"c = np.random.binomial(1, 0.1*a)\n",
"\n",
"v0 = np.random.binomial(1, u*0.1)\n",
"v1 = np.random.binomial(1, u*0.2)\n",
"v2 = np.random.binomial(1, u*0.3)\n",
"vi1 = (a==0)*v0 + (a==1)*v1 + (a==2)*v2\n",
"X = pd.DataFrame({'vi': vi, 'ai': a, 'v_i+1': vi1, 'v0': v0, 'v1': v1, 'v2': v2})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The conditional effect is the same as before:\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0.058236256701293121"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X.query(\"vi == 1\").query(\"ai == 2\").mean()['v_i+1'] - X.query(\"vi == 1\").query(\"ai == 1\").mean()['v_i+1']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But now the naive estimate is an unbiased estimate for the ATE, 0.05.\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0.049544686770245788"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X.query(\"ai == 2\").mean()['v_i+1'] - X.query(\"ai == 1\").mean()['v_i+1']\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And just to show the ATE is still the same, we can calculate it directly,\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0.049833799999999998"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(X['v2'] - X['v1']).mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"whis is as it was before. In summary, the conditional treatment effect might not be what we're really interested in, but it's what we measure in an AB test. The people entering the experiment are different from the general internet population (here, they had different $u$ than the general population)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
@STBlackHawk
Copy link

Hey Adam really very good article, would you please explain to me why you chose 0.1,0.2 and 0.3 in

v0 = np.random.binomial(1, u*0.1)
v1 = np.random.binomial(1, u*0.2)
v2 = np.random.binomial(1, u*0.3)

?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment