Skip to content

Instantly share code, notes, and snippets.

@DeepakRavi
Created October 16, 2016 23:44
Show Gist options
  • Save DeepakRavi/30c530881d2fa3378c94545d70ae22ab to your computer and use it in GitHub Desktop.
Save DeepakRavi/30c530881d2fa3378c94545d70ae22ab to your computer and use it in GitHub Desktop.
Spanish A-B Test
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Spanish Translation A/B Test"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Company XYZ is a worldwide e-ecommerce site with localized versions of the site. It was observed that Spain based users have\n",
"a much higher conversion rate than any other Spanish speaking country. One of the possible reasons could be poor translation.\n",
"However, it was noticed that all Spanish speaking countries had the same translation as that of the Spain based site written by a Spaniard. Hence, it was agreed upon to conduct an A/B test, where two versions of the site would be released. One of these versions would be written by a local translator from the native country and the other would be the original site written by the Spaniard.\n",
"\n",
"After running the test for five days, the results turned out to be negative. This implies, the local translation did poorly as compared to the original translation. \n",
"\n",
"The following analysis is to investigate, if the test was actually negative and if so, the possible reasons for it."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"from pandas import DataFrame, Series \n",
"from matplotlib import pyplot as plt \n",
"import matplotlib.ticker as ticker\n",
"import scipy as sc\n",
"from scipy import stats \n",
"import sklearn\n",
"from sklearn.tree import DecisionTreeClassifier, export_graphviz\n",
"from sklearn.metrics import classification_report\n",
"from sklearn.cross_validation import train_test_split\n",
"from sklearn.pipeline import Pipeline\n",
"from sklearn.grid_search import GridSearchCV\n",
"from sklearn.preprocessing import LabelEncoder\n",
"from sklearn.preprocessing import Imputer\n",
"from StringIO import StringIO\n",
"from inspect import getmembers"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Reading in the data"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"test_table = pd.read_csv('test_table.csv')\n",
"user_table = pd.read_csv('user_table.csv')"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>user_id</th>\n",
" <th>date</th>\n",
" <th>source</th>\n",
" <th>device</th>\n",
" <th>browser_language</th>\n",
" <th>ads_channel</th>\n",
" <th>browser</th>\n",
" <th>conversion</th>\n",
" <th>test</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>315281</td>\n",
" <td>2015-12-03</td>\n",
" <td>Direct</td>\n",
" <td>Web</td>\n",
" <td>ES</td>\n",
" <td>NaN</td>\n",
" <td>IE</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>497851</td>\n",
" <td>2015-12-04</td>\n",
" <td>Ads</td>\n",
" <td>Web</td>\n",
" <td>ES</td>\n",
" <td>Google</td>\n",
" <td>IE</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>848402</td>\n",
" <td>2015-12-04</td>\n",
" <td>Ads</td>\n",
" <td>Web</td>\n",
" <td>ES</td>\n",
" <td>Facebook</td>\n",
" <td>Chrome</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>290051</td>\n",
" <td>2015-12-03</td>\n",
" <td>Ads</td>\n",
" <td>Mobile</td>\n",
" <td>Other</td>\n",
" <td>Facebook</td>\n",
" <td>Android_App</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>548435</td>\n",
" <td>2015-11-30</td>\n",
" <td>Ads</td>\n",
" <td>Web</td>\n",
" <td>ES</td>\n",
" <td>Google</td>\n",
" <td>FireFox</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" user_id date source device browser_language ads_channel \\\n",
"0 315281 2015-12-03 Direct Web ES NaN \n",
"1 497851 2015-12-04 Ads Web ES Google \n",
"2 848402 2015-12-04 Ads Web ES Facebook \n",
"3 290051 2015-12-03 Ads Mobile Other Facebook \n",
"4 548435 2015-11-30 Ads Web ES Google \n",
"\n",
" browser conversion test \n",
"0 IE 1 0 \n",
"1 IE 0 1 \n",
"2 Chrome 0 0 \n",
"3 Android_App 0 1 \n",
"4 FireFox 0 1 "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test_table.head(5)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>user_id</th>\n",
" <th>sex</th>\n",
" <th>age</th>\n",
" <th>country</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>765821</td>\n",
" <td>M</td>\n",
" <td>20</td>\n",
" <td>Mexico</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>343561</td>\n",
" <td>F</td>\n",
" <td>27</td>\n",
" <td>Nicaragua</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>118744</td>\n",
" <td>M</td>\n",
" <td>23</td>\n",
" <td>Colombia</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>987753</td>\n",
" <td>F</td>\n",
" <td>27</td>\n",
" <td>Venezuela</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>554597</td>\n",
" <td>F</td>\n",
" <td>20</td>\n",
" <td>Spain</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" user_id sex age country\n",
"0 765821 M 20 Mexico\n",
"1 343561 F 27 Nicaragua\n",
"2 118744 M 23 Colombia\n",
"3 987753 F 27 Venezuela\n",
"4 554597 F 20 Spain"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"user_table.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Merging the two datasets by checking for unique id's"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(test_table) == len(test_table['user_id'].unique())"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(user_table) == len(user_table['user_id'].unique())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Comparing the lengths of both the tables"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(453321, 9)"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test_table.shape"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(452867, 4)"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"user_table.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This implies the user_table is missing a few id's.\n",
"Therefore when we perform a join operation, we shouldn't loose the user id's in the test table that are not in the user table.\n",
"\n",
"We could either do a left join or an outer join. Going with the outer join in the following case"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>user_id</th>\n",
" <th>date</th>\n",
" <th>source</th>\n",
" <th>device</th>\n",
" <th>browser_language</th>\n",
" <th>ads_channel</th>\n",
" <th>browser</th>\n",
" <th>conversion</th>\n",
" <th>test</th>\n",
" <th>sex</th>\n",
" <th>age</th>\n",
" <th>country</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>315281</td>\n",
" <td>2015-12-03</td>\n",
" <td>Direct</td>\n",
" <td>Web</td>\n",
" <td>ES</td>\n",
" <td>NaN</td>\n",
" <td>IE</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>M</td>\n",
" <td>32.0</td>\n",
" <td>Spain</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>497851</td>\n",
" <td>2015-12-04</td>\n",
" <td>Ads</td>\n",
" <td>Web</td>\n",
" <td>ES</td>\n",
" <td>Google</td>\n",
" <td>IE</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>M</td>\n",
" <td>21.0</td>\n",
" <td>Mexico</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>848402</td>\n",
" <td>2015-12-04</td>\n",
" <td>Ads</td>\n",
" <td>Web</td>\n",
" <td>ES</td>\n",
" <td>Facebook</td>\n",
" <td>Chrome</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>M</td>\n",
" <td>34.0</td>\n",
" <td>Spain</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>290051</td>\n",
" <td>2015-12-03</td>\n",
" <td>Ads</td>\n",
" <td>Mobile</td>\n",
" <td>Other</td>\n",
" <td>Facebook</td>\n",
" <td>Android_App</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>F</td>\n",
" <td>22.0</td>\n",
" <td>Mexico</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>548435</td>\n",
" <td>2015-11-30</td>\n",
" <td>Ads</td>\n",
" <td>Web</td>\n",
" <td>ES</td>\n",
" <td>Google</td>\n",
" <td>FireFox</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>M</td>\n",
" <td>19.0</td>\n",
" <td>Mexico</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" user_id date source device browser_language ads_channel \\\n",
"0 315281 2015-12-03 Direct Web ES NaN \n",
"1 497851 2015-12-04 Ads Web ES Google \n",
"2 848402 2015-12-04 Ads Web ES Facebook \n",
"3 290051 2015-12-03 Ads Mobile Other Facebook \n",
"4 548435 2015-11-30 Ads Web ES Google \n",
"\n",
" browser conversion test sex age country \n",
"0 IE 1 0 M 32.0 Spain \n",
"1 IE 0 1 M 21.0 Mexico \n",
"2 Chrome 0 0 M 34.0 Spain \n",
"3 Android_App 0 1 F 22.0 Mexico \n",
"4 FireFox 0 1 M 19.0 Mexico "
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data = pd.merge(test_table, user_table, on = 'user_id', how = 'outer')\n",
"data.head(5)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(453321, 12)"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Summarizing the data. Getting the basic descriptive statistics."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>user_id</th>\n",
" <th>conversion</th>\n",
" <th>test</th>\n",
" <th>age</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>453321.000000</td>\n",
" <td>453321.000000</td>\n",
" <td>453321.000000</td>\n",
" <td>452867.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>499937.514728</td>\n",
" <td>0.049579</td>\n",
" <td>0.476446</td>\n",
" <td>27.130740</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>288665.193436</td>\n",
" <td>0.217073</td>\n",
" <td>0.499445</td>\n",
" <td>6.776678</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>18.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>249816.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>22.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>500019.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>26.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>749522.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>31.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>1000000.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>70.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" user_id conversion test age\n",
"count 453321.000000 453321.000000 453321.000000 452867.000000\n",
"mean 499937.514728 0.049579 0.476446 27.130740\n",
"std 288665.193436 0.217073 0.499445 6.776678\n",
"min 1.000000 0.000000 0.000000 18.000000\n",
"25% 249816.000000 0.000000 0.000000 22.000000\n",
"50% 500019.000000 0.000000 0.000000 26.000000\n",
"75% 749522.000000 0.000000 1.000000 31.000000\n",
"max 1000000.000000 1.000000 1.000000 70.000000"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"#### Some insights from the data so far. \n",
"\n",
"1. Average conversion rate is roughly 4%. This is pretty normal. Considered to be industry standard.\n",
"2. 47% of the population belong in the test group and 53% in the control group\n",
"3. This is a fairly young user base with a mean age of 27 years. Also 75% of the user base is within 30 years of age. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Understanding if Spain actually converts best, as compared to other Spanish speaking nations"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>conversion</th>\n",
" </tr>\n",
" <tr>\n",
" <th>country</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Argentina</th>\n",
" <td>0.013994</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Bolivia</th>\n",
" <td>0.048634</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Chile</th>\n",
" <td>0.049704</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Colombia</th>\n",
" <td>0.051332</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Costa Rica</th>\n",
" <td>0.053494</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Ecuador</th>\n",
" <td>0.049072</td>\n",
" </tr>\n",
" <tr>\n",
" <th>El Salvador</th>\n",
" <td>0.050765</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Guatemala</th>\n",
" <td>0.049653</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Honduras</th>\n",
" <td>0.049253</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Mexico</th>\n",
" <td>0.050341</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Nicaragua</th>\n",
" <td>0.053399</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Panama</th>\n",
" <td>0.048089</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Paraguay</th>\n",
" <td>0.048863</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Peru</th>\n",
" <td>0.050258</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Spain</th>\n",
" <td>0.079719</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Uruguay</th>\n",
" <td>0.012821</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Venezuela</th>\n",
" <td>0.049666</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" conversion\n",
"country \n",
"Argentina 0.013994\n",
"Bolivia 0.048634\n",
"Chile 0.049704\n",
"Colombia 0.051332\n",
"Costa Rica 0.053494\n",
"Ecuador 0.049072\n",
"El Salvador 0.050765\n",
"Guatemala 0.049653\n",
"Honduras 0.049253\n",
"Mexico 0.050341\n",
"Nicaragua 0.053399\n",
"Panama 0.048089\n",
"Paraguay 0.048863\n",
"Peru 0.050258\n",
"Spain 0.079719\n",
"Uruguay 0.012821\n",
"Venezuela 0.049666"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.groupby('country')[['conversion']].mean()"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"From the above results, it is quite evident that spain has a conversion rate of nearly 7.9% whereas other nations have conversion rates in the range of 4-5%. Therefore, Spain indeed has the best conversion rate. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Below is the comparison of the performance in test and control groups. It can be seen that the control group did much better than the test group."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>conversion</th>\n",
" </tr>\n",
" <tr>\n",
" <th>test</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.055179</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.043425</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" conversion\n",
"test \n",
"0 0.055179\n",
"1 0.043425"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.groupby('test')[['conversion']].mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Below is the comparison of the test and control group without Spain in the picture"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>user_id</th>\n",
" <th>date</th>\n",
" <th>source</th>\n",
" <th>device</th>\n",
" <th>browser_language</th>\n",
" <th>ads_channel</th>\n",
" <th>browser</th>\n",
" <th>conversion</th>\n",
" <th>test</th>\n",
" <th>sex</th>\n",
" <th>age</th>\n",
" <th>country</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>497851</td>\n",
" <td>2015-12-04</td>\n",
" <td>Ads</td>\n",
" <td>Web</td>\n",
" <td>ES</td>\n",
" <td>Google</td>\n",
" <td>IE</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>M</td>\n",
" <td>21.0</td>\n",
" <td>Mexico</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>290051</td>\n",
" <td>2015-12-03</td>\n",
" <td>Ads</td>\n",
" <td>Mobile</td>\n",
" <td>Other</td>\n",
" <td>Facebook</td>\n",
" <td>Android_App</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>F</td>\n",
" <td>22.0</td>\n",
" <td>Mexico</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>548435</td>\n",
" <td>2015-11-30</td>\n",
" <td>Ads</td>\n",
" <td>Web</td>\n",
" <td>ES</td>\n",
" <td>Google</td>\n",
" <td>FireFox</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>M</td>\n",
" <td>19.0</td>\n",
" <td>Mexico</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>540675</td>\n",
" <td>2015-12-03</td>\n",
" <td>Direct</td>\n",
" <td>Mobile</td>\n",
" <td>ES</td>\n",
" <td>NaN</td>\n",
" <td>Android_App</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>F</td>\n",
" <td>22.0</td>\n",
" <td>Venezuela</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>863394</td>\n",
" <td>2015-12-04</td>\n",
" <td>SEO</td>\n",
" <td>Mobile</td>\n",
" <td>Other</td>\n",
" <td>NaN</td>\n",
" <td>Android_App</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>M</td>\n",
" <td>35.0</td>\n",
" <td>Mexico</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" user_id date source device browser_language ads_channel \\\n",
"1 497851 2015-12-04 Ads Web ES Google \n",
"3 290051 2015-12-03 Ads Mobile Other Facebook \n",
"4 548435 2015-11-30 Ads Web ES Google \n",
"5 540675 2015-12-03 Direct Mobile ES NaN \n",
"6 863394 2015-12-04 SEO Mobile Other NaN \n",
"\n",
" browser conversion test sex age country \n",
"1 IE 0 1 M 21.0 Mexico \n",
"3 Android_App 0 1 F 22.0 Mexico \n",
"4 FireFox 0 1 M 19.0 Mexico \n",
"5 Android_App 0 1 F 22.0 Venezuela \n",
"6 Android_App 0 0 M 35.0 Mexico "
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_new =data.copy()\n",
"data_new = data_new[data_new['country']!= 'Spain']\n",
"data_new.head(5)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>conversion</th>\n",
" </tr>\n",
" <tr>\n",
" <th>test</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.048330</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.043425</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" conversion\n",
"test \n",
"0 0.048330\n",
"1 0.043425"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_new.groupby('test')[['conversion']].mean()"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"#### Some quick insights\n",
"\n",
"1. From the results it is quite evident that both the control group and the test group are faring similarly with the control group performing slightly better.\n",
"\n",
"2. There happened to be more spaniards in the control group as compared to the test group. As their conversion rate was higher, the control group had a higher mean. However, removing them as caused both the control and test group to perform similarly\n",
"\n",
"3. Secondly, for other countries other than spain, it doesn't seem to matter whether the translation is by a local or by a spaniard. The test results are more or less the same"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Doing a welch two sample t test on both the groups to check if there is a statistical difference in the mean of the two groups"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Ttest_indResult(statistic=7.3939374121344805, pvalue=1.4282994754055316e-13)"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"zero = data_new[data_new['test'] == 0]\n",
"one = data_new[data_new['test'] == 1]\n",
"\n",
"sc.stats.ttest_ind(zero['conversion'], one['conversion'], equal_var = False, axis = 0)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"#### Insights\n",
"\n",
"As the p value is less than alpha = .05, we can reject the null hypothesis.\n",
"This implies, we can tell with statistical significance that the two groups have different means.\n",
"Mean of test = 4.3% and mean of control = 4.8%.\n",
"This would be a significant difference in means, if it were true\n",
"\n",
"Likely reasons for this include\n",
"\n",
"1. Control group and test group are not really random\n",
"2. Not enough data for the sample to truly represent the population"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Converting data to standard format. This includes formatting datetime column"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"data_new['date'] = pd.to_datetime(data_new['date'], infer_datetime_format = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Plotting to check for any anomalies or biases"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>conversion</th>\n",
" </tr>\n",
" <tr>\n",
" <th>date</th>\n",
" <th>test</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">2015-11-30</th>\n",
" <th>0</th>\n",
" <td>0.051378</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.043886</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">2015-12-01</th>\n",
" <th>0</th>\n",
" <td>0.046287</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.041387</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">2015-12-02</th>\n",
" <th>0</th>\n",
" <td>0.048550</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.044234</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">2015-12-03</th>\n",
" <th>0</th>\n",
" <td>0.049284</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.043884</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">2015-12-04</th>\n",
" <th>0</th>\n",
" <td>0.047043</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.043491</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" conversion\n",
"date test \n",
"2015-11-30 0 0.051378\n",
" 1 0.043886\n",
"2015-12-01 0 0.046287\n",
" 1 0.041387\n",
"2015-12-02 0 0.048550\n",
" 1 0.044234\n",
"2015-12-03 0 0.049284\n",
" 1 0.043884\n",
"2015-12-04 0 0.047043\n",
" 1 0.043491"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"time_series = data_new.groupby(['date','test'])[['conversion']].mean()\n",
"time_series"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"date\n",
"2015-11-30 0.854179\n",
"2015-12-01 0.894141\n",
"2015-12-02 0.911090\n",
"2015-12-03 0.890439\n",
"2015-12-04 0.924486\n",
"dtype: float64"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"time_series = time_series.unstack()['conversion'][1]/time_series.unstack()['conversion'][0]\n",
"time_series"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZMAAAEZCAYAAABSN8jfAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XecVNX5x/HPA4qKAjZiI2IhIthNQGLLKhaM/kRjYkCj\nYkESRbEDirLYEPOzBn9GFAWxYAHiCoiAuFYMKEVEWoIiCFggURRDfX5/nLs6WZdllpk7d2b2+369\n5sXcMveevbr7zDnPKebuiIiIZKJO0gUQEZHCp2AiIiIZUzAREZGMKZiIiEjGFExERCRjCiYiIpIx\nBRORSszsSDOblcB9PzKzY3N9X5FsUDCRWmtDf7zd/U13bxHTPdeb2Qoz+9rMFprZXWZmNbzGr8xs\nYRzlE9lUCiYiueXAge7eEGgLnAV0ruE1LLqOSN5QMBGppPI3/6gGc7WZTTezf5nZ02ZWL+X4KWY2\nNTr2ppkdUN3loxfuPhd4A9i/ijLUM7N7zexTM1tkZveY2eZmVh8YDeyaUsPZOVs/u8imUjARqVrl\nb/6/A04A9gQOAjoBmNkhwEBC7WJ74CGgzMw239gNzKwlcBQwpYrDvYDWwIHR/VoDvdx9JXASsNjd\nG7h7Q3dfWuOfTiTLFExE0nOfu3/m7v8GXgQOjvZ3Bv7q7u96MARYBbSp5lpTzGwZ8AIwwN0HVXHO\nWUAfd1/m7suAPsA52fphRLJts6QLIFIgPkt5vxLYJXrfFDjXzC6Ltg3YHNi1mmsd4u4fbeR+uwKf\npGwv2Mg1RRKlYCKSmYXAbe7etwafSaf31mJCoKrootw02gdKvkseUjOX1Hb1zGyLlFfdGn7+YeCP\nZtYawMy2NrNfm9nWGZbraaCXme1oZjsCNwJDomOfATuYWcMM7yGSNQomUtuNIjRbfRf927uKczZY\nE3D39wh5k/5mthyYC5xXzf2qq1WkHrsVeBd4H5gevb8tuuccQrCZb2bL1ZtL8oHFvTiWmbUD7iUE\nroHu3q/S8W2BR4G9Cb/QF7j7h2a2BfA6UI/QHPe8u/eJtbAiIrJJYg0mZlaH8E2tLaG9dzLQwd1n\np5xzJ7DC3W8xs+bAA+5+XHSsvruvjJoe3gIud/dJsRVYREQ2SdzNXK2Bee6+wN3XAEOB9pXOaQlM\ngO+r73uYWeNoe2V0zhaE2okSjyIieSjuYLIbobdLhUXRvlTTgd8AREnM3YEm0XYdM5sKLAXGufvk\nmMsrIiKbIB8S8HcA25nZFOBSYCqwDsDd17v7IYTgclg0YlhERPJM3ONMPiXUNCo0ifZ9z91XABdU\nbJvZR8D8Sud8bWavAu2ADyvfxMzU/CUiUkPuXqMZq6sTd81kMtDMzJpGE+N1AMpSTzCzRhXzGJlZ\nZ+A1d/8m6l/fKNq/FXA8MJsNcHe9svDq3bt34mUoppeep55nvr6yLdaaibuvM7OuwFh+6Bo8y8y6\nhMM+AGgBDDaz9cBM4MLo47tE++tEn33G3UfHWV4REdk0sU+n4u5jgOaV9j2U8v6dysej/TOAQ+Mu\nn4iIZC4fEvCSR0pKSpIuQlHR88wuPc/8FfsI+FwwMy+Gn0NEJFfMDC+gBLyIiNQCCiYiIpIxBRMR\nEcmYgomIiGRMwURERDKmYCIiIhlTMBERkYwpmIiISMYUTEREapmvv87+NRVMRERqkdWr4Ywzsn9d\nBRMRkVrCHTp3hq22yv61Y581WERE8sNNN8GcOTBhAmy9dXavrWAiIlILDBgATz8NEydC/frZv75m\nDRYRKXKjRsFFF8Ebb0CzZmFftmcNVs1ERKSIvfsudOoEL774QyCJgxLwIiJFav58OPVUeOQRaNMm\n3nspmIiIFKFly+Ckk+CGG6B9+/jvF3swMbN2ZjbbzOaaWfcqjm9rZsPNbLqZvWNmLaP9TcxsgpnN\nNLMZZnZ53GUVESkG330XaiSnnQaXXpqbe8aagDezOsBcoC2wGJgMdHD32Snn3AmscPdbzKw58IC7\nH2dmOwM7u/s0M9sGeA9on/rZlGsoAS8iAqxbB2eeCVtsAU88AXU2UGUotGV7WwPz3H2Bu68BhgKV\nK1wtgQkA7j4H2MPMGrv7UnefFu3/BpgF7BZzeUVECtrVV8Py5fDYYxsOJHGI+1a7AQtTthfx44Aw\nHfgNgJm1BnYHmqSeYGZ7AAcDf4+pnCIiBe+ee2D8eBgxItRMcikfEvB3ANuZ2RTgUmAqsK7iYNTE\n9TzQLaqhiIhIJc89B3ffDaNHw7bb5v7+cY8z+ZRQ06jQJNr3PXdfAVxQsW1mHwHzo/ebEQLJEHd/\noboblZaWfv++pKSEkpKSzEouIlIg3ngjJNrHjoXdd6/6nPLycsrLy2MrQ9wJ+LrAHEICfgkwCejo\n7rNSzmkErHT3NWbWGTjC3TtFxx4HvnT3qzZyHyXgRaRWmjULSkpCsv3449P/XEGNgHf3dWbWFRhL\naFIb6O6zzKxLOOwDgBbAYDNbD8wELgQwsyOAs4EZZjYVcOB6dx8TZ5lFRArF0qXw61/DnXfWLJDE\nQXNziYgUoBUrQo3k9NOhV6+afz7bNRMFExGRArNmTRiU2KRJmA3YNiEkFNo4ExERySJ3+NOfwhiS\nBx/ctEASB80aLCJSQG65BaZNg/Jy2CyP/oLnUVFERKQ6gwaF19tvwzbbJF2a/6aciYhIARg7Fs45\nB157DfbdN/PrFVTXYBERydy0afCHP8Dw4dkJJHFQAl5EJI998gmcckpIth95ZNKl2TAFExGRPPWv\nf4UFrq65Bs44I+nSVE85ExGRPLRqFZxwAvz852ECx2zToMUqKJiISDFZvx7OOissdPXMM/GsS6IE\nvIhIkevRAz79FMaNy+0CV5lQMBERySP9+0NZWRhLsuWWSZcmfQomIiJ5YsQI6NsX3nwTtt8+6dLU\njIKJiEgemDgRLr4YxoyBPfdMujQ1VyCtcSIixWvu3DCV/ODBofdWIVIwERFJ0Oefh7Ekt94aFroq\nVAomIiIJ+fbbMLr97LPhoouSLk1mNM5ERCQBa9eGpq0ddoDHHsv9uiRaHEtEpMC5w2WXhVHuDz+c\nPwtcZSL2YGJm7cxstpnNNbPuVRzf1syGm9l0M3vHzFqmHBtoZp+Z2ftxl1Mk21atCsurilTWr1/o\nvfX887D55kmXJjtiDSZmVgfoD5wI7Ad0NLPKEyhfD0x194OA84D7U449Fn1WpGCsWwcDB4bunfvt\nB++8k3SJJJ88+WSYAXj0aGjYMOnSZE/cNZPWwDx3X+Dua4ChQPtK57QEJgC4+xxgDzNrHG2/Cfwr\n5jKKZM2ECaFr56BB8MILcPvtcNppcOONsHp10qWTpE2YAFddFQLJrrsmXZrsijuY7AYsTNleFO1L\nNR34DYCZtQZ2B5rEXC6RrJo7F9q3Dz1yevWC11+HVq3gt78NCxtNmwZt2sDMmUmXVJIyYwZ06BAm\nbtxvv6RLk335MAL+DuA+M5sCzACmAutqepHS0tLv35eUlFBSUpKl4ols2PLlcPPNoeniuuvg2Wdh\niy3++5yddw5zLQ0cCCUl0LMnXHFF4UzgJ5lbtAhOPhnuvz/8P5CE8vJyysvLY7t+rF2DzawNUOru\n7aLtHoC7e79qPvMRcIC7fxNtNwVedPcDq/mMugZLTq1ZA//3f3DbbaH20acPNG688c/Nnw/nnQd1\n64amsD32iLukkrSvvoKjjgrrt197bdKl+UGhdQ2eDDQzs6ZmVg/oAJSlnmBmjcxs8+h9Z+C1ikBS\ncUr0Ekmce6hl7L8/vPQSvPpqCCrpBBKAvfaC8vLwLbVVK3j00XBNKU6rV4cVEo8+OqyWWMxiH7Ro\nZu2A+wiBa6C732FmXQg1lAFR7WUwsB6YCVzo7l9Fn30KKAF2AD4Derv7Y1XcQzUTid306SF5unQp\n3HUXtGuX2fVmzAjfVps2hQEDYKedslNOyQ/ucO65sGIFDBsWaqP5RCstVkHBROK0dGlIqr/4IvTu\nHWZ23SxL2cbVq0MT2cCBobvo6adn57qSvBtuCL23XnkF6tdPujQ/VmjNXCIF67vvQtfe/feH7baD\nOXPgkkuyF0gA6tULeZcRI0IC/7zzQhu7FLaHHgqdMcrK8jOQxEHBRKQSd3j6adh3X5gyBf7+d/jz\nn2HbbeO75y9/GboPb701HHhg+DYrhWnkSCgtDeuSpJtLKwZq5hJJMXFiyIusWQN33x0Sp7n28stw\n4YWhl1jfvrDVVrkvg2yayZND54qRI6F166RLUz01c4nEYMEC6NgRfvc7+NOfYNKkZAIJwIknwvvv\nw2efwaGHhj9Qkv/++c8wcPWRR/I/kMRBwURqtRUr4Prrwx/tffcNeZFzz01+QOH224emtt69w3oX\nffpo0sh89uWXYYGrG2+EU09NujTJUDCRWmndujD19z77wOLFoSbQu3fIWeSTDh1g6tQwWeThh8Ps\n2UmXSCr77rsQQM44I9RqayvlTKTWGT8+5EW23Rbuuacw1tx2Dz2EevUK334vuyz52pOELyW/+13I\naw0ZUlj/TTTOpAoKJpKOOXPCKOQPPwy9s04/vfAWJfrHP0Iz3FZbhdX5dt896RLVXu7QrRt88EGY\nDaHynGz5Tgl4kRpatgwuvxyOPBJ+9asQTH7zm8ILJADNmsEbb8Dxx8MvfgGPP67pWJJy991hOp3h\nwwsvkMRBwUSK1urVcO+90KJFaI748MNQMyn0X/y6daFHDxg3Dv73f0Nb/RdfJF2q2uWZZ8L/W6NH\nxzv+qJAomEjRcQ8LU+2/P4wdGyZWfOCB4htAdtBBodvwz34WBjqWlW38M5K5118POatRo+CnP026\nNPlDORMpKtOmheT655+HyRhPrCWLPr/5ZpiKpaQkdCoopuVg88mHH8Ixx8BTT0HbtkmXJjPKmYhU\nYcmSMGq8XTs488wQVGpLIIGQD5o2LTSBHXQQvPZa0iUqPosXw69/HZoWCz2QxEHBRArad9/BrbfC\nAQfAjjuGHlt//GN2J2MsFA0ahKns+/eHs84K+aH//CfpUhWHFSvCNCmdO4dlA+THFEykIK1fH5bK\nbd48rDMyaRL06weNGiVdsuSdfHJ4JgsWhDE0U6YkXaLCtmZNGEvSunWYLUGqppyJFJy334YrrwwB\n5Z57QhOP/FjF7MdXXBG6RvfoUTtrbJlwD82nn38Of/tbcT0/DVqsgoJJ7fDxx9C9ewgmt98OZ59d\nWCOOk7JoEZx/fmiqefzxMIWMpKdPnzADcHl5/k21kykl4KXW+fpr6NkzNNnst1/Ii5xzjgJJupo0\nCdPan3MOHHFE6Ca9fn3Spcp/jz4agu/IkcUXSOKgmonkrXXrwnK2vXuHXlq33Qa77pp0qQrb3Llh\nOpaGDcMfyyZNki5RfhozBjp1Cr3imjdPujTxKLiaiZm1M7PZZjbXzLpXcXxbMxtuZtPN7B0za5nu\nZ6V4jRsHhxwS+vOPGhXmoVIgydw++4QxKUcfHabdf/JJTcdS2ZQpIeAOH168gSQOsdZMzKwOMBdo\nCywGJgMd3H12yjl3Aivc/RYzaw484O7HpfPZlGuoZlIkZs2Ca68NU63/+c9w2mmFOYdWIZgyJTR9\ntWwJDz4YulbXdh9/HJoC//KXMH9bMSu0mklrYJ67L3D3NcBQoH2lc1oCEwDcfQ6wh5k1TvOzUiS+\n/DJMUXH00XDssWGkcSHO6ltIDj0U3nsvzDx80EGhBlibLV8eBiV27178gSQOcQeT3YCFKduLon2p\npgO/ATCz1sDuQJM0PysFbvXqMPtqixZhe9asMB1KvXrJlqu22HLLMO3Mk09C167QpQt8803Spcq9\n//wn1IJPOil0o5aay4de03cA95nZFGAGMBVYV9OLlJaWfv++pKSEkpKSLBVP4uAe+u1fd11ox3/9\n9R8CiuReSUkY6HjllaGWMnhw7Rm/s359mNdsl11C02qxKi8vp7y8PLbrx50zaQOUunu7aLsH4O7e\nr5rPfAQcAOyf7meVMyksU6eG2seXX4ZvxSeckHSJJFVZWZiS5pxz4OabC3/K/o255powg8LYsaGm\nVlvkbNCimVXbaujuwzd6cbO6wBxCEn0JMAno6O6zUs5pBKx09zVm1hk4wt07pfPZlGsomBSAxYvh\nhhvCqnR9+oSRxcU0oriYfPFFaPL6xz/CcrQHHZR0ieJx//2h88Fbb8H22yddmtzKdjCp7lf5f6o5\n5sBGg4m7rzOzrsBYQn5moLvPMrMu4bAPAFoAg81sPTATuLC6z6bzQ0l+Wbky1EDuvTdMlDd3rqZI\nz3eNG8OwYSGQHH98qElee22YlbhYDB8Od94ZukrXtkASBw1alNisXx/GiVx/Pfzyl3DHHbDnnkmX\nSmrqk0/CAL5Vq0IupVmzpEuUubffhvbtw8wAhx6adGmSkfOuwWbWyMzuNrN3o9ddUdOUyAa9+Sa0\naRP66w8dGpY5VSApTLvvDuPHh3Vi2rSBv/61sAc6zpkTuv4OGVJ7A0kcNlozMbNhwAfA4GjXOcBB\n7p43PbFVM8kfH30U+um/8w707QsdO2oOrWIya1ZIzDduHKa6KbRZCT77DA4/POTuLrgg6dIkK4lB\ni3u7e293nx+9+gB7ZasAUhy++ioEkVatwnrks2drVt9i1KIFTJwIhx0Wprt55pmkS5S+b7+FU04J\nwbC2B5I4pPOr/p2Zfd/j3MyOAL6Lr0hSSNauDc0ezZuHHkDvvw+9ekH9+kmXTOKy+eZQWhpm0+3d\nO6zquHx50qWq3tq18PvfhxU5e/dOujTFKZ1g8kfgATP72Mw+BvoDXWItlRSEsWPh4IPDt9OXXgqz\n0BZas4dsulatwpihn/wk1EZffjnpElXNHS69NASUhx7SFD1xqTZnEk22+Ft3f9bMGgK4+9e5Kly6\nlDPJrVmz4OqrYd68MGK4fXv9gtZ2EyaEBbhOPjn8P5FP63/cfjs891yYZaFBg6RLkz9ymjNx9/XA\nddH7r/MxkEjufPllmL/p6KPD2IOZMzWrrwTHHhuaOL/9NtRWJ05MukTBkCEwYACMHq1AErd0mrnG\nm9k1ZvZTM9u+4hV7ySRvrFoVBh22aBES6rNnhzmcNBmjpGrUKIxD6dcvzPh8ww1hIs+kjB8fpkoZ\nPTrMuyXxSqdr8EdV7HZ3z5seXWrmioc7jBgRJmNs0SI0X+y7b9KlkkKwdClcfHEY8DhkSEh859L7\n78Nxx8Hzz4eatPxYzubmSrnhlu7+n43tS5KCSfa9916YQuNf/wpTxB93XNIlkkLjHlbI7N49fCG5\n6qrcTMeycGFY4OrPfw49uKRqSYwzeTvNfVIEPv00TJ1xyinwhz+E3joKJLIpzMJ4jkmTQjfiY44J\ng1rj9O9/hwWuunVTIMm1DQYTM9vZzH4ObGVmh5jZodGrBNAogiL06qthdthddw1TTnTuXFwT+0ky\n9twz/L/Vvj20bg2PPBLPdCyrV4dpUo45JtSCJLeqm4L+PKAT8Avg3ZRDK4BB6UxBnytq5src0qVh\nnqLBg0NPLZE4fPABnHsu7LYbPPww7Lxzdq7rHka2r1wZugHrS9DGJZEzOcPdh2XrhnFQMMnMunVh\ngaojjgiLIYnEafVquOWWEEweeADOOCPza15/PZSXwyuvwFZbZX692iCJYLIFcAawBynrn7h73vzZ\nUTDJTJ8+4Rdx/Hh9o5PceeedUEs57LAwu/S2227adR58EO65J0wrv+OO2S1jMUsiAf8C0B5YC3yb\n8pIiMGFCmGLiqacUSCS32rQJHTwaNQrTsYwfX/NrlJWFWs5LLymQJC2dmskH7r5/jsqzSVQz2TQV\neZLHH1ePLUnW2LFhGefTTw+LqKUzUeikSaHX4ahRYZ4wqZlEugabWY6HHEnc1q0Ls7127qxAIsk7\n4YQw0HDZsjC1/aRJ1Z//z3+G3mGPPqpAki/SqZl8CDQDPgJWAUYYAX9g/MVLj2omNVdaGia+GzdO\nzVuSX559Fi67DLp0gRtvDFPep/rii9BZ5OqrwzmyaZJIwDetar+7L0jrBmbtgHsJtaCB7t6v0vGG\nwBPA7kBd4C53HxQd6wZcFJ36sLvfv4F7KJjUwCuvhG6UU6Zkr2umSDYtWQIXXRSaYocMgZYtw/6V\nK6Ft2zCx5G23JVvGQpfzYBLd9CDgqGjzDXefntbFwxT2c4G2wGJgMtDB3WennNMTaOjuPc1sR2AO\nsBPQHHgaaEVI/r8E/NHd51dxHwWTNFXkSYYMCb+UIvnKPQxwvP768OraNaxDv802Ic+n2aozk/Oc\nSVQ7eBL4SfR6wswuS/P6rYF57r7A3dcAQwk9w1I5UDE5dANgmbuvBVoAf3f3Ve6+DngdyJt15wtR\nRZ7k4osVSCT/mYWc3jvvwLBhsNdesGJFWHtegST/bLbxU7gQOMzdvwUws37AROAvaXx2N2BhyvYi\nQoBJ1R8oM7PFwDZAxYw6HwC3mtl2hFzNrwk1G9lEFQMSb7wx2XKI1MTee8Nrr8HQoaH3lpY+yE/p\nBBMD1qVsr4v2ZcuJwFR3P9bM9gbGmdmB7j47ClzjgG+AqZXK8V9KS0u/f19SUkJJSUkWi1j4xo8P\nI46nTFHCXQpP3bpw9tlJl6KwlZeXU15eHtv100nAXwWcB4yIdp1GmJvr3o1e3KwNUOru7aLtHoSe\nYP1SzhkJ9HX3t6LtV4Du7v5upWvdBix0979WcR/lTKqxZAn8/OfKk4jID3KeM3H3u4HzgeXR6/x0\nAklkMtDMzJqaWT2gA1BW6ZwFwHEAZrYTsA8wP9puHP27O3A68FSa95VIRZ6kSxcFEhGJz0abuaLa\nxUx3nxJtNzSzw9z97xv7rLuvM7OuwFh+6Bo8y8y6hMM+ALgVGGRm70cfu87dl0fvh0VLBK8BLtEa\n9DV3881hqd1evZIuiYgUs3SauaYCh1a0I0Xdfd9190NzUL60qJmrauPHw3nnhVUTNZ5ERFIlMZ3K\nf/2ldvf1pJe4lwQtWRJmZB0yRIFEROKXTjCZb2aXm9nm0asbUU5D8tPatdCxY8iTHHts0qURkdog\nnWDyR+Bw4FPCOJHDgIvjLJRk5uabYbPNlCcRkdypbtnejsBYd1+W2yLVnHImPxg3Djp1CuNJdtop\n6dKISL7Kds6kutzH7sBzZrY58AphbqxJ+qudvxYvDgn3J59UIBGR3EqnN1cDwjiQdoSpUGYBY4CX\n3f2z2EuYBtVMQp7kuONCjuSmm5IujYjku0RmDa5UgJbAScAJ7n5itgqSCQWTMN/WxInw8suaLkVE\nNi6J9Uxecfe2G9uXpNoeTMaOhfPPV55ERNKXs5yJmW0J1Ad2jGburbhpQ8JswJIHKvIkTz2lQCIi\nyakuAd8FuALYFXiPH4LJ14Rp4yVha9eGebcuuQSOOSbp0ohIbZZOM9dl7p7O2iWJqa3NXDfeGBYO\nGjNGeRIRqZkkplNZGvXowsx6mdlwM8ubeblqq7Fj4dFH4YknFEhEJHnpBJMb3X2FmR1J6CI8EHgw\n3mJJdT79VONJRCS/pBNMKlY3PBkY4O6jAC2cmZCKPMmll4IWkxSRfJFOMPnUzB4irM0+2sy2SPNz\nEoPSUthiC+jZM+mSiIj8IJ0EfH3C6PcZ7j7PzHYBDnD3sbkoYDpqSwL+5ZfhwgvDeJKf/CTp0ohI\nIUti2d6VwOfAkdGutcC8bBVA0vPpp2ECxyeeUCARkfyTTs2kN/ALoLm772NmuwLPufsRuShgOoq9\nZrJ2bZhz64QTNK28iGRHEl2DTwdOBb4FcPfFQINsFUA2rndv2HJL5UlEJH+lE0xWR1/7K9aA37om\nNzCzdmY228zmmln3Ko43NLMyM5tmZjPMrFPKsSvN7AMze9/MnjSzWteL7OWXYfBgjScRkfyWTjB5\nNurNta2ZdQbGAw+nc3Ezq0OYeuVEYD+go5ntW+m0S4GZ7n4wcAxwl5ltFjWnXQYc6u4HEqZ+6ZDO\nfYtFRZ7kySeVJxGR/JZOMGkMPA8MA5oDNwFN0rx+a2Ceuy9w9zXAUKB9pXOcH5rNGgDL3H1ttF0X\n2NrMNiNMOrk4zfsWvIp13Lt2hV/9KunSiIhUL51gcry7j3P3a939GncfR1jPJB27AQtTthfx4xmH\n+wMtzWwxMB3oBt/nZu4CPiGsP/9vdx+f5n0L3k03KU8iIoWjuino/wRcAuxlZu+nHGoAvJXFMpwI\nTHX3Y81sb2CcmVU0a7UHmgJfAc+b2Vnu/lRVFyktLf3+fUlJCSUFPDx8zBh4/PEwnqSOhoeKSBaU\nl5dTXl4e2/U32DXYzBoB2wF9gR4ph1a4+/K0Lm7WBih193bRdg/A3b1fyjkjgb7u/la0/QrQHdgD\nONHdO0f7zwEOc/euVdynaLoGL1oEv/gFPPssHH100qURkWKVs8Wx3P0rQo2gYwbXnww0M7OmwBJC\nAr3y9RYQJpB8y8x2AvYB5hOa4NpEi3StAtpG1ytaFXmSyy9XIBGRwlLd4lgZc/d1ZtYVGEsIDgPd\nfZaZdQmHfQBwKzAopSntuqjmM8nMngemAmuifwfEWd6k3XQT1K8PPXps/FwRkXyy0RHwhaAYmrnG\njIGLLtK8WyKSGzlr5pLcWbQojCd59lkFEhEpTOorlLC1a6FDB+VJRKSwqZkrYT17wtSpMHq0ugGL\nSO6omauIvPRSmHNL40lEpNApmCRk0SI4/3x47jlo3Djp0oiIZEbfhxNQkSfp1g2OOirp0oiIZE45\nkwT06AHTp8OoUWreEpFkKGdS4EaPDlPKK08iIsVEwSSHFi6ECy6A559XnkREiou+G+fImjUhT3LF\nFXDkkUmXRkQku5QzyRHlSUQknyhnUoBGjVKeRESKm4JJzBYuhAsvVJ5ERIqbvifHqCJPcuWVypOI\nSHFTziRG3bvDjBkwcqSat0QkvyhnUiBGjYKnn1aeRERqBwWTGKTmSXbcMenSiIjET9+Zs2zNGvj9\n75UnEZHaRTmTLLvuOpg5E158Uc1bIpK/sp0zif3PnZm1M7PZZjbXzLpXcbyhmZWZ2TQzm2FmnaL9\n+5jZVDObEv37lZldHnd5MzFyJAwdCoMHK5CISO0Sa83EzOoAc4G2wGJgMtDB3WennNMTaOjuPc1s\nR2AOsJPziBC+AAAKd0lEQVS7r610nUXAYe6+sIr7JF4z+eQTaNUKhg+HI45ItCgiIhtVaDWT1sA8\nd1/g7muAoUD7Suc40CB63wBYlhpIIscB/6wqkOSDivEkV1+tQCIitVPcwWQ3IDUALIr2peoPtDSz\nxcB0oFsV1/k98HQsJcyCG26A7baDa65JuiQiIsnIh67BJwJT3f1YM9sbGGdmB7r7NwBmtjlwKtCj\nuouUlpZ+/76kpISSkpLYCpyqIk+i8SQiks/Ky8spLy+P7fpx50zaAKXu3i7a7gG4u/dLOWck0Nfd\n34q2XwG6u/u70fapwCUV19jAfRLJmVTkSUaMgMMPz/ntRUQ2WaHlTCYDzcysqZnVAzoAZZXOWUDI\niWBmOwH7APNTjnckD5u4KsaTXHONAomISOzjTMysHXAfIXANdPc7zKwLoYYywMx2AQYBu0Qf6evu\nT0efrU8INnu5+4pq7pHzmsm118KsWVBWpuYtESk82a6ZaNDiJnjxRejaNeRJdtghZ7cVEckaTfSY\nsAUL4KKLQp5EgUREJFADTQ2sXh3yJNdeqzyJiEgqNXPVwDXXwJw58MILypOISGFTM1dCysrguec0\nnkREpCqqmaRhwQJo3Rr+9jf45S9ju42ISM4U2jiTgpeaJ1EgERGpmmomG3H11TB3rvIkIlJclDPJ\noRdegGHDlCcREdkYBZMN+PhjuPjikCfZfvukSyMikt/0fbsKq1eH9Umuu055EhGRdChnUoWrr4Z5\n80Izl2WtRVFEJH8oZxKz1DyJAomISHpUM0nx8cdw2GEhoLRpk3m5RETylcaZxKRiPEn37gokIiI1\npZpJ5Kqr4B//UJ5ERGoH5Uxi8MILMHy48iQiIpuq1tdMlCcRkdpIOZMsUp5ERCQ7anXN5MorYf78\nMMpdzVsiUpsUXM3EzNqZ2Wwzm2tm3as43tDMysxsmpnNMLNOKccamdlzZjbLzGaa2WHZKtff/haW\n3n3sMQUSEZFMxVozMbM6wFygLbAYmAx0cPfZKef0BBq6e08z2xGYA+zk7mvNbBDwmrs/ZmabAfXd\n/esq7lOjmslHH4U8yYsvhn9FRGqbQquZtAbmufsCd18DDAXaVzrHgQbR+wbAsiiQNASOcvfHANx9\nbVWBpKYq8iQ9eyqQiIhkS9zBZDdgYcr2omhfqv5ASzNbDEwHukX79wS+NLPHzGyKmQ0ws60yLVD3\n7rDrrnDFFZleSUREKuTDOJMTganufqyZ7Q2MM7MDCWU7FLjU3d81s3uBHkDvqi5SWlr6/fuSkhJK\nSkp+dM6IESFXovEkIlLblJeXU15eHtv1486ZtAFK3b1dtN0DcHfvl3LOSKCvu78Vbb8CdCfUaCa6\n+17R/iOB7u7+P1XcZ6M5k4o8yciRYT13EZHarNByJpOBZmbW1MzqAR2AskrnLACOAzCznYB9gPnu\n/hmw0Mz2ic5rC3y4KYWoyJNcf70CiYhIHGIfZ2Jm7YD7CIFroLvfYWZdCDWUAWa2CzAI2CX6SF93\nfzr67EHAI8DmwHzgfHf/qop7VFszueKKMNJ9xAg1b4mIQPZrJkU/aHHEiDCJ45QpsN12OS6YiEie\nUjCpwoaCifIkIiJVK7ScSWJWrYIzz4QbblAgERGJW9HWTLp1g08+CVPLK08iIvLftJ5JGoYPh7Iy\njScREcmVoquZzJ8fppNXnkREZMOUM6nGqlVhPInyJCIiuVVUNZPLL4dFi2DYMDVviYhURzmTDRg2\nLDRtKU8iIpJ7RVMzadzYGTUKWrVKujQiIvlPOZMN6NVLgUREJClFUzNZv97VvCUikibVTDZAgURE\nJDlFE0xERCQ5CiYiIpIxBRMREcmYgomIiGRMwURERDKmYCIiIhmLPZiYWTszm21mc82sexXHG5pZ\nmZlNM7MZZtYp5djHZjbdzKaa2aS4yyoiIpsm1mBiZnWA/sCJwH5ARzPbt9JplwIz3f1g4BjgLjOr\nmDNsPVDi7oe4u+YBzoHy8vKki1BU9DyzS88zf8VdM2kNzHP3Be6+BhgKtK90jgMNovcNgGXuvjba\nthyUUVLolzW79DyzS88zf8X9h3o3YGHK9qJoX6r+QEszWwxMB7qlHHNgnJlNNrPOsZZUREQ2WT5M\nQX8iMNXdjzWzvQnB40B3/wY4wt2XmFnjaP8sd38z2eKKiEhlsU70aGZtgFJ3bxdt9wDc3fulnDMS\n6Ovub0XbrwDd3f3dStfqDaxw97uruE/hz1YpIpJjhbQ41mSgmZk1BZYAHYCOlc5ZABwHvGVmOwH7\nAPPNrD5Qx92/MbOtgROAPlXdJJsPREREai7WYOLu68ysKzCWkJ8Z6O6zzKxLOOwDgFuBQWb2fvSx\n69x9uZntCYyIah2bAU+6+9g4yysiIpumKNYzERGRZOVlt1sza2JmE8xsZjSQ8fJo/3ZmNtbM5pjZ\ny2bWKNq/fXT+CjO7v9K1Xo0GTU41sylmtmMSP1NS9CyzS88zu/Q8syfxZ+nuefcCdgYOjt5vA8wB\n9gX6EZrBALoDd0Tv6wOHAxcD91e61qvAIUn/THqWxfHS89TzzNdX0s8yL2sm7r7U3adF778BZgFN\nCAMeB0enDQZOi85Z6e5vA6s2cMm8/DlzQc8yu/Q8s0vPM3uSfpZ5/+DNbA/gYOAdYCd3/wzCgwN+\nkuZlBkVVtV6xFLJA6Flml55ndul5Zk8SzzKvg4mZbQM8D3SLIm3l3gLp9B44y90PAI4CjjKzP2S5\nmAVBzzK79DyzS88ze5J6lnkbTCxM9vg8MMTdX4h2f2ZhLApmtjPw+cau4+5Lon+/BZ4izBdWq+hZ\nZpeeZ3bpeWZPks8yb4MJ8Cjwobvfl7KvDOgUvT8PeKHyhwiTQ4Y3ZnXNbIfo/ebAKcAHsZQ2v+lZ\nZpeeZ3bpeWZPYs8yL8eZmNkRwOvADEKVzIHrgUnAs8BPCSPnz3T3f0ef+Ygw63A94N+EEfOfRNfZ\nDKgLjAeu8nz8oWOiZ5ldep7ZpeeZPUk/y7wMJiIiUljyuZlLREQKhIKJiIhkTMFEREQypmAiIiIZ\nUzAREZGMKZiIiEjGFExEasjM1kVzFn0QTdF9lZlVu9qnmTU1s8qrjIoUDQUTkZr71t0Pdff9geOB\nk4DeG/nMnsBZsZdMJCEKJiIZcPcvCetBdIXvayCvm9m70atNdGpf4MioRtPNzOqY2Z1m9nczm2Zm\nnZP6GUSyQSPgRWrIzL5294aV9i0HmgMrgPXuvtrMmgFPu3srM/sVcLW7nxqd3xlo7O63m1k94C3g\nt+6+ILc/jUh2bJZ0AUSKREXOpB7Q38wOBtYBP9vA+ScAB5jZ76LthtG5CiZSkBRMRDJkZnsBa939\nCzPrDSx19wPNrC7w3YY+Blzm7uNyVlCRGClnIlJzqdN1NwYeBP4S7WoELInen0uYdRVC81eDlGu8\nDFwSrT+Bmf3MzLaKs9AicVLNRKTmtjSzKYQmrTXA4+5+T3Ts/4BhZnYuMAb4Ntr/PrDezKYCg9z9\nvmhp1SlRt+LPidbmFilESsCLiEjG1MwlIiIZUzAREZGMKZiIiEjGFExERCRjCiYiIpIxBRMREcmY\ngomIiGRMwURERDL2/+gyJAnIwwnpAAAAAElFTkSuQmCC\n",
"text/plain": [
"<matplotlib.figure.Figure at 0xd94ba58>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"fig, ax = plt.subplots(1,1)\n",
"ax.plot(time_series)\n",
"ax.set_xlabel('Date')\n",
"ax.set_ylabel('test/control')\n",
"ax.set_title('Line Plot')\n",
"\n",
"ax.xaxis.set_major_locator(ticker.MultipleLocator())"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"#### Insights\n",
"\n",
"From the above graph it can be seen that over the course of five days, control does continuously better than test.\n",
"Also the variability in the test/control variable is low. Min = 0.87 and Max = 0.93.\n",
"This implies that data collected is sufficient, however there might be a bias in the control or test group.\n",
"\n",
"Likely cause\n",
"\n",
"1. Some segment of the data that has a higher or lower conversion rate has found it's way into either test or control thus increasing/decreasing that groups' overall conversion rate.\n",
"\n",
"2. We can use decision trees to identify this. If the split between test and control is truly random, then the tree shouldn't be able to split well."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\Deepak\\Anaconda2\\lib\\site-packages\\numpy\\lib\\arraysetops.py:200: FutureWarning: numpy not_equal will not check object identity in the future. The comparison did not return the same result as suggested by the identity (`is`)) and will change.\n",
" flag = np.concatenate(([True], aux[1:] != aux[:-1]))\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>user_id</th>\n",
" <th>source</th>\n",
" <th>device</th>\n",
" <th>browser_language</th>\n",
" <th>ads_channel</th>\n",
" <th>browser</th>\n",
" <th>sex</th>\n",
" <th>age</th>\n",
" <th>country</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>497851</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>21.0</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>290051</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>22.0</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>548435</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>19.0</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>540675</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>22.0</td>\n",
" <td>16</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>863394</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>35.0</td>\n",
" <td>10</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" user_id source device browser_language ads_channel browser sex age \\\n",
"1 497851 0 1 1 3 3 2 21.0 \n",
"3 290051 0 0 2 2 0 1 22.0 \n",
"4 548435 0 1 1 3 2 2 19.0 \n",
"5 540675 1 0 1 0 0 1 22.0 \n",
"6 863394 2 0 2 0 0 2 35.0 \n",
"\n",
" country \n",
"1 10 \n",
"3 10 \n",
"4 10 \n",
"5 16 \n",
"6 10 "
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X = data_new.copy()\n",
"lb = LabelEncoder()\n",
"X['source'] = lb.fit_transform(X['source'])\n",
"X['country'] = lb.fit_transform(X['country'])\n",
"X['device'] = lb.fit_transform(X['device'])\n",
"X['browser_language'] = lb.fit_transform(X['browser_language'])\n",
"X['ads_channel'] = lb.fit_transform(X['ads_channel'])\n",
"X['browser'] = lb.fit_transform(X['browser'])\n",
"X['sex'] = lb.fit_transform(X['sex'])\n",
"X = X.drop(['conversion','date','test'], axis = 1)\n",
"y = data_new['test']\n",
"X.head(5)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"user_id 0\n",
"source 0\n",
"device 0\n",
"browser_language 0\n",
"ads_channel 0\n",
"browser 0\n",
"sex 0\n",
"age 454\n",
"country 0\n",
"dtype: int64"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Checking for missing values\n",
"X.isnull().sum()"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Int64Index([ 819, 1696, 1934, 2409, 2721, 5042, 7552, 7855,\n",
" 8930, 9082,\n",
" ...\n",
" 444098, 444581, 444828, 445540, 445950, 446681, 451052, 452302,\n",
" 452342, 453270],\n",
" dtype='int64', length=454)"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Index values of all NAN's\n",
"index = X['age'].index[X['age'].apply(np.isnan)]\n",
"index"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Below, imputing the missing values in age column with median age of the column"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>user_id</th>\n",
" <th>source</th>\n",
" <th>device</th>\n",
" <th>browser_language</th>\n",
" <th>ads_channel</th>\n",
" <th>browser</th>\n",
" <th>sex</th>\n",
" <th>age</th>\n",
" <th>country</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>497851.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>3.0</td>\n",
" <td>3.0</td>\n",
" <td>2.0</td>\n",
" <td>21.0</td>\n",
" <td>10.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>290051.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>2.0</td>\n",
" <td>2.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>22.0</td>\n",
" <td>10.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>548435.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>3.0</td>\n",
" <td>2.0</td>\n",
" <td>2.0</td>\n",
" <td>19.0</td>\n",
" <td>10.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>540675.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>22.0</td>\n",
" <td>16.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>863394.0</td>\n",
" <td>2.0</td>\n",
" <td>0.0</td>\n",
" <td>2.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>2.0</td>\n",
" <td>35.0</td>\n",
" <td>10.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" user_id source device browser_language ads_channel browser sex \\\n",
"0 497851.0 0.0 1.0 1.0 3.0 3.0 2.0 \n",
"1 290051.0 0.0 0.0 2.0 2.0 0.0 1.0 \n",
"2 548435.0 0.0 1.0 1.0 3.0 2.0 2.0 \n",
"3 540675.0 1.0 0.0 1.0 0.0 0.0 1.0 \n",
"4 863394.0 2.0 0.0 2.0 0.0 0.0 2.0 \n",
"\n",
" age country \n",
"0 21.0 10.0 \n",
"1 22.0 10.0 \n",
"2 19.0 10.0 \n",
"3 22.0 16.0 \n",
"4 35.0 10.0 "
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"impute = Imputer(missing_values = 'NaN', strategy = 'median', axis = 0, copy = True)\n",
"imputed = DataFrame(impute.fit_transform(X))\n",
"imputed.columns = X.columns.values\n",
"imputed.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Creating an instance of the Decision tree classifier below"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"clf = DecisionTreeClassifier(criterion = 'entropy', max_depth = 2, min_samples_leaf = 2, min_samples_split = 2)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=2,\n",
" max_features=None, max_leaf_nodes=None, min_samples_leaf=2,\n",
" min_samples_split=2, min_weight_fraction_leaf=0.0,\n",
" presort=False, random_state=None, splitter='best')"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"clf.fit(imputed,y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Understanding the most important features from the classification"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0., 0., 0., 0., 0., 0., 0., 0., 1.])"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"clf.feature_importances_"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array(['user_id', 'source', 'device', 'browser_language', 'ads_channel',\n",
" 'browser', 'sex', 'age', 'country'], dtype=object)"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"imputed.columns.values"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"#### Insights\n",
"\n",
"There seems to be a fair amount of bias in the way control and test groups are separated. \n",
"This is highlighted by the feature importance variable. \n",
"Country seems to be an important feature when it comes to separating test and control groups.\n",
"Therefore, the separation is not truly random"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[('country', 1.5, 1, 4),\n",
" ('country', 0.5, 2, 3),\n",
" ('age', -2.0, -1, -1),\n",
" ('age', -2.0, -1, -1),\n",
" ('country', 14.5, 5, 6),\n",
" ('age', -2.0, -1, -1),\n",
" ('age', -2.0, -1, -1)]"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"zip(imputed.columns[clf.tree_.feature], clf.tree_.threshold, clf.tree_.children_left, clf.tree_.children_right)"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 1, 2, -1, -1, 5, -1, -1], dtype=int64)"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"clf.tree_.children_left"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 10., 16., 2., 4., 15., 7., 11., 14., 5., 3., 1.,\n",
" 6., 8., 9., 13., 12., 0.])"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"imputed['country'].unique()"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array(['Mexico', 'Venezuela', 'Bolivia', 'Colombia', 'Uruguay',\n",
" 'El Salvador', 'Nicaragua', 'Peru', 'Costa Rica', 'Chile',\n",
" 'Argentina', 'Ecuador', 'Guatemala', 'Honduras', 'Paraguay',\n",
" 'Panama', nan], dtype=object)"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_new['country'].unique()"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>mean in control</th>\n",
" <th>mean in test</th>\n",
" <th>%samples in test group</th>\n",
" <th>p_value</th>\n",
" </tr>\n",
" <tr>\n",
" <th>country</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Argentina</th>\n",
" <td>0.015071</td>\n",
" <td>0.013725</td>\n",
" <td>0.799799</td>\n",
" <td>0.335147</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Bolivia</th>\n",
" <td>0.049369</td>\n",
" <td>0.047901</td>\n",
" <td>0.501079</td>\n",
" <td>0.718885</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Chile</th>\n",
" <td>0.048107</td>\n",
" <td>0.051295</td>\n",
" <td>0.500785</td>\n",
" <td>0.302848</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Colombia</th>\n",
" <td>0.052089</td>\n",
" <td>0.050571</td>\n",
" <td>0.498927</td>\n",
" <td>0.423719</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Costa Rica</th>\n",
" <td>0.052256</td>\n",
" <td>0.054738</td>\n",
" <td>0.498964</td>\n",
" <td>0.687876</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Ecuador</th>\n",
" <td>0.049154</td>\n",
" <td>0.048988</td>\n",
" <td>0.494432</td>\n",
" <td>0.961512</td>\n",
" </tr>\n",
" <tr>\n",
" <th>El Salvador</th>\n",
" <td>0.053554</td>\n",
" <td>0.047947</td>\n",
" <td>0.497492</td>\n",
" <td>0.248127</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Guatemala</th>\n",
" <td>0.050643</td>\n",
" <td>0.048647</td>\n",
" <td>0.496066</td>\n",
" <td>0.572107</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Honduras</th>\n",
" <td>0.050906</td>\n",
" <td>0.047540</td>\n",
" <td>0.491013</td>\n",
" <td>0.471463</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Mexico</th>\n",
" <td>0.049495</td>\n",
" <td>0.051186</td>\n",
" <td>0.500257</td>\n",
" <td>0.165544</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Nicaragua</th>\n",
" <td>0.052647</td>\n",
" <td>0.054177</td>\n",
" <td>0.491447</td>\n",
" <td>0.780400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Panama</th>\n",
" <td>0.046796</td>\n",
" <td>0.049370</td>\n",
" <td>0.502404</td>\n",
" <td>0.705327</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Paraguay</th>\n",
" <td>0.048493</td>\n",
" <td>0.049229</td>\n",
" <td>0.503199</td>\n",
" <td>0.883697</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Peru</th>\n",
" <td>0.049914</td>\n",
" <td>0.050604</td>\n",
" <td>0.498931</td>\n",
" <td>0.771953</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Uruguay</th>\n",
" <td>0.012048</td>\n",
" <td>0.012907</td>\n",
" <td>0.899613</td>\n",
" <td>0.879764</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Venezuela</th>\n",
" <td>0.050344</td>\n",
" <td>0.048978</td>\n",
" <td>0.496194</td>\n",
" <td>0.573702</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" mean in control mean in test %samples in test group p_value\n",
"country \n",
"Argentina 0.015071 0.013725 0.799799 0.335147\n",
"Bolivia 0.049369 0.047901 0.501079 0.718885\n",
"Chile 0.048107 0.051295 0.500785 0.302848\n",
"Colombia 0.052089 0.050571 0.498927 0.423719\n",
"Costa Rica 0.052256 0.054738 0.498964 0.687876\n",
"Ecuador 0.049154 0.048988 0.494432 0.961512\n",
"El Salvador 0.053554 0.047947 0.497492 0.248127\n",
"Guatemala 0.050643 0.048647 0.496066 0.572107\n",
"Honduras 0.050906 0.047540 0.491013 0.471463\n",
"Mexico 0.049495 0.051186 0.500257 0.165544\n",
"Nicaragua 0.052647 0.054177 0.491447 0.780400\n",
"Panama 0.046796 0.049370 0.502404 0.705327\n",
"Paraguay 0.048493 0.049229 0.503199 0.883697\n",
"Peru 0.049914 0.050604 0.498931 0.771953\n",
"Uruguay 0.012048 0.012907 0.899613 0.879764\n",
"Venezuela 0.050344 0.048978 0.496194 0.573702"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = data_new.groupby(['country','test'])[['conversion']].mean().unstack()\n",
"b = data_new.groupby('country')[['test']].mean()\n",
"df = pd.concat([a,b], axis = 1)\n",
"\n",
"temp1 = data_new[data_new['test'] == 0]\n",
"temp2 = data_new[data_new['test'] == 1]\n",
"\n",
"a = []; b = []; c = []; d = []\n",
"\n",
"for i, j in temp1.groupby('country')['conversion']:\n",
" a.append(i)\n",
" b.append(j)\n",
"for i, j in temp2.groupby('country')['conversion']:\n",
" c.append(i)\n",
" d.append(j)\n",
" \n",
"p_value = []\n",
"for i in np.arange(16):\n",
" p_value.append(sc.stats.ttest_ind(b[i], d[i], equal_var = False, axis = 0)[1]) \n",
" \n",
"df = pd.concat([df, DataFrame(p_value, index = a)], axis = 1)\n",
"\n",
"df.columns = ['mean in control', 'mean in test', '%samples in test group', 'p_value']\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"#### Conclusions\n",
"\n",
"As it can be seen the p-value associated with all the countries is much greater than 0.05\n",
"\n",
"This implies that we fail to reject the null hypothesis. Thus, there is no statistically significant difference in means\n",
"between test and control group in each country. \n",
"\n",
"However, as it can be seen Argentina and Uruguay have the lowest conversion rate of 1%. Also, nearly 80% of the samples from \n",
"these two countries found it's way into the test group and only 20% in the control group.This can be verified from the \n",
"third column in the above dataframe. \n",
"\n",
"Due to their small conversion rate and massive influx of samples into the test group, there was a significant difference\n",
"between the overall test conversion and control conversion rates. It was because of this the mean of the test group was much\n",
"less than the mean of the control group.\n",
"\n",
"However, it is now clear that the A/B test was insignificant. Both the test and the control group perform similarly. \n",
"It is clear that the local translation did not affect the conversion rate.\n",
"\n",
"Side Note\n",
"\n",
"1. Argentina and Uruguay have the lowest conversion rate\n",
"2. Marketing efforts can be directed in this direction to improve conversion rate in these two countries"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.11"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment