Skip to content

Instantly share code, notes, and snippets.

@mindyng
Last active February 17, 2017 04:10
Show Gist options
  • Save mindyng/35b418e11ff480815841080bbb7cf71d to your computer and use it in GitHub Desktop.
Save mindyng/35b418e11ff480815841080bbb7cf71d to your computer and use it in GitHub Desktop.
Mode Analytics Case Study: Validating A B Test Results
Validating A/B Test Results for Yammer
Possible Causes to Increased Messages in Treatment Group
1. Metric may need to be redefined
2. Poor calculations
3. Users were not random, which would make test set-up faulty by being bias
4. Confounding factor that is hard to detect, but having effect(s) on test results
1. Metric needs to be redefined.
Was test run correctly?
Maybe not the best metric to use to explain rise in treatment group results.
Advice is to use different metrics, such as Average Login's and/or Average Days Engaged.
If both of these metrics show an increase in treatment group results over the
control group results then this means that increase cannot be explained alone by
the metric, number of messages sent.
Query for Average Login's and Average Days Engaged follow:
a. Average Login's
Refer to attached file #1 for SQL code.
b. Average Days Engaged
Refer to attached file #1 for SQL code.
2. Poor calculations.
The results could have been a consequence of poor calculations. There are a wide variety of ways to run an A/B test. There are different opinions on how to conduct the test, such as how to define test-sample size, whether or not a one-tailed or two-tailed test are to be used, what assumptions are to be made about distributions. However, in the end, these do not affect test results.
Also, how the users were grouped could be reconsidered. New and old users were grouped together, which could cause a faulty measurement. If messages are tallied, a new user would always have less messages than an old user. Therefore, separating new and old users would be advised. When this happens, novelty effect could be measured. For example, if a new feature is introduced to an old user, would this cause increased engagement?
3. Users treated incorrectly.
Isolating new users to treatment group caused a problem. This made engagement results extremely low. Instead, there could have been mixing of new users with old users for the treatment group. This could have caused higher measurements in treatment group.
Treatments by Month Activated:
Refer to attached file #2 for SQL code.
When new user data was taken out and existing user data was kept to calulate average messages sent, the difference between control and treatment groups was significantly less than test results gathered from mixed groups used in control and treatment groups.
4. Confounding factors.
There could be multiple problems in test results. Thus, it is advised to explore all possible reasons. Also, sometimes problems can be related. Thus, it helps in having a comprehensive perspective and full understanding of a experiment’s results to help in advising different teams when presenting results.
Other relationships to check out are users who use different devices or users who are are different user types. When separating groups based on these new labels, there could be different test results worthy to explore and make own conclusions on in order to have more informed decision.
AVERAGE LOGIN'S & AVERAGE DAYS ENGAGED
--First block of code from SELECT to FROM sets up the columns in the table. New statistics columns are included.
SELECT c.experiment,
c.experiment_group,
c.users,
c.total_treated_users,
ROUND(c.users/c.total_treated_users,4) AS treatment_percent,
c.total,
ROUND(c.average,4)::FLOAT AS average,
ROUND(c.average - c.control_average,4) AS rate_difference,
ROUND((c.average - c.control_average)/c.control_average,4) AS rate_lift,
ROUND(c.stdev,4) AS stdev,
ROUND((c.average - c.control_average) /
SQRT((c.variance/c.users) + (c.control_variance/c.control_users))
,4) AS t_stat,
(1 - COALESCE(nd.value,1))*2 AS p_value
FROM (
--Another block from SELECT to FROM creates more stat's columns.
SELECT *,
MAX(CASE WHEN b.experiment_group = 'control_group' THEN b.users ELSE NULL END) OVER () AS control_users,
MAX(CASE WHEN b.experiment_group = 'control_group' THEN b.average ELSE NULL END) OVER () AS control_average,
MAX(CASE WHEN b.experiment_group = 'control_group' THEN b.total ELSE NULL END) OVER () AS control_total,
MAX(CASE WHEN b.experiment_group = 'control_group' THEN b.variance ELSE NULL END) OVER () AS control_variance,
MAX(CASE WHEN b.experiment_group = 'control_group' THEN b.stdev ELSE NULL END) OVER () AS control_stdev,
SUM(b.users) OVER () AS total_treated_users
FROM (
--Another block from SELECT to FROM creates more stat's columns.
SELECT a.experiment,
a.experiment_group,
COUNT(a.user_id) AS users,
AVG(a.metric) AS average,
SUM(a.metric) AS total,
STDDEV(a.metric) AS stdev,
VARIANCE(a.metric) AS variance
FROM (
--Another block from SELECT to FROM with second use of CASE.
SELECT ex.experiment,
ex.experiment_group,
ex.occurred_at AS treatment_start,
u.user_id,
u.activated_at,
COUNT(CASE WHEN e.event_name = 'login' THEN e.user_id ELSE NULL END) AS metric
FROM (SELECT user_id,
experiment,
experiment_group,
occurred_at
FROM tutorial.yammer_experiments
WHERE experiment = 'publisher_update'
) ex
--Here, many tables with common columns are connected together in order to condense data.
JOIN tutorial.yammer_users u
ON u.user_id = ex.user_id
JOIN tutorial.yammer_events e
ON e.user_id = ex.user_id
AND e.occurred_at >= ex.occurred_at
AND e.occurred_at < '2014-07-01'
AND e.event_type = 'engagement'
GROUP BY 1,2,3,4,5
) a
GROUP BY 1,2
) b
) c
--Here, normal distribution is included and relevcant stat's.
LEFT JOIN benn.normal_distribution nd
ON nd.score = ABS(ROUND((c.average - c.control_average)/SQRT((c.variance/c.users) + (c.control_variance/c.control_users)),3))
TREATMENTS BY ACTIVATION MONTH
--Block of code from SELECT to FROM sets up columns that isolate control group and treatment group per activation month.
SELECT DATE_TRUNC('month',u.activated_at) AS month_activated,
COUNT(CASE WHEN e.experiment_group = 'control_group' THEN u.user_id ELSE NULL END) AS control_users,
COUNT(CASE WHEN e.experiment_group = 'test_group' THEN u.user_id ELSE NULL END) AS test_users
FROM tutorial.yammer_experiments e
--Combining tables using user information.
JOIN tutorial.yammer_users u
ON u.user_id = e.user_id
GROUP BY 1
ORDER BY 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment