Skip to content

Instantly share code, notes, and snippets.

@msalganik
Created December 21, 2016 02:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save msalganik/064678b4eb7625e3ecb25e8a65eff38b to your computer and use it in GitHub Desktop.
Save msalganik/064678b4eb7625e3ecb25e8a65eff38b to your computer and use it in GitHub Desktop.
improved activity for Chapter 4: activity inspired by Lewis and Rao (2015) about power analysis

[very hard, requires coding, my favorite] Imagine that you are working as a data scientist at a tech company. Someone from the marketing department asks for your help in evaluating an experiment that they are planning in order to measure the Return on Investment (ROI) for a new online ad campaign. ROI is defined to be the net profit from the campaign divided by the cost of the campaign. For example, a campaign that had no effect on sales would have an ROI of -100%; a campaign where profits generated were equal to costs would have an ROI of 0; and a campaign where profits generated were double the cost would have an ROI of 200%.

Before launching the experiment, the marketing department provides you with the following information based on their earlier research (in fact, these values are typical of the real online ad campaigns reported in Lewis and Rao [-@lewis_unfavorable_2015])

  • the mean sales per customer follows a log-normal distribution with a mean of $7 with a standard deviation of $75.
  • the campaign is expected to increase sales by $0.35 per customer which corresponds to an increase in profit of $0.175 per customer.
  • the planned size of the experiment is 200,000 people, half in the treatment group and half in the control group.
  • the cost of the campaign is $0.14 per participant.
  • The expected ROI for the campaign is 25% [(0.175 - 0.14)/0.14]. In other words, the marketing department believes that for each 100 dollars spent on marketing, the company will earn an additional 25 dollars in profit.

Write a memo evaluating this proposed experiment. Your memo should address two major issues:

  1. Would you recommend launching this experiment as planned? If so, why? If not, why not? Be sure to be clear about the criteria that you are using to make this decision.
  2. What sample size would you recommend for this experiment? Again, please be sure to be clear about the criteria that you are using to make this decision.

A good memo will address this specific case; a better memo will generalize from this case in one way (e.g., show how the decision changes as a function of the size of the effect of the campaign); and a great memo will present a fully generalized result. Your memo should use graphs to help illustrate your results.

Here are two hints. First, all the information that marketing department provided might not be neccesary and all the neccesary information might not be provided. Second, if you are using R, be aware that the rlnorm()function does not work the way that many people expect.

This activity will give you practice with power analysis, creating simulations, and communication your results with words and graphs. It should help you conduct power analysis for any kind of experiment, not just experiments designed to estimate ROI. This activity assumes that you have some experience with statistical testing and power analysis. If you are not familiar with power analysis, I recommend that you read "A Power Primer" by Jacob Cohen, which was published in Psychological Bulletin in [@cohen_power_1992].

This activity was inspired by a lovely paper by @lewis_unfavorable_2015, which vividly illustrates a fundamental statistical limitation of even massive experiments. Their paper---which originally had the provocative title "On the Near-impossibility of Measuring the Returns to Advertising"---shows how difficult it is to measure the return on investment of online ads, even with digital experiments involving millions of customers. More generally, @lewis_unfavorable_2015 illustrates a fundemental statistical fact that is particularly important for digtal age experiments: it is hard to estimate small treatment effect amidst noisy outcome data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment