Skip to content

Instantly share code, notes, and snippets.

@arsatiki
Created November 26, 2011 09:16
Show Gist options
  • Save arsatiki/1395348 to your computer and use it in GitHub Desktop.
Save arsatiki/1395348 to your computer and use it in GitHub Desktop.
A/B test result computation. Simulation and exact versions.
from __future__ import division
from random import betavariate
import math
import sys
def sample(s1, f1, s2, f2):
p1 = betavariate(s1 + 1, f1 + 1)
p2 = betavariate(s2 + 1, f2 + 1)
return 1 if p1 > p2 else 0
def odds(p):
return p / (1 - p)
def main():
if len(sys.argv) != 6:
sys.exit("Usage: %s n_iter succ1 fail2 succ2 fail2" % sys.argv[0])
total, s1, f1, s2, f2 = map(int, sys.argv[1:])
count = sum(sample(s1, f1, s2, f2) for k in range(total))
o = odds(count / total)
b = 10 * math.log(o, 10)
print odds(count / total), "to 1 or", b, "dB"
if __name__ == "__main__":
main()
@lrasinen
Copy link

math.factorial() is quite inefficient; working with logarithms makes things somewhat faster.

 from math import lgamma, exp

def h(a, b, c, d):
    lnum = lgamma(a + c) + lgamma(b + d) + lgamma(a + b) + lgamma(c + d)
    lden = lgamma(a) + lgamma(b) + lgamma(c) + lgamma(d) + lgamma(a + b + c + d)

    return exp(lnum - lden)

def g0(a, b, c):
    return exp(lgamma(a + b) + lgamma(a + c) / (lgamma(a + b + c) + lgamma(a)))

@arsatiki
Copy link
Author

See also http://www.johndcook.com/UTMDABTR-005-05.pdf for permutation tricks.

@chris-hailstorm
Copy link

Any pointers on how to adapt for >2 comparisons? I.e. A-B-C-D testing.

@vsmelov
Copy link

vsmelov commented Jun 25, 2018

Thnx for your real-code implementations with explanations )
Dont you think to move your great article https://web.archive.org/web/20130331072458/http://arsatiki.posterous.com/bayesian-ab-testing-with-theory-and-code to github too?

Also, I would like to ask:
How to calculate 2-type error? (false-negative)
How to calculate sample size for given confidence?

@vlavorini
Copy link

Thank you very much for this code!

Adding to @lrasinen argumentation: working directly with the factorial can lead to rounding (and, finally, NaNs) issues, and so better to pass to natural logarithms and exponentials.
As example: try with s1=s2=1 and f1=f2=10000, the original code will return a NaN, while with logarithm it gives the right answer.

By the way, there is a typo:

def g0(a, b, c): return exp(lgamma(a + b) + lgamma(a + c) / (lgamma(a + b + c) + lgamma(a)))

should be:

def g0(a, b, c):    
    return np.exp(lgamma(a + b) + lgamma(a + c) - (lgamma(a + b + c) + lgamma(a)))

More, in order to have a huge perfomance improvement (20x when the parameters are of the order of 10, 1300x when the parameters are of the order of 100) Numba comes to help.

Finally, this is the modified version:

from math import lgamma
from numba import jit



@jit
def h(a, b, c, d):
    num = lgamma(a + c) + lgamma(b + d) + lgamma(a + b) + lgamma(c + d)
    den = lgamma(a) + lgamma(b) + lgamma(c) + lgamma(d) + lgamma(a + b + c + d)

    return np.exp(num - den)
@jit
def g0(a, b, c):    
    return np.exp(lgamma(a + b) + lgamma(a + c) - (lgamma(a + b + c) + lgamma(a)))

@jit
def hiter(a, b, c, d):
    while d > 1:
        d -= 1
        yield h(a, b, c, d) / d

@lucidyan
Copy link

lucidyan commented Feb 4, 2020

For those of you who interested in all three parts, I restored them from the webarchive and convert to markdown. Enjoy!
https://gist.github.com/lucidyan/89eee0db8ce353d91e6bfbc51b5dcf19

Also Evan Miller wrote another great article about it:
https://www.evanmiller.org/bayesian-ab-testing.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment