-
-
Save arsatiki/1395348 to your computer and use it in GitHub Desktop.
from __future__ import division | |
from random import betavariate | |
import math | |
import sys | |
def sample(s1, f1, s2, f2): | |
p1 = betavariate(s1 + 1, f1 + 1) | |
p2 = betavariate(s2 + 1, f2 + 1) | |
return 1 if p1 > p2 else 0 | |
def odds(p): | |
return p / (1 - p) | |
def main(): | |
if len(sys.argv) != 6: | |
sys.exit("Usage: %s n_iter succ1 fail2 succ2 fail2" % sys.argv[0]) | |
total, s1, f1, s2, f2 = map(int, sys.argv[1:]) | |
count = sum(sample(s1, f1, s2, f2) for k in range(total)) | |
o = odds(count / total) | |
b = 10 * math.log(o, 10) | |
print odds(count / total), "to 1 or", b, "dB" | |
if __name__ == "__main__": | |
main() |
See also http://www.johndcook.com/UTMDABTR-005-05.pdf for permutation tricks.
Any pointers on how to adapt for >2 comparisons? I.e. A-B-C-D testing.
Thnx for your real-code implementations with explanations )
Dont you think to move your great article https://web.archive.org/web/20130331072458/http://arsatiki.posterous.com/bayesian-ab-testing-with-theory-and-code to github too?
Also, I would like to ask:
How to calculate 2-type error? (false-negative)
How to calculate sample size for given confidence?
Thank you very much for this code!
Adding to @lrasinen argumentation: working directly with the factorial can lead to rounding (and, finally, NaNs) issues, and so better to pass to natural logarithms and exponentials.
As example: try with s1=s2=1
and f1=f2=10000
, the original code will return a NaN
, while with logarithm it gives the right answer.
By the way, there is a typo:
def g0(a, b, c): return exp(lgamma(a + b) + lgamma(a + c) / (lgamma(a + b + c) + lgamma(a)))
should be:
def g0(a, b, c):
return np.exp(lgamma(a + b) + lgamma(a + c) - (lgamma(a + b + c) + lgamma(a)))
More, in order to have a huge perfomance improvement (20x when the parameters are of the order of 10, 1300x when the parameters are of the order of 100) Numba comes to help.
Finally, this is the modified version:
from math import lgamma
from numba import jit
@jit
def h(a, b, c, d):
num = lgamma(a + c) + lgamma(b + d) + lgamma(a + b) + lgamma(c + d)
den = lgamma(a) + lgamma(b) + lgamma(c) + lgamma(d) + lgamma(a + b + c + d)
return np.exp(num - den)
@jit
def g0(a, b, c):
return np.exp(lgamma(a + b) + lgamma(a + c) - (lgamma(a + b + c) + lgamma(a)))
@jit
def hiter(a, b, c, d):
while d > 1:
d -= 1
yield h(a, b, c, d) / d
For those of you who interested in all three parts, I restored them from the webarchive and convert to markdown. Enjoy!
https://gist.github.com/lucidyan/89eee0db8ce353d91e6bfbc51b5dcf19
Also Evan Miller wrote another great article about it:
https://www.evanmiller.org/bayesian-ab-testing.html
math.factorial() is quite inefficient; working with logarithms makes things somewhat faster.