Skip to content

Instantly share code, notes, and snippets.

@dlebech
Last active August 15, 2022 20:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dlebech/5e2885e852bdca38d99c27a55fb6636c to your computer and use it in GitHub Desktop.
Save dlebech/5e2885e852bdca38d99c27a55fb6636c to your computer and use it in GitHub Desktop.
Binomial probability calculation function in SQL (BigQuery)
-- Public Domain CC0 license. https://creativecommons.org/publicdomain/zero/1.0/
-- Calculate the probability of k successes for n trials with probability of success k,
-- using the binomial distribution.
-- Calculate the binomial coefficient using the "multiplicative formula"
CREATE OR REPLACE FUNCTION functions.binomial_coef(n INT64, k INT64) AS ((
-- k!/(n!*(n-k)!)
-- We're going to have a hard time doing factorials here,
-- but based on the "multiplicative formula" in Wiki, it should be possible:
-- https://en.wikipedia.org/wiki/Binomial_coefficient#Computing_the_value_of_binomial_coefficients
-- Taken together with the fact that we can do EXP(SUM(LOG(x))) to do product over multiple rows, we have:
SELECT
-- Binomial coefficient is always an integer,
-- but BigQuery is not super happy about EXP(SUM(LN(x))) trick,
-- so the ROUND is added to be safe :-)
ROUND(EXP(SUM(LN((n + 1 - i) / i))))
FROM
-- Take advantage of symmetry (n // k) == (n // (n-k))
UNNEST(GENERATE_ARRAY(1, IF(k > n - k, n - k, k), 1)) i
));
-- Calculate the probability of k successes in n trials where the probability of a success is p.
-- This uses the probability density function for the binomial distribution.
-- See example below.
CREATE OR REPLACE FUNCTION functions.binomial_prob(n INT64, k INT64, p FLOAT64) AS (
functions.binomial_coef(n, k) * POW(p, k) * POW(1-p, n-k)
);
-- The example from Wikipedia: https://en.wikipedia.org/wiki/Binomial_distribution#Example
-- Probability of getting 4 heads in 6 tosses with a coin that is biased and has heads probability 0.3
-- Calculates to 0.05953499999999999
SELECT functions.binomial_prob(6, 4, 0.3);
@benman1
Copy link

benman1 commented Aug 13, 2022

Thanks for getting back. I liked the idea, btw. I tried another implementation, which also doesn't work.

My use case is a bit different from this actually - I needed a two sample test. I've implemented this in javascript (based on a Python implementation) and then added an approximation of the normal cdf to get the p-values that I found. I don't think I can share this though (corporate).

@dlebech
Copy link
Author

dlebech commented Aug 15, 2022

Aha, that makes sense @benman1 I'm happy to hear it worked out for you with a different solution 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment