Skip to content

Instantly share code, notes, and snippets.

@drdozer
Last active March 1, 2017 17:53
Show Gist options
  • Save drdozer/c9a21a8950fd519ca8c48c7c4cccdfc3 to your computer and use it in GitHub Desktop.
Save drdozer/c9a21a8950fd519ca8c48c7c4cccdfc3 to your computer and use it in GitHub Desktop.
Working with simple multiple trial probabilities
The question we're answering here is how many times we need to try something
to have some level of certainty that it will happen at least once.
For example, if a parent has children, each with independent mutations,
how many children are needed until the chances of at least one of those children carrying a particular mutation exceeds 90%.
First some notation.
p(e) :: probability of some event e happening in a single trial
p(¬e) :: probability that some event e does not happen
And as either something happens or it does not,
p(e) + p(¬e) = 1
=>
p(¬e) = 1-p(e)
Now let's say we do something lots of times:
p(n of e) :: probability of n independent events e
As the events are independent, the chances of them all happening is their product.
As we are taking the product of n things, all alike, we can rewrite this as a power.
p(n of e) = p(e)^n
We want to know how many events we need to look at, to expect to have a 50% chance (or some other chance) of observing at least one example of e happening.
The problem with this is that it may have happened once, or twice, or more times in those trials.
This sum gets quite complicated quite quickly.
Luckily though, is the same as asking how many events we need to look at, to have a (100% - 50% = 50%) chance of not observing any examples of e happening.
We can calculate this easily.
It's the chance that it didn't happen the first time, and didn't happen the second, and so on.
Given p(e), and a threshold c:
c = p(at least one e from n trials),
¬c is p(no e from n trials)
¬c = p(n of ¬e)
from independence of events:
= p(¬e)^n
We want to solve for n.
log both sides
ln(p(¬e)) * n = ln(¬c)
divide through
n = ln(¬c) / ln(p(¬e))
removing ¬ by complementary probabilities
n = ln(1 - c) / ln(1 - p(e))
Let's try it with some simple examples.
Say we have a fair coin.
So we can ask how many throws of that coin we need to get to expect a 50% chane that it will come up heads.
Now, we know directly that the answer is 1 throw, as it is a fair coin with just two outcomes one of which we are selecting (1/2=50%).
However, we can also show it with this formula:
n = ln(1 - c) / ln(1 - p(e))
c = 50%, p(e) = 50%
= ln(1-50%) / ln(1-50%)
x/x = 1
= 1
So, in a long-winded way, we've validated that a fair dice needs to be thrown once to have a 50% chance of coming up heads.
How many times must you roll a dice to have a 90% chance of a 6?
n = ln(1 - c) / ln(1 - p(e))
c = 90%, p(e) = 1/6
= ln(1 - 90%) / ln(1-1/6)
arithmetic
= ln(10%) / ln(5/6)
and using a calculator
~ 12.6
So if you roll a dice 12-13 times, you have a 90% chance of getting at least one 6.
Now, let's say we have a mutation that has a 1/16 chance of happening in any child, and we want to ask how many children a parent must have
before they have a 90% chance of at last one child carrying this mutation, we work it out in exactly the same way:
n = ln(1 - c) / ln(1 - p(e))
c = 90%, p(e) = 1/16
~35.7
So with 36 children, there's a greater than 90% chance that at least one child will carry that mutation.
In case you have scala installed, here's the scala code for calculating this:
// c_threshold - chance threshold
// p_e - probability of event e in a single trial
// returns the number of trials needed for the combined probability of e happening at least once
// to reach c_threshold
def estimateN(c_threshold: Double, p_e: Double) = Math.log(1.0 - c_threshold) / Math.log(1.0-p_e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment