Skip to content

Instantly share code, notes, and snippets.

@ruescasd
Created January 2, 2016 23:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ruescasd/fb721eda21c8282901f4 to your computer and use it in GitHub Desktop.
Save ruescasd/fb721eda21c8282901f4 to your computer and use it in GitHub Desktop.
Sparked by recent event in politics a lot of debate and controversy has occurred on the spanish blogosphere
around a seemingly simple question of probability:
What is the probability that a Yes/No election with 3030 voters results in a tie?
Before suggesting answers, let me make it clear that the main controversy has ocurred
when trying to answer this question in its barest form, without any additional
information besides its simplest formulation above, _plus_ a binomial model for
voter choices.
To make it doubly clear, this is all the information that defines the problem:
1) There are 3030 voters that can vote Yes or No.
2) The number of Yes and No votes follows a binomial distribution.
We know that a binomial model has two parameters, the number of events, and the probability
of success for each event.
X ~ Bin(n, p)
In this notation our question is answered by this piecewise function
P(Tie) =
P(X = n/2) if n is even
0 otherwise
All we need to do know is plug in the parameters and we're done. We have been
given n as 3030 in our problem definition. But wait a minute, what about p? The problem definition
states that the successes follow a binomial distribution, but we know nothing about p!
In order to create intuition, let me pose two related questions.
What is the probability of getting 5 heads when tossing a fair coin using a binomial model?
What is the probability of getting 5 heads when tossing a coin where the only thing we know about the coin
is that it can land heads or tails, using a binomial model?
In the first question we have been given information about the coin we are tossing, which we input into the binomial model.
In the second question, we know nothing about the coin, and therefore nothing about the second parameter p.
This is precisely the case with our election problem, we have no information on p. So how do we
proceed? In both cases the answer is the same, we must construct a version of the binomial that
allows us to represent this state of information. The beta-binomial probability distribution
comes to the rescue. From wikipedia:
"In probability theory and statistics, the beta-binomial distribution is a family of discrete
probability distributions on a finite support of non-negative integers arising when the probability
of success in each of a fixed or known number of Bernoulli trials is either unknown or random."
I hope something rang a bell when you saw the word "unknown" above, this is exactly our situation. What
we do, therefore, is to construct a non-informative prior over p that represents our lack of information
about said parameter. In the beta-binomial distribution this prior takes the form of a Beta distribution,
and the usual choice as non-informative prior is Beta(1, 1), with alpha = beta = 1. You can see
how this choice of prior makes sense in that it is uniform favoring no values of p:
<image of B(1,1)>
Having represented our state of knowledge about p as the choice of prior Beta(1, 1), and given that
the parameter n is 3030, we now have all the ingredients to calculate things in a way that is
consistent with our problem definition. We do this by using the probability mass function of the beta binomial:
<image of pmf)
@ruescasd
Copy link
Author

ruescasd commented Jan 2, 2016

We therefore want (given that 3030 is even)

P(X = 1515) =

(3030 choose 1515) * Beta[1515 + 1, 1515 +1] / Beta[1, 1]

= 1/3031

Does that fraction seem funny? That value is precisely one divided by the total number of possible election results. You can see this by considering that results can be number as Yes 0 - 3030 No all the way up to Yes 3030 - 0 No. And in fact, using a Beta(1, 1) prior all these results are given the same probability 1/3031. This should come as no surprise, given that as Ive said repeatedly, the problem definition is such that we know nothing about the election, meaning that we have no way to favour one result over the other.

The p = 0.5 mistake

In spite of all of the above, most of the people that analyzed our problem got another result, not 1/3031, but instead 0.0145. How did they get to this? Well, it seems that those who went this route did not know about the beta-binomial probability, and the beta prior that allows us to represent ignorance about p. Without these tools they made an unwarranted assumption: that the lack of information about p is equivalent to 100% certainty that p is 0.5. The source of that confusion is that

the probability of heads and tails for an unknown coin "happens" to be exactly the same as that for a coin which we know with 100% probability that it is fair. Let me restate that

P(heads for a coin toss we know nothing about) = 1 / 2
P(heads for a coin we know 100% to be fair) = 1 / 2

Because the value is shared, people make the conclusion that a series of coin tosses for a coin that we know nothing about is treated the same way as a series of coin tosses for which we know for sure that the coin is fair! Unfortunately the above coincidence does not reappear here:

P(n success for coin toss we know nothing about) != P(n success for coin toss we know 100% to be fair)

To convince you of the fact that setting p=0.5 corresponds to complete certainty, let's plot a few beta priors that show how our
non informative prior Beta(1, 1) progressively approximates the p=0.5 assumption as we reduce its uncertainty.

<plot Beta(10,10)>
<P(tie) = ?>

<plot Beta(100,100)>
<P(tie) = ?>

<plot Beta(1000,1000)>
<P(tie) = ?>

Prior knowledge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment