Skip to content

Instantly share code, notes, and snippets.

@dmackinnon1
Created March 15, 2017 17:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dmackinnon1/03cfa8b325509754f46dd02a8fd7f921 to your computer and use it in GitHub Desktop.
Save dmackinnon1/03cfa8b325509754f46dd02a8fd7f921 to your computer and use it in GitHub Desktop.
Exploring the Birthday Problem in R
---
title: "Birthday"
author: "Dan MacKinnon"
date: "March 15, 2017"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
##Introduction
The Birthday Problem, as frequently happens in questions of probability, is easy to state and understand, yet leads to an (initially) suprising solution.
> In a group of *n* people, how likely is it to find two or more people sharing the same birthday?
Many people are surprised to find out that in a group of 23 people, there is just over a fifty percent chance of finding a birthday match, and in a group of 60 people, you are almost certain to find two or more people sharing a birthday.
## Direct Calculation
If $B_n$ is the event that we find two or more people in a group of $n$ that share a birthday, then $\neg B_n$, is the probability that there are no birthday matches in a group of $n$ people, and the probability of this event is a bit easier to calculate.
In a group of one, $n = 1$, there is no chance to find two or more people that share a birthday. In a group of two, the probabability of not sharing a birthday is $p(\neg B_2) = \frac{364}{365}$. For a group of three, we have $p(\neg B_2) = \frac{364}{365} \times \frac{363}{365}$, and in general:
$$p(\neg B_n) = \prod_{i=1}^n \frac{366-i}{365}$$
We can set up a data frame **BCalc** in R to calculate this directly.
```{r}
group_size <- 60
group <- 1:group_size
BCalc <- data.frame(group)
BCalc$noMatch[1] = 1;
for (i in 2:group_size) {
BCalc$noMatch[i] <- BCalc$noMatch[i-1]*(366-i)/365
}
BCalc$match <- 1 - BCalc$noMatch
```
Plotting this relationship between the probability of a birthday match and the size of the group involved gives a characteristic sigmoid or logistic curve.
```{r}
plot(BCalc$group, BCalc$match,main='probability of a birthday match', xlab = 'group size', ylab='probability of a match')
lines(BCalc$group, BCalc$match)
```
This gives the somewhat surprising result that, in a group of 23 people there is over a fifity percent chance of finding two people with the same birthday.
```{r}
BCalc$match[23]
```
By the time you have sixty people together, it is almost a sure thing to find a match.
```{r}
BCalc$match[60]
```
## Setting up the Simulation
A simulation can be set up by randomly generating groups of numbers between 1 and 365, and checking to see if there are matches. For each group size between 1 and 60, we'll generate 100 groups and check for collisions, the results will be cases in the **BSim** data frame.
```{r}
group_size <- 60
trial_limit <-100
trial_index <- 1:(group_size * trial_limit)
BSim <- data.frame(trial_index)
current_index <- 1;
for (i in 1:group_size) {
for (k in 1:trial_limit) {
t <- sample(1:365, i, replace=TRUE)
BSim$groupSize[current_index] <- i
BSim$foundMatch[current_index] <- as.logical(anyDuplicated(t))
current_index <- current_index +1
}
}
```
## Exploring the Simulation
For each group size, we can find the proportion of groups that contained a match, and plot the results in comparison to the expected curve from the **BCalc** data frame.
```{r}
tbl <- table(BSim$groupSize,BSim$foundMatch)
mtrx <- as.matrix(prop.table(tbl,1)[,2])
plot(1:group_size, mtrx[,1],main='proportion of found birthday matches', xlab = 'group size', ylab='proportion with a match')
lines(BCalc$group, BCalc$match)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment