This documents describes how to use a Monte Carlo simulation to compute the probability that in a random sample of 50 individuals, 4 have a common birthday and all others have distinct birthdays (i.e 46 unique birthdays + 4 common birthdays). This is a variant of the birthday problem.
Download the R script or read on for explanations.
A single simulation (trial) of the problem consists of these steps:
- Draw a random sample of 50 integers between 1 and 365 with replacement. These represent the birthdays of the 50 individuals
- Compute the frequency of each value
- Check if the most frequent value occurs 4 times and the rest occur only once
To estimate the probability, generate many such trials and compute the fraction of trials that satisfy the condition. This fraction is a Monte Carlo estimate of the probability
This function generates a random sample of 50 birthdays and returns TRUE
if the sample satisfies the condition.
bdayOneTrial = function(){
# Simulate a single trial
bdays = sample.int(365, size=50, replace=T)
counts = as.data.frame(table(bdays))
sorted.counts = sort(counts$Freq, decreasing=T)
success = sorted.counts[1]==4 && sorted.counts[2]==1
return(success)
}
This function simulates multiple trials and computes the fraction of trials that satisfy the condition
bdayMC = function(n.trials){
# Simulate multiple trials of the birthday problem and compute probability of success
outcomes = replicate(n.trials, bdayOneTrial())
return(sum(outcomes)/n.trials)
}
Estimate the probability from 10,000 MC trials
bdayMC(10000)
Estimate the probability using 1000, 10,000 and 100,000 MC trials
trial.size = c(1000, 10000, 100000)
sapply(trial.size, bdayMC)
To compare the simulation results with theory, compute the probability from first principles (using logs to avoid overflow errors)
logP = lchoose(50, 4) + lfactorial(364) - lfactorial(364-46) - 49*log(365)
exp(logP)
The answer is 0.0002141902
.