Skip to content

Instantly share code, notes, and snippets.

@dododas
Last active March 9, 2017 08:14
Show Gist options
  • Save dododas/26901a0017a9c3aeebf8 to your computer and use it in GitHub Desktop.
Save dododas/26901a0017a9c3aeebf8 to your computer and use it in GitHub Desktop.
Monte Carlo simulation for a non-standard birthday problem

Monte Carlo simulation for a non-standard birthday problem

This documents describes how to use a Monte Carlo simulation to compute the probability that in a random sample of 50 individuals, 4 have a common birthday and all others have distinct birthdays (i.e 46 unique birthdays + 4 common birthdays). This is a variant of the birthday problem.


Download the R script or read on for explanations.


Strategy

A single simulation (trial) of the problem consists of these steps:

  1. Draw a random sample of 50 integers between 1 and 365 with replacement. These represent the birthdays of the 50 individuals
  2. Compute the frequency of each value
  3. Check if the most frequent value occurs 4 times and the rest occur only once

To estimate the probability, generate many such trials and compute the fraction of trials that satisfy the condition. This fraction is a Monte Carlo estimate of the probability

R code

This function generates a random sample of 50 birthdays and returns TRUE if the sample satisfies the condition.

bdayOneTrial = function(){
    # Simulate a single trial
    bdays = sample.int(365, size=50, replace=T)
    counts = as.data.frame(table(bdays))
    sorted.counts = sort(counts$Freq, decreasing=T)
    success = sorted.counts[1]==4 && sorted.counts[2]==1
    return(success)
}

This function simulates multiple trials and computes the fraction of trials that satisfy the condition

bdayMC = function(n.trials){
    # Simulate multiple trials of the birthday problem and compute probability of success
    outcomes = replicate(n.trials, bdayOneTrial())
    return(sum(outcomes)/n.trials)
}

Usage:

Estimate the probability from 10,000 MC trials

bdayMC(10000)

Estimate the probability using 1000, 10,000 and 100,000 MC trials

trial.size = c(1000, 10000, 100000)
sapply(trial.size, bdayMC)

To compare the simulation results with theory, compute the probability from first principles (using logs to avoid overflow errors)

logP = lchoose(50, 4) + lfactorial(364) - lfactorial(364-46) - 49*log(365)
exp(logP)

The answer is 0.0002141902.

# Monte Carlo simulation to compute the probability that in a random sample of
# 50 individuals, 4 share a birthday and the other 46 have distinct birthdays
bdayOneTrial = function(){
# Simulate a single trial
bdays = sample.int(365, size=50, replace=T)
counts = as.data.frame(table(bdays))
sorted.counts = sort(counts$Freq, decreasing=T)
success = sorted.counts[1]==4 && sorted.counts[2]==1
return(success)
}
bdayMC = function(n.trials){
# Simulate multiple trials of the birthday problem and compute probability
# of success
outcomes = replicate(n.trials, bdayOneTrial())
return(sum(outcomes)/n.trials)
}
# Estimate the probability from 10,000 MC trials
bdayMC(10000)
# Estimate probability from MC simulations with increasing number of trials
trial.size = c(1000, 10000, 100000)
sapply(trial.size, bdayMC)
# Compute the theoretical probability
logP = lchoose(50, 4) + lfactorial(364) - lfactorial(364-46) - 49*log(365)
exp(logP)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment