dododas/birthdayMC.R

## birthdayMC.md

      
    Raw
  

              birthdayMC.md
            
          
    Monte Carlo simulation for a non-standard birthday problem

This documents describes how to use a Monte Carlo simulation to compute the probability that in a random sample of 50 individuals, 4 have a common birthday and all others have distinct birthdays (i.e 46 unique birthdays + 4 common birthdays). This is a variant of the birthday problem.

Download the R script or read on for explanations.

Strategy

A single simulation (trial) of the problem consists of these steps:

Draw a random sample of 50 integers between 1 and 365 with replacement. These represent the birthdays of the 50 individuals
Compute the frequency of each value
Check if the most frequent value occurs 4 times and the rest occur only once

To estimate the probability, generate many such trials and compute the fraction of trials that satisfy the condition. This fraction is a Monte Carlo estimate of the probability
R code

This function generates a random sample of 50 birthdays and returns TRUE if the sample satisfies the condition.
bdayOneTrial = function(){
    # Simulate a single trial
    bdays = sample.int(365, size=50, replace=T)
    counts = as.data.frame(table(bdays))
    sorted.counts = sort(counts$Freq, decreasing=T)
    success = sorted.counts[1]==4 && sorted.counts[2]==1
    return(success)
}
This function simulates multiple trials and computes the fraction of trials that satisfy the condition
bdayMC = function(n.trials){
    # Simulate multiple trials of the birthday problem and compute probability of success
    outcomes = replicate(n.trials, bdayOneTrial())
    return(sum(outcomes)/n.trials)
}
Usage:

Estimate the probability from 10,000 MC trials
bdayMC(10000)
Estimate the probability using 1000, 10,000 and 100,000 MC trials
trial.size = c(1000, 10000, 100000)
sapply(trial.size, bdayMC)
To compare the simulation results with theory, compute the probability from first principles (using logs to avoid overflow errors)
logP = lchoose(50, 4) + lfactorial(364) - lfactorial(364-46) - 49*log(365)
exp(logP)
The answer is 0.0002141902.

  
## birthdayMC.R
# Monte Carlo simulation to compute the probability that in a random sample of
# 50 individuals, 4 share a birthday and the other 46 have distinct birthdays

bdayOneTrial = function(){
    # Simulate a single trial
    bdays = sample.int(365, size=50, replace=T)
    counts = as.data.frame(table(bdays))
    sorted.counts = sort(counts$Freq, decreasing=T)
    success = sorted.counts[1]==4 && sorted.counts[2]==1
    return(success)
}

bdayMC = function(n.trials){
    # Simulate multiple trials of the birthday problem and compute probability
    # of success
    outcomes = replicate(n.trials, bdayOneTrial())
    return(sum(outcomes)/n.trials)
}

# Estimate the probability from 10,000 MC trials
bdayMC(10000)

# Estimate probability from MC simulations with increasing number of trials
trial.size = c(1000, 10000, 100000)
sapply(trial.size, bdayMC)

# Compute the theoretical probability
logP = lchoose(50, 4) + lfactorial(364) - lfactorial(364-46) - 49*log(365)
exp(logP)
	# Monte Carlo simulation to compute the probability that in a random sample of
	# 50 individuals, 4 share a birthday and the other 46 have distinct birthdays

	bdayOneTrial = function(){
	# Simulate a single trial
	bdays = sample.int(365, size=50, replace=T)
	counts = as.data.frame(table(bdays))
	sorted.counts = sort(counts$Freq, decreasing=T)
	success = sorted.counts[1]==4 && sorted.counts[2]==1
	return(success)
	}

	bdayMC = function(n.trials){
	# Simulate multiple trials of the birthday problem and compute probability
	# of success
	outcomes = replicate(n.trials, bdayOneTrial())
	return(sum(outcomes)/n.trials)
	}

	# Estimate the probability from 10,000 MC trials
	bdayMC(10000)

	# Estimate probability from MC simulations with increasing number of trials
	trial.size = c(1000, 10000, 100000)
	sapply(trial.size, bdayMC)

	# Compute the theoretical probability
	logP = lchoose(50, 4) + lfactorial(364) - lfactorial(364-46) - 49*log(365)
	exp(logP)