{{ message }}

Instantly share code, notes, and snippets.

# dododas/birthdayMC.R

Last active Mar 9, 2017
Monte Carlo simulation for a non-standard birthday problem

## Monte Carlo simulation for a non-standard birthday problem

This documents describes how to use a Monte Carlo simulation to compute the probability that in a random sample of 50 individuals, 4 have a common birthday and all others have distinct birthdays (i.e 46 unique birthdays + 4 common birthdays). This is a variant of the birthday problem.

### Strategy

A single simulation (trial) of the problem consists of these steps:

1. Draw a random sample of 50 integers between 1 and 365 with replacement. These represent the birthdays of the 50 individuals
2. Compute the frequency of each value
3. Check if the most frequent value occurs 4 times and the rest occur only once

To estimate the probability, generate many such trials and compute the fraction of trials that satisfy the condition. This fraction is a Monte Carlo estimate of the probability

### R code

This function generates a random sample of 50 birthdays and returns `TRUE` if the sample satisfies the condition.

```bdayOneTrial = function(){
# Simulate a single trial
bdays = sample.int(365, size=50, replace=T)
counts = as.data.frame(table(bdays))
sorted.counts = sort(counts\$Freq, decreasing=T)
success = sorted.counts==4 && sorted.counts==1
return(success)
}```

This function simulates multiple trials and computes the fraction of trials that satisfy the condition

```bdayMC = function(n.trials){
# Simulate multiple trials of the birthday problem and compute probability of success
outcomes = replicate(n.trials, bdayOneTrial())
return(sum(outcomes)/n.trials)
}```

#### Usage:

Estimate the probability from 10,000 MC trials

`bdayMC(10000)`

Estimate the probability using 1000, 10,000 and 100,000 MC trials

```trial.size = c(1000, 10000, 100000)
sapply(trial.size, bdayMC)```

To compare the simulation results with theory, compute the probability from first principles (using logs to avoid overflow errors)

```logP = lchoose(50, 4) + lfactorial(364) - lfactorial(364-46) - 49*log(365)
exp(logP)```

The answer is `0.0002141902`.

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
 # Monte Carlo simulation to compute the probability that in a random sample of # 50 individuals, 4 share a birthday and the other 46 have distinct birthdays bdayOneTrial = function(){ # Simulate a single trial bdays = sample.int(365, size=50, replace=T) counts = as.data.frame(table(bdays)) sorted.counts = sort(counts\$Freq, decreasing=T) success = sorted.counts==4 && sorted.counts==1 return(success) } bdayMC = function(n.trials){ # Simulate multiple trials of the birthday problem and compute probability # of success outcomes = replicate(n.trials, bdayOneTrial()) return(sum(outcomes)/n.trials) } # Estimate the probability from 10,000 MC trials bdayMC(10000) # Estimate probability from MC simulations with increasing number of trials trial.size = c(1000, 10000, 100000) sapply(trial.size, bdayMC) # Compute the theoretical probability logP = lchoose(50, 4) + lfactorial(364) - lfactorial(364-46) - 49*log(365) exp(logP)