Skip to content

Instantly share code, notes, and snippets.

@dmackinnon1
Last active April 27, 2017 20:46
Show Gist options
  • Save dmackinnon1/398f0cac96e6b3cfad9add6e37fd872b to your computer and use it in GitHub Desktop.
Save dmackinnon1/398f0cac96e6b3cfad9add6e37fd872b to your computer and use it in GitHub Desktop.
---
title: "Binomial and Negative Binomial Simulation"
author: "Dan MacKinnon"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Coin flipping counts
This R file simulates flipping coins, observing how many heads we see and how long we have to wait until we see a head.
First we create the list of 10 coin flips:
```{r}
test <-list(sample(0:1,10, replace=TRUE));
test
```
Next we find out how many were heads by summing the list (1=head, 0=tail)
```{r}
headCount <- sum(unlist(test))
headCount
```
Next we find out which entry in our list was the first head.
```{r}
firstHead <- which(unlist(test) == 1)[1]
firstHead
```
Let's do this same experiment with more coin flips, and repeat the experiment many times.
```{r}
trial_limit <- 5000
trial <- 1:trial_limit
n <- 100
BinSim <-data.frame(trial)
```
In this siulation, we will be repeating an experiment `r trial_limit` times. The experiment consists of flipping a coin `r n` times, and counting how many times we get a 'heads'. Internally, we are using the numbers 0 and 1 to represent tails and heads, respectively.
```{r}
for (i in trial) {
BinSim$tries[i] <- list(sample(0:1,n, replace=TRUE))
BinSim$sum[i] <- sum(unlist(BinSim$tries[i]))
BinSim$first[i] <- which(unlist(BinSim$tries[i]) == 1)[1];
}
```
How are the head counts distributed?
The theoretical probability (probability mass function) for $k$ successes in $n$ trials is:
$$ P(X = k) = \binom{n}{k}p^k(1-p)^{n-k}$$
Since this is a binomial distribution, we should be able to calculate the expected values for the mean and variance, and compare this to what was actually observed in the experiment.
We expect the mean to be
$$\mu = p \times n $$
$$\mu = 0.5 \times 100 $$
$$\mu = 50 $$.
```{r}
mean(BinSim$sum)
```
We expect the variance to be
$$\sigma^2 = p( 1 - p) \times n $$
$$\sigma^2 = 0.5( 1 - 0.5) \times 100 $$
$$\sigma^2 = 25 $$.
```{r}
var(BinSim$sum)
```
```{r}
hist(BinSim$sum, breaks=c(0:n), plot=TRUE, col="lightgrey", xlab="success count", main="success frequency", freq = FALSE)
lines(density(BinSim$sum), lwd = 2,col = "orange")
abline(v = mean(BinSim$sum),col = "blue",lwd = 3)
abline(v = median(BinSim$sum), col = "red", lwd = 2)
legend(x = "topright", c("Density plot", "Mean", "Median"), col = c("orange", "blue", "red"),lwd = c(2, 2, 2))
```
Because the normal distribution is a close approximation to the binomial distribution, we should be able to plot a normal curve with the theoretical mean and standard deviation and obtain something very close to the actual density curve shown above.
```{r}
hist(BinSim$sum, breaks=c(0:n), plot=TRUE, col="lightgrey", xlab="success count", main="success frequency", freq = FALSE)
curve(dnorm(x, mean=50, sd=5), col="darkblue", lwd=2, add=TRUE, yaxt="n")
legend(x = "topright", c("Normal approximation"),col = c("dark blue"),lwd = c(2))
```
You can randomly generate a normal distribution in R using the *rnorm* function.
```{r}
generatedNormal <- rnorm(n = trial_limit, mean = 50, sd = 5)
hist(generatedNormal,breaks=c(0:n), plot=TRUE, col="lightgrey", xlab="success count", main="generated normal data", freq = FALSE)
curve(dnorm(x, mean=50, sd=5), col="darkblue", lwd=2, add=TRUE, yaxt="n")
```
## Comparing with R built in binomial functions
Let's use the *rbinom* function to achieve the same thing we did earlier `r trial_limit` observations of
```{r}
randBinom <- rbinom(trial_limit, n, 0.5)
hist(randBinom, breaks=c(0:n), plot=TRUE, col="lightgrey", xlab="successful event count", main="random binomial", freq = FALSE)
curve(dbinom(x, 100, 0.5), col="darkblue", lwd=2, add=TRUE, yaxt="n")
```
## Waiting for first success
How long do we have to wait for success, usually? Not long - but the distribution shows that sometimes you may end up with a run of tails before you encounter your first head.
```{r}
flipsBeforeSuccess <- table(BinSim$first)/trial_limit
barplot(flipsBeforeSuccess)
```
In general, the *negative binomial distribution* is a discrete probability distribution of the number of failures/successes in a sequence of Bernoulli trials before a specified number of successes/failures.
In this case, we want to know how many tails (failures) occur before the first head (success). The number of tails is unknown, the number of heads is specified (r = 1).
The probability of having $k$ failures before $r$ successes if the probability of success is $p$, then we have:
$$ P(X = k) = \binom{k + r - 1}{k}p^r(1-p)^k$$
In our case, this simplifies down to:
$$ P(X = k) = \left ( \frac{1}{2} \right )^{k+1}$$
Let's compare with R's built in random negative binomial generator.
```{r}
randNegBinom <- table(rnbinom(n=5000, 1, 1/2))
barplot(flipsBeforeSuccess)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment