dmackinnon1/binomial_1.rmd

## binomial_1.rmd
---
title: "Binomial and Negative Binomial Simulation"
author: "Dan MacKinnon"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```


## Coin flipping counts

This R file simulates flipping coins, observing how many heads we see and how long we have to wait until we see a head.

First we create the list of 10 coin flips:
```{r}
test <-list(sample(0:1,10, replace=TRUE));
test
```

Next we find out how many were heads by summing the list (1=head, 0=tail)
```{r}
headCount <- sum(unlist(test))
headCount
```

Next we find out which entry in our list was the first head.
```{r}
firstHead <- which(unlist(test) == 1)[1]
firstHead
```

Let's do this same experiment with more coin flips, and repeat the experiment many times.

```{r}
trial_limit <- 5000
trial <- 1:trial_limit
n <- 100
BinSim <-data.frame(trial)
```
In this siulation, we will be repeating an experiment `r trial_limit` times. The experiment consists of flipping a coin `r n` times, and counting how many times we get a 'heads'. Internally, we are using the numbers 0 and 1 to represent tails and heads, respectively.

```{r}
for (i in trial) {
  BinSim$tries[i] <- list(sample(0:1,n, replace=TRUE))
  BinSim$sum[i] <- sum(unlist(BinSim$tries[i]))
  BinSim$first[i] <- which(unlist(BinSim$tries[i]) == 1)[1];
}
```

How are the head counts distributed?

The theoretical probability (probability mass function) for $k$ successes in $n$ trials is:

$$ P(X = k) = \binom{n}{k}p^k(1-p)^{n-k}$$

Since this is a binomial distribution, we should be able to calculate the expected values for the mean and variance, and compare this to what was actually observed in the experiment.

We expect the mean to be
$$\mu = p \times n $$
$$\mu = 0.5 \times 100 $$
$$\mu = 50 $$.

```{r}
mean(BinSim$sum)
```

We expect the variance to be

$$\sigma^2 = p( 1 - p) \times n $$
$$\sigma^2 = 0.5( 1 - 0.5) \times 100 $$
$$\sigma^2 = 25 $$.

```{r}
var(BinSim$sum)
```


```{r}
hist(BinSim$sum, breaks=c(0:n), plot=TRUE, col="lightgrey", xlab="success count", main="success frequency", freq = FALSE)

lines(density(BinSim$sum), lwd = 2,col = "orange")

abline(v = mean(BinSim$sum),col = "blue",lwd = 3)

abline(v = median(BinSim$sum), col = "red", lwd = 2)

legend(x = "topright",  c("Density plot", "Mean", "Median"), col = c("orange", "blue", "red"),lwd = c(2, 2, 2))


```

Because the normal distribution is a close approximation to the binomial distribution, we should be able to plot a normal curve with the theoretical mean and standard deviation and obtain something very close to the actual density curve shown above.

```{r}
hist(BinSim$sum, breaks=c(0:n), plot=TRUE, col="lightgrey", xlab="success count", main="success frequency", freq = FALSE)

curve(dnorm(x, mean=50, sd=5), col="darkblue", lwd=2, add=TRUE, yaxt="n")

legend(x = "topright", c("Normal approximation"),col = c("dark blue"),lwd = c(2))
```

You can randomly generate a normal distribution in R using the *rnorm* function.
```{r}
generatedNormal <- rnorm(n = trial_limit, mean = 50, sd = 5)
hist(generatedNormal,breaks=c(0:n), plot=TRUE, col="lightgrey", xlab="success count", main="generated normal data", freq = FALSE)

curve(dnorm(x, mean=50, sd=5), col="darkblue", lwd=2, add=TRUE, yaxt="n")
```


## Comparing with R built in binomial functions

Let's use the *rbinom* function to achieve the same thing we did earlier `r trial_limit` observations of

```{r}
randBinom <- rbinom(trial_limit, n, 0.5)
hist(randBinom, breaks=c(0:n), plot=TRUE, col="lightgrey", xlab="successful event count", main="random binomial", freq = FALSE)

curve(dbinom(x, 100, 0.5), col="darkblue", lwd=2, add=TRUE, yaxt="n")
```


## Waiting for first success

How long do we have to wait for success, usually? Not long - but the distribution shows that sometimes you may end up with a run of tails before you encounter your first head.

```{r}
flipsBeforeSuccess <- table(BinSim$first)/trial_limit
barplot(flipsBeforeSuccess)
```

In general, the *negative binomial distribution* is a discrete probability distribution of the number of failures/successes in a sequence of Bernoulli trials before a specified number of successes/failures.

In this case, we want to know how many tails (failures) occur before the first head (success). The number of tails is unknown, the number of heads is specified (r = 1).

The probability of having $k$ failures before $r$ successes if the probability of success is $p$, then we have:

$$ P(X = k) = \binom{k + r - 1}{k}p^r(1-p)^k$$

In our case, this simplifies down to:

$$ P(X = k) = \left ( \frac{1}{2} \right )^{k+1}$$

Let's compare with R's built in random negative binomial generator.
```{r}
randNegBinom <- table(rnbinom(n=5000, 1, 1/2))
barplot(flipsBeforeSuccess)
```
	---
	title: "Binomial and Negative Binomial Simulation"
	author: "Dan MacKinnon"
	output: html_document
	---
	```{r setup, include=FALSE}
	knitr::opts_chunk$set(echo = TRUE)
	```


	## Coin flipping counts

	This R file simulates flipping coins, observing how many heads we see and how long we have to wait until we see a head.

	First we create the list of 10 coin flips:
	```{r}
	test <-list(sample(0:1,10, replace=TRUE));
	test
	```

	Next we find out how many were heads by summing the list (1=head, 0=tail)
	```{r}
	headCount <- sum(unlist(test))
	headCount
	```

	Next we find out which entry in our list was the first head.
	```{r}
	firstHead <- which(unlist(test) == 1)[1]
	firstHead
	```

	Let's do this same experiment with more coin flips, and repeat the experiment many times.

	```{r}
	trial_limit <- 5000
	trial <- 1:trial_limit
	n <- 100
	BinSim <-data.frame(trial)
	```
	In this siulation, we will be repeating an experiment `r trial_limit` times. The experiment consists of flipping a coin `r n` times, and counting how many times we get a 'heads'. Internally, we are using the numbers 0 and 1 to represent tails and heads, respectively.

	```{r}
	for (i in trial) {
	BinSim$tries[i] <- list(sample(0:1,n, replace=TRUE))
	BinSim$sum[i] <- sum(unlist(BinSim$tries[i]))
	BinSim$first[i] <- which(unlist(BinSim$tries[i]) == 1)[1];
	}
	```

	How are the head counts distributed?

	The theoretical probability (probability mass function) for $k$ successes in $n$ trials is:

	$$ P(X = k) = \binom{n}{k}p^k(1-p)^{n-k}$$

	Since this is a binomial distribution, we should be able to calculate the expected values for the mean and variance, and compare this to what was actually observed in the experiment.

	We expect the mean to be
	$$\mu = p \times n $$
	$$\mu = 0.5 \times 100 $$
	$$\mu = 50 $$.

	```{r}
	mean(BinSim$sum)
	```

	We expect the variance to be

	$$\sigma^2 = p( 1 - p) \times n $$
	$$\sigma^2 = 0.5( 1 - 0.5) \times 100 $$
	$$\sigma^2 = 25 $$.

	```{r}
	var(BinSim$sum)
	```


	```{r}
	hist(BinSim$sum, breaks=c(0:n), plot=TRUE, col="lightgrey", xlab="success count", main="success frequency", freq = FALSE)

	lines(density(BinSim$sum), lwd = 2,col = "orange")

	abline(v = mean(BinSim$sum),col = "blue",lwd = 3)

	abline(v = median(BinSim$sum), col = "red", lwd = 2)

	legend(x = "topright", c("Density plot", "Mean", "Median"), col = c("orange", "blue", "red"),lwd = c(2, 2, 2))


	```

	Because the normal distribution is a close approximation to the binomial distribution, we should be able to plot a normal curve with the theoretical mean and standard deviation and obtain something very close to the actual density curve shown above.

	```{r}
	hist(BinSim$sum, breaks=c(0:n), plot=TRUE, col="lightgrey", xlab="success count", main="success frequency", freq = FALSE)

	curve(dnorm(x, mean=50, sd=5), col="darkblue", lwd=2, add=TRUE, yaxt="n")

	legend(x = "topright", c("Normal approximation"),col = c("dark blue"),lwd = c(2))
	```

	You can randomly generate a normal distribution in R using the rnorm function.
	```{r}
	generatedNormal <- rnorm(n = trial_limit, mean = 50, sd = 5)
	hist(generatedNormal,breaks=c(0:n), plot=TRUE, col="lightgrey", xlab="success count", main="generated normal data", freq = FALSE)

	curve(dnorm(x, mean=50, sd=5), col="darkblue", lwd=2, add=TRUE, yaxt="n")
	```


	## Comparing with R built in binomial functions

	Let's use the rbinom function to achieve the same thing we did earlier `r trial_limit` observations of

	```{r}
	randBinom <- rbinom(trial_limit, n, 0.5)
	hist(randBinom, breaks=c(0:n), plot=TRUE, col="lightgrey", xlab="successful event count", main="random binomial", freq = FALSE)

	curve(dbinom(x, 100, 0.5), col="darkblue", lwd=2, add=TRUE, yaxt="n")
	```


	## Waiting for first success

	How long do we have to wait for success, usually? Not long - but the distribution shows that sometimes you may end up with a run of tails before you encounter your first head.

	```{r}
	flipsBeforeSuccess <- table(BinSim$first)/trial_limit
	barplot(flipsBeforeSuccess)
	```

	In general, the negative binomial distribution is a discrete probability distribution of the number of failures/successes in a sequence of Bernoulli trials before a specified number of successes/failures.

	In this case, we want to know how many tails (failures) occur before the first head (success). The number of tails is unknown, the number of heads is specified (r = 1).

	The probability of having $k$ failures before $r$ successes if the probability of success is $p$, then we have:

	$$ P(X = k) = \binom{k + r - 1}{k}p^r(1-p)^k$$

	In our case, this simplifies down to:

	$$ P(X = k) = \left ( \frac{1}{2} \right )^{k+1}$$

	Let's compare with R's built in random negative binomial generator.
	```{r}
	randNegBinom <- table(rnbinom(n=5000, 1, 1/2))
	barplot(flipsBeforeSuccess)
	```