ajp619/is609_hw4

## is609_hw4
---
title: "IS 609 Homework 4"
author: "Aaron Palumbo"
date: "Saturday, September 20, 2014"
output: pdf_document
---

Homework 4:

## Page 191: #3

Using Monte Carlo simulation, wirte an algorithm to calculate an approximation to $\pi$ by considering the number of random points selected inside the quarter circle

$$Q: x^2 + y^2 = 1, x \geq 0, y\geq 0$$

where the quarter circle is take to be inside the square

$$S: 0 \leq x \leq 1 \text{ and } 0 \leq y \leq 1$$

Use the equation $\pi / 4 = \frac{\text{area }Q} {\text{area }S}$

```{r}
# Number of trials
n <- 10000000

# Generate uniformly random x and y on the interval [0,1]
x <- runif(n)
y <- runif(n)

# Calculate x^2 + y^2
sum.x2y2 <- x^2 + y^2

# Count the points where x^2 + y^2 <= 1
points.in <- sum(sum.x2y2 <= 1)

# With the numbers used: pi = 4 * ratio where ratio = points.in / trials
pi.estimate <- 4 * (points.in / n)

```

With `r n` trials our estimation of $\pi$ is `r pi.estimate` compared to the actual value of `r pi`.


## Page 194: #1

Use the middle-square method to generate

```{r}
# Function to calculate next number
next.num <- function(x, n.digits){
  # Convert to text to take middle
  num.as.text <- as.character(x^2)
  length.str <- nchar(num.as.text)

  # Calculate start and stop.
  # Not sure how this is supposed to work with
  # seeds with odd number of digits

  # If seed is n digits long. Expect square to be 2n.
  # If square < 2n add leading zeros
  # Goal to grab middle n digits from square
  # Drop last n/2 digits
  # From there grab the last n.digits
  stop.pos <- length.str - n.digits/2
  start.pos <- length.str - n.digits/2 - n.digits + 1

  next.n <- substr(num.as.text, start.pos, stop.pos)

  # Convert back to number and return
  if(next.n == ""){return(0)} # Need this when sequence degenerates
  return(as.numeric(next.n))
}

rmids <- function(seed, vector.length){
  # Loop for vector.length. Call next.num for next
  # number in sequence
  n.digits <- nchar(as.character(seed))
  r.vec <- numeric(length = vector.length)
  r.vec[1] <- next.num(seed, n.digits)
  for(i in 2:vector.length){
    r.vec[i] <- next.num(r.vec[i-1], n.digits)
  }
  return(r.vec)
}
```


### a. 10 random numbers using $x_0 = 1009$.

```{r}
r.a <- rmids(1009, 10)
r.a

plot(r.a, type="b")
```

This sequence degenerates to zero very quickly (10 iterations).

### b. 20 random numbers using $x_0 = 653217$

```{r}
r.b <- rmids(653217, 400)
r.b[1:20]

plot(r.b[1:20], type="b")
```

This appears to be degenerating to 0, but when we expand the series, we can see that it appears pretty good until about iteration 380. There it begins to oscillate between two numbers.

```{r}
plot(r.b, type="b")
```

### c. 15 random numbers using $x_0 = 3043$

```{r}
r.c <- rmids(3043, 15)
r.c

plot(r.c, type="b")
```

This series begins oscillations at the fourth iteration.

### d. Comment about the results of each sequence. Was there cycling? Did each sequence degenerate rapidly?

See comments above.

## Page 201: #4

*Horse Race* - Construct and perform a Monte Carlo simulation of a horse race. You can be creative and use odds from thenewspaper, or simulate the Mathematical Derby with the entries and odds shown in the following table.

| Entry's name      | Odds |
|-------------------|:----:|
| Euler's Folly     |  7-1 |
| Leapin' Leibniz   |  5-1 |
| Newton Lobell     |  9-1 |
| Count Cauchy      | 12-1 |
| Pumped up Poisson |  4-1 |
| Loping L'Hopital  | 35-1 |
| Steamin' Stokes   | 15-1 |
| Dancin' Dantzig   |  4-1 |

Construct and perform a Monte Carlo simulation of 1000 horse races. Which horse won the most races? Which horse won the fewest races? Do these results surprise you? Provide the tallies of how many races each horse won with your output.

Assumptions:

* The odds listed in the table are "against" odds; i.e. 9-1 odds means a 10% chance of winning.

* The odds listed, when converted to probabilities, do not add up to 1. There are several ways to fix this, but I will choose to weight the horses chances of not winning in proportion to their odds

```{r}
# Number of races to run
races <- 1000

# Horses participating
horses <- c("Euler's Folly",
            "Leapin' Leibniz",
            "Newton Lobell",
            "Count Cauchy",
            "Pumped up Poisson",
            "Loping L'Hopital",
            "Steamin' Stokes",
            "Dancin' Dantzig")

wts <- c(7, 5, 9, 12, 4, 35, 15, 4)      # Chances against winning
wts <- wts + 1                           # Total chances
wts <- 1/wts                             # Weights for winning
wts <- wts / sum(wts)                    # Normalize wts

# Create summary data frame of horses and probabilities or winning
df.odds <- data.frame(horses, wts)
df.odds <- df.odds[order(df.odds$wts, decreasing = TRUE), ]

# Simulate race results
results <- table(sample(horses, races, replace=TRUE, prob=wts))
results <- results[order(results)]

# Fitting Labels
par(las=2)          # make label text perpendicular to axis
par(mar=c(5,9,4,2)) # increase y-axis margin.

barplot(results, horiz=TRUE)
```

```{r}
df <- data.frame(results[order(results, decreasing = TRUE)])
names(df) <- "Frequency"
df
```

The table above shows how many wins each horse had in the simulation.

```{r}
df.odds
```

The table above shows the probability of winning for each horse.

----------------------

The horse that won the most races is `r row.names(df)[1]`.
The horse that won the fewest races is `r row.names(df)[nrow(df)]`.
These results are not surprising. They reflect the given probabilities of winning with an expected small margin of error.

## Page 211: #3

In many situations, the time T between deliveries and the order quantity *Q* is not fixed. Instead, an order is placed for a specific amount of gasoline. Depending on how many orders are placed in a give time interval, the time to fill an order varies. You  have no reason to believe that the performance of the delivery operation will change. Therefore, you have examined records for the past 100 deliveries and found the following lag times, or extra days, required to fill your order:

| Lag time (in days) | Number of occurrences |
|--------------------|-----------------------|
|                  2 |                    10 |
|                  3 |                    25 |
|                  4 |                    30 |
|                  5 |                    20 |
|                  6 |                    13 |
|                  7 |                     2 |
|             Total: |                   100 |

Construct a Monte Carlo simulation for the lag time submodel. If you have a handheld calculator or computer available, test your submodel by running 1000 trials and comparing the number of  occurrences of the various lag times with the historical data.

```{r}
# Enter data:
lag.time <- 2:7
occurrences <- c(10, 25, 30, 20, 13, 2)

# Enter a zero value to be used at the beginning of the data.
# There seem to be two logical choices:
# (lag.time, occurrences):
# (1,0) - This continues with the lag time spacing given in the data
# (0,0) - This might make sense if there is  some probability of  an
#         order being fulfilled in under one day
# For this model, I will assume orders are never fulfilled in less
# than one day and use (1,0)
lag.time <- c(1, lag.time)
occurrences <- c(0, occurrences)

# Transform occurrences to probabilities
probs <- occurrences / sum(occurrences)
# Calculate cumulative probabilities
cum.probs <- cumsum(probs)

```

As seen in the examples in the book. We will now fit lines between points. This is done by determing the constants m and b in the equation y = mx + b for each pair of points in the data. For example, the first points are (0, 1) and (0.1, 2).

Substituting we have:

$y = mx + b$

$y_0 = mx_0 + b$
$y_1 = mx_1 + b$

$1 = 0 + b$
$2 = 0.1m + b$

$b=1, m=10$

approxfun is a function in R that fits these lines between each pair data points and can then be used as a function to interpolate values between the points.

``` {r}
# Function to linearly interpolate between the data points
lin.int <- approxfun(cum.probs, lag.time, method="linear")

# Create an x vector for plotting
x <- seq(from=0, to=1, by=0.01)

# Let's see how our function compares to our data
# First we will plot the data
# Then we will add a line for our approximation function
# Finally we will add the line we calculated by hand as an example
plot(cum.probs, lag.time)
points(x, lin.int(x), type="l")
abline(1, 10, col="red")
```

We can now run our Monte Carlo simulation by generating uniformally distributed values between 0 and 1 and using our linear interpolation model (lin.int) to simulate lag times.

```{r}
trials <- 1000
sim <- lin.int(runif(trials))
hist(sim)
```

Comparing the historical data with the data from our model:

```{r}
model.probs <- as.numeric(table(round(sim))/trials)
df <- data.frame("lag"=lag.time, "historical"=probs, "model"=round(model.probs, 2))
df
```

We can see that, as we would expect, there is reasonable agreement. The more simulations we run, the closer we would expect the values to  be.


## Page 221: #2

Use a smooth polynomial to fit the data in Table 5.18 to obtain arrivals and unloading times. Compare results to those in tables 5.19 and 5.20.

![Table 5.18](images/t_5_18.png)

```{r}
# Enter the data from table 5.18
# Arrival
arrival.lb <- 0:12
arrival.lb <- arrival.lb * 10 + 15

arrival.ub <- arrival.lb + 9
arrival.ub[length(arrival.ub)] <- 145

arrival.mid <- (arrival.lb + arrival.ub) / 2
arrival.occur <- c(11, 35, 42, 61, 108, 193, 240, 207, 150, 85, 44, 21, 3)

# Unload
unload.lb <- 0:8
unload.lb <- unload.lb * 5 + 45

unload.ub <- unload.lb + 4
unload.ub[length(unload.ub)] <- 90

unload.mid <- (unload.lb + unload.ub) / 2
unload.occur <- c(20, 54, 114, 103, 156, 221, 250, 171, 109)
#-------------------------------

# Cumulative probabilities
arrival.probs <- arrival.occur / sum(arrival.occur)
arrival.cum <- cumsum(arrival.probs)

unload.probs <- unload.occur / sum(unload.occur)
unload.cum <- cumsum(unload.probs)

# Linear Models
arrival.lin <- approxfun(arrival.cum, arrival.mid, method="linear")
unload.lin <- approxfun(unload.cum, unload.mid, method="linear")
```

We now have the ability to generate the data in tables 5.19 and 5.20. Let's look at plots of this:

```{r}
plot(arrival.cum, arrival.mid, main="Arrival Data")
points(arrival.cum, arrival.lin(arrival.cum), type="l")

plot(unload.cum, unload.mid, main="Unload Data")
points(unload.cum, unload.lin(unload.cum), type="l")
```

Now we will fit a smooth polynomial of degree 3 to the data and compare the results:

```{r}
# Arrival
arrival.lm <- lm(arrival.mid ~ poly(arrival.cum, 3))
unload.lm <- lm(unload.mid ~ poly(unload.cum, 3))

plot(arrival.cum, arrival.mid, main="Arrival Data")
points(arrival.cum, arrival.lin(arrival.cum), type="l")
new.data <- data.frame(arrival.cum=seq(from=0, to=1, by=0.01))
y <- predict(arrival.lm, newdata=new.data)
points(new.data$arrival.cum, y, type="l", col="red")
```

Here the smoothed polynomial has some fit issues and is not a very good representation of the data.

```{r}
# Unload
plot(unload.cum, unload.mid, main="Unload Data")
points(unload.cum, unload.lin(unload.cum), type="l")
new.data <- data.frame(unload.cum=seq(from=0, to=1, by=0.01))
y <- predict(unload.lm, newdata=new.data)
points(new.data$unload.cum, y, type="l", col="red")
```

For the unloading data, both of these models are fairly accurate representations of the data.
	---
	title: "IS 609 Homework 4"
	author: "Aaron Palumbo"
	date: "Saturday, September 20, 2014"
	output: pdf_document
	---

	Homework 4:

	## Page 191: #3

	Using Monte Carlo simulation, wirte an algorithm to calculate an approximation to $\pi$ by considering the number of random points selected inside the quarter circle

	$$Q: x^2 + y^2 = 1, x \geq 0, y\geq 0$$

	where the quarter circle is take to be inside the square

	$$S: 0 \leq x \leq 1 \text{ and } 0 \leq y \leq 1$$

	Use the equation $\pi / 4 = \frac{\text{area }Q} {\text{area }S}$

	```{r}
	# Number of trials
	n <- 10000000

	# Generate uniformly random x and y on the interval [0,1]
	x <- runif(n)
	y <- runif(n)

	# Calculate x^2 + y^2
	sum.x2y2 <- x^2 + y^2

	# Count the points where x^2 + y^2 <= 1
	points.in <- sum(sum.x2y2 <= 1)

	# With the numbers used: pi = 4 * ratio where ratio = points.in / trials
	pi.estimate <- 4 * (points.in / n)

	```

	With `r n` trials our estimation of $\pi$ is `r pi.estimate` compared to the actual value of `r pi`.


	## Page 194: #1

	Use the middle-square method to generate

	```{r}
	# Function to calculate next number
	next.num <- function(x, n.digits){
	# Convert to text to take middle
	num.as.text <- as.character(x^2)
	length.str <- nchar(num.as.text)

	# Calculate start and stop.
	# Not sure how this is supposed to work with
	# seeds with odd number of digits

	# If seed is n digits long. Expect square to be 2n.
	# If square < 2n add leading zeros
	# Goal to grab middle n digits from square
	# Drop last n/2 digits
	# From there grab the last n.digits
	stop.pos <- length.str - n.digits/2
	start.pos <- length.str - n.digits/2 - n.digits + 1

	next.n <- substr(num.as.text, start.pos, stop.pos)

	# Convert back to number and return
	if(next.n == ""){return(0)} # Need this when sequence degenerates
	return(as.numeric(next.n))
	}

	rmids <- function(seed, vector.length){
	# Loop for vector.length. Call next.num for next
	# number in sequence
	n.digits <- nchar(as.character(seed))
	r.vec <- numeric(length = vector.length)
	r.vec[1] <- next.num(seed, n.digits)
	for(i in 2:vector.length){
	r.vec[i] <- next.num(r.vec[i-1], n.digits)
	}
	return(r.vec)
	}
	```


	### a. 10 random numbers using $x_0 = 1009$.

	```{r}
	r.a <- rmids(1009, 10)
	r.a

	plot(r.a, type="b")
	```

	This sequence degenerates to zero very quickly (10 iterations).

	### b. 20 random numbers using $x_0 = 653217$

	```{r}
	r.b <- rmids(653217, 400)
	r.b[1:20]

	plot(r.b[1:20], type="b")
	```

	This appears to be degenerating to 0, but when we expand the series, we can see that it appears pretty good until about iteration 380. There it begins to oscillate between two numbers.

	```{r}
	plot(r.b, type="b")
	```

	### c. 15 random numbers using $x_0 = 3043$

	```{r}
	r.c <- rmids(3043, 15)
	r.c

	plot(r.c, type="b")
	```

	This series begins oscillations at the fourth iteration.

	### d. Comment about the results of each sequence. Was there cycling? Did each sequence degenerate rapidly?

	See comments above.

	## Page 201: #4

	Horse Race - Construct and perform a Monte Carlo simulation of a horse race. You can be creative and use odds from thenewspaper, or simulate the Mathematical Derby with the entries and odds shown in the following table.

	\| Entry's name \| Odds \|
	\|-------------------\|:----:\|
	\| Euler's Folly \| 7-1 \|
	\| Leapin' Leibniz \| 5-1 \|
	\| Newton Lobell \| 9-1 \|
	\| Count Cauchy \| 12-1 \|
	\| Pumped up Poisson \| 4-1 \|
	\| Loping L'Hopital \| 35-1 \|
	\| Steamin' Stokes \| 15-1 \|
	\| Dancin' Dantzig \| 4-1 \|

	Construct and perform a Monte Carlo simulation of 1000 horse races. Which horse won the most races? Which horse won the fewest races? Do these results surprise you? Provide the tallies of how many races each horse won with your output.

	Assumptions:

	* The odds listed in the table are "against" odds; i.e. 9-1 odds means a 10% chance of winning.

	* The odds listed, when converted to probabilities, do not add up to 1. There are several ways to fix this, but I will choose to weight the horses chances of not winning in proportion to their odds

	```{r}
	# Number of races to run
	races <- 1000

	# Horses participating
	horses <- c("Euler's Folly",
	"Leapin' Leibniz",
	"Newton Lobell",
	"Count Cauchy",
	"Pumped up Poisson",
	"Loping L'Hopital",
	"Steamin' Stokes",
	"Dancin' Dantzig")

	wts <- c(7, 5, 9, 12, 4, 35, 15, 4) # Chances against winning
	wts <- wts + 1 # Total chances
	wts <- 1/wts # Weights for winning
	wts <- wts / sum(wts) # Normalize wts

	# Create summary data frame of horses and probabilities or winning
	df.odds <- data.frame(horses, wts)
	df.odds <- df.odds[order(df.odds$wts, decreasing = TRUE), ]

	# Simulate race results
	results <- table(sample(horses, races, replace=TRUE, prob=wts))
	results <- results[order(results)]

	# Fitting Labels
	par(las=2) # make label text perpendicular to axis
	par(mar=c(5,9,4,2)) # increase y-axis margin.

	barplot(results, horiz=TRUE)
	```

	```{r}
	df <- data.frame(results[order(results, decreasing = TRUE)])
	names(df) <- "Frequency"
	df
	```

	The table above shows how many wins each horse had in the simulation.

	```{r}
	df.odds
	```

	The table above shows the probability of winning for each horse.

	----------------------

	The horse that won the most races is `r row.names(df)[1]`.
	The horse that won the fewest races is `r row.names(df)[nrow(df)]`.
	These results are not surprising. They reflect the given probabilities of winning with an expected small margin of error.

	## Page 211: #3

	In many situations, the time T between deliveries and the order quantity Q is not fixed. Instead, an order is placed for a specific amount of gasoline. Depending on how many orders are placed in a give time interval, the time to fill an order varies. You have no reason to believe that the performance of the delivery operation will change. Therefore, you have examined records for the past 100 deliveries and found the following lag times, or extra days, required to fill your order:

	\| Lag time (in days) \| Number of occurrences \|
	\|--------------------\|-----------------------\|
	\| 2 \| 10 \|
	\| 3 \| 25 \|
	\| 4 \| 30 \|
	\| 5 \| 20 \|
	\| 6 \| 13 \|
	\| 7 \| 2 \|
	\| Total: \| 100 \|

	Construct a Monte Carlo simulation for the lag time submodel. If you have a handheld calculator or computer available, test your submodel by running 1000 trials and comparing the number of occurrences of the various lag times with the historical data.

	```{r}
	# Enter data:
	lag.time <- 2:7
	occurrences <- c(10, 25, 30, 20, 13, 2)

	# Enter a zero value to be used at the beginning of the data.
	# There seem to be two logical choices:
	# (lag.time, occurrences):
	# (1,0) - This continues with the lag time spacing given in the data
	# (0,0) - This might make sense if there is some probability of an
	# order being fulfilled in under one day
	# For this model, I will assume orders are never fulfilled in less
	# than one day and use (1,0)
	lag.time <- c(1, lag.time)
	occurrences <- c(0, occurrences)

	# Transform occurrences to probabilities
	probs <- occurrences / sum(occurrences)
	# Calculate cumulative probabilities
	cum.probs <- cumsum(probs)

	```

	As seen in the examples in the book. We will now fit lines between points. This is done by determing the constants m and b in the equation y = mx + b for each pair of points in the data. For example, the first points are (0, 1) and (0.1, 2).

	Substituting we have:

	$y = mx + b$

	$y_0 = mx_0 + b$
	$y_1 = mx_1 + b$

	$1 = 0 + b$
	$2 = 0.1m + b$

	$b=1, m=10$

	approxfun is a function in R that fits these lines between each pair data points and can then be used as a function to interpolate values between the points.

	``` {r}
	# Function to linearly interpolate between the data points
	lin.int <- approxfun(cum.probs, lag.time, method="linear")

	# Create an x vector for plotting
	x <- seq(from=0, to=1, by=0.01)

	# Let's see how our function compares to our data
	# First we will plot the data
	# Then we will add a line for our approximation function
	# Finally we will add the line we calculated by hand as an example
	plot(cum.probs, lag.time)
	points(x, lin.int(x), type="l")
	abline(1, 10, col="red")
	```

	We can now run our Monte Carlo simulation by generating uniformally distributed values between 0 and 1 and using our linear interpolation model (lin.int) to simulate lag times.

	```{r}
	trials <- 1000
	sim <- lin.int(runif(trials))
	hist(sim)
	```

	Comparing the historical data with the data from our model:

	```{r}
	model.probs <- as.numeric(table(round(sim))/trials)
	df <- data.frame("lag"=lag.time, "historical"=probs, "model"=round(model.probs, 2))
	df
	```

	We can see that, as we would expect, there is reasonable agreement. The more simulations we run, the closer we would expect the values to be.


	## Page 221: #2

	Use a smooth polynomial to fit the data in Table 5.18 to obtain arrivals and unloading times. Compare results to those in tables 5.19 and 5.20.

	![Table 5.18](images/t_5_18.png)

	```{r}
	# Enter the data from table 5.18
	# Arrival
	arrival.lb <- 0:12
	arrival.lb <- arrival.lb * 10 + 15

	arrival.ub <- arrival.lb + 9
	arrival.ub[length(arrival.ub)] <- 145

	arrival.mid <- (arrival.lb + arrival.ub) / 2
	arrival.occur <- c(11, 35, 42, 61, 108, 193, 240, 207, 150, 85, 44, 21, 3)

	# Unload
	unload.lb <- 0:8
	unload.lb <- unload.lb * 5 + 45

	unload.ub <- unload.lb + 4
	unload.ub[length(unload.ub)] <- 90

	unload.mid <- (unload.lb + unload.ub) / 2
	unload.occur <- c(20, 54, 114, 103, 156, 221, 250, 171, 109)
	#-------------------------------

	# Cumulative probabilities
	arrival.probs <- arrival.occur / sum(arrival.occur)
	arrival.cum <- cumsum(arrival.probs)

	unload.probs <- unload.occur / sum(unload.occur)
	unload.cum <- cumsum(unload.probs)

	# Linear Models
	arrival.lin <- approxfun(arrival.cum, arrival.mid, method="linear")
	unload.lin <- approxfun(unload.cum, unload.mid, method="linear")
	```

	We now have the ability to generate the data in tables 5.19 and 5.20. Let's look at plots of this:

	```{r}
	plot(arrival.cum, arrival.mid, main="Arrival Data")
	points(arrival.cum, arrival.lin(arrival.cum), type="l")

	plot(unload.cum, unload.mid, main="Unload Data")
	points(unload.cum, unload.lin(unload.cum), type="l")
	```

	Now we will fit a smooth polynomial of degree 3 to the data and compare the results:

	```{r}
	# Arrival
	arrival.lm <- lm(arrival.mid ~ poly(arrival.cum, 3))
	unload.lm <- lm(unload.mid ~ poly(unload.cum, 3))

	plot(arrival.cum, arrival.mid, main="Arrival Data")
	points(arrival.cum, arrival.lin(arrival.cum), type="l")
	new.data <- data.frame(arrival.cum=seq(from=0, to=1, by=0.01))
	y <- predict(arrival.lm, newdata=new.data)
	points(new.data$arrival.cum, y, type="l", col="red")
	```

	Here the smoothed polynomial has some fit issues and is not a very good representation of the data.

	```{r}
	# Unload
	plot(unload.cum, unload.mid, main="Unload Data")
	points(unload.cum, unload.lin(unload.cum), type="l")
	new.data <- data.frame(unload.cum=seq(from=0, to=1, by=0.01))
	y <- predict(unload.lm, newdata=new.data)
	points(new.data$unload.cum, y, type="l", col="red")
	```

	For the unloading data, both of these models are fairly accurate representations of the data.