bayesball/Pythagorean_wins.Rmd

## Pythagorean_wins.Rmd
---
title: "Pythagorean Wins"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,
                      message = FALSE,
                      warning = FALSE)
```

Load a few packages.

```{r}
library(Lahman)
library(tidyverse)
library(brms)
library(LearnBayes)
```

#### Bayesian model

Observe $(W, L, R, RA)$ for each of the 30 teams in the 2019 baseball season.

Assume $W_j$, the number of wins for the $j$th team, is normal with mean given by Bill James formula and standard deviation $\sigma$.

$$
W_j \sim N\left(162 \frac{(R_j / RA_j)^k}{1 + (R_j / RA_j)^k}, \sigma\right)
$$

For prior we assume $k, \sigma$ independent where $k$ is normal with mean 2 and standard deviation 0.5 and $\sigma$ is exponential with rate 1.

#### Collect data from Lahman package

```{r}
Teams %>%
  filter(yearID == 2019) %>%
  select(teamID, W, L, R, RA) %>%
  mutate(WL = W / L,
         RR = R / RA) -> d
```

#### LearnBayes analysis

First we write a function defining the log posterior of the parameter vector $\theta = (k, \sigma)$.


```{r}
logpost2 <- function(theta, d){
  k <- theta[1]
  sigma <- theta[2]
  n_mean <- 162 * d$RR ^ k / (1 + d$RR ^ k)
  sum(dnorm(d$W, n_mean, sigma, log = TRUE)) +
    dnorm(k, 2, 0.5, log = TRUE) +
    dexp(sigma, 1, log = TRUE)
}
```

```{r}
mycontour(logpost2, c(1.6, 2.3, 2, 6), d,
  xlab = "k", ylab = "sigma",
   main = "Contours of Posterior of (k, sigma)")
```

Collect 5000 simulated draws from the 50 x 50 grid.

```{r}
pts <- simcontour(logpost2, c(1.6, 2.3, 2, 6),
                  d, 5000)
```

Show the simulated draws on top of the contour plot.

```{r}
mycontour(logpost2, c(1.6, 2.3, 2, 6), d,
  xlab = "k", ylab = "sigma",
   main = "Contours of Posterior of (k, sigma)")
points(pts, col = "red")
```

Construct a density estimate of the simulated draws of k.

```{r}
ggplot(data.frame(k = pts$x),
       aes(k)) +
  geom_density() +
  ggtitle("Marginal Posterior Density of k")
```

#### Compare with brms package (MCMC sampling)

Specify the prior and fit by the ```brm()``` function.

```{r}
prior1 <- c(prior(normal(2, 0.5), nlpar = "b1"),
            prior(exponential(1), class = "sigma"))
fit1 <- brm(bf(W ~ 0 +
              162 * RR ^ b1 / (1 + RR ^ b1),
              b1 ~ 1, nl = TRUE),
            data = d, prior = prior1,
            family = gaussian,
            refresh = 0)
```

Trace plots and density estimate graphs for each parameter.

```{r}
plot(fit1)
```

Posterior summaries of each parameter.

```{r}
summary(fit1)
```
#### Prediction

Suppose we want to predict the number of games won for teams with runs ratios equal to 0.8, 0.9, 1, 1.1, 1.2.

```{r}
PP <- posterior_predict(fit1,
               newdata = data.frame(
                RR = seq(0.8, 1.2, by = 0.1)))
PP <- as.data.frame(PP)
names(PP) <- paste("RR =", seq(0.8, 1.2, by = 0.1))
```
```{r}
PP %>%
  pivot_longer(
    cols = starts_with("RR"),
    values_to = "Number_of_Wins"
  ) -> P_Wins2
```

Use violin plots to show predictive distributions.

```{r}
ggplot(P_Wins2, aes(name, Number_of_Wins)) +
  geom_violin() +
  xlab("Runs Ratio") +
  ylab("Number of Wins") +
  ggtitle("Predicted Number of Wins")
```
	---
	title: "Pythagorean Wins"
	output: html_document
	---

	```{r setup, include=FALSE}
	knitr::opts_chunk$set(echo = TRUE,
	message = FALSE,
	warning = FALSE)
	```

	Load a few packages.

	```{r}
	library(Lahman)
	library(tidyverse)
	library(brms)
	library(LearnBayes)
	```

	#### Bayesian model

	Observe $(W, L, R, RA)$ for each of the 30 teams in the 2019 baseball season.

	Assume $W_j$, the number of wins for the $j$th team, is normal with mean given by Bill James formula and standard deviation $\sigma$.

	$$
	W_j \sim N\left(162 \frac{(R_j / RA_j)^k}{1 + (R_j / RA_j)^k}, \sigma\right)
	$$

	For prior we assume $k, \sigma$ independent where $k$ is normal with mean 2 and standard deviation 0.5 and $\sigma$ is exponential with rate 1.

	#### Collect data from Lahman package

	```{r}
	Teams %>%
	filter(yearID == 2019) %>%
	select(teamID, W, L, R, RA) %>%
	mutate(WL = W / L,
	RR = R / RA) -> d
	```

	#### LearnBayes analysis

	First we write a function defining the log posterior of the parameter vector $\theta = (k, \sigma)$.


	```{r}
	logpost2 <- function(theta, d){
	k <- theta[1]
	sigma <- theta[2]
	n_mean <- 162 * d$RR ^ k / (1 + d$RR ^ k)
	sum(dnorm(d$W, n_mean, sigma, log = TRUE)) +
	dnorm(k, 2, 0.5, log = TRUE) +
	dexp(sigma, 1, log = TRUE)
	}
	```

	```{r}
	mycontour(logpost2, c(1.6, 2.3, 2, 6), d,
	xlab = "k", ylab = "sigma",
	main = "Contours of Posterior of (k, sigma)")
	```

	Collect 5000 simulated draws from the 50 x 50 grid.

	```{r}
	pts <- simcontour(logpost2, c(1.6, 2.3, 2, 6),
	d, 5000)
	```

	Show the simulated draws on top of the contour plot.

	```{r}
	mycontour(logpost2, c(1.6, 2.3, 2, 6), d,
	xlab = "k", ylab = "sigma",
	main = "Contours of Posterior of (k, sigma)")
	points(pts, col = "red")
	```

	Construct a density estimate of the simulated draws of k.

	```{r}
	ggplot(data.frame(k = pts$x),
	aes(k)) +
	geom_density() +
	ggtitle("Marginal Posterior Density of k")
	```

	#### Compare with brms package (MCMC sampling)

	Specify the prior and fit by the ```brm()``` function.

	```{r}
	prior1 <- c(prior(normal(2, 0.5), nlpar = "b1"),
	prior(exponential(1), class = "sigma"))
	fit1 <- brm(bf(W ~ 0 +
	162 * RR ^ b1 / (1 + RR ^ b1),
	b1 ~ 1, nl = TRUE),
	data = d, prior = prior1,
	family = gaussian,
	refresh = 0)
	```

	Trace plots and density estimate graphs for each parameter.

	```{r}
	plot(fit1)
	```

	Posterior summaries of each parameter.

	```{r}
	summary(fit1)
	```
	#### Prediction

	Suppose we want to predict the number of games won for teams with runs ratios equal to 0.8, 0.9, 1, 1.1, 1.2.

	```{r}
	PP <- posterior_predict(fit1,
	newdata = data.frame(
	RR = seq(0.8, 1.2, by = 0.1)))
	PP <- as.data.frame(PP)
	names(PP) <- paste("RR =", seq(0.8, 1.2, by = 0.1))
	```
	```{r}
	PP %>%
	pivot_longer(
	cols = starts_with("RR"),
	values_to = "Number_of_Wins"
	) -> P_Wins2
	```

	Use violin plots to show predictive distributions.

	```{r}
	ggplot(P_Wins2, aes(name, Number_of_Wins)) +
	geom_violin() +
	xlab("Runs Ratio") +
	ylab("Number of Wins") +
	ggtitle("Predicted Number of Wins")
	```