Created
July 31, 2020 14:31
-
-
Save bayesball/94d76887ce852d3b054b715050a769ca to your computer and use it in GitHub Desktop.
Markdown document for Bayesian Pythagorean Modeling blog post
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "Pythagorean Wins" | |
output: html_document | |
--- | |
```{r setup, include=FALSE} | |
knitr::opts_chunk$set(echo = TRUE, | |
message = FALSE, | |
warning = FALSE) | |
``` | |
Load a few packages. | |
```{r} | |
library(Lahman) | |
library(tidyverse) | |
library(brms) | |
library(LearnBayes) | |
``` | |
#### Bayesian model | |
Observe $(W, L, R, RA)$ for each of the 30 teams in the 2019 baseball season. | |
Assume $W_j$, the number of wins for the $j$th team, is normal with mean given by Bill James formula and standard deviation $\sigma$. | |
$$ | |
W_j \sim N\left(162 \frac{(R_j / RA_j)^k}{1 + (R_j / RA_j)^k}, \sigma\right) | |
$$ | |
For prior we assume $k, \sigma$ independent where $k$ is normal with mean 2 and standard deviation 0.5 and $\sigma$ is exponential with rate 1. | |
#### Collect data from Lahman package | |
```{r} | |
Teams %>% | |
filter(yearID == 2019) %>% | |
select(teamID, W, L, R, RA) %>% | |
mutate(WL = W / L, | |
RR = R / RA) -> d | |
``` | |
#### LearnBayes analysis | |
First we write a function defining the log posterior of the parameter vector $\theta = (k, \sigma)$. | |
```{r} | |
logpost2 <- function(theta, d){ | |
k <- theta[1] | |
sigma <- theta[2] | |
n_mean <- 162 * d$RR ^ k / (1 + d$RR ^ k) | |
sum(dnorm(d$W, n_mean, sigma, log = TRUE)) + | |
dnorm(k, 2, 0.5, log = TRUE) + | |
dexp(sigma, 1, log = TRUE) | |
} | |
``` | |
```{r} | |
mycontour(logpost2, c(1.6, 2.3, 2, 6), d, | |
xlab = "k", ylab = "sigma", | |
main = "Contours of Posterior of (k, sigma)") | |
``` | |
Collect 5000 simulated draws from the 50 x 50 grid. | |
```{r} | |
pts <- simcontour(logpost2, c(1.6, 2.3, 2, 6), | |
d, 5000) | |
``` | |
Show the simulated draws on top of the contour plot. | |
```{r} | |
mycontour(logpost2, c(1.6, 2.3, 2, 6), d, | |
xlab = "k", ylab = "sigma", | |
main = "Contours of Posterior of (k, sigma)") | |
points(pts, col = "red") | |
``` | |
Construct a density estimate of the simulated draws of k. | |
```{r} | |
ggplot(data.frame(k = pts$x), | |
aes(k)) + | |
geom_density() + | |
ggtitle("Marginal Posterior Density of k") | |
``` | |
#### Compare with brms package (MCMC sampling) | |
Specify the prior and fit by the ```brm()``` function. | |
```{r} | |
prior1 <- c(prior(normal(2, 0.5), nlpar = "b1"), | |
prior(exponential(1), class = "sigma")) | |
fit1 <- brm(bf(W ~ 0 + | |
162 * RR ^ b1 / (1 + RR ^ b1), | |
b1 ~ 1, nl = TRUE), | |
data = d, prior = prior1, | |
family = gaussian, | |
refresh = 0) | |
``` | |
Trace plots and density estimate graphs for each parameter. | |
```{r} | |
plot(fit1) | |
``` | |
Posterior summaries of each parameter. | |
```{r} | |
summary(fit1) | |
``` | |
#### Prediction | |
Suppose we want to predict the number of games won for teams with runs ratios equal to 0.8, 0.9, 1, 1.1, 1.2. | |
```{r} | |
PP <- posterior_predict(fit1, | |
newdata = data.frame( | |
RR = seq(0.8, 1.2, by = 0.1))) | |
PP <- as.data.frame(PP) | |
names(PP) <- paste("RR =", seq(0.8, 1.2, by = 0.1)) | |
``` | |
```{r} | |
PP %>% | |
pivot_longer( | |
cols = starts_with("RR"), | |
values_to = "Number_of_Wins" | |
) -> P_Wins2 | |
``` | |
Use violin plots to show predictive distributions. | |
```{r} | |
ggplot(P_Wins2, aes(name, Number_of_Wins)) + | |
geom_violin() + | |
xlab("Runs Ratio") + | |
ylab("Number of Wins") + | |
ggtitle("Predicted Number of Wins") | |
``` | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment