Skip to content

Instantly share code, notes, and snippets.

@bayesball
Created July 31, 2020 14:31
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bayesball/94d76887ce852d3b054b715050a769ca to your computer and use it in GitHub Desktop.
Save bayesball/94d76887ce852d3b054b715050a769ca to your computer and use it in GitHub Desktop.
Markdown document for Bayesian Pythagorean Modeling blog post
---
title: "Pythagorean Wins"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,
message = FALSE,
warning = FALSE)
```
Load a few packages.
```{r}
library(Lahman)
library(tidyverse)
library(brms)
library(LearnBayes)
```
#### Bayesian model
Observe $(W, L, R, RA)$ for each of the 30 teams in the 2019 baseball season.
Assume $W_j$, the number of wins for the $j$th team, is normal with mean given by Bill James formula and standard deviation $\sigma$.
$$
W_j \sim N\left(162 \frac{(R_j / RA_j)^k}{1 + (R_j / RA_j)^k}, \sigma\right)
$$
For prior we assume $k, \sigma$ independent where $k$ is normal with mean 2 and standard deviation 0.5 and $\sigma$ is exponential with rate 1.
#### Collect data from Lahman package
```{r}
Teams %>%
filter(yearID == 2019) %>%
select(teamID, W, L, R, RA) %>%
mutate(WL = W / L,
RR = R / RA) -> d
```
#### LearnBayes analysis
First we write a function defining the log posterior of the parameter vector $\theta = (k, \sigma)$.
```{r}
logpost2 <- function(theta, d){
k <- theta[1]
sigma <- theta[2]
n_mean <- 162 * d$RR ^ k / (1 + d$RR ^ k)
sum(dnorm(d$W, n_mean, sigma, log = TRUE)) +
dnorm(k, 2, 0.5, log = TRUE) +
dexp(sigma, 1, log = TRUE)
}
```
```{r}
mycontour(logpost2, c(1.6, 2.3, 2, 6), d,
xlab = "k", ylab = "sigma",
main = "Contours of Posterior of (k, sigma)")
```
Collect 5000 simulated draws from the 50 x 50 grid.
```{r}
pts <- simcontour(logpost2, c(1.6, 2.3, 2, 6),
d, 5000)
```
Show the simulated draws on top of the contour plot.
```{r}
mycontour(logpost2, c(1.6, 2.3, 2, 6), d,
xlab = "k", ylab = "sigma",
main = "Contours of Posterior of (k, sigma)")
points(pts, col = "red")
```
Construct a density estimate of the simulated draws of k.
```{r}
ggplot(data.frame(k = pts$x),
aes(k)) +
geom_density() +
ggtitle("Marginal Posterior Density of k")
```
#### Compare with brms package (MCMC sampling)
Specify the prior and fit by the ```brm()``` function.
```{r}
prior1 <- c(prior(normal(2, 0.5), nlpar = "b1"),
prior(exponential(1), class = "sigma"))
fit1 <- brm(bf(W ~ 0 +
162 * RR ^ b1 / (1 + RR ^ b1),
b1 ~ 1, nl = TRUE),
data = d, prior = prior1,
family = gaussian,
refresh = 0)
```
Trace plots and density estimate graphs for each parameter.
```{r}
plot(fit1)
```
Posterior summaries of each parameter.
```{r}
summary(fit1)
```
#### Prediction
Suppose we want to predict the number of games won for teams with runs ratios equal to 0.8, 0.9, 1, 1.1, 1.2.
```{r}
PP <- posterior_predict(fit1,
newdata = data.frame(
RR = seq(0.8, 1.2, by = 0.1)))
PP <- as.data.frame(PP)
names(PP) <- paste("RR =", seq(0.8, 1.2, by = 0.1))
```
```{r}
PP %>%
pivot_longer(
cols = starts_with("RR"),
values_to = "Number_of_Wins"
) -> P_Wins2
```
Use violin plots to show predictive distributions.
```{r}
ggplot(P_Wins2, aes(name, Number_of_Wins)) +
geom_violin() +
xlab("Runs Ratio") +
ylab("Number of Wins") +
ggtitle("Predicted Number of Wins")
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment