Skip to content

Instantly share code, notes, and snippets.

@bayesball
Created August 22, 2020 18:54
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bayesball/b527f48071bb57b20153235a3f67e4f4 to your computer and use it in GitHub Desktop.
Save bayesball/b527f48071bb57b20153235a3f67e4f4 to your computer and use it in GitHub Desktop.
Multilevel modeling of player count effects
---
title: "Simple Count Effects"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,
message = FALSE,
message = FALSE)
```
## Count Effects Work
### Exploratory Work
Read in some R packages:
```{r}
library(tidyverse)
library(lme4)
library(broom)
library(CalledStrike)
```
Read in the 2019 Statcast data and focus on the balls put into play. Define a count variable and classify the count as "ahead", "behind", or "neutral".
```{r}
statcast2019 <- read_csv("~/Dropbox/2016 WORK/BLOG Baseball R/OTHER/StatcastData/statcast2019.csv")
statcast2019 %>%
filter(type == "X") %>%
mutate(count = paste(balls, "-", strikes, sep=""),
c_type = ifelse(count %in%
c("2-0", "3-0", "3-1"),
"ahead", ifelse(count %in%
c("0-0", "1-0", "1-1",
"2-1", "3-2"),
"neutral", "behind"))) -> sc2019_ip
```
Restrict the analysis to players with at least 200 in-play in the 2019 season.
```{r}
sc2019_ip %>%
group_by(batter) %>%
summarize(N = n(),
.groups = "drop") -> S
inner_join(sc2019_ip, S, by="batter") %>%
filter(N >= 200) -> sc_regular
```
For each regular batter (those with at least 200 BIP) player, find the mean expected wOBA in each type of count.
```{r}
sc_regular %>%
group_by(player_name, c_type) %>%
summarize(N = n(),
M = mean(estimated_woba_using_speedangle,
na.rm = TRUE),
.groups = "drop") -> S1
inner_join(filter(S1, c_type == "ahead"),
filter(S1, c_type == "behind"),
by = "player_name") -> S2
inner_join(S2,
filter(S1, c_type == "neutral"),
by = "player_name") -> S2
head(S2)
```
Construct a Tukey mean-difference plot, comparing ahead and behind count situations.
```{r}
ggplot(S2, aes((M.x + M.y) / 2,
(M.x - M.y))) +
geom_point() +
geom_hline(yintercept = 0, color="red") +
ggtitle("Comparing Ahead and Behind Counts") +
centertitle() +
ylab("Improvement in Ahead wOBA") +
xlab("wOBA") +
increasefont()
```
Compare the neutral and behind states.
```{r}
ggplot(S2, aes((M + M.y) / 2,
(M - M.y))) +
geom_point() +
geom_hline(yintercept = 0, color="red") +
ggtitle("Comparing Neutral and Behind Counts") +
centertitle()
```
Comparing the ahead and neutral counts.
```{r}
ggplot(S2, aes((M.x + M) / 2,
(M.x - M))) +
geom_point() +
geom_hline(yintercept = 0, color="red") +
ggtitle("Comparing Ahead and Neutral Counts") +
centertitle()
```
Focus on the neutral/behind count comparison. Define the effect variable so that $\beta_0$ is the average wOBA value and $\beta_1$ represents the mean wOBA improvement in the Neutral state over the Behind state.
```{r}
sc_regular_2 <- filter(sc_regular,
c_type %in% c("neutral", "behind"))
sc_regular_2 %>%
mutate(effect = ifelse(c_type == "neutral",
0.5, -0.5)) -> sc_regular_2
```
Summaries of count effects by doing a regression on the entire dataset.
```{r}
fit <- lm(estimated_woba_using_speedangle ~ effect,
data = sc_regular_2)
fit$coef
```
Graph these effects.
```{r}
df <- data.frame(Type = c("Behind", "Neutral"),
Estimate = c(fit$coef[1] - fit$coef[2] / 2,
fit$coef[1] + fit$coef[2] / 2))
df$Group = "Pooled"
ggplot(df, aes(Type, Estimate, group = Group)) +
geom_point() + geom_line() +
increasefont() + ylim(0.2, 0.6) +
ggtitle("Pooled Regression Curve") +
centertitle()
```
### Individual Estimates
Perform individual regressions for all players with at least 200 balls-in-play.
```{r}
regressions <- sc_regular_2 %>%
group_by(player_name) %>%
do(tidy(lm(estimated_woba_using_speedangle ~
effect, data=.)))
spread(dplyr::select(regressions, player_name,
term, estimate),
term, estimate) -> Individual_est
names(Individual_est) <-
c("player_name", "beta0", "beta1")
```
Construct a scatterplot of the individual regression coefficients.
```{r}
ggplot(Individual_est, aes(beta0, beta1)) +
geom_point()
```
Show effects through a shrinkage plot.
First define a ```shrink_plot()``` function.
```{r}
shrink_plot <- function(estimates, n = 50){
estimates %>%
mutate(Behind = beta0 - beta1 / 2,
Neutral = beta0 + beta1 / 2) %>%
select(player_name, Neutral, Behind) -> d
set.seed(123)
d <- d[sample(nrow(d), n, replace = FALSE), ]
d %>%
gather(Type, WOBA, -player_name) %>%
ggplot(aes(Type, WOBA, group = player_name)) +
geom_line() +
geom_point()
}
```
Then show a shrinkage plot of the individual regression estimates.
```{r}
shrink_plot(Individual_est) +
ggtitle("50 Individual Regression Curves") +
centertitle() +
increasefont() +
ylim(.2, .6)
```
### Multilevel Fitting
Quick multilevel model fit using the ```lmer()``` function.
```{r}
newfit <- lmer(estimated_woba_using_speedangle ~
effect + (1 + effect | player_name),
data = sc_regular_2)
```
Posterior summaries:
```{r}
summary(newfit)
```
Collect the estimated random effects for all players.
```{r}
B <- coef(newfit)[[1]]
names(B) <- c("beta0", "beta1")
B$player_name <- row.names(B)
row.names(B) <- NULL
head(B)
```
Construct a shrinkage plot for the multilevel estimates.
```{r}
shrink_plot(B) +
ggtitle("50 Multilevel Regression Curves") +
centertitle() + increasefont() +
ylim(.2, .6)
```
To compare the shrinkage of the individual slopes and intercepts, construct a scatterplot of the regression estimates and overlay the multilevel model estimates.
```{r}
Individual_est$Type = "Individual"
B$Type = "Multilevel"
```
```{r}
ggplot(data = rbind(Individual_est, B),
aes(beta0, beta1, color = Type)) +
geom_point() +
increasefont() +
ggtitle("Individual and Multilevel Estimates") +
centertitle()
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment