Created
August 22, 2020 18:54
-
-
Save bayesball/b527f48071bb57b20153235a3f67e4f4 to your computer and use it in GitHub Desktop.
Multilevel modeling of player count effects
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "Simple Count Effects" | |
output: html_document | |
--- | |
```{r setup, include=FALSE} | |
knitr::opts_chunk$set(echo = TRUE, | |
message = FALSE, | |
message = FALSE) | |
``` | |
## Count Effects Work | |
### Exploratory Work | |
Read in some R packages: | |
```{r} | |
library(tidyverse) | |
library(lme4) | |
library(broom) | |
library(CalledStrike) | |
``` | |
Read in the 2019 Statcast data and focus on the balls put into play. Define a count variable and classify the count as "ahead", "behind", or "neutral". | |
```{r} | |
statcast2019 <- read_csv("~/Dropbox/2016 WORK/BLOG Baseball R/OTHER/StatcastData/statcast2019.csv") | |
statcast2019 %>% | |
filter(type == "X") %>% | |
mutate(count = paste(balls, "-", strikes, sep=""), | |
c_type = ifelse(count %in% | |
c("2-0", "3-0", "3-1"), | |
"ahead", ifelse(count %in% | |
c("0-0", "1-0", "1-1", | |
"2-1", "3-2"), | |
"neutral", "behind"))) -> sc2019_ip | |
``` | |
Restrict the analysis to players with at least 200 in-play in the 2019 season. | |
```{r} | |
sc2019_ip %>% | |
group_by(batter) %>% | |
summarize(N = n(), | |
.groups = "drop") -> S | |
inner_join(sc2019_ip, S, by="batter") %>% | |
filter(N >= 200) -> sc_regular | |
``` | |
For each regular batter (those with at least 200 BIP) player, find the mean expected wOBA in each type of count. | |
```{r} | |
sc_regular %>% | |
group_by(player_name, c_type) %>% | |
summarize(N = n(), | |
M = mean(estimated_woba_using_speedangle, | |
na.rm = TRUE), | |
.groups = "drop") -> S1 | |
inner_join(filter(S1, c_type == "ahead"), | |
filter(S1, c_type == "behind"), | |
by = "player_name") -> S2 | |
inner_join(S2, | |
filter(S1, c_type == "neutral"), | |
by = "player_name") -> S2 | |
head(S2) | |
``` | |
Construct a Tukey mean-difference plot, comparing ahead and behind count situations. | |
```{r} | |
ggplot(S2, aes((M.x + M.y) / 2, | |
(M.x - M.y))) + | |
geom_point() + | |
geom_hline(yintercept = 0, color="red") + | |
ggtitle("Comparing Ahead and Behind Counts") + | |
centertitle() + | |
ylab("Improvement in Ahead wOBA") + | |
xlab("wOBA") + | |
increasefont() | |
``` | |
Compare the neutral and behind states. | |
```{r} | |
ggplot(S2, aes((M + M.y) / 2, | |
(M - M.y))) + | |
geom_point() + | |
geom_hline(yintercept = 0, color="red") + | |
ggtitle("Comparing Neutral and Behind Counts") + | |
centertitle() | |
``` | |
Comparing the ahead and neutral counts. | |
```{r} | |
ggplot(S2, aes((M.x + M) / 2, | |
(M.x - M))) + | |
geom_point() + | |
geom_hline(yintercept = 0, color="red") + | |
ggtitle("Comparing Ahead and Neutral Counts") + | |
centertitle() | |
``` | |
Focus on the neutral/behind count comparison. Define the effect variable so that $\beta_0$ is the average wOBA value and $\beta_1$ represents the mean wOBA improvement in the Neutral state over the Behind state. | |
```{r} | |
sc_regular_2 <- filter(sc_regular, | |
c_type %in% c("neutral", "behind")) | |
sc_regular_2 %>% | |
mutate(effect = ifelse(c_type == "neutral", | |
0.5, -0.5)) -> sc_regular_2 | |
``` | |
Summaries of count effects by doing a regression on the entire dataset. | |
```{r} | |
fit <- lm(estimated_woba_using_speedangle ~ effect, | |
data = sc_regular_2) | |
fit$coef | |
``` | |
Graph these effects. | |
```{r} | |
df <- data.frame(Type = c("Behind", "Neutral"), | |
Estimate = c(fit$coef[1] - fit$coef[2] / 2, | |
fit$coef[1] + fit$coef[2] / 2)) | |
df$Group = "Pooled" | |
ggplot(df, aes(Type, Estimate, group = Group)) + | |
geom_point() + geom_line() + | |
increasefont() + ylim(0.2, 0.6) + | |
ggtitle("Pooled Regression Curve") + | |
centertitle() | |
``` | |
### Individual Estimates | |
Perform individual regressions for all players with at least 200 balls-in-play. | |
```{r} | |
regressions <- sc_regular_2 %>% | |
group_by(player_name) %>% | |
do(tidy(lm(estimated_woba_using_speedangle ~ | |
effect, data=.))) | |
spread(dplyr::select(regressions, player_name, | |
term, estimate), | |
term, estimate) -> Individual_est | |
names(Individual_est) <- | |
c("player_name", "beta0", "beta1") | |
``` | |
Construct a scatterplot of the individual regression coefficients. | |
```{r} | |
ggplot(Individual_est, aes(beta0, beta1)) + | |
geom_point() | |
``` | |
Show effects through a shrinkage plot. | |
First define a ```shrink_plot()``` function. | |
```{r} | |
shrink_plot <- function(estimates, n = 50){ | |
estimates %>% | |
mutate(Behind = beta0 - beta1 / 2, | |
Neutral = beta0 + beta1 / 2) %>% | |
select(player_name, Neutral, Behind) -> d | |
set.seed(123) | |
d <- d[sample(nrow(d), n, replace = FALSE), ] | |
d %>% | |
gather(Type, WOBA, -player_name) %>% | |
ggplot(aes(Type, WOBA, group = player_name)) + | |
geom_line() + | |
geom_point() | |
} | |
``` | |
Then show a shrinkage plot of the individual regression estimates. | |
```{r} | |
shrink_plot(Individual_est) + | |
ggtitle("50 Individual Regression Curves") + | |
centertitle() + | |
increasefont() + | |
ylim(.2, .6) | |
``` | |
### Multilevel Fitting | |
Quick multilevel model fit using the ```lmer()``` function. | |
```{r} | |
newfit <- lmer(estimated_woba_using_speedangle ~ | |
effect + (1 + effect | player_name), | |
data = sc_regular_2) | |
``` | |
Posterior summaries: | |
```{r} | |
summary(newfit) | |
``` | |
Collect the estimated random effects for all players. | |
```{r} | |
B <- coef(newfit)[[1]] | |
names(B) <- c("beta0", "beta1") | |
B$player_name <- row.names(B) | |
row.names(B) <- NULL | |
head(B) | |
``` | |
Construct a shrinkage plot for the multilevel estimates. | |
```{r} | |
shrink_plot(B) + | |
ggtitle("50 Multilevel Regression Curves") + | |
centertitle() + increasefont() + | |
ylim(.2, .6) | |
``` | |
To compare the shrinkage of the individual slopes and intercepts, construct a scatterplot of the regression estimates and overlay the multilevel model estimates. | |
```{r} | |
Individual_est$Type = "Individual" | |
B$Type = "Multilevel" | |
``` | |
```{r} | |
ggplot(data = rbind(Individual_est, B), | |
aes(beta0, beta1, color = Type)) + | |
geom_point() + | |
increasefont() + | |
ggtitle("Individual and Multilevel Estimates") + | |
centertitle() | |
``` | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment