bayesball/count_effects_work.Rmd

## count_effects_work.Rmd
---
title: "Simple Count Effects"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,
                      message = FALSE,
                      message = FALSE)
```

## Count Effects Work

### Exploratory Work

Read in some R packages:

```{r}
library(tidyverse)
library(lme4)
library(broom)
library(CalledStrike)
```

Read in the 2019 Statcast data and focus on the balls put into play.  Define a count variable and classify the count as "ahead", "behind", or "neutral".

```{r}
statcast2019 <- read_csv("~/Dropbox/2016 WORK/BLOG Baseball R/OTHER/StatcastData/statcast2019.csv")
statcast2019 %>%
  filter(type == "X")  %>%
  mutate(count = paste(balls, "-", strikes, sep=""),
         c_type = ifelse(count %in%
            c("2-0", "3-0", "3-1"),
            "ahead", ifelse(count %in%
            c("0-0", "1-0", "1-1",
              "2-1", "3-2"),
          "neutral", "behind"))) -> sc2019_ip
```


Restrict the analysis to players with at least 200 in-play in the 2019 season.

```{r}
sc2019_ip %>%
  group_by(batter) %>%
  summarize(N = n(),
            .groups = "drop") -> S
inner_join(sc2019_ip, S, by="batter") %>%
  filter(N >= 200) -> sc_regular
```

For each regular batter  (those with at least 200 BIP) player, find the mean expected wOBA in each type of count.

```{r}
sc_regular %>%
  group_by(player_name, c_type) %>%
  summarize(N = n(),
      M = mean(estimated_woba_using_speedangle,
               na.rm = TRUE),
      .groups = "drop") -> S1
inner_join(filter(S1, c_type == "ahead"),
           filter(S1, c_type == "behind"),
           by = "player_name") -> S2
inner_join(S2,
           filter(S1, c_type == "neutral"),
           by = "player_name") -> S2
head(S2)
```

Construct a Tukey mean-difference plot, comparing ahead and behind count situations.

```{r}
ggplot(S2, aes((M.x + M.y) / 2,
               (M.x - M.y))) +
  geom_point() +
  geom_hline(yintercept = 0, color="red") +
  ggtitle("Comparing Ahead and Behind Counts") +
  centertitle() +
  ylab("Improvement in Ahead wOBA") +
  xlab("wOBA") +
  increasefont()
```

Compare the neutral and behind states.

```{r}
ggplot(S2, aes((M + M.y) / 2,
               (M - M.y))) +
  geom_point() +
  geom_hline(yintercept = 0, color="red") +
  ggtitle("Comparing Neutral and Behind Counts") +
  centertitle()
```

Comparing the ahead and neutral counts.

```{r}
ggplot(S2, aes((M.x + M) / 2,
               (M.x - M))) +
  geom_point() +
  geom_hline(yintercept = 0, color="red") +
  ggtitle("Comparing Ahead and Neutral Counts") +
  centertitle()
```

Focus on the neutral/behind count comparison.  Define the effect variable so that $\beta_0$ is the average wOBA value and $\beta_1$ represents the mean wOBA improvement in the Neutral state over the Behind state.

```{r}
sc_regular_2 <- filter(sc_regular,
             c_type %in% c("neutral", "behind"))
sc_regular_2 %>%
  mutate(effect = ifelse(c_type == "neutral",
                         0.5, -0.5)) -> sc_regular_2
```

Summaries of count effects by doing a regression on the entire dataset.

```{r}
fit <- lm(estimated_woba_using_speedangle ~ effect,
          data = sc_regular_2)
fit$coef
```

Graph these effects.

```{r}
df <- data.frame(Type = c("Behind", "Neutral"),
        Estimate = c(fit$coef[1] - fit$coef[2] / 2,
                     fit$coef[1] + fit$coef[2] / 2))
df$Group = "Pooled"
ggplot(df, aes(Type, Estimate, group = Group)) +
    geom_point() + geom_line() +
  increasefont() + ylim(0.2, 0.6) +
  ggtitle("Pooled Regression Curve") +
  centertitle()
```


### Individual Estimates

Perform individual regressions for all players with at least 200 balls-in-play.

```{r}
regressions <- sc_regular_2 %>%
  group_by(player_name) %>%
  do(tidy(lm(estimated_woba_using_speedangle ~
        effect, data=.)))
spread(dplyr::select(regressions, player_name,
              term, estimate),
       term, estimate) -> Individual_est
names(Individual_est) <-
  c("player_name", "beta0", "beta1")
```

Construct a scatterplot of the individual regression coefficients.

```{r}
ggplot(Individual_est, aes(beta0, beta1)) +
  geom_point()
```

Show effects through a shrinkage plot.

First define a ```shrink_plot()``` function.

```{r}
shrink_plot <- function(estimates, n = 50){
  estimates  %>%
    mutate(Behind = beta0 - beta1 / 2,
           Neutral = beta0 + beta1 / 2)  %>%
    select(player_name, Neutral, Behind) -> d
    set.seed(123)
    d <- d[sample(nrow(d), n, replace = FALSE), ]
    d %>%
    gather(Type, WOBA, -player_name) %>%
    ggplot(aes(Type, WOBA, group = player_name)) +
    geom_line() +
    geom_point()
}
```

Then show a shrinkage plot of the individual regression estimates.

```{r}
shrink_plot(Individual_est) +
  ggtitle("50 Individual Regression Curves") +
  centertitle() +
  increasefont() +
  ylim(.2, .6)
```

### Multilevel Fitting

Quick multilevel model fit using the ```lmer()``` function.

```{r}
newfit <- lmer(estimated_woba_using_speedangle ~
            effect + (1 + effect | player_name),
              data = sc_regular_2)

```

Posterior summaries:

```{r}
summary(newfit)
```

Collect the estimated random effects for all players.

```{r}
B <- coef(newfit)[[1]]
names(B) <- c("beta0", "beta1")
B$player_name <- row.names(B)
row.names(B) <- NULL
head(B)
```

Construct a shrinkage plot for the multilevel estimates.

```{r}
shrink_plot(B) +
  ggtitle("50 Multilevel Regression Curves") +
  centertitle() + increasefont() +
  ylim(.2, .6)
```

To compare the shrinkage of the individual slopes and intercepts, construct a scatterplot of the regression estimates and overlay the multilevel model estimates.


```{r}
Individual_est$Type = "Individual"
B$Type = "Multilevel"
```

```{r}
ggplot(data = rbind(Individual_est, B),
       aes(beta0, beta1, color = Type)) +
  geom_point() +
  increasefont() +
  ggtitle("Individual and Multilevel Estimates") +
  centertitle()
```
	---
	title: "Simple Count Effects"
	output: html_document
	---

	```{r setup, include=FALSE}
	knitr::opts_chunk$set(echo = TRUE,
	message = FALSE,
	message = FALSE)
	```

	## Count Effects Work

	### Exploratory Work

	Read in some R packages:

	```{r}
	library(tidyverse)
	library(lme4)
	library(broom)
	library(CalledStrike)
	```

	Read in the 2019 Statcast data and focus on the balls put into play. Define a count variable and classify the count as "ahead", "behind", or "neutral".

	```{r}
	statcast2019 <- read_csv("~/Dropbox/2016 WORK/BLOG Baseball R/OTHER/StatcastData/statcast2019.csv")
	statcast2019 %>%
	filter(type == "X") %>%
	mutate(count = paste(balls, "-", strikes, sep=""),
	c_type = ifelse(count %in%
	c("2-0", "3-0", "3-1"),
	"ahead", ifelse(count %in%
	c("0-0", "1-0", "1-1",
	"2-1", "3-2"),
	"neutral", "behind"))) -> sc2019_ip
	```



	Restrict the analysis to players with at least 200 in-play in the 2019 season.

	```{r}
	sc2019_ip %>%
	group_by(batter) %>%
	summarize(N = n(),
	.groups = "drop") -> S
	inner_join(sc2019_ip, S, by="batter") %>%
	filter(N >= 200) -> sc_regular
	```

	For each regular batter (those with at least 200 BIP) player, find the mean expected wOBA in each type of count.

	```{r}
	sc_regular %>%
	group_by(player_name, c_type) %>%
	summarize(N = n(),
	M = mean(estimated_woba_using_speedangle,
	na.rm = TRUE),
	.groups = "drop") -> S1
	inner_join(filter(S1, c_type == "ahead"),
	filter(S1, c_type == "behind"),
	by = "player_name") -> S2
	inner_join(S2,
	filter(S1, c_type == "neutral"),
	by = "player_name") -> S2
	head(S2)
	```

	Construct a Tukey mean-difference plot, comparing ahead and behind count situations.

	```{r}
	ggplot(S2, aes((M.x + M.y) / 2,
	(M.x - M.y))) +
	geom_point() +
	geom_hline(yintercept = 0, color="red") +
	ggtitle("Comparing Ahead and Behind Counts") +
	centertitle() +
	ylab("Improvement in Ahead wOBA") +
	xlab("wOBA") +
	increasefont()
	```

	Compare the neutral and behind states.

	```{r}
	ggplot(S2, aes((M + M.y) / 2,
	(M - M.y))) +
	geom_point() +
	geom_hline(yintercept = 0, color="red") +
	ggtitle("Comparing Neutral and Behind Counts") +
	centertitle()
	```

	Comparing the ahead and neutral counts.

	```{r}
	ggplot(S2, aes((M.x + M) / 2,
	(M.x - M))) +
	geom_point() +
	geom_hline(yintercept = 0, color="red") +
	ggtitle("Comparing Ahead and Neutral Counts") +
	centertitle()
	```

	Focus on the neutral/behind count comparison. Define the effect variable so that $\beta_0$ is the average wOBA value and $\beta_1$ represents the mean wOBA improvement in the Neutral state over the Behind state.

	```{r}
	sc_regular_2 <- filter(sc_regular,
	c_type %in% c("neutral", "behind"))
	sc_regular_2 %>%
	mutate(effect = ifelse(c_type == "neutral",
	0.5, -0.5)) -> sc_regular_2
	```

	Summaries of count effects by doing a regression on the entire dataset.

	```{r}
	fit <- lm(estimated_woba_using_speedangle ~ effect,
	data = sc_regular_2)
	fit$coef
	```

	Graph these effects.

	```{r}
	df <- data.frame(Type = c("Behind", "Neutral"),
	Estimate = c(fit$coef[1] - fit$coef[2] / 2,
	fit$coef[1] + fit$coef[2] / 2))
	df$Group = "Pooled"
	ggplot(df, aes(Type, Estimate, group = Group)) +
	geom_point() + geom_line() +
	increasefont() + ylim(0.2, 0.6) +
	ggtitle("Pooled Regression Curve") +
	centertitle()
	```


	### Individual Estimates

	Perform individual regressions for all players with at least 200 balls-in-play.

	```{r}
	regressions <- sc_regular_2 %>%
	group_by(player_name) %>%
	do(tidy(lm(estimated_woba_using_speedangle ~
	effect, data=.)))
	spread(dplyr::select(regressions, player_name,
	term, estimate),
	term, estimate) -> Individual_est
	names(Individual_est) <-
	c("player_name", "beta0", "beta1")
	```

	Construct a scatterplot of the individual regression coefficients.

	```{r}
	ggplot(Individual_est, aes(beta0, beta1)) +
	geom_point()
	```

	Show effects through a shrinkage plot.

	First define a ```shrink_plot()``` function.

	```{r}
	shrink_plot <- function(estimates, n = 50){
	estimates %>%
	mutate(Behind = beta0 - beta1 / 2,
	Neutral = beta0 + beta1 / 2) %>%
	select(player_name, Neutral, Behind) -> d
	set.seed(123)
	d <- d[sample(nrow(d), n, replace = FALSE), ]
	d %>%
	gather(Type, WOBA, -player_name) %>%
	ggplot(aes(Type, WOBA, group = player_name)) +
	geom_line() +
	geom_point()
	}
	```

	Then show a shrinkage plot of the individual regression estimates.

	```{r}
	shrink_plot(Individual_est) +
	ggtitle("50 Individual Regression Curves") +
	centertitle() +
	increasefont() +
	ylim(.2, .6)
	```

	### Multilevel Fitting

	Quick multilevel model fit using the ```lmer()``` function.

	```{r}
	newfit <- lmer(estimated_woba_using_speedangle ~
	effect + (1 + effect \| player_name),
	data = sc_regular_2)

	```

	Posterior summaries:

	```{r}
	summary(newfit)
	```

	Collect the estimated random effects for all players.

	```{r}
	B <- coef(newfit)[[1]]
	names(B) <- c("beta0", "beta1")
	B$player_name <- row.names(B)
	row.names(B) <- NULL
	head(B)
	```

	Construct a shrinkage plot for the multilevel estimates.

	```{r}
	shrink_plot(B) +
	ggtitle("50 Multilevel Regression Curves") +
	centertitle() + increasefont() +
	ylim(.2, .6)
	```

	To compare the shrinkage of the individual slopes and intercepts, construct a scatterplot of the regression estimates and overlay the multilevel model estimates.


	```{r}
	Individual_est$Type = "Individual"
	B$Type = "Multilevel"
	```

	```{r}
	ggplot(data = rbind(Individual_est, B),
	aes(beta0, beta1, color = Type)) +
	geom_point() +
	increasefont() +
	ggtitle("Individual and Multilevel Estimates") +
	centertitle()
	```