bayesball/Count_Effect_JAGS_final.qmd

## Count_Effect_JAGS_final.qmd
---
title: "Count Effects Model"
format: html
editor: visual
---

Illustration of Bayesian fitting of a multilevel model for situational data using JAGS software.
Read in Statcast data from the first half of the 2023 season.

Reference:  Albert, Jim (2024).  Bayesian Workflow of a Situational Random Effects Model
https://bayesball.github.io/papers/bayes_workflow.html

Note:  The JAGS software needs to be initially installed on one's laptop from
https://sourceforge.net/projects/mcmc-jags/files/

Outline:

-   the multilevel Bayesian model
-   data setup
-   Tukey mean-difference plot of observed data
-   simulating from posterior via JAGS
-   posterior estimates of player abilities and bias effect
-   posterior-predictive simulations

Common situations of interest:

-   home versus away
-   day versus night
-   different counts
-   facing pitcher of same or opposite arm
-   facing pitchers of different qualities
-   clutch versus non-clutch situations

My belief: Generally, situations tend to be *biases*.

-   There is an effect due to the situation.
-   But the size of the effect is the same for all hitters. In other words, there is little evidence that some hitters have extra ability to take advantage of the situation.

Demonstrate this belief by fitting of a "bias random effects" model for 2023 hitters on balls in play, and exploring the count advantage in hitting.

Have data on balls in-play for the 2023 season. A good measure of the quality of a ball in play is the expected wOBA statistic.

Let $y_j$ denote the square root of the expected wOBA for the $j$th ball in play.

Focus on the situations where the count is "neutral" and "behind" (ignore the "ahead" counts since relatively few in-play balls happen on ahead counts).

Model is $y_j \sim N(\mu_j, \sigma)$ where the mean has the structure $$
\mu_j = \beta_0 + \beta_1 * EFF_j + \alpha_{p(j)},
$$ where $EFF_j$ is the count effect, $\alpha_{p(j)}$ is the player effect for player $p(j)$, and the collection of player effects $\alpha_1, ..., \alpha_N$ are iid $N(0, \tau)$. At the last stage, the parameters $\sigma$ and $\tau$ are assigned weakly informative distributions.

Read in packages for this work.  (These packages need to be initially installed into one's R system.)

```{r}
#| message: false
library(readr)
library(dplyr)
library(runjags)
library(coda)
library(ggplot2)
library(bayesplot)
library(purrr)
```

Read in 2023 in-play data before June 15.  Focus on batters with at least 100 balls in play in this period

```{r}
#| message: false
sc_ip <- read_csv("https://raw.githubusercontent.com/bayesball/HomeRuns2021/main/sc2023_bip.csv")
```

```{r}
sc_regular_150a <- filter(sc_ip,
                  c_type %in% c("neutral", "behind")) |>
             mutate(effect = ifelse(c_type == "neutral",
                           0.5, -0.5))
```


Histograms of the expected wOBA and the root data illustrating the usefulness of the root reexpression.

```{r}
ggplot(sc_regular_150a,
       aes(estimated_woba_using_speedangle)) +
  geom_histogram(bins = 20,
                 color = "white",
                 fill = "blue") -> p1
ggplot(sc_regular_150a,
       aes(root_woba)) +
  geom_histogram(bins = 20,
                 color = "white",
                 fill = "blue") -> p2
d1 <- p1$data |>
  mutate(Value = estimated_woba_using_speedangle,
         Type = "E(wOBA)")
d2 <- p2$data |>
  mutate(Value = root_woba,
         Type = "Root E(wOBA)")
ggplot(rbind(d1, d2)) +
  geom_histogram(aes(Value),
                 bins = 20,
                 color = "white",
                 fill = "blue") +
  facet_wrap(~ Type, ncol = 1) +
  theme(text=element_text(size=18))
```

Using the function `mean_diff_plot()`, construct a Tukey mean-difference plot of the improvement in the Neutral count compared with the Behind count.

```{r}
mean_diff_plot <- function(sc_regular){
  sc_regular %>%
    group_by(player, c_type) %>%
    summarize(N = n(),
              M = mean(root_woba,
                       na.rm = TRUE),
              .groups = "drop") -> S1
  inner_join(filter(S1, c_type == "behind"),
             filter(S1, c_type == "neutral"),
             by = "player") -> S2
  ggplot(S2, aes((M.x + M.y) / 2,
                 (M.y - M.x))) +
    geom_point() +
    geom_hline(yintercept = 0, color="red")  +
    xlab("Mean Root wOBA") +
    ylab("Improvement") +
    ggtitle("Improvement in Root wOBA in Neutral\n Compared to Behind Counts") +
    theme(plot.title = element_text(colour = "blue", size = 18,
                                    hjust = 0.5, vjust = 0.8, angle = 0),
          plot.subtitle = element_text(colour = "red", size = 16,
                                       hjust = 0.5, vjust = 0.8, angle = 0)) +
    theme(text=element_text(size=18))
}
```

```{r}
(p1 <- mean_diff_plot(sc_regular_150a))
```

Illustration of the random effects model with a bias component:

```{r}
df <- data.frame(x1 = 0.53, x2 = 0.56, y1 = 4, y2 = 4)
ggplot(data = data.frame(x = c(.35, .7)), aes(x)) +
  stat_function(fun = dnorm,
                args = list(mean = 0.53, sd = 0.03)) +
  stat_function(fun = dnorm,
                args = list(mean = 0.56, sd = 0.03),
                color = "red") +
  geom_vline(xintercept = 0.53, linetype = "dashed") +
  geom_vline(xintercept = 0.56, linetype = "dashed",
             color = "red") +
  theme(text=element_text(size=18)) +
  labs(x = "Ability", y = "") +
  annotate(geom = "text", x= .45, y = 10,
           label = "BEHIND", size = 7) +
  annotate(geom = "text", x= .65, y = 10,
           label = "NEUTRAL", size = 7,
           color = "red") +
geom_segment(data = df, aes(x = x1, y = y1, xend = x2,
                 yend = y2, colour = "segment"),
             arrow = arrow(length = unit(0.03, "npc"),
                           ends = "both")) +
  annotate(geom = "text", x= .545, y = 3,
           label = "beta", parse = TRUE, size = 7,
           color = "red") +
  guides(color="none")
```


Fit a bias model with random effects by use of JAGS:

Define the Bayesian model in a script.

```{r}
modelString <-"
model {
## sampling
for (i in 1:N){
y[i] ~ dnorm(mu[i], invsigma2)
mu[i] <- beta * eff[i] + alpha_j[player[i]]
}
## priors
for (j in 1:J){
   alpha_j[j] ~ dnorm(theta, invtau2)
}
theta ~ dnorm(0, 0.001)
beta ~ dnorm(0, 0.001)
invsigma2 <- 1 / (sigma * sigma)
sigma ~ dt(0, 1, 1) T(0, )
invtau2 <- 1 / (tau * tau)
tau ~ dt(0, 1, 1) T(0, )
}
"
```

Create a list `jags_data` including all of the data needed in the JAGS run.

```{r}
jags_data <- list(y = sc_regular_150a$root_woba,
                  eff = sc_regular_150a$effect,
                  player = sc_regular_150a$player,
                  N = dim(sc_regular_150a)[1],
                  J = max(sc_regular_150a$player))
```

Run JAGS with 1000 iterations at the adaption stage, 2000 iterations at the warmup stage, and collect 5000 iterations. Will collect all parameters $\alpha, \beta_0, \beta_1, \sigma, \tau$ in the model.

```{r}
#| message: false
posterior <- run.jags(modelString,
                      n.chains = 1,
                      data = jags_data,
                      monitor = c("theta", "beta", "alpha_j",
                                  "sigma", "tau"),
                      adapt = 1000,
                      burnin = 2000,
                      sample = 5000)
```

Put the simulated draws in a data frame.

```{r}
posterior %>% as.mcmc() %>%
  as.data.frame() -> post2
```

Plot of fitted random effects curve:

```{r}
ps <- post2[sample(5000, 20),c("theta", "tau")]
myplot <- ggplot(data.frame(x = c(.4, .7)), aes(x)) +
  theme(text=element_text(size=18)) +
  labs(x = expression(alpha), y = "Density")
for(j in 1:20){
  myplot <- myplot +
    stat_function(fun = dnorm,
                args = list(mean = ps$theta[j],
                            sd = ps$tau[j]),
                alpha = 0.35)
}
myplot
```

Several graphs comparing player estimates (observed and multilevel).

Jitter plot comparing the two sets of estimates.

```{r}
sc_regular_150a |>
  group_by(player) |>
  summarize(IP = n(),
            Estimate = mean(root_woba)) |>
  mutate(Type = "Observed")  |>
  select(player, Estimate, Type) -> S
get_post_est <- function(j){
  mean(post2[, 2 + j])
}
S1 <- data.frame(player = 1:jags_data$J,
                 Estimate = sapply(1:jags_data$J,
                                   get_post_est)) |>
  mutate(Type = "Multilevel")
ggplot(rbind(S, S1), aes(Type, Estimate)) +
  geom_jitter() +
  coord_flip()
```

Shrinkage plot showing the shrinkage of the observed root wOBA estimates towards the multilevel estimates.

```{r}
shrinkage_plot <- function(S, N = 15, seed = 12){
  # S has variables Player, Individual, Multilevel
  require(tidyr)
  set.seed(seed)
  S2 <- pivot_longer(sample_n(S, size = N),
                     names_to = "Type",
                     values_to = "AVG", -Player)
  S2a <- filter(S2, Type == "Individual")
  ggplot(S2, aes(Type, AVG, group=Player)) +
    geom_line() + geom_point() +
    ggplot2::annotate("text", x = 0.75, y = S2a$AVG,
                      label = S2a$Player) +
    ggtitle("Shrinkage Plot") +
    theme(plot.title = element_text(
      colour = "blue", size = 18, hjust = 0.5))
}
S |>
  mutate(Player = player,
         Individual = Estimate) |>
  select(Player, Individual) -> H
S1 |>
  mutate(Player = player,
         Multilevel = Estimate) |>
  select(Player, Multilevel) -> H1
H01 <- inner_join(H, H1, by = "Player")
shrinkage_plot(H01, N = 30, seed = 10)
```

Posterior of bias coefficient $\beta$:

```{r}
ggplot(post2, aes(beta)) +
  geom_histogram(bins = 20, color = "white", fill = "tan") +
  theme(text=element_text(size=18)) +
  theme(plot.title = element_text(colour = "blue", size = 18,
                     hjust = 0.5, vjust = 0.8, angle = 0)) +
  ggtitle("Posterior of Bias Coefficient")
```

Alternative display using `bayesplot` package:

```{r}
mcmc_areas(post2, "beta", prob = 0.90)
```

90% interval estimate for beta:

```{r}
quantile(post2[, "beta"], c(.05, .95))
```

The function `one_sim()` implements one simulation of replicated data from the posterior predictive distribution.

-   `alpha` is a simulated draw of random effects parameters
-   `mean0` is the simulated draw of $\mu_j$ for all players behind in the count
-   `mean1` is the simulated draw of $\mu_j$ for all players in a neutral count
-   the means `mean0` and `mean1` are merged with full dataset
-   values of `y` are simulated from normal distributions

```{r}
one_sim <- function(j){
  # simulates one replicated dataset
  # simulated draws of random effect parameters
  alpha <- unlist(post2[j, c(3:(jags_data$J + 2))])
  # simulated player means when behind
  mean0 <- data.frame(player = 1:jags_data$J,
                      mu =  post2[j, "beta"] * (- 0.5) + alpha,
                    c_type = "behind")
  # simulated player means when ahead
  mean1 <- data.frame(player = 1:jags_data$J,
                      mu = post2[j, "beta"] * (0.5) + alpha,
                    c_type = "neutral")
  mean01 <- rbind(mean0, mean1)
  # merge player means with sc data
  # simulate root woba from sampling model
  inner_join(select(sc_regular_150a,
                    player, c_type), mean01,
             by = c("player" = "player",
                    "c_type" = "c_type"))  |>
    mutate(root_woba = rnorm(jags_data$N, mu,
                             post2[j, "sigma"]))
}
```

The function `compare_plots()` shows mean-difference plots of the observed and one draw from posterior predictive distribution.

```{r}
compare_plots <- function(jj){
  # compare observed and simulated pp m/d plots
  p2 <- mean_diff_plot(one_sim(jj))
  p1$data |> mutate(Type = "Observed") -> d1
  p2$data |> mutate(Type = "Simulated") -> d2
  ggplot(rbind(d1, d2), aes((M.x + M.y) / 2,
                      (M.y - M.x))) +
         geom_point() +
         geom_hline(yintercept = 0, color="red") +
   facet_wrap(~ Type, ncol = 1) +
   theme(text=element_text(size=18)) +
    geom_hline(yintercept = 0, color="red")  +
    xlab("Root wOBA") +
    ylab("Improvement") +
    ggtitle("Improvement in Root wOBA in Neutral\n Compared to Behind Counts") +
    theme(plot.title = element_text(colour = "blue", size = 18,
                                    hjust = 0.5, vjust = 0.8, angle = 0),
          plot.subtitle = element_text(colour = "red", size = 16,
                                       hjust = 0.5, vjust = 0.8, angle = 0))
}
```

```{r}
compare_plots(sample(5000, 1))
```

The function test function implements a posterior-predictive check. This particular function checks if the simulated data resembles the observed data with respect to the variation in the {$(M_{neutral} + M_{behind}) /2$} values and the {$M_{neutral} - M_{behind}$} values.

The checking function is $(S_1, S_2)$ where

-   $S_1 = SD((M_{neutral} + M_{behind}) / 2)$
-   $S_2 = SD(M_{neutral} - M_{behind})$

```{r}
test_function <- function(jj){
  p2 <- mean_diff_plot(one_sim(jj))
  p2$data |>
    summarize(S1 = sd((M.x + M.y) / 2),
              S2 = sd(M.y - M.x))
}
```

The `test_function()` is run 500 times, collecting the values of $S_1, S_2$ from the posterior predictive distribution.

```{r}
OUT <- map(1:500, test_function) |> bind_rows()
```

The values of $S_1, S_2$ are found from the observed data.

```{r}
observed <- mean_diff_plot(sc_regular_150a)$data |>
  summarize(S1 = sd((M.x + M.y) / 2),
            S2 = sd(M.y - M.x))
```

The simulated draws are displayed as a scatterplot with the observed value shown as a big red point. Since the test function on the observed data is in the middle of the display, this indicates that the observed data (using this particular metric) is consistent with simulations from the fitted model.

```{r}
ggplot(OUT, aes(S1, S2)) +
  geom_point() +
  geom_point(data = observed, color = "red",
             size = 5) +
  xlab("SD(Mean)") +
  ylab("SD(Diff)") +
  ggtitle("Posterior Predictive Check") +
  theme(plot.title = element_text(colour = "blue", size = 18,
                                  hjust = 0.5, vjust = 0.8, angle = 0))
```
	---
	title: "Count Effects Model"
	format: html
	editor: visual
	---

	Illustration of Bayesian fitting of a multilevel model for situational data using JAGS software.
	Read in Statcast data from the first half of the 2023 season.

	Reference: Albert, Jim (2024). Bayesian Workflow of a Situational Random Effects Model
	https://bayesball.github.io/papers/bayes_workflow.html

	Note: The JAGS software needs to be initially installed on one's laptop from
	https://sourceforge.net/projects/mcmc-jags/files/

	Outline:

	- the multilevel Bayesian model
	- data setup
	- Tukey mean-difference plot of observed data
	- simulating from posterior via JAGS
	- posterior estimates of player abilities and bias effect
	- posterior-predictive simulations

	Common situations of interest:

	- home versus away
	- day versus night
	- different counts
	- facing pitcher of same or opposite arm
	- facing pitchers of different qualities
	- clutch versus non-clutch situations

	My belief: Generally, situations tend to be biases.

	- There is an effect due to the situation.
	- But the size of the effect is the same for all hitters. In other words, there is little evidence that some hitters have extra ability to take advantage of the situation.

	Demonstrate this belief by fitting of a "bias random effects" model for 2023 hitters on balls in play, and exploring the count advantage in hitting.

	Have data on balls in-play for the 2023 season. A good measure of the quality of a ball in play is the expected wOBA statistic.

	Let $y_j$ denote the square root of the expected wOBA for the $j$th ball in play.

	Focus on the situations where the count is "neutral" and "behind" (ignore the "ahead" counts since relatively few in-play balls happen on ahead counts).

	Model is $y_j \sim N(\mu_j, \sigma)$ where the mean has the structure $$
	\mu_j = \beta_0 + \beta_1 * EFF_j + \alpha_{p(j)},
	$$ where $EFF_j$ is the count effect, $\alpha_{p(j)}$ is the player effect for player $p(j)$, and the collection of player effects $\alpha_1, ..., \alpha_N$ are iid $N(0, \tau)$. At the last stage, the parameters $\sigma$ and $\tau$ are assigned weakly informative distributions.

	Read in packages for this work. (These packages need to be initially installed into one's R system.)

	```{r}
	#\| message: false
	library(readr)
	library(dplyr)
	library(runjags)
	library(coda)
	library(ggplot2)
	library(bayesplot)
	library(purrr)
	```

	Read in 2023 in-play data before June 15. Focus on batters with at least 100 balls in play in this period

	```{r}
	#\| message: false
	sc_ip <- read_csv("https://raw.githubusercontent.com/bayesball/HomeRuns2021/main/sc2023_bip.csv")
	```

	```{r}
	sc_regular_150a <- filter(sc_ip,
	c_type %in% c("neutral", "behind")) \|>
	mutate(effect = ifelse(c_type == "neutral",
	0.5, -0.5))
	```


	Histograms of the expected wOBA and the root data illustrating the usefulness of the root reexpression.

	```{r}
	ggplot(sc_regular_150a,
	aes(estimated_woba_using_speedangle)) +
	geom_histogram(bins = 20,
	color = "white",
	fill = "blue") -> p1
	ggplot(sc_regular_150a,
	aes(root_woba)) +
	geom_histogram(bins = 20,
	color = "white",
	fill = "blue") -> p2
	d1 <- p1$data \|>
	mutate(Value = estimated_woba_using_speedangle,
	Type = "E(wOBA)")
	d2 <- p2$data \|>
	mutate(Value = root_woba,
	Type = "Root E(wOBA)")
	ggplot(rbind(d1, d2)) +
	geom_histogram(aes(Value),
	bins = 20,
	color = "white",
	fill = "blue") +
	facet_wrap(~ Type, ncol = 1) +
	theme(text=element_text(size=18))
	```

	Using the function `mean_diff_plot()`, construct a Tukey mean-difference plot of the improvement in the Neutral count compared with the Behind count.

	```{r}
	mean_diff_plot <- function(sc_regular){
	sc_regular %>%
	group_by(player, c_type) %>%
	summarize(N = n(),
	M = mean(root_woba,
	na.rm = TRUE),
	.groups = "drop") -> S1
	inner_join(filter(S1, c_type == "behind"),
	filter(S1, c_type == "neutral"),
	by = "player") -> S2
	ggplot(S2, aes((M.x + M.y) / 2,
	(M.y - M.x))) +
	geom_point() +
	geom_hline(yintercept = 0, color="red") +
	xlab("Mean Root wOBA") +
	ylab("Improvement") +
	ggtitle("Improvement in Root wOBA in Neutral\n Compared to Behind Counts") +
	theme(plot.title = element_text(colour = "blue", size = 18,
	hjust = 0.5, vjust = 0.8, angle = 0),
	plot.subtitle = element_text(colour = "red", size = 16,
	hjust = 0.5, vjust = 0.8, angle = 0)) +
	theme(text=element_text(size=18))
	}
	```

	```{r}
	(p1 <- mean_diff_plot(sc_regular_150a))
	```

	Illustration of the random effects model with a bias component:

	```{r}
	df <- data.frame(x1 = 0.53, x2 = 0.56, y1 = 4, y2 = 4)
	ggplot(data = data.frame(x = c(.35, .7)), aes(x)) +
	stat_function(fun = dnorm,
	args = list(mean = 0.53, sd = 0.03)) +
	stat_function(fun = dnorm,
	args = list(mean = 0.56, sd = 0.03),
	color = "red") +
	geom_vline(xintercept = 0.53, linetype = "dashed") +
	geom_vline(xintercept = 0.56, linetype = "dashed",
	color = "red") +
	theme(text=element_text(size=18)) +
	labs(x = "Ability", y = "") +
	annotate(geom = "text", x= .45, y = 10,
	label = "BEHIND", size = 7) +
	annotate(geom = "text", x= .65, y = 10,
	label = "NEUTRAL", size = 7,
	color = "red") +
	geom_segment(data = df, aes(x = x1, y = y1, xend = x2,
	yend = y2, colour = "segment"),
	arrow = arrow(length = unit(0.03, "npc"),
	ends = "both")) +
	annotate(geom = "text", x= .545, y = 3,
	label = "beta", parse = TRUE, size = 7,
	color = "red") +
	guides(color="none")
	```


	Fit a bias model with random effects by use of JAGS:

	Define the Bayesian model in a script.

	```{r}
	modelString <-"
	model {
	## sampling
	for (i in 1:N){
	y[i] ~ dnorm(mu[i], invsigma2)
	mu[i] <- beta * eff[i] + alpha_j[player[i]]
	}
	## priors
	for (j in 1:J){
	alpha_j[j] ~ dnorm(theta, invtau2)
	}
	theta ~ dnorm(0, 0.001)
	beta ~ dnorm(0, 0.001)
	invsigma2 <- 1 / (sigma * sigma)
	sigma ~ dt(0, 1, 1) T(0, )
	invtau2 <- 1 / (tau * tau)
	tau ~ dt(0, 1, 1) T(0, )
	}
	"
	```

	Create a list `jags_data` including all of the data needed in the JAGS run.

	```{r}
	jags_data <- list(y = sc_regular_150a$root_woba,
	eff = sc_regular_150a$effect,
	player = sc_regular_150a$player,
	N = dim(sc_regular_150a)[1],
	J = max(sc_regular_150a$player))
	```

	Run JAGS with 1000 iterations at the adaption stage, 2000 iterations at the warmup stage, and collect 5000 iterations. Will collect all parameters $\alpha, \beta_0, \beta_1, \sigma, \tau$ in the model.

	```{r}
	#\| message: false
	posterior <- run.jags(modelString,
	n.chains = 1,
	data = jags_data,
	monitor = c("theta", "beta", "alpha_j",
	"sigma", "tau"),
	adapt = 1000,
	burnin = 2000,
	sample = 5000)
	```

	Put the simulated draws in a data frame.

	```{r}
	posterior %>% as.mcmc() %>%
	as.data.frame() -> post2
	```

	Plot of fitted random effects curve:

	```{r}
	ps <- post2[sample(5000, 20),c("theta", "tau")]
	myplot <- ggplot(data.frame(x = c(.4, .7)), aes(x)) +
	theme(text=element_text(size=18)) +
	labs(x = expression(alpha), y = "Density")
	for(j in 1:20){
	myplot <- myplot +
	stat_function(fun = dnorm,
	args = list(mean = ps$theta[j],
	sd = ps$tau[j]),
	alpha = 0.35)
	}
	myplot
	```

	Several graphs comparing player estimates (observed and multilevel).

	Jitter plot comparing the two sets of estimates.

	```{r}
	sc_regular_150a \|>
	group_by(player) \|>
	summarize(IP = n(),
	Estimate = mean(root_woba)) \|>
	mutate(Type = "Observed") \|>
	select(player, Estimate, Type) -> S
	get_post_est <- function(j){
	mean(post2[, 2 + j])
	}
	S1 <- data.frame(player = 1:jags_data$J,
	Estimate = sapply(1:jags_data$J,
	get_post_est)) \|>
	mutate(Type = "Multilevel")
	ggplot(rbind(S, S1), aes(Type, Estimate)) +
	geom_jitter() +
	coord_flip()
	```

	Shrinkage plot showing the shrinkage of the observed root wOBA estimates towards the multilevel estimates.

	```{r}
	shrinkage_plot <- function(S, N = 15, seed = 12){
	# S has variables Player, Individual, Multilevel
	require(tidyr)
	set.seed(seed)
	S2 <- pivot_longer(sample_n(S, size = N),
	names_to = "Type",
	values_to = "AVG", -Player)
	S2a <- filter(S2, Type == "Individual")
	ggplot(S2, aes(Type, AVG, group=Player)) +
	geom_line() + geom_point() +
	ggplot2::annotate("text", x = 0.75, y = S2a$AVG,
	label = S2a$Player) +
	ggtitle("Shrinkage Plot") +
	theme(plot.title = element_text(
	colour = "blue", size = 18, hjust = 0.5))
	}
	S \|>
	mutate(Player = player,
	Individual = Estimate) \|>
	select(Player, Individual) -> H
	S1 \|>
	mutate(Player = player,
	Multilevel = Estimate) \|>
	select(Player, Multilevel) -> H1
	H01 <- inner_join(H, H1, by = "Player")
	shrinkage_plot(H01, N = 30, seed = 10)
	```

	Posterior of bias coefficient $\beta$:

	```{r}
	ggplot(post2, aes(beta)) +
	geom_histogram(bins = 20, color = "white", fill = "tan") +
	theme(text=element_text(size=18)) +
	theme(plot.title = element_text(colour = "blue", size = 18,
	hjust = 0.5, vjust = 0.8, angle = 0)) +
	ggtitle("Posterior of Bias Coefficient")
	```

	Alternative display using `bayesplot` package:

	```{r}
	mcmc_areas(post2, "beta", prob = 0.90)
	```

	90% interval estimate for beta:

	```{r}
	quantile(post2[, "beta"], c(.05, .95))
	```

	The function `one_sim()` implements one simulation of replicated data from the posterior predictive distribution.

	- `alpha` is a simulated draw of random effects parameters
	- `mean0` is the simulated draw of $\mu_j$ for all players behind in the count
	- `mean1` is the simulated draw of $\mu_j$ for all players in a neutral count
	- the means `mean0` and `mean1` are merged with full dataset
	- values of `y` are simulated from normal distributions

	```{r}
	one_sim <- function(j){
	# simulates one replicated dataset
	# simulated draws of random effect parameters
	alpha <- unlist(post2[j, c(3:(jags_data$J + 2))])
	# simulated player means when behind
	mean0 <- data.frame(player = 1:jags_data$J,
	mu = post2[j, "beta"] * (- 0.5) + alpha,
	c_type = "behind")
	# simulated player means when ahead
	mean1 <- data.frame(player = 1:jags_data$J,
	mu = post2[j, "beta"] * (0.5) + alpha,
	c_type = "neutral")
	mean01 <- rbind(mean0, mean1)
	# merge player means with sc data
	# simulate root woba from sampling model
	inner_join(select(sc_regular_150a,
	player, c_type), mean01,
	by = c("player" = "player",
	"c_type" = "c_type")) \|>
	mutate(root_woba = rnorm(jags_data$N, mu,
	post2[j, "sigma"]))
	}
	```

	The function `compare_plots()` shows mean-difference plots of the observed and one draw from posterior predictive distribution.

	```{r}
	compare_plots <- function(jj){
	# compare observed and simulated pp m/d plots
	p2 <- mean_diff_plot(one_sim(jj))
	p1$data \|> mutate(Type = "Observed") -> d1
	p2$data \|> mutate(Type = "Simulated") -> d2
	ggplot(rbind(d1, d2), aes((M.x + M.y) / 2,
	(M.y - M.x))) +
	geom_point() +
	geom_hline(yintercept = 0, color="red") +
	facet_wrap(~ Type, ncol = 1) +
	theme(text=element_text(size=18)) +
	geom_hline(yintercept = 0, color="red") +
	xlab("Root wOBA") +
	ylab("Improvement") +
	ggtitle("Improvement in Root wOBA in Neutral\n Compared to Behind Counts") +
	theme(plot.title = element_text(colour = "blue", size = 18,
	hjust = 0.5, vjust = 0.8, angle = 0),
	plot.subtitle = element_text(colour = "red", size = 16,
	hjust = 0.5, vjust = 0.8, angle = 0))
	}
	```

	```{r}
	compare_plots(sample(5000, 1))
	```

	The function test function implements a posterior-predictive check. This particular function checks if the simulated data resembles the observed data with respect to the variation in the {$(M_{neutral} + M_{behind}) /2$} values and the {$M_{neutral} - M_{behind}$} values.

	The checking function is $(S_1, S_2)$ where

	- $S_1 = SD((M_{neutral} + M_{behind}) / 2)$
	- $S_2 = SD(M_{neutral} - M_{behind})$

	```{r}
	test_function <- function(jj){
	p2 <- mean_diff_plot(one_sim(jj))
	p2$data \|>
	summarize(S1 = sd((M.x + M.y) / 2),
	S2 = sd(M.y - M.x))
	}
	```

	The `test_function()` is run 500 times, collecting the values of $S_1, S_2$ from the posterior predictive distribution.

	```{r}
	OUT <- map(1:500, test_function) \|> bind_rows()
	```

	The values of $S_1, S_2$ are found from the observed data.

	```{r}
	observed <- mean_diff_plot(sc_regular_150a)$data \|>
	summarize(S1 = sd((M.x + M.y) / 2),
	S2 = sd(M.y - M.x))
	```

	The simulated draws are displayed as a scatterplot with the observed value shown as a big red point. Since the test function on the observed data is in the middle of the display, this indicates that the observed data (using this particular metric) is consistent with simulations from the fitted model.

	```{r}
	ggplot(OUT, aes(S1, S2)) +
	geom_point() +
	geom_point(data = observed, color = "red",
	size = 5) +
	xlab("SD(Mean)") +
	ylab("SD(Diff)") +
	ggtitle("Posterior Predictive Check") +
	theme(plot.title = element_text(colour = "blue", size = 18,
	hjust = 0.5, vjust = 0.8, angle = 0))
	```