Created
November 20, 2020 23:36
-
-
Save bayesball/920f1102fadf84c8fe0f8f67b1b773bb to your computer and use it in GitHub Desktop.
Illustrate of bivariate density estimates on 2019 Statcast data
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "Some Density Estimates" | |
output: | |
html_document: | |
df_print: paged | |
--- | |
```{r setup, include=FALSE} | |
knitr::opts_chunk$set(echo = TRUE, | |
message = FALSE, | |
warning = FALSE) | |
``` | |
#### Ideas (2D density estimation via ggplot2) | |
- density estimation is very good for graphing large amounts of two-dimensional data | |
- better than scatterplots where one has to handle the overplotting issue | |
- maybe harder to interpret? | |
- basic function for computing density estimate is ```kde2d()``` | |
- output of density estimate is a contour plot | |
- personally like filled contour bands which displays region of highest probability | |
- geometric object is ```geom_density2d_filled()``` | |
- illustrate for (1) plotting locations in a zone, (2) in-play locations of batted balls, (3) distribution of launch variables | |
- when you have different facet's distinguish from joint density estimate or conditional density estimate for each facet (this ensures that peak intensity is the same in each facet) | |
- I think one is more interested in seeing conditional density estimates (one gets the conditional estimates by the contour_var = "ndensity" argument option) | |
For this exercise I will just be using the readr and ggplot2 and dplyr packages. | |
```{r} | |
library(readr) | |
library(ggplot2) | |
library(dplyr) | |
``` | |
Load in Statcast data for the complete 2019 season: | |
```{r} | |
statcast2020 <- read_csv("~/Dropbox/2016 WORK/BLOG Baseball R/OTHER/StatcastData/statcast2019.csv") | |
``` | |
Here are several functions that I use to cut down on typing the ggplot() functions. I find the add_zone() especially helpful for adding the zone boundaries to the plot. | |
```{r} | |
increasefont <- function(){ | |
theme(text=element_text(size=18)) | |
} | |
centertitle <- function(){ | |
theme(plot.title = element_text(colour = "blue", size = 18, | |
hjust = 0.5, vjust = 0.8, angle = 0)) | |
} | |
add_zone <- function(Color = "red"){ | |
topKzone <- 3.5 | |
botKzone <- 1.6 | |
inKzone <- -0.85 | |
outKzone <- 0.85 | |
kZone <- data.frame( | |
x=c(inKzone, inKzone, outKzone, outKzone, inKzone), | |
y=c(botKzone, topKzone, topKzone, botKzone, botKzone) | |
) | |
geom_path(aes(.data$x, .data$y), data=kZone, | |
lwd=1, col=Color) | |
} | |
``` | |
#### Locations of Pitches | |
Locations of different pitch types for right-handed pitchers: | |
```{r} | |
ggplot(filter(statcast2020, | |
p_throws == "R", | |
pitch_type %in% | |
c("CH", "CS", "CU", "FC", "FF", | |
"FS", "FT", "KC", "SI", "SL")), | |
aes(plate_x, plate_z)) + | |
geom_density2d_filled(contour_var = "ndensity") + | |
add_zone("red") + | |
xlim(-2.2, 2.2) + | |
ylim(-0.5, 4) + | |
facet_wrap(~ pitch_name) + | |
ggtitle("Right-Arm Pitchers") + | |
centertitle() + | |
theme(text=element_text(size=16)) + | |
theme(legend.position = "none") + | |
coord_equal() | |
``` | |
Locations of different pitch types for left-handed pitchers: | |
```{r} | |
ggplot(filter(statcast2020, | |
p_throws == "L", | |
pitch_type %in% | |
c("CH", "CS", "CU", "FC", "FF", | |
"FS", "FT", "KC", "SI", "SL")), | |
aes(plate_x, plate_z)) + | |
geom_density2d_filled(contour_var = "ndensity") + | |
add_zone("red") + | |
xlim(-2.2, 2.2) + | |
ylim(-0.5, 4) + | |
facet_wrap(~ pitch_name) + | |
ggtitle("Left-Arm Pitchers") + | |
centertitle() + | |
theme(text=element_text(size=16)) + | |
theme(legend.position = "none") + | |
coord_equal() | |
``` | |
#### Zone locations of various hits | |
Zone locations of home runs | |
```{r} | |
ggplot(filter(statcast2020, | |
events == "home_run"), | |
aes(plate_x, plate_z)) + | |
geom_density2d_filled() + | |
add_zone("red") + | |
xlim(-1.5, 1.5) + | |
ylim(1, 4) + | |
ggtitle("Home Runs") + | |
centertitle() + | |
facet_wrap(~ stand) + | |
coord_equal()+ | |
theme(legend.position = "none") | |
``` | |
Zone locations of hits | |
```{r} | |
ggplot(filter(statcast2020, | |
events %in% c("single", "double", | |
"triple", "home_run")), | |
aes(plate_x, plate_z)) + | |
geom_density2d_filled() + | |
add_zone("red") + | |
xlim(-1.5, 1.5) + | |
ylim(1, 4) + | |
ggtitle("Hits") + | |
centertitle() + | |
facet_wrap(~ stand) + | |
coord_equal()+ | |
theme(legend.position = "none") | |
``` | |
Locations of hit locations by pitching arm and batter side | |
```{r} | |
ggplot(filter(statcast2020, | |
events %in% c("single", "double", | |
"triple", "home_run")), | |
aes(plate_x, plate_z)) + | |
geom_density2d_filled(contour_var = "ndensity") + | |
add_zone("red") + | |
xlim(-1.5, 1.5) + | |
ylim(1, 4) + | |
ggtitle("Hits") + | |
centertitle() + | |
facet_grid(stand ~ p_throws, | |
labeller=label_both) + | |
increasefont() + | |
theme(legend.position = "none") + | |
coord_equal() | |
``` | |
#### Locations of strikes | |
Locations of called strikes | |
```{r} | |
ggplot(filter(statcast2020, | |
description == "called_strike"), | |
aes(plate_x, plate_z)) + | |
geom_density2d_filled() + | |
add_zone("red") + | |
xlim(-1.5, 1.5) + | |
ylim(1, 4) + | |
ggtitle("Called Strike") + | |
centertitle() + | |
facet_wrap(~ stand) + | |
coord_equal()+ | |
theme(legend.position = "none") | |
``` | |
Divide by pitching arm and batting side | |
```{r} | |
ggplot(filter(statcast2020, | |
description == "called_strike"), | |
aes(plate_x, plate_z)) + | |
geom_density2d_filled(contour_var = "ndensity") + | |
add_zone("red") + | |
xlim(-1.5, 1.5) + | |
ylim(1, 4) + | |
ggtitle("Called Strike") + | |
centertitle() + | |
facet_grid(stand ~ p_throws, | |
labeller=label_both) + | |
increasefont() + | |
theme(legend.position = "none") + | |
coord_equal() | |
``` | |
Location of swinging strikes | |
```{r} | |
ggplot(filter(statcast2020, | |
description %in% | |
c("swinging_strike", | |
"swinging_strike_blocked")), | |
aes(plate_x, plate_z)) + | |
geom_density2d_filled(contour_var = "ndensity") + | |
add_zone("red") + | |
xlim(-1.5, 1.5) + | |
ylim(1, 4) + | |
ggtitle("Swinging Strike") + | |
centertitle() + | |
facet_grid(stand ~ p_throws, | |
labeller=label_both) + | |
increasefont() + | |
theme(legend.position = "none") + | |
coord_equal() | |
``` | |
#### Balls in Play | |
Field locations of BIP (note that I am transforming the variables hc_x and hc_y to get reasonable looking field locations): | |
```{r} | |
ggplot(filter(statcast2020, | |
type == "X"), | |
aes(hc_x - 125.42, 198.27 - hc_y)) + | |
geom_density2d_filled() + | |
coord_equal()+ | |
theme(legend.position = "none") + | |
xlab("") + ylab("") | |
``` | |
Locations of doubles | |
```{r} | |
ggplot(filter(statcast2020, | |
type == "X", | |
events == "double"), | |
aes(hc_x - 125.42, 198.27 - hc_y)) + | |
geom_density2d_filled() + | |
coord_equal()+ | |
theme(legend.position = "none")+ | |
xlab("") + ylab("") | |
``` | |
Locations of home runs | |
```{r} | |
ggplot(filter(statcast2020, | |
type == "X", | |
events == "home_run"), | |
aes(hc_x - 125.42, 198.27 - hc_y)) + | |
geom_density2d_filled() + | |
coord_equal()+ | |
theme(legend.position = "none")+ | |
xlab("") + ylab("") | |
``` | |
Look at locations of BIP types -- define a new variable BB_Type: | |
```{r} | |
statcast2020 %>% | |
filter(type == "X") %>% | |
mutate(BB_Type = ifelse(launch_angle <= 10, | |
"Ground Ball", | |
ifelse(launch_angle <= 25, | |
"Line Drive", | |
ifelse(launch_angle <= 50, | |
"Fly Ball", | |
"Pop Up")))) -> | |
scip | |
``` | |
Look at batters of each side -- locations of four types of batted balls. | |
```{r} | |
ggplot(filter(scip, stand == "L", | |
is.na(BB_Type) == FALSE), | |
aes(hc_x - 125.42, 198.27 - hc_y)) + | |
geom_density2d_filled(contour_var = "ndensity") + | |
ylim(-20, 80) + | |
xlim(-50, 50) + | |
ggtitle("Left-Handed Batters") + | |
increasefont() + | |
centertitle() + | |
theme(legend.position = "none")+ | |
coord_equal() + | |
xlab("") + ylab("") + | |
facet_wrap(~ BB_Type, ncol = 2) | |
``` | |
```{r} | |
ggplot(filter(scip, stand == "R", | |
is.na(BB_Type) == FALSE), | |
aes(hc_x - 125.42, 198.27 - hc_y)) + | |
geom_density2d_filled(contour_var = "ndensity") + | |
ylim(-20, 80) + | |
xlim(-50, 50) + | |
ggtitle("Right-Handed Batters") + | |
increasefont() + | |
centertitle() + | |
theme(legend.position = "none")+ | |
coord_equal() + | |
xlab("") + ylab("") + | |
facet_wrap(~ BB_Type, ncol = 2) | |
``` | |
#### Distribution of launch variables launch angle and launch speed | |
All batted balls: | |
```{r} | |
ggplot(scip, | |
aes(launch_angle, launch_speed)) + | |
geom_density2d_filled() + | |
ggtitle("Launch Variables") + | |
increasefont() + | |
centertitle() + | |
theme(legend.position = "none") + | |
xlim(-50, 100) + | |
ylim(55, 120) | |
``` | |
Home runs: | |
```{r} | |
ggplot(filter(scip, events == "home_run"), | |
aes(launch_angle, launch_speed)) + | |
geom_density2d_filled() + | |
ggtitle("Launch Variables for Home Runs") + | |
increasefont() + | |
centertitle() + | |
theme(legend.position = "none") + | |
xlim(15, 42) + | |
ylim(92, 112) | |
``` | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment