Skip to content

Instantly share code, notes, and snippets.

@bayesball
Created November 20, 2020 23:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bayesball/920f1102fadf84c8fe0f8f67b1b773bb to your computer and use it in GitHub Desktop.
Save bayesball/920f1102fadf84c8fe0f8f67b1b773bb to your computer and use it in GitHub Desktop.
Illustrate of bivariate density estimates on 2019 Statcast data
---
title: "Some Density Estimates"
output:
html_document:
df_print: paged
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,
message = FALSE,
warning = FALSE)
```
#### Ideas (2D density estimation via ggplot2)
- density estimation is very good for graphing large amounts of two-dimensional data
- better than scatterplots where one has to handle the overplotting issue
- maybe harder to interpret?
- basic function for computing density estimate is ```kde2d()```
- output of density estimate is a contour plot
- personally like filled contour bands which displays region of highest probability
- geometric object is ```geom_density2d_filled()```
- illustrate for (1) plotting locations in a zone, (2) in-play locations of batted balls, (3) distribution of launch variables
- when you have different facet's distinguish from joint density estimate or conditional density estimate for each facet (this ensures that peak intensity is the same in each facet)
- I think one is more interested in seeing conditional density estimates (one gets the conditional estimates by the contour_var = "ndensity" argument option)
For this exercise I will just be using the readr and ggplot2 and dplyr packages.
```{r}
library(readr)
library(ggplot2)
library(dplyr)
```
Load in Statcast data for the complete 2019 season:
```{r}
statcast2020 <- read_csv("~/Dropbox/2016 WORK/BLOG Baseball R/OTHER/StatcastData/statcast2019.csv")
```
Here are several functions that I use to cut down on typing the ggplot() functions. I find the add_zone() especially helpful for adding the zone boundaries to the plot.
```{r}
increasefont <- function(){
theme(text=element_text(size=18))
}
centertitle <- function(){
theme(plot.title = element_text(colour = "blue", size = 18,
hjust = 0.5, vjust = 0.8, angle = 0))
}
add_zone <- function(Color = "red"){
topKzone <- 3.5
botKzone <- 1.6
inKzone <- -0.85
outKzone <- 0.85
kZone <- data.frame(
x=c(inKzone, inKzone, outKzone, outKzone, inKzone),
y=c(botKzone, topKzone, topKzone, botKzone, botKzone)
)
geom_path(aes(.data$x, .data$y), data=kZone,
lwd=1, col=Color)
}
```
#### Locations of Pitches
Locations of different pitch types for right-handed pitchers:
```{r}
ggplot(filter(statcast2020,
p_throws == "R",
pitch_type %in%
c("CH", "CS", "CU", "FC", "FF",
"FS", "FT", "KC", "SI", "SL")),
aes(plate_x, plate_z)) +
geom_density2d_filled(contour_var = "ndensity") +
add_zone("red") +
xlim(-2.2, 2.2) +
ylim(-0.5, 4) +
facet_wrap(~ pitch_name) +
ggtitle("Right-Arm Pitchers") +
centertitle() +
theme(text=element_text(size=16)) +
theme(legend.position = "none") +
coord_equal()
```
Locations of different pitch types for left-handed pitchers:
```{r}
ggplot(filter(statcast2020,
p_throws == "L",
pitch_type %in%
c("CH", "CS", "CU", "FC", "FF",
"FS", "FT", "KC", "SI", "SL")),
aes(plate_x, plate_z)) +
geom_density2d_filled(contour_var = "ndensity") +
add_zone("red") +
xlim(-2.2, 2.2) +
ylim(-0.5, 4) +
facet_wrap(~ pitch_name) +
ggtitle("Left-Arm Pitchers") +
centertitle() +
theme(text=element_text(size=16)) +
theme(legend.position = "none") +
coord_equal()
```
#### Zone locations of various hits
Zone locations of home runs
```{r}
ggplot(filter(statcast2020,
events == "home_run"),
aes(plate_x, plate_z)) +
geom_density2d_filled() +
add_zone("red") +
xlim(-1.5, 1.5) +
ylim(1, 4) +
ggtitle("Home Runs") +
centertitle() +
facet_wrap(~ stand) +
coord_equal()+
theme(legend.position = "none")
```
Zone locations of hits
```{r}
ggplot(filter(statcast2020,
events %in% c("single", "double",
"triple", "home_run")),
aes(plate_x, plate_z)) +
geom_density2d_filled() +
add_zone("red") +
xlim(-1.5, 1.5) +
ylim(1, 4) +
ggtitle("Hits") +
centertitle() +
facet_wrap(~ stand) +
coord_equal()+
theme(legend.position = "none")
```
Locations of hit locations by pitching arm and batter side
```{r}
ggplot(filter(statcast2020,
events %in% c("single", "double",
"triple", "home_run")),
aes(plate_x, plate_z)) +
geom_density2d_filled(contour_var = "ndensity") +
add_zone("red") +
xlim(-1.5, 1.5) +
ylim(1, 4) +
ggtitle("Hits") +
centertitle() +
facet_grid(stand ~ p_throws,
labeller=label_both) +
increasefont() +
theme(legend.position = "none") +
coord_equal()
```
#### Locations of strikes
Locations of called strikes
```{r}
ggplot(filter(statcast2020,
description == "called_strike"),
aes(plate_x, plate_z)) +
geom_density2d_filled() +
add_zone("red") +
xlim(-1.5, 1.5) +
ylim(1, 4) +
ggtitle("Called Strike") +
centertitle() +
facet_wrap(~ stand) +
coord_equal()+
theme(legend.position = "none")
```
Divide by pitching arm and batting side
```{r}
ggplot(filter(statcast2020,
description == "called_strike"),
aes(plate_x, plate_z)) +
geom_density2d_filled(contour_var = "ndensity") +
add_zone("red") +
xlim(-1.5, 1.5) +
ylim(1, 4) +
ggtitle("Called Strike") +
centertitle() +
facet_grid(stand ~ p_throws,
labeller=label_both) +
increasefont() +
theme(legend.position = "none") +
coord_equal()
```
Location of swinging strikes
```{r}
ggplot(filter(statcast2020,
description %in%
c("swinging_strike",
"swinging_strike_blocked")),
aes(plate_x, plate_z)) +
geom_density2d_filled(contour_var = "ndensity") +
add_zone("red") +
xlim(-1.5, 1.5) +
ylim(1, 4) +
ggtitle("Swinging Strike") +
centertitle() +
facet_grid(stand ~ p_throws,
labeller=label_both) +
increasefont() +
theme(legend.position = "none") +
coord_equal()
```
#### Balls in Play
Field locations of BIP (note that I am transforming the variables hc_x and hc_y to get reasonable looking field locations):
```{r}
ggplot(filter(statcast2020,
type == "X"),
aes(hc_x - 125.42, 198.27 - hc_y)) +
geom_density2d_filled() +
coord_equal()+
theme(legend.position = "none") +
xlab("") + ylab("")
```
Locations of doubles
```{r}
ggplot(filter(statcast2020,
type == "X",
events == "double"),
aes(hc_x - 125.42, 198.27 - hc_y)) +
geom_density2d_filled() +
coord_equal()+
theme(legend.position = "none")+
xlab("") + ylab("")
```
Locations of home runs
```{r}
ggplot(filter(statcast2020,
type == "X",
events == "home_run"),
aes(hc_x - 125.42, 198.27 - hc_y)) +
geom_density2d_filled() +
coord_equal()+
theme(legend.position = "none")+
xlab("") + ylab("")
```
Look at locations of BIP types -- define a new variable BB_Type:
```{r}
statcast2020 %>%
filter(type == "X") %>%
mutate(BB_Type = ifelse(launch_angle <= 10,
"Ground Ball",
ifelse(launch_angle <= 25,
"Line Drive",
ifelse(launch_angle <= 50,
"Fly Ball",
"Pop Up")))) ->
scip
```
Look at batters of each side -- locations of four types of batted balls.
```{r}
ggplot(filter(scip, stand == "L",
is.na(BB_Type) == FALSE),
aes(hc_x - 125.42, 198.27 - hc_y)) +
geom_density2d_filled(contour_var = "ndensity") +
ylim(-20, 80) +
xlim(-50, 50) +
ggtitle("Left-Handed Batters") +
increasefont() +
centertitle() +
theme(legend.position = "none")+
coord_equal() +
xlab("") + ylab("") +
facet_wrap(~ BB_Type, ncol = 2)
```
```{r}
ggplot(filter(scip, stand == "R",
is.na(BB_Type) == FALSE),
aes(hc_x - 125.42, 198.27 - hc_y)) +
geom_density2d_filled(contour_var = "ndensity") +
ylim(-20, 80) +
xlim(-50, 50) +
ggtitle("Right-Handed Batters") +
increasefont() +
centertitle() +
theme(legend.position = "none")+
coord_equal() +
xlab("") + ylab("") +
facet_wrap(~ BB_Type, ncol = 2)
```
#### Distribution of launch variables launch angle and launch speed
All batted balls:
```{r}
ggplot(scip,
aes(launch_angle, launch_speed)) +
geom_density2d_filled() +
ggtitle("Launch Variables") +
increasefont() +
centertitle() +
theme(legend.position = "none") +
xlim(-50, 100) +
ylim(55, 120)
```
Home runs:
```{r}
ggplot(filter(scip, events == "home_run"),
aes(launch_angle, launch_speed)) +
geom_density2d_filled() +
ggtitle("Launch Variables for Home Runs") +
increasefont() +
centertitle() +
theme(legend.position = "none") +
xlim(15, 42) +
ylim(92, 112)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment