Skip to content

Instantly share code, notes, and snippets.

@bayesball
Created October 31, 2020 12:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bayesball/5fa81267e7810091d61d5b614ee3ea28 to your computer and use it in GitHub Desktop.
Save bayesball/5fa81267e7810091d61d5b614ee3ea28 to your computer and use it in GitHub Desktop.
Using tabyls() function to explore pitch selection in the 2020 World Series
---
title: "tabyls"
author: "Jim Albert"
date: "10/27/2020"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,
warning = FALSE,
message = FALSE)
```
#### Introduction
- Much of baseball data consists of counts
- Creating tables (one-way, 2-way, 3-way)
- Graphing the counts or percentages
- Base ```table()``` function is incompatible with tidyverse
- Can use ```count()``` option with ```group_by()```
- Idea is to introduce ```tabyl()``` function which is part of the ```janitor``` package
#### Load five packages
```{r}
library(tidyverse)
library(janitor)
library(teamcolors)
library(ProbBayes)
library(gridExtra)
```
#### Overall pitch types for 2020 pitchers
Let's explore pitch type use among all pitchers
in 2020 season.
Read in data and combine "Curveball" and "Knuckle Curve" categories into a single "CurveBall".
```{r}
sc2020final <- read_csv("~/Dropbox/2020 WORK/statcast2020/sc2020final.csv")
sc2020final %>%
mutate(Pitch_Name =
ifelse(pitch_name %in%
c("Curveball", "Knuckle Curve"),
"CurveBall", pitch_name)) ->
sc2020final
```
Graph of all pitch types.
```{r}
sc2020final %>%
tabyl(Pitch_Name) %>%
ggplot(aes(Pitch_Name, percent)) +
geom_col(fill = "red",
color = "white") +
coord_polar()
```
Alternative display not using polar coordinates.
```{r}
sc2020final %>%
tabyl(Pitch_Name) %>%
ggplot(aes(Pitch_Name, percent)) +
geom_col(fill = "red",
color = "white")
```
Table of fractions of pitch type facing left and right-handed hitters.
```{r}
sc2020final %>%
tabyl(Pitch_Name, stand) %>%
adorn_percentages("col") %>%
adorn_rounding(3)
```
Graph of pitch types faceted by side of batter.
```{r}
sc2020final %>%
tabyl(Pitch_Name, stand) %>%
adorn_percentages("col") %>%
pivot_longer(
cols = L:R,
names_to = "Stand",
values_to = "Percentage"
) %>%
ggplot(aes(Pitch_Name, Percentage)) +
geom_col(fill = "yellow",
color = "brown") +
coord_polar() +
facet_wrap(~ Stand, ncol = 2) +
ggtitle("All Pitchers") +
centertitle()
```
Now let's break down this comparison by arm of pitcher.
```{r}
sc2020final %>%
filter(p_throws == "L") %>%
tabyl(Pitch_Name, stand) %>%
adorn_percentages("col") %>%
adorn_rounding(3)
```
```{r}
sc2020final %>%
filter(p_throws == "R") %>%
tabyl(Pitch_Name, stand) %>%
adorn_percentages("col") %>%
adorn_rounding(3)
```
Graph for right-handed pitchers.
```{r}
sc2020final %>%
filter(p_throws == "R") %>%
tabyl(Pitch_Name, stand) %>%
adorn_percentages("col") %>%
pivot_longer(
cols = L:R,
names_to = "Stand",
values_to = "Percentage"
) %>%
ggplot(aes(Pitch_Name, Percentage)) +
geom_col(fill = "yellow",
color = "brown") +
coord_polar() +
facet_wrap(~ Stand, ncol = 2) +
ggtitle("Right-Handed Pitchers") +
centertitle()
```
Takeaways for RH pitchers:
- To lefthanded batters, for off speed pitches, tend to throw changeups (16%), sliders (13%) and curveballs (12%) with relatively equal frequencies, but changeups are the most common. Throw four seam fastballs about 36% of time.
- To righthanded batters, for off speed pitches, tend to throw a lot of sliders (24%) and curveballs (11%) and changeups (6%) are not common. Four seamers are thrown about 33% of time.
Graph for left-handed pitchers:
```{r}
sc2020final %>%
filter(p_throws == "L") %>%
tabyl(Pitch_Name, stand) %>%
adorn_percentages("col") %>%
pivot_longer(
cols = L:R,
names_to = "Stand",
values_to = "Percentage"
) %>%
ggplot(aes(Pitch_Name, Percentage)) +
geom_col(fill = "yellow",
color = "brown") +
coord_polar() +
facet_wrap(~ Stand, ncol = 2) +
ggtitle("Left-Handed Pitchers") +
centertitle()
```
Takeaways for LH pitchers:
- To lefthanded batters, for off speed pitches, tend to throw sliders (22%), followed by curveballs (12%) and changeups (5%). For fastballs, four seam fastballs and sinkers are common thrown respectively 34% and 20% of time.
- To righthanded batters, for off speed pitches, tend to throw curveballs (11%), sliders (12%) and changeups (19%). Four seamers are thrown about 35% of time.
#### World Series Data
- Read in the 2020 World Series data
- Obtained by ```baseballr``` package
```{r}
d <- read_csv("../2020 playoffs/ws2020.csv")
```
Find Team colors using ```teamcolors``` package.
```{r}
LA <- teamcolors %>%
filter(name == "Los Angeles Dodgers") %>%
select(primary) %>% pull()
TB <- teamcolors %>%
filter(name == "Tampa Bay Rays") %>%
select(tertiary) %>% pull()
```
#### Basic Use of tably() Function - One Categorical Variable
```{r}
d %>%
tabyl(pitch_name)
```
#### Some Recoding
Let's recode the pitch name variable, so Curveball and Knuckle Curve are both curve balls.
```{r}
d %>%
mutate(Pitch_Name =
ifelse(pitch_name %in%
c("Curveball", "Knuckle Curve"),
"CurveBall", pitch_name)) -> d
```
Redo earlier table:
```{r}
d %>%
tabyl(Pitch_Name)
```
#### Graph output:
```{r}
d %>%
tabyl(Pitch_Name) %>%
ggplot(aes(Pitch_Name, percent)) +
geom_col()
```
#### Two Categorical Variables
Identify team that is pitching:
```{r}
d %>%
mutate(pitch_team = ifelse(inning_topbot == "Top",
home_team, away_team)) -> d
```
Tabulate by pitch name and pitching team:
```{r}
(d %>%
tabyl(Pitch_Name, pitch_team) -> d1)
```
#### Graph
- Basic barplot, facetted by Team variable.
```{r}
d1 %>%
pivot_longer(
cols = LAD:TB,
names_to = "Team",
values_to = "Count"
) %>%
ggplot(aes(Pitch_Name, Count)) +
geom_col() +
facet_wrap(~ Team, ncol = 1) +
coord_flip()
```
- Better to use "dodge" positioning:
```{r}
d1 %>%
pivot_longer(
cols = LAD:TB,
names_to = "Team",
values_to = "Count"
) %>%
ggplot(aes(Pitch_Name, Count,
fill = Team)) +
geom_col(position = "dodge") +
coord_flip() +
scale_fill_manual(values = c(LA, TB))
```
- Pie graphs?
```{r}
d1 %>%
pivot_longer(
cols = LAD:TB,
names_to = "Team",
values_to = "Count"
) %>%
ggplot(aes(Pitch_Name, Count,
fill = Team)) +
geom_col(position = "dodge") +
coord_polar() +
scale_fill_manual(values = c(LA, TB))
```
#### Three Way tables
Categorize by pitching team, pitch type and stand of batter.
```{r}
(d %>%
tabyl(Pitch_Name, stand, pitch_team) -> d2)
```
#### adorn Functions
- ```adorn_totals()``` - add row and/or col totals
```{r}
d %>%
tabyl(Pitch_Name, pitch_team) %>%
adorn_totals(c("row", "col"))
```
- ```adorn_percentages()``` - change to row or col percentages
```{r}
d %>%
tabyl(Pitch_Name, pitch_team) %>%
adorn_percentages("col")
```
- ```adorn_pct_formatting()``` - add formatting to percentages
```{r}
d %>%
tabyl(Pitch_Name, pitch_team) %>%
adorn_percentages("col") %>%
adorn_pct_formatting()
```
- ```adorn_rounding()``` - round tabular output
```{r}
d %>%
tabyl(Pitch_Name, pitch_team) %>%
adorn_percentages("col") %>%
adorn_rounding(2)
```
- ```adorn_ns()``` -- add counts to percentages
```{r}
d %>%
tabyl(Pitch_Name, pitch_team) %>%
adorn_percentages("col") %>%
adorn_rounding(3) %>%
adorn_ns()
```
- ```adorn_title()``` - add title
```{r}
d %>%
tabyl(Pitch_Name, pitch_team) %>%
adorn_percentages("col") %>%
adorn_rounding(3) %>%
adorn_ns() %>%
adorn_title("top", "PITCH TYPE")
```
#### adorn() function on 3-way table
```{r}
(d2 %>%
adorn_percentages("col") %>%
adorn_rounding(3) -> d3)
```
#### Graph of a three-way table?
- facet by side of batter
- pie chart comparing percentages
```{r}
d3[[1]] %>%
mutate(Team = "LAD") -> d4a
d3[[2]] %>%
mutate(Team = "TB") -> d4b
(d4 <- rbind(d4a, d4b))
```
```{r}
d4 %>%
pivot_longer(
cols = L:R,
names_to = "Stand",
values_to = "Percentage"
) %>%
ggplot(aes(Pitch_Name, Percentage,
fill = Team)) +
geom_col(position = "dodge") +
coord_polar() +
facet_wrap(~ Stand, ncol = 2) +
scale_fill_manual(values = c(LA, TB))
```
#### Breakdown by Side of Pitching Arm
Left-handers:
```{r}
d %>%
filter(p_throws == "L") %>%
tabyl(Pitch_Name, stand, pitch_team) -> d2
d2 %>%
adorn_percentages("col") %>%
adorn_rounding(3) -> d3
d3[[1]] %>%
mutate(Team = "LAD") -> d4a
d3[[2]] %>%
mutate(Team = "TB") -> d4b
d4 <- rbind(d4a, d4b)
d4 %>%
pivot_longer(
cols = L:R,
names_to = "Stand",
values_to = "Percentage"
) %>%
ggplot(aes(Pitch_Name, Percentage,
fill = Team)) +
geom_col(position = "dodge") +
coord_polar() +
facet_wrap(~ Stand, ncol = 2) +
scale_fill_manual(values = c(LA, TB)) +
ggtitle("Left-Handed Pitchers") +
centertitle()
```
Right-handers:
```{r}
d %>%
filter(p_throws == "R") %>%
tabyl(Pitch_Name, stand, pitch_team) -> d2
d2 %>%
adorn_percentages("col") %>%
adorn_rounding(3) -> d3
d3[[1]] %>%
mutate(Team = "LAD") -> d4a
d3[[2]] %>%
mutate(Team = "TB") -> d4b
d4 <- rbind(d4a, d4b)
d4 %>%
pivot_longer(
cols = L:R,
names_to = "Stand",
values_to = "Percentage"
) %>%
ggplot(aes(Pitch_Name, Percentage,
fill = Team)) +
geom_col(position = "dodge") +
coord_polar() +
facet_wrap(~ Stand, ncol = 2) +
scale_fill_manual(values = c(LA, TB)) +
ggtitle("Right-Handed Pitchers") +
centertitle()
```
Here are the associated tables:
Pitch types of left-handed pitchers:
```{r}
d %>%
filter(p_throws == "L") %>%
tabyl(Pitch_Name, stand, pitch_team) %>%
adorn_percentages("col") %>%
adorn_rounding(3)
```
Pitch types of right-handed pitchers:
```{r}
d %>%
filter(p_throws == "R") %>%
tabyl(Pitch_Name, stand, pitch_team) %>%
adorn_percentages("col") %>%
adorn_rounding(3)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment