Created
October 31, 2020 12:19
-
-
Save bayesball/5fa81267e7810091d61d5b614ee3ea28 to your computer and use it in GitHub Desktop.
Using tabyls() function to explore pitch selection in the 2020 World Series
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "tabyls" | |
author: "Jim Albert" | |
date: "10/27/2020" | |
output: html_document | |
--- | |
```{r setup, include=FALSE} | |
knitr::opts_chunk$set(echo = TRUE, | |
warning = FALSE, | |
message = FALSE) | |
``` | |
#### Introduction | |
- Much of baseball data consists of counts | |
- Creating tables (one-way, 2-way, 3-way) | |
- Graphing the counts or percentages | |
- Base ```table()``` function is incompatible with tidyverse | |
- Can use ```count()``` option with ```group_by()``` | |
- Idea is to introduce ```tabyl()``` function which is part of the ```janitor``` package | |
#### Load five packages | |
```{r} | |
library(tidyverse) | |
library(janitor) | |
library(teamcolors) | |
library(ProbBayes) | |
library(gridExtra) | |
``` | |
#### Overall pitch types for 2020 pitchers | |
Let's explore pitch type use among all pitchers | |
in 2020 season. | |
Read in data and combine "Curveball" and "Knuckle Curve" categories into a single "CurveBall". | |
```{r} | |
sc2020final <- read_csv("~/Dropbox/2020 WORK/statcast2020/sc2020final.csv") | |
sc2020final %>% | |
mutate(Pitch_Name = | |
ifelse(pitch_name %in% | |
c("Curveball", "Knuckle Curve"), | |
"CurveBall", pitch_name)) -> | |
sc2020final | |
``` | |
Graph of all pitch types. | |
```{r} | |
sc2020final %>% | |
tabyl(Pitch_Name) %>% | |
ggplot(aes(Pitch_Name, percent)) + | |
geom_col(fill = "red", | |
color = "white") + | |
coord_polar() | |
``` | |
Alternative display not using polar coordinates. | |
```{r} | |
sc2020final %>% | |
tabyl(Pitch_Name) %>% | |
ggplot(aes(Pitch_Name, percent)) + | |
geom_col(fill = "red", | |
color = "white") | |
``` | |
Table of fractions of pitch type facing left and right-handed hitters. | |
```{r} | |
sc2020final %>% | |
tabyl(Pitch_Name, stand) %>% | |
adorn_percentages("col") %>% | |
adorn_rounding(3) | |
``` | |
Graph of pitch types faceted by side of batter. | |
```{r} | |
sc2020final %>% | |
tabyl(Pitch_Name, stand) %>% | |
adorn_percentages("col") %>% | |
pivot_longer( | |
cols = L:R, | |
names_to = "Stand", | |
values_to = "Percentage" | |
) %>% | |
ggplot(aes(Pitch_Name, Percentage)) + | |
geom_col(fill = "yellow", | |
color = "brown") + | |
coord_polar() + | |
facet_wrap(~ Stand, ncol = 2) + | |
ggtitle("All Pitchers") + | |
centertitle() | |
``` | |
Now let's break down this comparison by arm of pitcher. | |
```{r} | |
sc2020final %>% | |
filter(p_throws == "L") %>% | |
tabyl(Pitch_Name, stand) %>% | |
adorn_percentages("col") %>% | |
adorn_rounding(3) | |
``` | |
```{r} | |
sc2020final %>% | |
filter(p_throws == "R") %>% | |
tabyl(Pitch_Name, stand) %>% | |
adorn_percentages("col") %>% | |
adorn_rounding(3) | |
``` | |
Graph for right-handed pitchers. | |
```{r} | |
sc2020final %>% | |
filter(p_throws == "R") %>% | |
tabyl(Pitch_Name, stand) %>% | |
adorn_percentages("col") %>% | |
pivot_longer( | |
cols = L:R, | |
names_to = "Stand", | |
values_to = "Percentage" | |
) %>% | |
ggplot(aes(Pitch_Name, Percentage)) + | |
geom_col(fill = "yellow", | |
color = "brown") + | |
coord_polar() + | |
facet_wrap(~ Stand, ncol = 2) + | |
ggtitle("Right-Handed Pitchers") + | |
centertitle() | |
``` | |
Takeaways for RH pitchers: | |
- To lefthanded batters, for off speed pitches, tend to throw changeups (16%), sliders (13%) and curveballs (12%) with relatively equal frequencies, but changeups are the most common. Throw four seam fastballs about 36% of time. | |
- To righthanded batters, for off speed pitches, tend to throw a lot of sliders (24%) and curveballs (11%) and changeups (6%) are not common. Four seamers are thrown about 33% of time. | |
Graph for left-handed pitchers: | |
```{r} | |
sc2020final %>% | |
filter(p_throws == "L") %>% | |
tabyl(Pitch_Name, stand) %>% | |
adorn_percentages("col") %>% | |
pivot_longer( | |
cols = L:R, | |
names_to = "Stand", | |
values_to = "Percentage" | |
) %>% | |
ggplot(aes(Pitch_Name, Percentage)) + | |
geom_col(fill = "yellow", | |
color = "brown") + | |
coord_polar() + | |
facet_wrap(~ Stand, ncol = 2) + | |
ggtitle("Left-Handed Pitchers") + | |
centertitle() | |
``` | |
Takeaways for LH pitchers: | |
- To lefthanded batters, for off speed pitches, tend to throw sliders (22%), followed by curveballs (12%) and changeups (5%). For fastballs, four seam fastballs and sinkers are common thrown respectively 34% and 20% of time. | |
- To righthanded batters, for off speed pitches, tend to throw curveballs (11%), sliders (12%) and changeups (19%). Four seamers are thrown about 35% of time. | |
#### World Series Data | |
- Read in the 2020 World Series data | |
- Obtained by ```baseballr``` package | |
```{r} | |
d <- read_csv("../2020 playoffs/ws2020.csv") | |
``` | |
Find Team colors using ```teamcolors``` package. | |
```{r} | |
LA <- teamcolors %>% | |
filter(name == "Los Angeles Dodgers") %>% | |
select(primary) %>% pull() | |
TB <- teamcolors %>% | |
filter(name == "Tampa Bay Rays") %>% | |
select(tertiary) %>% pull() | |
``` | |
#### Basic Use of tably() Function - One Categorical Variable | |
```{r} | |
d %>% | |
tabyl(pitch_name) | |
``` | |
#### Some Recoding | |
Let's recode the pitch name variable, so Curveball and Knuckle Curve are both curve balls. | |
```{r} | |
d %>% | |
mutate(Pitch_Name = | |
ifelse(pitch_name %in% | |
c("Curveball", "Knuckle Curve"), | |
"CurveBall", pitch_name)) -> d | |
``` | |
Redo earlier table: | |
```{r} | |
d %>% | |
tabyl(Pitch_Name) | |
``` | |
#### Graph output: | |
```{r} | |
d %>% | |
tabyl(Pitch_Name) %>% | |
ggplot(aes(Pitch_Name, percent)) + | |
geom_col() | |
``` | |
#### Two Categorical Variables | |
Identify team that is pitching: | |
```{r} | |
d %>% | |
mutate(pitch_team = ifelse(inning_topbot == "Top", | |
home_team, away_team)) -> d | |
``` | |
Tabulate by pitch name and pitching team: | |
```{r} | |
(d %>% | |
tabyl(Pitch_Name, pitch_team) -> d1) | |
``` | |
#### Graph | |
- Basic barplot, facetted by Team variable. | |
```{r} | |
d1 %>% | |
pivot_longer( | |
cols = LAD:TB, | |
names_to = "Team", | |
values_to = "Count" | |
) %>% | |
ggplot(aes(Pitch_Name, Count)) + | |
geom_col() + | |
facet_wrap(~ Team, ncol = 1) + | |
coord_flip() | |
``` | |
- Better to use "dodge" positioning: | |
```{r} | |
d1 %>% | |
pivot_longer( | |
cols = LAD:TB, | |
names_to = "Team", | |
values_to = "Count" | |
) %>% | |
ggplot(aes(Pitch_Name, Count, | |
fill = Team)) + | |
geom_col(position = "dodge") + | |
coord_flip() + | |
scale_fill_manual(values = c(LA, TB)) | |
``` | |
- Pie graphs? | |
```{r} | |
d1 %>% | |
pivot_longer( | |
cols = LAD:TB, | |
names_to = "Team", | |
values_to = "Count" | |
) %>% | |
ggplot(aes(Pitch_Name, Count, | |
fill = Team)) + | |
geom_col(position = "dodge") + | |
coord_polar() + | |
scale_fill_manual(values = c(LA, TB)) | |
``` | |
#### Three Way tables | |
Categorize by pitching team, pitch type and stand of batter. | |
```{r} | |
(d %>% | |
tabyl(Pitch_Name, stand, pitch_team) -> d2) | |
``` | |
#### adorn Functions | |
- ```adorn_totals()``` - add row and/or col totals | |
```{r} | |
d %>% | |
tabyl(Pitch_Name, pitch_team) %>% | |
adorn_totals(c("row", "col")) | |
``` | |
- ```adorn_percentages()``` - change to row or col percentages | |
```{r} | |
d %>% | |
tabyl(Pitch_Name, pitch_team) %>% | |
adorn_percentages("col") | |
``` | |
- ```adorn_pct_formatting()``` - add formatting to percentages | |
```{r} | |
d %>% | |
tabyl(Pitch_Name, pitch_team) %>% | |
adorn_percentages("col") %>% | |
adorn_pct_formatting() | |
``` | |
- ```adorn_rounding()``` - round tabular output | |
```{r} | |
d %>% | |
tabyl(Pitch_Name, pitch_team) %>% | |
adorn_percentages("col") %>% | |
adorn_rounding(2) | |
``` | |
- ```adorn_ns()``` -- add counts to percentages | |
```{r} | |
d %>% | |
tabyl(Pitch_Name, pitch_team) %>% | |
adorn_percentages("col") %>% | |
adorn_rounding(3) %>% | |
adorn_ns() | |
``` | |
- ```adorn_title()``` - add title | |
```{r} | |
d %>% | |
tabyl(Pitch_Name, pitch_team) %>% | |
adorn_percentages("col") %>% | |
adorn_rounding(3) %>% | |
adorn_ns() %>% | |
adorn_title("top", "PITCH TYPE") | |
``` | |
#### adorn() function on 3-way table | |
```{r} | |
(d2 %>% | |
adorn_percentages("col") %>% | |
adorn_rounding(3) -> d3) | |
``` | |
#### Graph of a three-way table? | |
- facet by side of batter | |
- pie chart comparing percentages | |
```{r} | |
d3[[1]] %>% | |
mutate(Team = "LAD") -> d4a | |
d3[[2]] %>% | |
mutate(Team = "TB") -> d4b | |
(d4 <- rbind(d4a, d4b)) | |
``` | |
```{r} | |
d4 %>% | |
pivot_longer( | |
cols = L:R, | |
names_to = "Stand", | |
values_to = "Percentage" | |
) %>% | |
ggplot(aes(Pitch_Name, Percentage, | |
fill = Team)) + | |
geom_col(position = "dodge") + | |
coord_polar() + | |
facet_wrap(~ Stand, ncol = 2) + | |
scale_fill_manual(values = c(LA, TB)) | |
``` | |
#### Breakdown by Side of Pitching Arm | |
Left-handers: | |
```{r} | |
d %>% | |
filter(p_throws == "L") %>% | |
tabyl(Pitch_Name, stand, pitch_team) -> d2 | |
d2 %>% | |
adorn_percentages("col") %>% | |
adorn_rounding(3) -> d3 | |
d3[[1]] %>% | |
mutate(Team = "LAD") -> d4a | |
d3[[2]] %>% | |
mutate(Team = "TB") -> d4b | |
d4 <- rbind(d4a, d4b) | |
d4 %>% | |
pivot_longer( | |
cols = L:R, | |
names_to = "Stand", | |
values_to = "Percentage" | |
) %>% | |
ggplot(aes(Pitch_Name, Percentage, | |
fill = Team)) + | |
geom_col(position = "dodge") + | |
coord_polar() + | |
facet_wrap(~ Stand, ncol = 2) + | |
scale_fill_manual(values = c(LA, TB)) + | |
ggtitle("Left-Handed Pitchers") + | |
centertitle() | |
``` | |
Right-handers: | |
```{r} | |
d %>% | |
filter(p_throws == "R") %>% | |
tabyl(Pitch_Name, stand, pitch_team) -> d2 | |
d2 %>% | |
adorn_percentages("col") %>% | |
adorn_rounding(3) -> d3 | |
d3[[1]] %>% | |
mutate(Team = "LAD") -> d4a | |
d3[[2]] %>% | |
mutate(Team = "TB") -> d4b | |
d4 <- rbind(d4a, d4b) | |
d4 %>% | |
pivot_longer( | |
cols = L:R, | |
names_to = "Stand", | |
values_to = "Percentage" | |
) %>% | |
ggplot(aes(Pitch_Name, Percentage, | |
fill = Team)) + | |
geom_col(position = "dodge") + | |
coord_polar() + | |
facet_wrap(~ Stand, ncol = 2) + | |
scale_fill_manual(values = c(LA, TB)) + | |
ggtitle("Right-Handed Pitchers") + | |
centertitle() | |
``` | |
Here are the associated tables: | |
Pitch types of left-handed pitchers: | |
```{r} | |
d %>% | |
filter(p_throws == "L") %>% | |
tabyl(Pitch_Name, stand, pitch_team) %>% | |
adorn_percentages("col") %>% | |
adorn_rounding(3) | |
``` | |
Pitch types of right-handed pitchers: | |
```{r} | |
d %>% | |
filter(p_throws == "R") %>% | |
tabyl(Pitch_Name, stand, pitch_team) %>% | |
adorn_percentages("col") %>% | |
adorn_rounding(3) | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment