Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Prediction of 2019 MLB home run total at midseason (through games of June 27)
# 2019 statcast data is in data frame sc2019
# collect number of home runs in each game
library(tidyverse)
sc2019 %>%
group_by(game_pk) %>%
summarize(HR = sum(events == "home_run",
na.rm = TRUE)) -> S
# construct bar graph of home run distribution
library(TeachBayes)
bar_plot(S$HR) +
increasefont() +
ggtitle("Number of Home Runs in a Game - 2019") +
centertitle() +
xlab("HRs in Game")
# played 1211 games so far
# 2430 - 1211 = 1219 remaining games to play
current_hr <- 3311
HR_Predicted <- current_hr +
replicate(10000,
sum(sample(S$HR,
replace = TRUE,
size = 1219)))
# graph predictions and show 95% interval estimate
(pred_limits <- quantile(HR_Predicted, c(0.025, 0.975)))
bar_plot(HR_Predicted) +
increasefont() +
ggtitle("Predicted 2019 Home Runs") +
centertitle() +
geom_vline(xintercept = pred_limits, size = 1.5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.