Skip to content

Instantly share code, notes, and snippets.

@bayesball
Created November 13, 2022 22:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bayesball/3595f541b57eed6759326d0aa27ce8b0 to your computer and use it in GitHub Desktop.
Save bayesball/3595f541b57eed6759326d0aa27ce8b0 to your computer and use it in GitHub Desktop.
R code to fit nonnested multilevel model to compare the roles of offense and defense in baseball run scoring
fit_model <- function(season){
# required packages
require(dplyr)
require(readr)
require(lme4)
# read in retrosheet game logs for that season
# from my hard drive
# you can obtain these files from retrosheet.org
name1 <- "~/Dropbox/Google Drive/gamelogs/gamelogs/gl"
file_name <- paste(name1, season, ".txt", sep = "")
gl_season <- read_csv(file_name)
# create response and off, def team variables
gl_season %>%
mutate(RootRuns = sqrt(HomeRunsScore),
OffensiveTeam = HomeTeam,
DefensiveTeam = VisitingTeam) %>%
select(RootRuns, OffensiveTeam,
DefensiveTeam) -> d1
# again for visiting runs scored
gl_season %>%
mutate(RootRuns = sqrt(VisitorRunsScored),
OffensiveTeam = VisitingTeam,
DefensiveTeam = HomeTeam) %>%
select(RootRuns, OffensiveTeam,
DefensiveTeam) -> d2
# row merge two datasets
d12 <- rbind(d1, d2)
# random effects model fit
fit <- lmer(RootRuns ~ (1 | OffensiveTeam) +
(1 | DefensiveTeam),
data = d12)
# collect sd estimates
vcomp <- VarCorr(fit)
sd_off <- sqrt(vcomp$OffensiveTeam[1, 1])
sd_def <- sqrt(vcomp$DefensiveTeam[1, 1])
# return data frame
data.frame(Season = season,
Offensive = sd_off,
Defense = sd_def)
}
fit_model_home <- function(season){
# required packages
require(dplyr)
require(readr)
require(lme4)
# read in retrosheet game logs for that season
# from my hard drive
# you can obtain these files from retrosheet.org
name1 <- "~/Dropbox/Google Drive/gamelogs/gamelogs/gl"
file_name <- paste(name1, season, ".txt", sep = "")
gl_season <- read_csv(file_name)
# create response and off, def team variables
gl_season %>%
mutate(RootRuns = sqrt(HomeRunsScore),
OffensiveTeam = HomeTeam,
DefensiveTeam = VisitingTeam,
Home = "yes") %>%
select(RootRuns, OffensiveTeam,
DefensiveTeam, Home) -> d1
# again for visiting runs scored
gl_season %>%
mutate(RootRuns = sqrt(VisitorRunsScored),
OffensiveTeam = VisitingTeam,
DefensiveTeam = HomeTeam,
Home = "no") %>%
select(RootRuns, OffensiveTeam,
DefensiveTeam, Home) -> d2
# row merge two datasets
d12 <- rbind(d1, d2)
# random effects model fit
fit <- lmer(RootRuns ~ (1 | OffensiveTeam) +
(1 | DefensiveTeam) +
Home,
data = d12)
# collect sd estimates
vcomp <- VarCorr(fit)
sd_off <- sqrt(vcomp$OffensiveTeam[1, 1])
sd_def <- sqrt(vcomp$DefensiveTeam[1, 1])
beta <- coef(fit)
# return data frame
data.frame(Season = season,
Offensive = sd_off,
Defense = sd_def,
Home = beta$OffensiveTeam[1, 2])
}
# load in required packages
library(purrr)
library(ggplot2)
library(tidyr)
library(dplyr)
# read in modeling functions
source("fit_model.R")
source("fit_model_home.R")
# run this function for seasons 1970-2021 (omitting 2020)
# collecting sd estimates
out <- map_df(c(1970:2019, 2021), fit_model)
# one plot -- plot of offensive and defensive estimates
# first pivot longer the output data frame
out %>%
pivot_longer(
cols = Offensive:Defense,
names_to = "Type",
values_to = "SD"
) -> out1
# here is the plot
ggplot(out1, aes(Season, SD,
color = Type)) +
geom_point() +
geom_smooth(se = FALSE,
method = "loess",
formula = "y ~ x") +
theme(text=element_text(size=18)) +
theme(plot.title = element_text(colour = "blue",
size = 18,
hjust = 0.5,
vjust = 0.8, angle = 0)) +
ggtitle("Random Effect SD Estimates: 1970-2019, 2021")
# another plot - plot of ratio
# of sd estimates
ggplot(out, aes(Season,
Defense / Offensive)) +
geom_point() +
geom_smooth(se = FALSE,
method = "loess",
formula = "y ~ x") +
geom_hline(yintercept = 1, color = "red") +
theme(text=element_text(size=18)) +
theme(plot.title = element_text(colour = "blue",
size = 18,
hjust = 0.5,
vjust = 0.8, angle = 0)) +
ggtitle("Ratio of Defensive to Offensive SD Estimates: 1970-2019, 2021")
###### look at home effects
out_home <- map_df(c(1970:2019, 2021), fit_model_home)
ggplot(out_home, aes(Season, Home)) +
geom_point() +
geom_smooth(se = FALSE,
method = "loess",
formula = "y ~ x") +
theme(text=element_text(size=18)) +
ylab("Home Coefficient") +
theme(plot.title = element_text(colour = "blue",
size = 18,
hjust = 0.5,
vjust = 0.8, angle = 0)) +
ggtitle("Home Park Effects: 1970-2019, 2021")
@GeorgeMcGinn
Copy link

I ran your code and it fails with ! object 'HomeRunsScore' not found. I am assuming that in your datasets in ~/Dropbox/Google Drive/gamelogs/gamelogs/gl you've added the headers for your .csv files. I am wondering if you can post that here as a file (I plan to add a col_names = pull( ...` to my code to add the header names so the objects in the source can be found. Thanks.

@bayesball
Copy link
Author

bayesball commented Nov 17, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment