wessonmo/qb_hits.md

## qb_hits.md

      
    Raw
  

              qb_hits.md
            
          
    An Alternate Approach to Measuring the Effect of Accumulated QB Hits on Passing Performance

This analysis is a follow-up to @benbbaldwin's post The accumulation of QB hits vs passing efficiency where his conclusion was that:

[For] a given level of hits, there is no evidence that the accumulation of hits makes any difference throughout the course of a game.

As the title suggests, this analysis will take a different approach to measuring the hypothesized effect.
1. Data

Here is the code to get the passing data we'll be using from nflscrapR-data:
library(tidyverse)
library(dplyr)
library(MLmetrics)
library(lme4)


passes <- data.frame()
for (year in c(2009:2018)){
  pbp_path <- paste0(
    'https://github.com/ryurko/nflscrapR-data/raw/master/play_by_play_data/regular_season/reg_pbp_',
    year,
    '.csv'
    )
  
  season_passes <- read_csv(url(pbp_path)) %>% 
    filter(!is.na(epa), !is.na(ep)) %>%
    mutate(
      pass = ifelse(
        play_type == 'pass'
        | (play_type %in% c('no_play', 'run') & str_detect(desc, "((?<!Backwards )pass)|(sacked)|(scramble)")),
        1, 0)
      ) %>%
    filter(pass == 1, !is.na(down)) %>%
    mutate(season = year) %>%
    select(season, game_id, game_date, game_seconds_remaining, game_half, half_seconds_remaining,
           score_differential, play_id, posteam, posteam_type, defteam, play_type, passer_player_id,
           passer_player_name, qb_hit, sack, down, ydstogo, yardline_100, ep, epa, desc)
    
  passes <- rbind(passes, season_passes)
  rm(season_passes)
}
Ben does his analysis at the team level but this analysis is going to be at the player level. As such, we want as little missing data in the passer_player_id column as possible.
nrow(passes %>% filter(is.na(passer_player_id)))/nrow(passes)
# [1] 0.08276332
Out of the box, about 8% of the passing plays in the dataset do not have an associated passer_player_id, however, most of the play descriptions mention a passer. So, it's just a matter of parsing the names from the descriptions.
To do this, we will borrow from Ben's Basic nflscrapR tutorial the regex string used to extract the passer name from the play descriptions.
passer_name_re <- paste0(
  '(?<=\\s)',
  '[A-Z][a-z]*\\.\\s?[A-Z][A-z]+(\\s(I{2,3})|(IV))?', 
  '(?=\\s((pass)|(sack)|(scramble)))'
  )
passes <- passes %>%
  mutate(
    missing_passer_name = ifelse(
      is.na(passer_player_name),
      str_extract(desc, passer_name_re),
      NA_character_
      )
    )

passer_ids <- passes %>%
  filter(!is.na(passer_player_id)) %>%
  distinct(season, posteam, passer_player_name, passer_player_id) %>%
  rename(
    missing_passer_name = passer_player_name,
    missing_passer_id = passer_player_id
    )

passes <- passes %>%
  left_join(passer_ids, by = c('season', 'posteam', 'missing_passer_name')) %>%
  mutate(
    passer_player_name = ifelse(!is.na(passer_player_name), passer_player_name, missing_passer_name),
    passer_player_id = ifelse(!is.na(passer_player_id), passer_player_id, missing_passer_id)
  ) %>%
  select(-missing_passer_name, -missing_passer_id)
Once we get down to an acceptable number, we can just filter the missing plays out.
nrow(passes %>% filter(is.na(passer_player_id)))/nrow(passes)
[1] 0.0006718434

passes <- passes %>%
  filter(!is.na(passer_player_id))
Finally, we'll need to create the cumulative hit variables. I also want to look at cumulative sacks and non-sack hits, so we'll create accumulated variables for them, too.
passes <- passes %>%
  arrange(game_id, passer_player_id, play_id) %>%
  group_by(game_id, passer_player_id) %>%
  mutate(
    cum_hits = cumsum(qb_hit),
    cum_non_sack_hits = cumsum(ifelse(sack == 0 & qb_hit == 1, 1, 0)),
    cum_sacks = cumsum(sack)
  ) %>%
  ungroup() %>%
  mutate(
    cum_hits = cum_hits - ifelse(qb_hit == 1, 1, 0),
    cum_sacks = cum_sacks - ifelse(sack == 1, 1, 0),
    cum_non_sack_hits = cum_non_sack_hits - ifelse(sack == 0 & qb_hit == 1, 1, 0)
  )
2. Analysis

In Ben's analysis, he showed the relationship between mean_epa and total_hits at a game level:
#    total_hits mean_epa games
#         <int>    <dbl> <int>
#  1          0  0.256     120
#  2          1  0.214     354
#  3          2  0.166     596
#  4          3  0.112     762
#  5          4  0.0758    771
#  6          5  0.0353    754
#  7          6  0.0202    557
#  8          7 -0.00527   428
#  9          8 -0.0383    298
# 10          9 -0.0681    192
# 11         10 -0.0463    116
# 12         11 -0.0795     81
# 13         12 -0.162      38
# 14         13 -0.124      30
# 15         14 -0.0417     11
# 16         15 -0.242       6
# 17         16 -0.184       4
He goes on to say:

This is picking up, in part, a game script effect, where overmatched teams fall behind early and are forced to pass a lot, resulting in their QB being hit more often...We want to create a level playing field.

In our analysis, we will certainly want to account for the fact that more passes result in more (opportunities for) hits. Ben also mentions teams that are overmatched, so maybe we can account for team offensive and defensive strengths, as well.
Approach

In order to account for "more passes = more hits", Ben aggregates team-game performances by the total number of hits. My approach will utilize regression models where individual passes are the observations, the resulting epa values are the target, and aggregation is, therefore, unnecessary.
Linear Models

Let's build a base model using the context in the play-by-play. In order to test the validity of the model, we'll split our dataset into train and test sets.
smp_size <- floor(0.65 * nrow(passes))
set.seed(69)
train_ind <- sample(seq_len(nrow(passes)), size = smp_size)

train <- passes[train_ind, ]
test <- passes[-train_ind, ]

base_formula <- epa ~ posteam_type + down_fac*ydstogo + yardline_100 + ep
base_fit <- lm(base_formula, train)
summary(base_fit)
# Call:
# lm(formula = base_formula, data = train)
# 
# Residuals:
#      Min       1Q   Median       3Q      Max 
# -12.5282  -0.8374  -0.2543   0.9490   9.2732 
# 
# Coefficients:
#                     Estimate Std. Error t value Pr(>|t|)    
# (Intercept)        1.4120844  0.0718716  19.647  < 2e-16 ***
# posteam_typehome   0.0429638  0.0089117   4.821 1.43e-06 ***
# down_fac2         -0.0589676  0.0388494  -1.518   0.1291    
# down_fac3         -0.2761082  0.0390585  -7.069 1.57e-12 ***
# down_fac4         -0.3774475  0.0692393  -5.451 5.01e-08 ***
# ydstogo           -0.0158859  0.0033357  -4.762 1.92e-06 ***
# yardline_100      -0.0129320  0.0006947 -18.616  < 2e-16 ***
# ep                -0.2252849  0.0104915 -21.473  < 2e-16 ***
# down_fac2:ydstogo -0.0060017  0.0037805  -1.588   0.1124    
# down_fac3:ydstogo -0.0076389  0.0037465  -2.039   0.0415 *  
# down_fac4:ydstogo -0.0518295  0.0076304  -6.793 1.11e-11 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 1.625 on 133070 degrees of freedom
# Multiple R-squared:  0.005654,	Adjusted R-squared:  0.005579 
# F-statistic: 75.66 on 10 and 133070 DF,  p-value: < 2.2e-16

R2_Score(predict(base_fit, test), test$epa)
# [1] 0.005795776
While the overall R2 obviously is not great, it is still better than nothing. Now we can test our accumulated hit variables. Starting with cum_hits:
hits_formula <- epa ~ posteam_type + down_fac*ydstogo + yardline_100 + ep + cum_hits
hits_fit <- lm(hits_formula, train)
summary(hits_fit)
# Call:
# lm(formula = hits_formula, data = train)
# 
# Residuals:
#      Min       1Q   Median       3Q      Max 
# -12.5375  -0.8386  -0.2542   0.9479   9.2702 
# 
# Coefficients:
#                     Estimate Std. Error t value Pr(>|t|)    
# (Intercept)        1.4376849  0.0725113  19.827  < 2e-16 ***
# posteam_typehome   0.0415764  0.0089268   4.658 3.20e-06 ***
# down_fac2         -0.0606015  0.0388534  -1.560  0.11882    
# down_fac3         -0.2798543  0.0390830  -7.161 8.08e-13 ***
# down_fac4         -0.3781659  0.0692383  -5.462 4.72e-08 ***
# ydstogo           -0.0159250  0.0033356  -4.774 1.81e-06 ***
# yardline_100      -0.0130787  0.0006969 -18.768  < 2e-16 ***
# ep                -0.2272435  0.0105170 -21.607  < 2e-16 ***
# cum_hits          -0.0052878  0.0019876  -2.660  0.00781 ** 
# down_fac2:ydstogo -0.0059233  0.0037806  -1.567  0.11717    
# down_fac3:ydstogo -0.0074938  0.0037468  -2.000  0.04550 *  
# down_fac4:ydstogo -0.0511072  0.0076350  -6.694 2.18e-11 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 1.625 on 133069 degrees of freedom
# Multiple R-squared:  0.005707,	Adjusted R-squared:  0.005625 
# F-statistic: 69.43 on 11 and 133069 DF,  p-value: < 2.2e-16

R2_Score(predict(hits_fit, test), test$epa)
# [1] 0.005863101
Including cum_hits in the model does give a better fit. Next let's try cum_sacks:
sacks_formula <- epa ~ posteam_type + down_fac*ydstogo + yardline_100 + ep + cum_sacks
sacks_fit <- lm(sacks_formula, train)
summary(sacks_fit)
# Call:
# lm(formula = sacks_formula, data = train)
# 
# Residuals:
#      Min       1Q   Median       3Q      Max 
# -12.5371  -0.8390  -0.2541   0.9480   9.2584 
# 
# Coefficients:
#                     Estimate Std. Error t value Pr(>|t|)    
# (Intercept)        1.4382681  0.0722595  19.904  < 2e-16 ***
# posteam_typehome   0.0420994  0.0089148   4.722 2.33e-06 ***
# down_fac2         -0.0615970  0.0388551  -1.585 0.112900    
# down_fac3         -0.2813389  0.0390856  -7.198 6.14e-13 ***
# down_fac4         -0.3789212  0.0692377  -5.473 4.44e-08 ***
# ydstogo           -0.0159563  0.0033356  -4.784 1.72e-06 ***
# yardline_100      -0.0130707  0.0006958 -18.785  < 2e-16 ***
# ep                -0.2272693  0.0105064 -21.631  < 2e-16 ***
# cum_sacks         -0.0119597  0.0034285  -3.488 0.000486 ***
# down_fac2:ydstogo -0.0058117  0.0037808  -1.537 0.124253    
# down_fac3:ydstogo -0.0073112  0.0037475  -1.951 0.051065 .  
# down_fac4:ydstogo -0.0507428  0.0076364  -6.645 3.05e-11 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 1.625 on 133069 degrees of freedom
# Multiple R-squared:  0.005745,	Adjusted R-squared:  0.005663 
# F-statistic:  69.9 on 11 and 133069 DF,  p-value: < 2.2e-16

R2_Score(predict(sacks_fit, test), test$epa)
# [1] 0.005801378
While cum_sacks does result in a higher R2 for the model, the R2 on the test set is slightly lower. Finally, let's try combining cum_sacks and cum_non_sack_hits:
comb_formula <- epa ~ posteam_type + down_fac*ydstogo + yardline_100 + ep + cum_sacks + cum_non_sack_hits
comb_fit <- lm(comb_formula, train)
summary(comb_fit)
# Call:
# lm(formula = comb_formula, data = train)
# 
# Residuals:
#      Min       1Q   Median       3Q      Max 
# -12.5385  -0.8389  -0.2544   0.9477   9.2600 
# 
# Coefficients:
#                     Estimate Std. Error t value Pr(>|t|)    
# (Intercept)        1.4417973  0.0725317  19.878  < 2e-16 ***
# posteam_typehome   0.0418159  0.0089290   4.683 2.83e-06 ***
# down_fac2         -0.0616384  0.0388553  -1.586  0.11266    
# down_fac3         -0.2816011  0.0390885  -7.204 5.87e-13 ***
# down_fac4         -0.3789051  0.0692379  -5.473 4.44e-08 ***
# ydstogo           -0.0159554  0.0033356  -4.783 1.73e-06 ***
# yardline_100      -0.0130930  0.0006969 -18.787  < 2e-16 ***
# ep                -0.2275456  0.0105180 -21.634  < 2e-16 ***
# cum_sacks         -0.0113891  0.0035755  -3.185  0.00145 ** 
# cum_non_sack_hits -0.0016574  0.0029474  -0.562  0.57389    
# down_fac2:ydstogo -0.0058219  0.0037808  -1.540  0.12360    
# down_fac3:ydstogo -0.0073241  0.0037476  -1.954  0.05066 .  
# down_fac4:ydstogo -0.0507045  0.0076367  -6.640 3.16e-11 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 1.625 on 133068 degrees of freedom
# Multiple R-squared:  0.005747,	Adjusted R-squared:  0.005658 
# F-statistic:  64.1 on 12 and 133068 DF,  p-value: < 2.2e-16

R2_Score(predict(comb_fit, test), test$epa)
# [1] 0.005818752
In addition to all of the R2 values being roughly the same as before, cum_non_sack_hits has a fairly insignificant p-value. Of the three models that we've tested with accumulated hit variables, it is unclear to me which is best (or if any of them are actually better than the base model).
So, let's try adding some more context.
Linear Mixed-Effects Models

Toward the end of his analysis, Ben talks about the relationship between efficiency and number of hits:

The negative relationship between QB hits and efficiency is because the group of teams that get hit often are the only ones to make it to the high numbers of hits. Stated this way, it sounds obvious, but it's important. These teams aren't necessarily inefficient because their QBs are getting hit a lot; but rather, their QBs are getting hit a lot because they're bad teams to begin with.

Maybe we should try to account for just how good each offense and defense are. One way to do this is with linear mixed-effects models. If you are unfamiliar with these types of models, I would reccomend checking out the piece that @StatsByLopez wrote on estimating fumble frequencies during DeflateGate.
In our model(s), we treat all of the variables from our simple linear models as fixed effects and the offenses and defenses as random effects. Offenses will be split by season and QB while defenses will be split by season.
First, we'll build our offensive and defensive ids and resplit our dataset using the same seed as before:
passes <- passes %>%
  mutate(
    offense = as.factor(paste(season, passer_player_id, sep=':')),
    defense = as.factor(paste(season, defteam, sep='_'))
  )

smp_size <- floor(0.65 * nrow(passes))
set.seed(69)
train_ind <- sample(seq_len(nrow(passes)), size = smp_size)

train <- passes[train_ind, ]
test <- passes[-train_ind, ]
Now, we can build our mixed-effects models. Building a linear mixed-effects model with lme4 is very similar to building a linear model with base R. The only major difference is in the specification of the random effects terms.
Before we look at the accumulated hit variables again, we'll start with a base model:
base_formula_mm <- epa ~ posteam_type + down_fac*ydstogo + yardline_100 + ep + (1|offense) + (1|defense)
base_fit_mm <- lmer(base_formula_mm, train)
summary(base_fit_mm)
# Linear mixed model fit by REML ['lmerMod']
# Formula: epa ~ posteam_type + down_fac * ydstogo + yardline_100 + ep +      (1 | offense) + (1 | defense)
#    Data: train
# 
# REML criterion at convergence: 506598
# 
# Scaled residuals: 
#     Min      1Q  Median      3Q     Max 
# -7.5730 -0.5203 -0.1509  0.5804  5.6803 
# 
# Random effects:
#  Groups   Name        Variance Std.Dev.
#  offense  (Intercept) 0.016764 0.12948 
#  defense  (Intercept) 0.004687 0.06846 
#  Residual             2.620479 1.61879 
# Number of obs: 133081, groups:  offense, 870; defense, 321
# 
# Fixed effects:
#                    Estimate Std. Error t value
# (Intercept)        1.387130   0.072148  19.226
# posteam_typehome   0.046873   0.008936   5.245
# down_fac2         -0.064708   0.038772  -1.669
# down_fac3         -0.279516   0.038991  -7.169
# down_fac4         -0.372395   0.069123  -5.387
# ydstogo           -0.016309   0.003329  -4.898
# yardline_100      -0.012859   0.000694 -18.529
# ep                -0.229296   0.010485 -21.870
# down_fac2:ydstogo -0.005383   0.003772  -1.427
# down_fac3:ydstogo -0.006852   0.003739  -1.833
# down_fac4:ydstogo -0.048603   0.007617  -6.381
# 
# Correlation of Fixed Effects:
#             (Intr) pstm_t dwn_f2 dwn_f3 dwn_f4 ydstog yr_100 ep     dwn_2: dwn_3:
# postm_typhm -0.056                                                               
# down_fac2   -0.498 -0.006                                                        
# down_fac3   -0.626 -0.005  0.801                                                 
# down_fac4   -0.558  0.003  0.472  0.525                                          
# ydstogo     -0.531 -0.006  0.866  0.878  0.512                                   
# yardlin_100 -0.858 -0.001  0.078  0.221  0.358  0.058                            
# ep          -0.869 -0.008  0.100  0.251  0.372  0.100  0.961                     
# dwn_fc2:yds  0.367  0.006 -0.950 -0.740 -0.407 -0.863  0.052  0.033              
# dwn_fc3:yds  0.329  0.004 -0.752 -0.897 -0.393 -0.867  0.099  0.081  0.771       
# dwn_fc4:yds  0.250  0.000 -0.378 -0.386 -0.767 -0.434 -0.052 -0.062  0.374  0.375

R2_Score(predict(base_fit_mm, test, allow.new.levels = TRUE), test$epa)
# [1] 0.008938265
While the summaries for lmer objects don't provide R2 values for models, we can see the vast improvement in our test R2.
Once again, let's add in cum_hits:
hits_formula_mm <- epa ~ posteam_type + down_fac*ydstogo + yardline_100 + ep + cum_hits +
  (1|offense) + (1|defense)
hits_fit_mm <- lmer(hits_formula_mm, train)
summary(hits_fit_mm)
# Linear mixed model fit by REML ['lmerMod']
# Formula: epa ~ posteam_type + down_fac * ydstogo + yardline_100 + ep +      cum_hits + (1 | offense) + (1 | defense)
#    Data: train
# 
# REML criterion at convergence: 506607
# 
# Scaled residuals: 
#     Min      1Q  Median      3Q     Max 
# -7.5760 -0.5205 -0.1510  0.5805  5.6804 
# 
# Random effects:
#  Groups   Name        Variance Std.Dev.
#  offense  (Intercept) 0.016658 0.12907 
#  defense  (Intercept) 0.004625 0.06801 
#  Residual             2.620538 1.61881 
# Number of obs: 133081, groups:  offense, 870; defense, 321
# 
# Fixed effects:
#                     Estimate Std. Error t value
# (Intercept)        1.3995570  0.0728081  19.223
# posteam_typehome   0.0461532  0.0089537   5.155
# down_fac2         -0.0654363  0.0387764  -1.688
# down_fac3         -0.2812497  0.0390160  -7.209
# down_fac4         -0.3727959  0.0691236  -5.393
# ydstogo           -0.0163275  0.0033296  -4.904
# yardline_100      -0.0129301  0.0006963 -18.571
# ep                -0.2302172  0.0105107 -21.903
# cum_hits          -0.0025855  0.0020533  -1.259
# down_fac2:ydstogo -0.0053506  0.0037725  -1.418
# down_fac3:ydstogo -0.0067897  0.0037394  -1.816
# down_fac4:ydstogo -0.0482605  0.0076223  -6.331
# 
# Correlation of Fixed Effects:
#             (Intr) pstm_t dwn_f2 dwn_f3 dwn_f4 ydstog yr_100 ep     cm_hts dwn_2: dwn_3:
# postm_typhm -0.064                                                                      
# down_fac2   -0.495 -0.005                                                               
# down_fac3   -0.625 -0.003  0.801                                                        
# down_fac4   -0.553  0.004  0.472  0.525                                                 
# ydstogo     -0.527 -0.006  0.866  0.877  0.512                                          
# yardlin_100 -0.859  0.004  0.079  0.223  0.357  0.058                                   
# ep          -0.869 -0.004  0.101  0.253  0.371  0.100  0.962                            
# cum_hits    -0.135  0.063  0.015  0.036  0.005  0.005  0.081  0.071                     
# dwn_fc2:yds  0.364  0.005 -0.950 -0.740 -0.407 -0.863  0.051  0.032 -0.007              
# dwn_fc3:yds  0.328  0.004 -0.752 -0.897 -0.393 -0.867  0.098  0.079 -0.014  0.771       
# dwn_fc4:yds  0.253 -0.002 -0.379 -0.387 -0.767 -0.434 -0.055 -0.064 -0.037  0.374  0.375

R2_Score(predict(hits_fit_mm, test, allow.new.levels = TRUE), test$epa)
# [1] 0.008952836
Adding add cum_hits does result in a slight improvement over our base mixed-effects model's test R2. Note that a t value of -1.259 with 133000 df is roughly equivalent to a p value of 0.2.
Now for cum_sacks:
sacks_formula_mm <- epa ~ posteam_type + down_fac*ydstogo + yardline_100 + ep + cum_sacks +
  (1|offense) + (1|defense)
sacks_fit_mm <- lmer(sacks_formula_mm, train)
summary(sacks_fit_mm)
# Linear mixed model fit by REML ['lmerMod']
# Formula: epa ~ posteam_type + down_fac * ydstogo + yardline_100 + ep +      cum_sacks + (1 | offense) + (1 | defense)
#    Data: train
# 
# REML criterion at convergence: 506605.2
# 
# Scaled residuals: 
#     Min      1Q  Median      3Q     Max 
# -7.5764 -0.5204 -0.1504  0.5804  5.6773 
# 
# Random effects:
#  Groups   Name        Variance Std.Dev.
#  offense  (Intercept) 0.016586 0.12879 
#  defense  (Intercept) 0.004606 0.06787 
#  Residual             2.620561 1.61881 
# Number of obs: 133081, groups:  offense, 870; defense, 321
# 
# Fixed effects:
#                     Estimate Std. Error t value
# (Intercept)        1.3988302  0.0725516  19.280
# posteam_typehome   0.0464523  0.0089400   5.196
# down_fac2         -0.0657950  0.0387790  -1.697
# down_fac3         -0.2817349  0.0390198  -7.220
# down_fac4         -0.3730822  0.0691244  -5.397
# ydstogo           -0.0163379  0.0033296  -4.907
# yardline_100      -0.0129211  0.0006952 -18.586
# ep                -0.2301406  0.0105002 -21.918
# cum_sacks         -0.0053061  0.0035276  -1.504
# down_fac2:ydstogo -0.0053076  0.0037728  -1.407
# down_fac3:ydstogo -0.0067216  0.0037401  -1.797
# down_fac4:ydstogo -0.0481505  0.0076236  -6.316
# 
# Correlation of Fixed Effects:
#             (Intr) pstm_t dwn_f2 dwn_f3 dwn_f4 ydstog yr_100 ep     cm_sck dwn_2: dwn_3:
# postm_typhm -0.059                                                                      
# down_fac2   -0.497 -0.005                                                               
# down_fac3   -0.626 -0.004  0.801                                                        
# down_fac4   -0.555  0.004  0.472  0.525                                                 
# ydstogo     -0.529 -0.006  0.866  0.877  0.512                                          
# yardlin_100 -0.858  0.001  0.079  0.223  0.358  0.058                                   
# ep          -0.869 -0.007  0.101  0.253  0.371  0.100  0.961                            
# cum_sacks   -0.106  0.030  0.019  0.038  0.006  0.006  0.059  0.055                     
# dwn_fc2:yds  0.366  0.005 -0.950 -0.740 -0.407 -0.863  0.051  0.032 -0.014              
# dwn_fc3:yds  0.329  0.004 -0.752 -0.897 -0.393 -0.867  0.097  0.079 -0.024  0.771       
# dwn_fc4:yds  0.253 -0.001 -0.379 -0.387 -0.767 -0.434 -0.054 -0.064 -0.041  0.375  0.376
# 
R2_Score(predict(sacks_fit_mm, test, allow.new.levels = TRUE), test$epa)
# [1] 0.008908527
The test R2 drops below what we were getting in our base model while the t value for cum_sacks is roughly equivalent to a p value of 0.13.
Let's finish off with cum_sacks + cum_non_sack_hits:
comb_formula_mm <- epa ~ posteam_type + down_fac*ydstogo + yardline_100 + ep + cum_sacks +
  cum_non_sack_hits + (1|offense) + (1|defense)
comb_fit_mm <- lmer(comb_formula_mm, train)
summary(comb_fit_mm)
# Linear mixed model fit by REML ['lmerMod']
# Formula: epa ~ posteam_type + down_fac * ydstogo + yardline_100 + ep +      cum_sacks + cum_non_sack_hits + (1 | offense) + (1 | defense)
#    Data: train
# 
# REML criterion at convergence: 506614.7
# 
# Scaled residuals: 
#     Min      1Q  Median      3Q     Max 
# -7.5769 -0.5205 -0.1508  0.5805  5.6784 
# 
# Random effects:
#  Groups   Name        Variance Std.Dev.
#  offense  (Intercept) 0.016585 0.12878 
#  defense  (Intercept) 0.004601 0.06783 
#  Residual             2.620579 1.61882 
# Number of obs: 133081, groups:  offense, 870; defense, 321
# 
# Fixed effects:
#                     Estimate Std. Error t value
# (Intercept)        1.4017879  0.0728271  19.248
# posteam_typehome   0.0461963  0.0089567   5.158
# down_fac2         -0.0658132  0.0387792  -1.697
# down_fac3         -0.2819329  0.0390222  -7.225
# down_fac4         -0.3730766  0.0691246  -5.397
# ydstogo           -0.0163371  0.0033296  -4.907
# yardline_100      -0.0129399  0.0006963 -18.583
# ep                -0.2303694  0.0105117 -21.916
# cum_sacks         -0.0047892  0.0036979  -1.295
# cum_non_sack_hits -0.0014459  0.0030961  -0.467
# down_fac2:ydstogo -0.0053173  0.0037729  -1.409
# down_fac3:ydstogo -0.0067335  0.0037402  -1.800
# down_fac4:ydstogo -0.0481156  0.0076240  -6.311
# 
# Correlation matrix not shown by default, as p = 13 > 12.
# Use print(x, correlation=TRUE)  or
#     vcov(x)        if you need it

R2_Score(predict(comb_fit_mm, test, allow.new.levels = TRUE), test$epa)
# [1] 0.008923579
Again, no improvement on the test R2 from the base model and the t value for cum_non_sack_hits is the lowest we've seen.
Addendum

Ben reminded me on Twitter that I had neglected to include score_differntial in the model(s). Here is what the results are when included in the simple linear models and the mixed models.
3. Conclusions

So, our best (most predictive) model was the mixed-effects model that included cum_hits. The estimate for cum_hits of -0.0026 meant that with every hit, each subsequent pass would drop by 0.0026 EPA.
As an example, take a QB that attempts 30 passes in a single game. If that QB was hit for the 3rd time on their 15th throw, assuming they are not hit again, the EPA lost on their last 15 throws as a result of accumulated hits would be 15 * 3 * -0.0026 ~= -0.116.
Now, what if, in the worst case scenario, that same QB had been hit on every play and the EPA lost on their last 15 throws was instead 15 * 15 * -0.0026 ~= -0.62052?
Considering that:

-0.116 is slightly better than the average 4 yard gain on 1st and 10
-0.621 is slightly better than the average 1 yard gain on 2nd and 10

even in the worst-case scenario, the losses in EPA that result from an accumulation of hits is equivalent to one moderately bad play. That's it.
And that's only if you think cum_hits should have been included in the model. Either way, nothing in mine or Ben's analyses suggest that accumulated hits are worth considering.
4. Postscript: Offensive and Defensive Strengths

For fun, let's look at the random effects from our mixed-effects model(s). Here are the the distributions of the offensive and defensive units:
ran <- ranef(base_fit_mm)

plot_data <- rbind(
  ran$offense %>% rename(strength='(Intercept)') %>% mutate(side='offense'),
  ran$defense %>% rename(strength='(Intercept)') %>% mutate(side='defense')
  )

library(ggplot2)
ggplot(plot_data, aes(strength, fill = side)) + 
  geom_density(alpha = 0.35) +
  ggtitle('Offensive vs Defensive Passing EPA Strength')

The tighter distribution of the defensive strengths around zero suggest that defenses have less of an impact on passing EPA than offenses.
The top 15 offensive passing seasons since 2009 seem to generally make sense:
passers <- passes %>%
  distinct(passer_player_id, passer_player_name) %>%
  mutate(nchar=nchar(passer_player_name)) %>%
  arrange(passer_player_id, -nchar) %>%
  distinct(passer_player_id, .keep_all = TRUE) %>%
  select(-nchar)

qb_str <- ran$offense %>%
  rename(off_pass_epa_int = '(Intercept)') %>%
  tibble::rownames_to_column(var = 'offense') %>%
  mutate(off_pass_epa_var = attr(ran$offense, "postVar")[1, 1, ]) %>%
  separate(offense, c('season', 'passer_player_id'), sep=':') %>%
  mutate(season=as.numeric(season)) %>%
  arrange(-off_pass_epa_int) %>%
  left_join(passers, by='passer_player_id') %>%
  left_join(
    passes %>% group_by(season, passer_player_id) %>% summarize(passes=length(play_id)),
    by = c('season', 'passer_player_id')
    )

head(qb_str, 15)
#    season passer_player_id off_pass_epa_int off_pass_epa_var passer_player_name passes
# 1    2018       00-0033873        0.3228594      0.004484434          P.Mahomes    671
# 2    2009       00-0010346        0.2973676      0.004809476          P.Manning    608
# 3    2014       00-0023459        0.2842181      0.004781794          A.Rodgers    605
# 4    2011       00-0020531        0.2779966      0.004270427            D.Brees    737
# 5    2018       00-0020531        0.2737248      0.005265880            D.Brees    539
# 6    2011       00-0023459        0.2696903      0.004807023          A.Rodgers    592
# 7    2016       00-0026143        0.2507002      0.004712305             M.Ryan    616
# 8    2013       00-0010346        0.2383985      0.004307639          P.Manning    716
# 9    2012       00-0019596        0.2243546      0.004396295            T.Brady    703
# 10   2011       00-0019596        0.2214261      0.004588874            T.Brady    687
# 11   2009       00-0022942        0.2189321      0.005354635           P.Rivers    542
# 12   2015       00-0029263        0.2171071      0.004994096           R.Wilson    602
# 13   2015       00-0022924        0.2169744      0.005290244   B.Roethlisberger    527
# 14   2012       00-0010346        0.2147981      0.004781978          P.Manning    638
# 15   2017       00-0032950        0.2140824      0.005416154            C.Wentz    525
And the top 10 defensive passing seasons belong to:
def_str <- ran$defense %>%
  rename(def_pass_epa_int = '(Intercept)') %>%
  tibble::rownames_to_column(var = 'defense') %>%
  mutate(def_pass_epa_var = attr(ran$defense, "postVar")[1, 1, ]) %>%
  separate(defense, c('season', 'defteam'), sep='_') %>%
  mutate(season=as.numeric(season)) %>%
  arrange(def_pass_epa_int)

head(def_str, 10)
#    season defteam def_pass_epa_int def_pass_epa_var
# 1    2016     DEN      -0.12672999      0.002695580
# 2    2009     NYJ      -0.11572561      0.002947501
# 3    2012     CIN      -0.11321756      0.002833850
# 4    2013     SEA      -0.11027701      0.002831613
# 5    2012     ARI      -0.10525198      0.002844016
# 6    2017     BAL      -0.09783000      0.002721691
# 7    2011     NYJ      -0.09685708      0.002879514
# 8    2017     JAX      -0.09169130      0.002801606
# 9    2015     DEN      -0.09158392      0.002712855
# 10   2009     BUF      -0.08843870      0.002826532
And finally, the bottom 15 offensive passing seasons:
tail(qb_str, 15)
#     season passer_player_id off_pass_epa_int off_pass_epa_var passer_player_name passes
# 856   2018       00-0026898       -0.1621003      0.014478714          M.Sanchez     43
# 857   2013       00-0027948       -0.1644682      0.011977040          B.Gabbert    107
# 858   2017       00-0028595       -0.1670066      0.015061625          S.Tolzien     24
# 859   2009       00-0026898       -0.1673196      0.006354349          M.Sanchez    414
# 860   2012       00-0027754       -0.1674245      0.008710545          J.Skelton    224
# 861   2018       00-0033958       -0.1836659      0.012063079         N.Peterman    105
# 862   2013       00-0030526       -0.1937594      0.006649194           E.Manuel    378
# 863   2015       00-0029567       -0.2034197      0.006527679            N.Foles    373
# 864   2018       00-0034343       -0.2044619      0.005956765            J.Rosen    475
# 865   2012       00-0029531       -0.2106621      0.009626361          R.Lindley    193
# 866   2016       00-0033106       -0.2213431      0.008055449             J.Goff    251
# 867   2009       00-0022164       -0.2309472      0.009012687           K.Boller    209
# 868   2010       00-0027659       -0.2539440      0.006828976          J.Clausen    354
# 869   2009       00-0025388       -0.2648288      0.007824201          J.Russell    299
# 870   2011       00-0025970       -0.2885444      0.010492451   C.Hanie (3rd QB)    133