Code and text from https://andrewgelman.com/2018/10/05/ivy-league-football-saw-large-reduction-in-concussions-after-new-kickoff-rules/
rows <- "y1 n1 y2 n2
kickoff 26 2379 3 1467
scrimmage 92 34521 28 22467
punt 6 2496 2 1791
field_goal_pat 2 2090 0 1268"
concussions <- read.delim(text = rows, sep = " ")
concussions <- rbind(concussions, apply(concussions[2:4,], 2, sum))
rownames(concussions)[5] <- "other_plays"
compare <- function(data, rate_per = 1000) {
# "For each row, I computed the raw proportions and standard errors, the
# difference between the proportion, and the standard errors of that
# difference."
p1 <- data$y1 / data$n1
p2 <- data$y2 / data$n2
diff <- p2 - p1
se_diff <- sqrt(p1 * (1 - p1) / data$n1 + p2 * (1 - p2) / data$n2)
# "I also computed the difference in differences, comparing the change in the
# concussion rate for kickoffs to the change in concussion rate for
# non-kickoff plays, as this comparison was mentioned in the article’s results
# section."
diff_in_diff <- diff[1] - diff[5]
se_diff_in_diff <- sqrt(se_diff[1] ^ 2 + se_diff[5] ^ 2)
# "I multiplied all the estimated differences and standard errors by 1000 to
# get the data in rates per thousand."
list(
diffs = data.frame(
data,
diff = diff * rate_per,
diff_ci_lower = (diff - 2 * se_diff) * rate_per,
diff_ci_upper = (diff + 2 * se_diff) * rate_per),
diff_in_diff = diff_in_diff * rate_per,
se_diff_in_diff = se_diff_in_diff * rate_per)
}
print(lapply(compare(concussions), round, 2))
#> $diffs
#> y1 n1 y2 n2 diff diff_ci_lower diff_ci_upper
#> kickoff 26 2379 3 1467 -8.88 -13.76 -4.01
#> scrimmage 92 34521 28 22467 -1.42 -2.15 -0.69
#> punt 6 2496 2 1791 -1.29 -3.80 1.23
#> field_goal_pat 2 2090 0 1268 -0.96 -2.31 0.40
#> other_plays 100 39107 30 25526 -1.38 -2.05 -0.71
#>
#> $diff_in_diff
#> [1] -7.5
#>
#> $se_diff_in_diff
#> [1] 2.46
# As a student in my class pointed out, it really makes more sense to compare
# the rates of decline than the absolute declines.
# Put it this way. If the probabilities of concussion are:
# - p_K1: Probability of concussion for a kickoff play in 2013-2015
# - p_K2: Probability of concussion for a kickoff play in 2016-2017
# - p_N1: Probability of concussion for a non-kickoff play in 2013-2015
# - p_N2: Probability of concussion for a non-kickoff play in 2016-2017.
# Following the published paper, we estimated the difference in differences,
# (p_K2 - p_K1) - (p_N2 - p_N1).
# But I think it makes more sense to think multiplicatively, to work with the
# ratio of ratios, (p_K2/p_K1) / (p_N2/p_N1).
# Or, on the log scale, (log p_K2 - log p_K1) - (log p_N2 - log p_N1).
# What's the estimate and standard error of this comparison?
# The key step is that we can use relative errors. From the Poisson
# distribution, the relative sd of an estimated rate is 1/sqrt(y), so this is
# the approx sd on the log scale. So, the estimated difference in difference of
# log probabilities is
(log(3/1467) - log(26/2379)) - (log(30/25526) - log(100/39107))
#> [1] -0.8986547
# That's a big number:
exp(0.90)
#> [1] 2.459603
# that the concussion rate fell over twice as fast in kickoff than in
# non-kickoff plays.
# But the standard error of this difference in difference of logs is
sqrt(1/3 + 1/26 + 1/30 + 1/100)
#> [1] 0.6443044
# The observed d-in-d-of-logs is -0.90, which is less than two standard errors
# from zero, thus not conventionally statistically significant.
# So, from these data alone, we cannot confidently conclude that the relative
# decline in concussion rates is more for kickoffs than for other plays. That
# estimate of 3/1467 is just too noisy.
# We could also see this using a Poisson regression with four data points.
data <- data.frame(y = c(concussions$y1[c(1,5)], concussions$y2[c(1,5)]),
n = c(concussions$n1[c(1,5)], concussions$n2[c(1,5)]),
time = c(0, 0, 1, 1),
kickoff = c(1, 0, 1, 0))
library("rstanarm")
#> Loading required package: Rcpp
#> rstanarm (Version 2.17.4, packaged: 2018-04-13 01:51:52 UTC)
#> - Do not expect the default priors to remain the same in future rstanarm versions.
#> Thus, R scripts should specify priors explicitly, even if they are just the defaults.
#> - For execution on a local, multicore CPU with excess RAM we recommend calling
#> options(mc.cores = parallel::detectCores())
#> - Plotting theme set to bayesplot::theme_default().
options(mc.cores = parallel::detectCores())
fit <- stan_glm(
y ~ time * kickoff + offset(log(n)),
family = poisson,
data = data)
print(fit)
#> stan_glm
#> family: poisson [log]
#> formula: y ~ time * kickoff + offset(log(n))
#> observations: 4
#> predictors: 4
#> ------
#> Median MAD_SD
#> (Intercept) -6.0 0.1
#> time -0.8 0.2
#> kickoff 1.4 0.2
#> time:kickoff -0.9 0.6
#>
#> Sample avg. posterior predictive distribution of y:
#> Median MAD_SD
#> mean_PPD 39.5 4.4
#>
#> ------
#> For info on the priors used see help('prior_summary.stanreg').
sessionInfo()
#> R version 3.5.1 (2018-07-02)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 17134)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United States.1252
#> [2] LC_CTYPE=English_United States.1252
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] rstanarm_2.17.4 Rcpp_0.12.18
#>
#> loaded via a namespace (and not attached):
#> [1] splines_3.5.1 gtools_3.8.1 StanHeaders_2.18.0
#> [4] threejs_0.3.1 shiny_1.1.0 assertthat_0.2.0
#> [7] stats4_3.5.1 yaml_2.2.0 pillar_1.3.0
#> [10] backports_1.1.2 lattice_0.20-35 glue_1.3.0
#> [13] digest_0.6.17 promises_1.0.1 minqa_1.2.4
#> [16] colorspace_1.3-2 htmltools_0.3.6 httpuv_1.4.5
#> [19] Matrix_1.2-14 plyr_1.8.4 dygraphs_1.1.1.6
#> [22] pkgconfig_2.0.2 rstan_2.17.4 purrr_0.2.5
#> [25] xtable_1.8-3 scales_1.0.0 later_0.7.5
#> [28] lme4_1.1-18-1 tibble_1.4.2 bayesplot_1.6.0.9000
#> [31] ggplot2_3.0.0 DT_0.4 shinyjs_1.0
#> [34] lazyeval_0.2.1 survival_2.42-3 magrittr_1.5
#> [37] crayon_1.3.4 mime_0.5 evaluate_0.11
#> [40] nlme_3.1-137 MASS_7.3-50 xts_0.11-1
#> [43] colourpicker_1.0 rsconnect_0.8.8 tools_3.5.1
#> [46] loo_2.0.0 matrixStats_0.54.0 stringr_1.3.1
#> [49] munsell_0.5.0 bindrcpp_0.2.2 compiler_3.5.1
#> [52] rlang_0.2.2 grid_3.5.1 nloptr_1.2.1
#> [55] ggridges_0.5.1 htmlwidgets_1.3 crosstalk_1.0.0
#> [58] igraph_1.2.2 miniUI_0.1.1.1 base64enc_0.1-3
#> [61] rmarkdown_1.10 gtable_0.2.0 codetools_0.2-15
#> [64] inline_0.3.15 markdown_0.8 reshape2_1.4.3
#> [67] R6_2.3.0 gridExtra_2.3 rstantools_1.5.1
#> [70] zoo_1.8-4 knitr_1.20 dplyr_0.7.6
#> [73] bindr_0.1.1 shinystan_2.5.0 shinythemes_1.1.1
#> [76] rprojroot_1.3-2 stringi_1.1.7 parallel_3.5.1
#> [79] tidyselect_0.2.4
Created on 2018-10-09 by the reprex package (v0.2.1)