Instantly share code, notes, and snippets.

# tjmahr/concussions.md Last active Oct 9, 2018

concussions model
```rows <- "y1 n1 y2 n2
kickoff 26 2379 3 1467
scrimmage 92 34521 28 22467
punt 6 2496 2 1791
field_goal_pat 2 2090 0 1268"

concussions <- read.delim(text = rows, sep = " ")
concussions <- rbind(concussions, apply(concussions[2:4,], 2, sum))
rownames(concussions)[5] <- "other_plays"

compare <- function(data, rate_per = 1000) {
# "For each row, I computed the raw proportions and standard errors, the
# difference between the proportion, and the standard errors of that
# difference."
p1 <- data\$y1 / data\$n1
p2 <- data\$y2 / data\$n2
diff <- p2 - p1
se_diff <- sqrt(p1 * (1 - p1) / data\$n1 + p2 * (1 - p2) / data\$n2)

# "I also computed the difference in differences, comparing the change in the
# concussion rate for kickoffs to the change in concussion rate for
# non-kickoff plays, as this comparison was mentioned in the article’s results
# section."
diff_in_diff <- diff[1] - diff[5]
se_diff_in_diff <- sqrt(se_diff[1] ^ 2 + se_diff[5] ^ 2)

# "I multiplied all the estimated differences and standard errors by 1000 to
# get the data in rates per thousand."
list(
diffs = data.frame(
data,
diff = diff * rate_per,
diff_ci_lower = (diff - 2 * se_diff) * rate_per,
diff_ci_upper = (diff + 2 * se_diff) * rate_per),
diff_in_diff = diff_in_diff * rate_per,
se_diff_in_diff = se_diff_in_diff * rate_per)
}

print(lapply(compare(concussions), round, 2))
#> \$diffs
#>                 y1    n1 y2    n2  diff diff_ci_lower diff_ci_upper
#> kickoff         26  2379  3  1467 -8.88        -13.76         -4.01
#> scrimmage       92 34521 28 22467 -1.42         -2.15         -0.69
#> punt             6  2496  2  1791 -1.29         -3.80          1.23
#> field_goal_pat   2  2090  0  1268 -0.96         -2.31          0.40
#> other_plays    100 39107 30 25526 -1.38         -2.05         -0.71
#>
#> \$diff_in_diff
#> [1] -7.5
#>
#> \$se_diff_in_diff
#> [1] 2.46

# As a student in my class pointed out, it really makes more sense to compare
# the rates of decline than the absolute declines.

# Put it this way. If the probabilities of concussion are:
# - p_K1: Probability of concussion for a kickoff play in 2013-2015
# - p_K2: Probability of concussion for a kickoff play in 2016-2017
# - p_N1: Probability of concussion for a non-kickoff play in 2013-2015
# - p_N2: Probability of concussion for a non-kickoff play in 2016-2017.

# Following the published paper, we estimated the difference in differences,
# (p_K2 - p_K1) - (p_N2 - p_N1).

# But I think it makes more sense to think multiplicatively, to work with the
# ratio of ratios, (p_K2/p_K1) / (p_N2/p_N1).

# Or, on the log scale, (log p_K2 - log p_K1) - (log p_N2 - log p_N1).

# What's the estimate and standard error of this comparison?

# The key step is that we can use relative errors. From the Poisson
# distribution, the relative sd of an estimated rate is 1/sqrt(y), so this is
# the approx sd on the log scale. So, the estimated difference in difference of
# log probabilities is
(log(3/1467) - log(26/2379)) - (log(30/25526) - log(100/39107))
#> [1] -0.8986547

# That's a big number:
exp(0.90)
#> [1] 2.459603

# that the concussion rate fell over twice as fast in kickoff than in
# non-kickoff plays.

# But the standard error of this difference in difference of logs is
sqrt(1/3 + 1/26 + 1/30 + 1/100)
#> [1] 0.6443044

# The observed d-in-d-of-logs is -0.90, which is less than two standard errors
# from zero, thus not conventionally statistically significant.

# So, from these data alone, we cannot confidently conclude that the relative
# decline in concussion rates is more for kickoffs than for other plays. That
# estimate of 3/1467 is just too noisy.

# We could also see this using a Poisson regression with four data points.
data <- data.frame(y = c(concussions\$y1[c(1,5)], concussions\$y2[c(1,5)]),
n = c(concussions\$n1[c(1,5)], concussions\$n2[c(1,5)]),
time = c(0, 0, 1, 1),
kickoff = c(1, 0, 1, 0))

library("rstanarm")
#> rstanarm (Version 2.17.4, packaged: 2018-04-13 01:51:52 UTC)
#> - Do not expect the default priors to remain the same in future rstanarm versions.
#> Thus, R scripts should specify priors explicitly, even if they are just the defaults.
#> - For execution on a local, multicore CPU with excess RAM we recommend calling
#> options(mc.cores = parallel::detectCores())
#> - Plotting theme set to bayesplot::theme_default().
options(mc.cores = parallel::detectCores())

fit <- stan_glm(
y ~ time * kickoff + offset(log(n)),
family = poisson,
data = data)

print(fit)
#> stan_glm
#>  family:       poisson [log]
#>  formula:      y ~ time * kickoff + offset(log(n))
#>  observations: 4
#>  predictors:   4
#> ------
#> (Intercept)  -6.0    0.1
#> time         -0.8    0.2
#> kickoff       1.4    0.2
#> time:kickoff -0.9    0.6
#>
#> Sample avg. posterior predictive distribution of y:
#> mean_PPD 39.5    4.4
#>
#> ------
#> For info on the priors used see help('prior_summary.stanreg').

sessionInfo()
#> R version 3.5.1 (2018-07-02)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 17134)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United States.1252
#> [2] LC_CTYPE=English_United States.1252
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.1252
#>
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base
#>
#> other attached packages:
#> [1] rstanarm_2.17.4 Rcpp_0.12.18
#>
#> loaded via a namespace (and not attached):
#>  [4] threejs_0.3.1        shiny_1.1.0          assertthat_0.2.0
#>  [7] stats4_3.5.1         yaml_2.2.0           pillar_1.3.0
#> [10] backports_1.1.2      lattice_0.20-35      glue_1.3.0
#> [13] digest_0.6.17        promises_1.0.1       minqa_1.2.4
#> [16] colorspace_1.3-2     htmltools_0.3.6      httpuv_1.4.5
#> [19] Matrix_1.2-14        plyr_1.8.4           dygraphs_1.1.1.6
#> [22] pkgconfig_2.0.2      rstan_2.17.4         purrr_0.2.5
#> [25] xtable_1.8-3         scales_1.0.0         later_0.7.5
#> [28] lme4_1.1-18-1        tibble_1.4.2         bayesplot_1.6.0.9000
#> [31] ggplot2_3.0.0        DT_0.4               shinyjs_1.0
#> [34] lazyeval_0.2.1       survival_2.42-3      magrittr_1.5
#> [37] crayon_1.3.4         mime_0.5             evaluate_0.11
#> [40] nlme_3.1-137         MASS_7.3-50          xts_0.11-1
#> [43] colourpicker_1.0     rsconnect_0.8.8      tools_3.5.1
#> [46] loo_2.0.0            matrixStats_0.54.0   stringr_1.3.1
#> [49] munsell_0.5.0        bindrcpp_0.2.2       compiler_3.5.1
#> [52] rlang_0.2.2          grid_3.5.1           nloptr_1.2.1
#> [55] ggridges_0.5.1       htmlwidgets_1.3      crosstalk_1.0.0
#> [58] igraph_1.2.2         miniUI_0.1.1.1       base64enc_0.1-3
#> [61] rmarkdown_1.10       gtable_0.2.0         codetools_0.2-15
#> [64] inline_0.3.15        markdown_0.8         reshape2_1.4.3
#> [67] R6_2.3.0             gridExtra_2.3        rstantools_1.5.1
#> [70] zoo_1.8-4            knitr_1.20           dplyr_0.7.6
#> [73] bindr_0.1.1          shinystan_2.5.0      shinythemes_1.1.1
#> [76] rprojroot_1.3-2      stringi_1.1.7        parallel_3.5.1
#> [79] tidyselect_0.2.4```

Created on 2018-10-09 by the reprex package (v0.2.1)