Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save vincentarelbundock/2d5e61e166382cf46d6b705b6b3a7ccb to your computer and use it in GitHub Desktop.
Save vincentarelbundock/2d5e61e166382cf46d6b705b6b3a7ccb to your computer and use it in GitHub Desktop.
ols/logit marginal effects
``` r
library(marginaleffects)
library(modelsummary)
set.seed(20220722)
S <- diag(3)
S[1,2] <- S[2,1] <- 0.6
S[1,3] <- S[3,1] <- 0.6
data <- MASS::mvrnorm(1000, rep(0, 3), S) |>
as.data.frame() |>
# make outcome, group, baseline binary
transform(outcome = V1 < 0,
rx = V2 < 0,
baseline = V3 < 0) |>
subset(select = c(outcome, rx, baseline))
```
OLS and Logit produce identical results:
``` r
mod <- list(
"glm 1" = glm(outcome ~ rx, data = data, family = binomial()),
"ols 1" = lm(outcome ~ rx, data = data),
"glm 2" = glm(outcome ~ rx + baseline, data = data, family = binomial()),
"ols 2" = lm(outcome ~ rx + baseline, data = data))
mfx <- lapply(mod, marginaleffects)
modelsummary(mfx, "markdown")
```
| | glm 1 | ols 1 | glm 2 | ols 2 |
| :------- | :-------: | :-------: | :-------: | :-------: |
| rx | 0.404 | 0.404 | 0.387 | 0.387 |
| | (0.029) | (0.029) | (0.027) | (0.026) |
| baseline | | | 0.390 | 0.390 |
| | | | (0.027) | (0.026) |
| Num.Obs. | 1000 | 1000 | 1000 | 1000 |
| R2 | | 0.163 | | 0.315 |
| R2 Adj. | | 0.162 | | 0.314 |
| AIC | 1220.8 | 1277.9 | 1032.4 | 1079.6 |
| BIC | 1230.6 | 1292.6 | 1047.1 | 1099.3 |
| Log.Lik. | \-608.404 | \-635.940 | \-513.181 | \-535.820 |
| F | 153.571 | 194.743 | 111.275 | 229.355 |
| RMSE | 0.46 | 0.46 | 0.41 | 0.41 |
Therefore, the slope is equal to the effect of a 1 unit change, as is always the case in linear models:
``` r
marginaleffects(mod[[1]]) |> summary()
#> Average marginal effects
#> Term Contrast Effect Std. Error z value Pr(>|z|) 2.5 % 97.5 %
#> 1 rx TRUE - FALSE 0.4038 0.02891 13.97 < 2.22e-16 0.3471 0.4605
#>
#> Model type: glm
#> Prediction type: response
comparisons(mod[[1]]) |> summary()
#> Average contrasts
#> Term Contrast Effect Std. Error z value Pr(>|z|) 2.5 % 97.5 %
#> 1 rx TRUE - FALSE 0.4038 0.02891 13.97 < 2.22e-16 0.3471 0.4605
#>
#> Model type: glm
#> Prediction type: response
```
@ngreifer
Copy link

The ols1 and glm1 AME estimates should be exactly equal because those models are saturated; ols2 and glm2 only happen to be equal here because the slopes in linear probability models are consistent for the AME of the corresponding logistic model, but they should not be exactly equal. They would be exactly equal if the models were fully saturated (i.e., outcome ~ rx * baseline).

@vincentarelbundock
Copy link
Author

Right, of course! Should have been a *

Screenshot 2022-07-23 013156

@vincentarelbundock
Copy link
Author

@mattansb see the discussion above. The model with 1 dummy is saturated because it has intercept and coefficient. The model with 2 dummies and an interaction is saturated because it has an intercept, two coefficients, and the parameter for the interaction. The model with only 2 dummies is not saturated, but as noted by Noah it is consistent, so it should be similar in large samples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment