Skip to content

Instantly share code, notes, and snippets.

@statwonk
Last active February 10, 2021 21:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save statwonk/08e852dd6735eaabeeee264d3a32e710 to your computer and use it in GitHub Desktop.
Save statwonk/08e852dd6735eaabeeee264d3a32e710 to your computer and use it in GitHub Desktop.
A simulation showing how cases can be discarded in logistic regression while preserving an unbiased estimator. https://twitter.com/statwonk/status/1291712092479860737?s=20
library(tidyverse)
1e4 -> N
0.03 -> p
# author: twitter.com/statwonk
# showing how cases can be discarded in logistic regression while preserving an unbiased estimator
seq_len(1e3) %>%
map_dbl(function(x) {
rbinom(N, 1, p) -> y
tibble(
all_data = tibble(y = y) %>% glm(y ~ 1, "binomial", .) %>% coef() %>% plogis(),
sampled_data = tibble(y = y[y == 1 | runif(N) <= p]) %>%
mutate(weights = case_when(y == 1 ~ y*1.0, TRUE ~ 1/p)) %>%
glm(y ~ 1, "binomial", ., weights = .$weights) %>%
coef() %>%
plogis()
) %>%
mutate(diff = sampled_data - all_data) %>%
pull(diff)
}) %>%
ecdf() %>%
plot(main = "The difference in estimated p\np% sample - all data")
abline(v = 0)
@statwonk
Copy link
Author

How does this look under effects coding?

@russellpierce
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment