Skip to content

Instantly share code, notes, and snippets.

@brshallo
Created December 15, 2021 23:51
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save brshallo/e031e73711afd2fff26636bd6091dd58 to your computer and use it in GitHub Desktop.
Save brshallo/e031e73711afd2fff26636bd6091dd58 to your computer and use it in GitHub Desktop.
Answering R4DS learning community question on whether stratified sampling. Generally probably don't need to worry about biasing parameter estimates.
library(tidyverse)
sim_params <- function(n_a = 100, n_b = 400, suffix = ""){
tibble(id = c(rep("a", n_a), rep("b", n_b)),
vals = c(rnorm(n_a, 5), rnorm(n_b, 3))
) %>%
lm(vals ~ id, data = .) %>%
broom::tidy() %>%
select(term, estimate) %>%
rename_with(~paste0(.x, suffix))
}
comparing_estimates <- tibble(sim_id = 1:1000) %>%
mutate(imbalanced = map(sim_id, sim_params, suffix = "_imbalanced"),
balanced = map(sim_id, ~sim_params(n_a = 250, n_b = 250, suffix = "_balanced"))) %>%
unnest(c(imbalanced, balanced))
comparing_estimates %>%
pivot_longer(cols = contains("estimate")) %>%
ggplot(aes(x = value, fill = name))+
geom_density(alpha = 0.3)+
facet_wrap(~term_balanced, ncol = 1, scales = "free_x")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment