Skip to content

Instantly share code, notes, and snippets.

@statwonk
Created March 4, 2021 12:58
Show Gist options
  • Save statwonk/774c833886d3c0eeb893914c668c47dc to your computer and use it in GitHub Desktop.
Save statwonk/774c833886d3c0eeb893914c668c47dc to your computer and use it in GitHub Desktop.
Let's explore Dr. Wooldridge's clustering comment on Twitter. https://twitter.com/jmwooldridge/status/1366515323923488768?s=20
library(tidyverse)
library(lmtest)
library(sandwich)
5e2 -> students
20 -> schools
tibble(student_id = 1:students) %>%
mutate(school_id = rep(1:schools, max(student_id) / schools)) %>%
left_join(tibble(school_id = 1:schools, school_effect = rnorm(schools)),
by = "school_id") %>%
left_join(tibble(student_id = 1:students),
by = "student_id") %>%
mutate(treatment_effect = 0.5 * student_id %% 2,
y = treatment_effect + school_effect) %>%
lm(y ~ treatment_effect, data = .) -> m
coeftest(m) # not clustered
coeftest(m, vcovCL(m, cluster = ~ school_effect)) # clustered standard errors
# If we exhaust the population of schools, we don't need to cluster because we don't need to speak to schools not in our sample.
@statwonk
Copy link
Author

statwonk commented Mar 4, 2021

Screen Shot 2021-03-04 at 6 55 19 AM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment