Skip to content

Instantly share code, notes, and snippets.

@tappek
Last active April 16, 2023 12:28
Show Gist options
  • Save tappek/4cb65ab25d64f019ec629df5d11bd2bc to your computer and use it in GitHub Desktop.
Save tappek/4cb65ab25d64f019ec629df5d11bd2bc to your computer and use it in GitHub Desktop.
fwildclusterboot and sandwich::vcovBS
library("plm")
data("Grunfeld", package = "plm")
## set some observations of the dependent variable to missing
Grunfeld$inv[Grunfeld$year<=1936] <- NA
fe <- plm(inv ~ value + capital, data = Grunfeld, model = "within")
library(sandwich)
sandwich::vcovBS(fe, cluster=~firm, R = 999) # errors
# Error in vcovBS.default(fe, cluster = ~firm, R = 999) :
# number of observations in 'cluster' and 'nobs()' do not match
## Workaround
Gr <- Grunfeld[complete.cases(Grunfeld), ] # workaround
fe2 <- plm(inv ~ value + capital, data = Gr, model = "within")
# works due to workaround (needs plm dev version 2023-04-15)
sandwich::vcovBS(fe2, cluster=~firm, R = 999)
## subset cases
Grunfeld$sample <- complete.cases(Grunfeld[ , c('inv', 'value', 'capital')])
fe_complete <- plm(inv ~ value + capital, data = Grunfeld, subset = Grunfeld$sample, model = "within")
## works with plm dev version / 2023-04-15)
sandwich::vcovBS(fe_complete, cluster=~firm, R = 999)
library(fwildclusterboot)
## case with dropped NA's
## fails, just like like sandwich::vcovBS
fe_wild <- boottest(fe, param="value",clustid=c("firm"),
B=101, type="rademacher", impose_null=TRUE, p_val_type="two-tailed")
summary(fe_wild)
## works due to workaround
fe_wild2 <- boottest(fe2, param="value",clustid=c("firm"),
B=101, type="rademacher", impose_null=TRUE, p_val_type="two-tailed")
summary(fe_wild2)
nobs(fe2)
## subset case
fe_wild_complete <- boottest(fe_complete, param="value",clustid=c("firm"),
B=101, type="rademacher", impose_null=TRUE, p_val_type="two-tailed")
summary(fe_wild_complete)
@s3alfisc
Copy link

Hi, after looking at this for a bit, I think the root source of the error with clusters is the following:

  • to make sure that users can specify a clustering variable that is not part of the regression model (think of lm(Y~X) and cluster group g), both sandwich and fwildclusterboot need to grab the clustering variable from the input data set, which is of course not yet processed by the regression function.
  • In a next step, both packages then need to drop columns as the regression software does.
  • for lm and lfe::felm, this is e.g. done as
    ## handle omitted or excluded observations (works for lfe, lm)
    if ((N != NROW(cluster)) &&
        !is.null(object$na.action) &&
        (class(object$na.action) %in% c("exclude", "omit"))) {
      cluster <- cluster[-object$na.action, , drop = FALSE]
    }
  • does plm report an index of dropped columns? Then I could easily add an additional if clause, as I do for fixest. But the easiest way to make plm compatible with both fwildclusterboot and sandwich would be to output a na.action as lm.

@s3alfisc
Copy link

Ah, I just read the plm PR and realized that you are already aware of what is going on =)

@tappek
Copy link
Author

tappek commented Apr 16, 2023

Thank you! Yes, somewhat ;). Proper na.action implemenation "was long on the list"...

(Ref: ycroissant/plm#44)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment