Skip to content

Instantly share code, notes, and snippets.

@bbdaniels
Created March 23, 2023 19:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bbdaniels/39c5497fe62ed87f5d780395ba4fdc07 to your computer and use it in GitHub Desktop.
Save bbdaniels/39c5497fe62ed87f5d780395ba4fdc07 to your computer and use it in GitHub Desktop.
Debiasing regression with controls
// Part 1: De-biasing a parameter estimate using controls
// Develop some data generating process for data X’s and for outcome Y, with some (potentially multi-armed) treatment variable and treatment effect. Like last week, you should strongly consider "simulating" data along the lines of your group project.
// This DGP should include strata groups and continuous covariates, as well as random noise. Make sure that the strata groups affect the outcome Y and are of different sizes, and make the probability that an individual unit receives treatment vary across strata groups. You will want to create the strata groups first, then use a command like expand or merge to add them to an individual-level data set.
cap prog drop runregs
prog def runregs, rclass
syntax anything // sample size goes here (divided by ten)
clear
set obs 4
gen block = _n
expand block*`anything'
gen cov_conf = rnormal()
gen cov_nocy = rnormal()
gen cov_nocx = rnormal()
gen treat = (block/(4) + cov_conf + cov_nocx + rnormal()) > 0
gen y = block + cov_conf /// block and cov_conf are confounders
+ 4*cov_nocy + 0.5*treat /// <-- true effect = 0.5
+ 2*rnormal() // random noise
return scalar n = `c(N)'
reg y treat
return scalar b_nocov = _b[treat]
reg y treat i.block
return scalar b_block = _b[treat]
reg y treat i.block cov_conf
return scalar b_unbias = _b[treat]
reg y treat cov_nocy cov_nocx
return scalar b_bias1 = _b[treat]
reg y treat cov_nocy cov_nocx cov_conf
return scalar b_bias2 = _b[treat]
end
tempfile sims all
clear
save `all' , emptyok
qui foreach i in 10 20 40 80 160 320 640 {
simulate nocov=r(b_nocov) block=r(b_block) ///
bias1=r(b_bias1) bias2=r(b_bias2) ///
unbias=r(b_unbias) n=r(n) ///
, r(500) saving(`sims' , replace) ///
: runregs `i'
use `all' , clear
append using `sims'
save `all' , replace
}
use `all' , clear
tw ///
(lpolyci nocov n) ///
(lpolyci block n) ///
(lpolyci bias1 n) ///
(lpolyci bias2 n) ///
(lpolyci unbias n) ///
, legend(on c(1) pos(3) ///
order(2 "Naive" 3 "Strata" 4 "X Cov Only" ///
5 "X + Conf" 6 "Correct")) ///
yscale(r(0)) ylab(#6 0.5 "True") ytit("Coefficient Estimates") ///
xscale(log) xlab(100 200 400 800 1600 3200 6400) xtit("Observations")
// Make sure that at least one of the continuous covariates also affects both the outcome and the likelihood of receiving treatment (a "confounder"). Make sure that another one of the covariates affects the outcome but not the treatment. Make sure that another one affects the treatment but not the outcome. (What do these do?)
// Construct at least five different regression models with combinations of these covariates and strata fixed effects. (Type h fvvarlist for information on using fixed effects in regression.) Run these regressions at different sample sizes, using a program like last week. Collect as many regression runs as you think you need for each, and produce figures and tables comparing the biasedness and convergence of the models as N grows. Can you produce a figure showing the mean and variance of beta for different regression models, as a function of N? Can you visually compare these to the "true" parameter value?
// Fully describe your results in your README.md file, including figures and tables as appropriate.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment