bbdaniels/bias-controls.co

## bias-controls.co
// Part 1: De-biasing a parameter estimate using controls

  // Develop some data generating process for data X’s and for outcome Y, with some (potentially multi-armed) treatment variable and treatment effect. Like last week, you should strongly consider "simulating" data along the lines of your group project.

  // This DGP should include strata groups and continuous covariates, as well as random noise. Make sure that the strata groups affect the outcome Y and are of different sizes, and make the probability that an individual unit receives treatment vary across strata groups. You will want to create the strata groups first, then use a command like expand or merge to add them to an individual-level data set.

cap prog drop runregs
prog def runregs, rclass

syntax anything // sample size goes here (divided by ten)

  clear
  set obs 4
    gen block = _n
    expand block*`anything'

    gen cov_conf = rnormal()

    gen cov_nocy = rnormal()
    gen cov_nocx = rnormal()

    gen treat = (block/(4) + cov_conf + cov_nocx + rnormal()) > 0
    gen y = block + cov_conf    /// block and cov_conf are confounders
          + 4*cov_nocy + 0.5*treat /// <-- true effect = 0.5
          + 2*rnormal() // random noise

    return scalar n = `c(N)'

    reg y treat
      return scalar b_nocov = _b[treat]
    reg y treat i.block
      return scalar b_block = _b[treat]
    reg y treat i.block cov_conf
      return scalar b_unbias = _b[treat]
    reg y treat cov_nocy cov_nocx
      return scalar b_bias1 = _b[treat]
    reg y treat cov_nocy cov_nocx cov_conf
      return scalar b_bias2 = _b[treat]

end

tempfile sims all
  clear
  save `all' , emptyok

qui foreach i in 10 20 40 80 160 320 640 {
  simulate nocov=r(b_nocov) block=r(b_block) ///
           bias1=r(b_bias1) bias2=r(b_bias2) ///
           unbias=r(b_unbias) n=r(n) ///
  , r(500) saving(`sims' , replace) ///
  : runregs `i'

  use `all' , clear
    append using `sims'
    save `all' , replace

}

use `all' , clear

  tw ///
    (lpolyci nocov n) ///
    (lpolyci block n) ///
    (lpolyci bias1 n) ///
    (lpolyci bias2 n) ///
    (lpolyci unbias n) ///
  , legend(on c(1) pos(3) ///
           order(2 "Naive" 3 "Strata" 4 "X Cov Only" ///
                 5 "X + Conf" 6 "Correct")) ///
    yscale(r(0)) ylab(#6 0.5 "True") ytit("Coefficient Estimates") ///
    xscale(log) xlab(100 200 400 800 1600 3200 6400) xtit("Observations")

  // Make sure that at least one of the continuous covariates also affects both the outcome and the likelihood of receiving treatment (a "confounder"). Make sure that another one of the covariates affects the outcome but not the treatment. Make sure that another one affects the treatment but not the outcome. (What do these do?)

  // Construct at least five different regression models with combinations of these covariates and strata fixed effects. (Type h fvvarlist for information on using fixed effects in regression.) Run these regressions at different sample sizes, using a program like last week. Collect as many regression runs as you think you need for each, and produce figures and tables comparing the biasedness and convergence of the models as N grows. Can you produce a figure showing the mean and variance of beta for different regression models, as a function of N? Can you visually compare these to the "true" parameter value?

  // Fully describe your results in your README.md file, including figures and tables as appropriate.
	// Part 1: De-biasing a parameter estimate using controls

	// Develop some data generating process for data X’s and for outcome Y, with some (potentially multi-armed) treatment variable and treatment effect. Like last week, you should strongly consider "simulating" data along the lines of your group project.

	// This DGP should include strata groups and continuous covariates, as well as random noise. Make sure that the strata groups affect the outcome Y and are of different sizes, and make the probability that an individual unit receives treatment vary across strata groups. You will want to create the strata groups first, then use a command like expand or merge to add them to an individual-level data set.

	cap prog drop runregs
	prog def runregs, rclass

	syntax anything // sample size goes here (divided by ten)

	clear
	set obs 4
	gen block = _n
	expand block*`anything'

	gen cov_conf = rnormal()

	gen cov_nocy = rnormal()
	gen cov_nocx = rnormal()

	gen treat = (block/(4) + cov_conf + cov_nocx + rnormal()) > 0
	gen y = block + cov_conf /// block and cov_conf are confounders
	+ 4cov_nocy + 0.5treat /// <-- true effect = 0.5
	+ 2*rnormal() // random noise

	return scalar n = `c(N)'

	reg y treat
	return scalar b_nocov = _b[treat]
	reg y treat i.block
	return scalar b_block = _b[treat]
	reg y treat i.block cov_conf
	return scalar b_unbias = _b[treat]
	reg y treat cov_nocy cov_nocx
	return scalar b_bias1 = _b[treat]
	reg y treat cov_nocy cov_nocx cov_conf
	return scalar b_bias2 = _b[treat]

	end

	tempfile sims all
	clear
	save `all' , emptyok

	qui foreach i in 10 20 40 80 160 320 640 {
	simulate nocov=r(b_nocov) block=r(b_block) ///
	bias1=r(b_bias1) bias2=r(b_bias2) ///
	unbias=r(b_unbias) n=r(n) ///
	, r(500) saving(`sims' , replace) ///
	: runregs `i'

	use `all' , clear
	append using `sims'
	save `all' , replace

	}

	use `all' , clear

	tw ///
	(lpolyci nocov n) ///
	(lpolyci block n) ///
	(lpolyci bias1 n) ///
	(lpolyci bias2 n) ///
	(lpolyci unbias n) ///
	, legend(on c(1) pos(3) ///
	order(2 "Naive" 3 "Strata" 4 "X Cov Only" ///
	5 "X + Conf" 6 "Correct")) ///
	yscale(r(0)) ylab(#6 0.5 "True") ytit("Coefficient Estimates") ///
	xscale(log) xlab(100 200 400 800 1600 3200 6400) xtit("Observations")

	// Make sure that at least one of the continuous covariates also affects both the outcome and the likelihood of receiving treatment (a "confounder"). Make sure that another one of the covariates affects the outcome but not the treatment. Make sure that another one affects the treatment but not the outcome. (What do these do?)

	// Construct at least five different regression models with combinations of these covariates and strata fixed effects. (Type h fvvarlist for information on using fixed effects in regression.) Run these regressions at different sample sizes, using a program like last week. Collect as many regression runs as you think you need for each, and produce figures and tables comparing the biasedness and convergence of the models as N grows. Can you produce a figure showing the mean and variance of beta for different regression models, as a function of N? Can you visually compare these to the "true" parameter value?

	// Fully describe your results in your README.md file, including figures and tables as appropriate.