Created
April 10, 2024 22:26
-
-
Save diamonaj/3bd7615a96d95bf75e7e638cae58d0bd to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "CS130 Causal Inference Assignment Spring 2024" | |
output: html_document | |
date: "2024-04-01" | |
--- | |
```{r setup, include=FALSE} | |
knitr::opts_chunk$set(echo = TRUE) | |
``` | |
## Part I Matching | |
### Part A | |
In the context of the Darfur dataset and the analysis of the effects of exposure to violence on attitudes towards peace, the essential elements of the Rubin Causal Model (RCM) can be outlined as follows: | |
Unit of Analysis: The individuals in the Darfur dataset represent the units of analysis. These are the people who have potentially been directly harmed during the violence in Darfur. | |
Treatment Variable: The treatment variable is directlyharmed, indicating whether an individual was physically injured or maimed during the attacks. In RCM, this variable determines the assignment of treatment (exposure to direct harm) versus control (not exposed to direct harm). | |
Outcome Variable: The outcome variable is peacefactor, an index measuring attitudes towards peace. RCM focuses on comparing the potential outcomes on this variable for each unit under both treated and untreated scenarios. | |
Potential Outcomes: For each individual, there are two potential outcomes: the attitude towards peace if directly harmed (treated) and if not directly harmed (control). The fundamental problem of causal inference is that we only observe one of these outcomes for each unit. | |
Covariates: Variables such as village, female, age, farmer_dar, herder_dar, pastvoted, and hhsize_darfur serve as covariates in the analysis. These are pre-treatment characteristics that might influence both the likelihood of being directly harmed and the outcome (attitudes towards peace). | |
SUTVA: unclear whether or not SUTVA is likely to be satisfied here. Maybe a person's | |
status with respect to having been harmed affects other people's potential outcomes; | |
if so, SUTVA would be violated. | |
### Part B | |
The research question derived from the vignette linked above is: "Does being directly injured or maimed in the violence in Darfur make individuals more likely to feel 'vengeful' and unwilling to make peace with those who perpetrated this violence, or are those who directly suffered from such violence more motivated to see it end by making peace?" | |
This is indeed a causal question because it seeks to determine the effect of a specific intervention or treatment (being directly harmed in the violence) on an outcome (attitudes towards peace). The question aims to compare the potential outcomes on the same individuals under two different scenarios: one where they are directly harmed and another where they are not. This fits the framework for causal inference by examining how the treatment (direct harm) causally affects the outcome (peace attitudes). | |
It passes Rubin's standard for causal questions, which typically involve comparing outcomes under treatment versus control conditions for the same units. Rubin's Causal Model (RCM) emphasizes the importance of potential outcomes, treatment assignment, and the comparison of observed outcomes under actual treatment versus the counterfactual scenario where treatment was not received. The question is structured to identify the causal effect of a specific treatment (direct harm) on an outcome (attitude towards peace), considering the individuals as the units of analysis and implying the existence of potential outcomes under treated and untreated scenarios. | |
The question implicitly recognizes the complexity of causal inference in observational settings, where the treatment is not randomly assigned. It implies a need to control for confounding variables to estimate the causal effect accurately, aligning with Rubin's emphasis on the careful design and analysis to approximate experimental conditions as closely as possible in observational studies. | |
Let's output some information on the dataset before we move on to the next step. | |
```{r pressure, echo=FALSE} | |
# Load necessary libraries | |
library(sensemakr) | |
library(Matching) | |
# Load and inspect the Darfur dataset | |
data("darfur") | |
names(darfur) # Display column names | |
summary(darfur) # Summary statistics of the dataset | |
``` | |
### Part C | |
Given the context of the research question and the summary statistics of the Darfur dataset, the predictors to control for when assessing the effect of being directly harmed on attitudes towards peace are: | |
1. **Female**: Gender might play a significant role in both the experience of violence and attitudes towards peace, especially considering the noted targeting of women for sexual violence. Controlling for gender allows for an assessment of the treatment effect across different genders. | |
2. **Age**: Age could influence both an individual's exposure to violence and their subsequent attitudes towards peace, potentially reflecting generational differences in responses to trauma or political engagement. | |
Justification for these predictors is grounded in their potential to influence both the treatment (being directly harmed) and the outcome (attitudes towards peace), making them relevant confounders in the causal analysis framework. Controlling for these variables helps to mitigate the bias in estimating the causal effect of direct harm on peace attitudes, adhering to the Rubin Causal Model's emphasis on the importance of adjusting for confounding factors to approximate the counterfactual scenario as closely as possible. | |
### Part D | |
```{r pressure, echo=FALSE} | |
glm_model <- glm(directlyharmed ~ female + age, family = binomial(), data = darfur) | |
# Extract the fitted propensity scores from the logistic regression model | |
propensity_scores <- glm_model$fitted.values | |
# Extract the treatment indicator variable | |
treatment_indicator <- darfur$directlyharmed | |
# Perform matching with replacement using the estimated propensity scores | |
matching_result <- Match(Y = NULL, Tr = treatment_indicator, X = propensity_scores, M = 1, estimand = "ATT", replace = TRUE) | |
# Optionally, check balance for female, age, and village after matching | |
balance_check <- MatchBalance(directlyharmed ~ female + age, data = darfur, match.out = matching_result) | |
''' | |
***** (V1) female ***** | |
Before Matching After Matching | |
mean treatment........ 0.43289 0.43289 | |
mean control.......... 0.47256 0.44423 | |
std mean diff......... -7.9978 -2.287 | |
mean raw eQQ diff..... 0.039698 0.0039621 | |
med raw eQQ diff..... 0 0 | |
max raw eQQ diff..... 1 1 | |
mean eCDF diff........ 0.019832 0.001981 | |
med eCDF diff........ 0.019832 0.001981 | |
max eCDF diff........ 0.039665 0.0039621 | |
var ratio (Tr/Co)..... 0.9855 0.99436 | |
T-test p-value........ 0.16084 0.014077 | |
***** (V2) age ***** | |
Before Matching After Matching | |
mean treatment........ 37.081 37.081 | |
mean control.......... 37.672 36.532 | |
std mean diff......... -4.0594 3.7758 | |
mean raw eQQ diff..... 0.94518 0.1831 | |
med raw eQQ diff..... 0 0 | |
max raw eQQ diff..... 10 10 | |
mean eCDF diff........ 0.013277 0.0027753 | |
med eCDF diff........ 0.0095707 0.0035376 | |
max eCDF diff........ 0.035092 0.0062261 | |
var ratio (Tr/Co)..... 0.90801 1.0928 | |
T-test p-value........ 0.48417 0.0088694 | |
KS Bootstrap p-value.. 0.684 0.96 | |
KS Naive p-value...... 0.84023 0.99917 | |
KS Statistic.......... 0.035092 0.0062261 | |
Before Matching Minimum p.value: 0.16084 | |
Variable Name(s): female Number(s): 1 | |
After Matching Minimum p.value: 0.0088694 | |
Variable Name(s): age Number(s): 2 | |
After matching, the smallest p-value obtained among the evaluated covariates is **0.0088694**, associated with the **`age`** variable. This indicates a statistically significant difference in the age distribution between the treated and control groups after matching, as indicated by the T-test. | |
### **Balance Standard Assessment** | |
A common balance standard in propensity score matching studies is the achievement of non-significant p-values (typically p \> 0.05) and standardized mean differences (SMD) less than 10% (or 0.1) for all covariates between matched treated and control groups, suggesting negligible differences between these groups in terms of the covariates. | |
- For the **`female`** variable, the post-matching standardized mean difference has notably improved to -2.287, and the p-value has increased but is still significant at **0.014077**, indicating a meaningful but not perfect balance improvement. | |
- For the **`age`** variable, despite the improvement in balance metrics (e.g., reduction in standardized mean difference to 3.7758 and mean raw eQQ diff), the significant p-value of **0.0088694** post-matching indicates that there remains a statistically significant imbalance between the treated and control groups in terms of age. THIS IS BAD NEWS FOR PROPENSITY SCORE MATCHING! | |
The matching result does not fully meet the balance standard because although the balance for covariates improved after matching, indicated by reduced standardized mean differences, the smallest balance-metric p-value associated with **`age`** is significantly less than 0.05. This significant p-value indicates that even after matching, there remains a statistically significant difference in the age distribution between the treated and control groups. | |
### Part E | |
''' | |
# Perform matching with replacement using the estimated propensity scores | |
matching_result <- Match(Y = darfur$peacefactor, Tr = treatment_indicator, X = propensity_scores, M = 1, estimand = "ATT", replace = TRUE) | |
# Estimate the Average Treatment Effect on the Treated (ATT) | |
# (ordinarily we would not do this because the matching balance is poor) | |
matching_summary <- summary(matching_result) | |
``` | |
- **Estimate (ATT)**: The estimated effect of being directly harmed on the outcome variable is 0.043143. | |
This suggests that, on average, being directly harmed leads to a 0.043143 unit increase in the outcome variable compared | |
to those who were not directly harmed, after adjusting for differences in **`female`** and **`age`** through matching. | |
- **T-statistic**: The T-statistic is 2.0909, which is used to test the null hypothesis that the treatment effect (ATT) | |
is zero. A T-statistic further away from zero indicates stronger evidence against the null hypothesis. | |
- **p-value**: The p-value is 0.036538, which assesses the significance of the treatment effect. | |
Since the p-value is less than the conventional significance level of 0.05, we reject the null hypothesis and conclude that the treatment effect is statistically significant. | |
### Part F | |
```{r pressure, echo=FALSE} | |
matched_data <- rbind(darfur[matching_result$index.treated,], | |
darfur[matching_result$index.control,]) | |
matched_model <- lm(peacefactor ~ directlyharmed + female + age, data = matched_data) | |
summary(matched_model) | |
''' | |
Call: | |
lm(formula = peacefactor ~ directlyharmed + female + age, data = matched_data) | |
Residuals: | |
Min 1Q Median 3Q Max | |
-0.4859 -0.1950 -0.1575 0.1489 0.8447 | |
Coefficients: | |
Estimate Std. Error t value Pr(>|t|) | |
(Intercept) 0.4582666 0.0113006 40.552 < 2e-16 *** | |
directlyharmed 0.0353435 0.0052376 6.748 1.56e-11 *** | |
female -0.2835539 0.0056268 -50.393 < 2e-16 *** | |
age -0.0004312 0.0002755 -1.565 0.118 | |
--- | |
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 | |
Residual standard error: 0.3113 on 14130 degrees of freedom | |
Multiple R-squared: 0.1643, Adjusted R-squared: 0.1642 | |
F-statistic: 926.3 on 3 and 14130 DF, p-value: < 2.2e-16 | |
''' | |
# Now run sensemakr | |
sensitivity_analysis <- sensemakr(model = matched_model, | |
treatment = "directlyharmed", | |
benchmark_covariates = c("female", "age")) | |
# Print and summarize the sensitivity analysis | |
print(sensitivity_analysis) | |
''' | |
Sensitivity Analysis to Unobserved Confounding | |
Model Formula: peacefactor ~ directlyharmed + female + age | |
Null hypothesis: q = 1 and reduce = TRUE | |
Unadjusted Estimates of directlyharmed: | |
Coef. estimate: 0.03534 | |
Standard Error: 0.00524 | |
t-value: 6.74801 | |
Sensitivity Statistics: | |
Partial R2 of treatment with outcome: 0.00321 | |
Robustness Value, q = 1 : 0.05518 | |
Robustness Value, q = 1 alpha = 0.05 : 0.03947 | |
For more information, check summary. | |
> summary(sensitivity_analysis) | |
Sensitivity Analysis to Unobserved Confounding | |
Model Formula: peacefactor ~ directlyharmed + female + age | |
Null hypothesis: q = 1 and reduce = TRUE | |
-- This means we are considering biases that reduce the absolute value of the current estimate. | |
-- The null hypothesis deemed problematic is H0:tau = 0 | |
Unadjusted Estimates of directlyharmed: | |
Coef. estimate: 0.0353 | |
Standard Error: 0.0052 | |
t-value (H0:tau = 0): 6.748 | |
Sensitivity Statistics: | |
Partial R2 of treatment with outcome: 0.0032 | |
Robustness Value, q = 1: 0.0552 | |
Robustness Value, q = 1, alpha = 0.05: 0.0395 | |
Verbal interpretation of sensitivity statistics: | |
-- Partial R2 of the treatment with the outcome: an extreme confounder (orthogonal to the covariates) that explains 100% of the | |
residual variance of the outcome, would need to explain at least 0.32% of the residual variance of the treatment to fully account | |
for the observed estimated effect. | |
-- Robustness Value, q = 1: unobserved confounders (orthogonal to the covariates) that explain more than 5.52% of the residual | |
variance of both the treatment and the outcome are strong enough to bring the point estimate to 0 (a bias of 100% of the original | |
estimate). Conversely, unobserved confounders that do not explain more than 5.52% of the residual variance of both the treatment | |
and the outcome are not strong enough to bring the point estimate to 0. | |
-- Robustness Value, q = 1, alpha = 0.05: unobserved confounders (orthogonal to the covariates) that explain more than 3.95% of the | |
residual variance of both the treatment and the outcome are strong enough to bring the estimate to a range where it is no longer | |
statistically different from 0 (a bias of 100% of the original estimate), at the significance level of alpha = 0.05. Conversely, | |
unobserved confounders that do not explain more than 3.95% of the residual variance of both the treatment and the outcome are not | |
strong enough to bring the estimate to a range where it is no longer statistically different from 0, at the significance level of | |
alpha = 0.05. | |
Bounds on omitted variable bias: | |
--The table below shows the maximum strength of unobserved confounders with association with the treatment and the outcome | |
bounded by a multiple of the observed explanatory power of the chosen benchmark covariate(s). | |
The sensitivity analysis conducted on the Darfur dataset investigates the effect of direct harm (**`directlyharmed`**) on | |
peace attitudes (**`peacefactor`**), adjusting for observable covariates **`female`** and **`age`**. The analysis reveals a | |
statistically significant positive coefficient for **`directlyharmed`** of 0.03534, with a standard error of 0.00524, leading | |
to a t-value of 6.74801, suggesting that experiencing direct harm is associated with an increase in peace attitudes. | |
The partial R² of the treatment with the outcome is reported as 0.00321, indicating that the treatment explains about 0.32 | |
percent of the variance in peace attitudes, conditional on the covariates included in the model. This relatively low value | |
underscores the modest role of direct harm in determining peace attitudes when **`female`** and **`age`** are considered. | |
Robustness values provide insight into the treatment effects sensitivity to unobserved confounding. The RV for q = 1 is 0.05518, | |
implying that unobserved confounders accounting for more than 5.52 percent of the variance in both treatment and outcome could entirely | |
negate the observed treatment effect. Additionally, at a significance level of alpha = 0.05, the RV decreases to 0.03947, indicating | |
that confounders explaining more than 3.95% of the variance could make the treatment effect statistically indistinguishable from zero. | |
These findings suggest that while the effect of being directly harmed on peace attitudes is statistically significant and positive, | |
it is relatively small and potentially sensitive to unobserved confounding. Even confounders with limited explanatory power could | |
challenge the statistical significance and magnitude of the observed effect. This analysis underscores the importance of cautious | |
interpretation of causal inferences from observational data, especially in contexts with significant potential for unobserved | |
confounding, as is often the case in conflict research. | |
''' | |
### Part G | |
#I will work with the same set of predictors (age and female) when doing genetic matching. | |
X <- cbind(darfur$female, darfur$age) | |
# Perform genetic matching | |
genout <- GenMatch(Tr = darfur$directlyharmed, # Treatment indicator | |
X = X, # Covariates for matching | |
estimand = "ATT", # Estimand of interest | |
M = 1, # Number of matches (1:1 matching) | |
pop.size = 200, # Population size for genetic algorithm | |
wait.generations = 25) # Number of generations without improvement before | |
''' | |
#GENERATION: 24 | |
#Lexical Fit..... 1.795485e-01 1.795485e-01 2.442236e-01 1.000000e+00 | |
#unique......... 98, #Total UniqueCount: 2626 | |
var 1: | |
best............ 7.051480e-01 | |
mean............ 2.928182e+01 | |
variance........ 1.104832e+04 | |
var 2: | |
best............ 5.235850e+02 | |
mean............ 5.210987e+02 | |
variance........ 7.689777e+03 | |
GENERATION: 25 | |
Lexical Fit..... 1.795485e-01 1.795485e-01 2.442236e-01 1.000000e+00 | |
#unique......... 91, #Total UniqueCount: 2717 | |
var 1: | |
best............ 7.051480e-01 | |
mean............ 3.339998e+01 | |
variance........ 1.868081e+04 | |
var 2: | |
best............ 5.235850e+02 | |
mean............ 5.174218e+02 | |
variance........ 6.594929e+03 | |
GENERATION: 26 | |
Lexical Fit..... 1.795485e-01 1.795485e-01 2.442236e-01 1.000000e+00 | |
#unique......... 100, #Total UniqueCount: 2817 | |
var 1: | |
best............ 7.051480e-01 | |
mean............ 4.317827e+01 | |
variance........ 2.005711e+04 | |
var 2: | |
best............ 5.235850e+02 | |
mean............ 5.244451e+02 | |
variance........ 5.176784e+03 | |
'wait.generations' limit reached. | |
No significant improvement in 25 generations. | |
Solution Lexical Fitness Value: | |
1.795485e-01 1.795485e-01 2.442236e-01 1.000000e+00 | |
Parameters at the Solution: | |
X[ 1] : 7.051480e-01 | |
X[ 2] : 5.235850e+02 | |
Solution Found Generation 1 | |
Number of Generations Run 26 | |
Wed Apr 10 15:50:47 2024 | |
Total run time : 0 hours 0 minutes and 55 seconds | |
''' | |
mout <- Match(Tr = darfur$directlyharmed, # Treatment indicator | |
X = X, # Covariates for matching | |
estimand = "ATT", # Estimand of interest | |
M = 1, Weight.matrix = genout) | |
summary(mout) | |
mb <- MatchBalance(directlyharmed ~ female + age, | |
data = darfur, match.out = mout, nboots = 500) | |
''' | |
***** (V1) female ***** | |
Before Matching After Matching | |
mean treatment........ 0.43289 0.43289 | |
mean control.......... 0.47256 0.42722 | |
std mean diff......... -7.9978 1.1435 | |
mean raw eQQ diff..... 0.039698 0.0016971 | |
med raw eQQ diff..... 0 0 | |
max raw eQQ diff..... 1 1 | |
mean eCDF diff........ 0.019832 0.00084854 | |
med eCDF diff........ 0.019832 0.00084854 | |
max eCDF diff........ 0.039665 0.0016971 | |
var ratio (Tr/Co)..... 0.9855 1.0032 | |
T-test p-value........ 0.16084 0.17955 | |
***** (V2) age ***** | |
Before Matching After Matching | |
mean treatment........ 37.081 37.081 | |
mean control.......... 37.672 37.058 | |
std mean diff......... -4.0594 0.15935 | |
mean raw eQQ diff..... 0.94518 0.014142 | |
med raw eQQ diff..... 0 0 | |
max raw eQQ diff..... 10 10 | |
mean eCDF diff........ 0.013277 0.00019011 | |
med eCDF diff........ 0.0095707 0 | |
max eCDF diff........ 0.035092 0.003677 | |
var ratio (Tr/Co)..... 0.90801 1.0114 | |
T-test p-value........ 0.48417 0.24422 | |
KS Bootstrap p-value.. 0.64 0.996 | |
KS Naive p-value...... 0.84023 1 | |
KS Statistic.......... 0.035092 0.003677 | |
Before Matching Minimum p.value: 0.16084 | |
Variable Name(s): female Number(s): 1 | |
After Matching Minimum p.value: 0.17955 | |
Variable Name(s): female Number(s): 1 | |
NOTE THAT GENETIC MATCHING HAS CREATED HIGH BALANCE | |
UNLIKE PROPENSITY SCORE MATCHING. HOWEVER---ONLY ON 2 VARIABLES. | |
2 VARIABLES IS PROBABLY INADEQUATE! | |
''' | |
mout <- Match(Y = darfur$peacefactor, Tr = darfur$directlyharmed, # Treatment indicator | |
X = X, # Covariates for matching | |
estimand = "ATT", # Estimand of interest | |
M = 1, Weight.matrix = genout) | |
summary(mout) | |
#Estimate... 0.041096 | |
#AI SE...... 0.020721 | |
#T-stat..... 1.9833 | |
#p.val...... 0.04733 | |
#Original number of observations.............. 1276 | |
#Original number of treated obs............... 529 | |
#Matched number of observations............... 529 | |
#Matched number of observations (unweighted). 7071 | |
''' | |
Subsequent linear regression analysis, weighted by the outcomes of genetic matching, revealed a statistically significant | |
positive effect of direct harm on peace attitudes, with a coefficient of 0.041096 (p-value = 0.004733). This suggests that | |
individuals who experienced direct harm had a marginally higher and statistically significant propensity towards peace. | |
Balance checks before matching highlighted the standardized mean differences for **`female`** and **`age`**, demonstrating the | |
genetic matchings effectiveness in reducing disparities between treated and control groups. This process ensured a fair comparison, | |
minimizing confounding effects. The findings underscore the nuanced influence of direct harm on peace attitudes, revealing that such | |
experiences can indeed alter individuals perspectives towards peace, albeit slightly. | |
''' | |
```{r pressure, echo=FALSE} | |
matched_data2 <- rbind(darfur[mout$index.treated,], | |
darfur[mout$index.control,]) | |
matched_model2 <- lm(peacefactor ~ directlyharmed + female + age, data = matched_data2) | |
summary(matched_model2) | |
sensitivity_analysis2 <- sensemakr(model = matched_model2, | |
treatment = "directlyharmed", | |
benchmark_covariates = c("female", "age")) | |
# Print and summarize the sensitivity analysis | |
print(sensitivity_analysis2) | |
''' | |
Sensitivity Analysis to Unobserved Confounding | |
Model Formula: peacefactor ~ directlyharmed + female + age | |
Null hypothesis: q = 1 and reduce = TRUE | |
Unadjusted Estimates of ' directlyharmed ': | |
Coef. estimate: 0.03621 | |
Standard Error: 0.00524 | |
t-value: 6.91039 | |
Sensitivity Statistics: | |
Partial R2 of treatment with outcome: 0.00337 | |
Robustness Value, q = 1 : 0.05645 | |
Robustness Value, q = 1 alpha = 0.05 : 0.04077 | |
Sensitivity results for these genetic matching results are, for all practical purposes, very similar to the results obtained | |
from propensity score matching results. In | |
other words, genetic matching did not (in this case, with these predictors) | |
make our analysis more robust to hidden bias. | |
For more information, check summary. | |
''' | |
# PART 'H' | |
# Update the covariates matrix for genetic matching | |
# NOW WE TRY TO DO MORE RIGOROUS MATCHING, WITH MORE COVARIATES | |
X_updated <- cbind(darfur$female, darfur$age, darfur$farmer_dar, darfur$herder_dar, | |
I(darfur$age^2), I(darfur$female * darfur$age)) | |
# Perform genetic matching with enhanced covariates | |
genout_updated <- GenMatch(Tr = darfur$directlyharmed, | |
X = X_updated, | |
estimand = "ATT", | |
M = 1, | |
pop.size = 200, | |
wait.generations = 25) | |
''' | |
GENERATION: 53 | |
Lexical Fit..... 2.478343e-02 2.507089e-02 2.507089e-02 5.474516e-02 8.297333e-02 8.297333e-02 2.053780e-01 3.173112e-01 | |
3.173112e-01 9.997694e-01 9.997694e-01 9.999997e-01 | |
#unique......... 126, #Total UniqueCount: 7296 | |
var 1: | |
best............ 4.888915e+02 | |
mean............ 4.845239e+02 | |
variance........ 6.981897e+03 | |
var 2: | |
best............ 8.011556e+02 | |
mean............ 7.838363e+02 | |
variance........ 6.111726e+03 | |
var 3: | |
best............ 7.206332e+01 | |
mean............ 8.604460e+01 | |
variance........ 6.482342e+03 | |
var 4: | |
best............ 6.237664e+01 | |
mean............ 7.378344e+01 | |
variance........ 5.599830e+03 | |
var 5: | |
best............ 9.875386e+02 | |
mean............ 9.598645e+02 | |
variance........ 1.395063e+04 | |
var 6: | |
best............ 8.249492e+00 | |
mean............ 2.945080e+01 | |
variance........ 1.286610e+04 | |
No significant improvement in 25 generations. | |
Solution Lexical Fitness Value: | |
2.478343e-02 2.507089e-02 2.507089e-02 5.474516e-02 8.297333e-02 8.297333e-02 2.053780e-01 3.173112e-01 3.173112e-01 | |
9.997694e-01 9.997694e-01 9.999997e-01 | |
Parameters at the Solution: | |
X[ 1] : 4.888915e+02 | |
X[ 2] : 8.011556e+02 | |
X[ 3] : 7.206332e+01 | |
X[ 4] : 6.237664e+01 | |
X[ 5] : 9.875386e+02 | |
X[ 6] : 8.249492e+00 | |
Solution Found Generation 27 | |
Number of Generations Run 53 | |
Wed Apr 10 16:10:49 2024 | |
Total run time : 0 hours 3 minutes and 24 seconds | |
''' | |
mout_updated <- Match(Y = darfur$peacefactor, | |
Tr = darfur$directlyharmed, # Treatment indicator | |
X = X_updated, # Covariates for matching | |
estimand = "ATT", # Estimand of interest | |
M = 1, Weight.matrix = genout_updated) | |
mb_updated <- MatchBalance(directlyharmed ~ female + age + farmer_dar + herder_dar, | |
data = darfur, match.out = mout_updated, nboots = 500) | |
summary(mout_updated) | |
''' | |
***** (V1) female ***** | |
Before Matching After Matching | |
mean treatment........ 0.43289 0.43289 | |
mean control.......... 0.47256 0.431 | |
std mean diff......... -7.9978 0.38116 | |
mean raw eQQ diff..... 0.039698 0.00049579 | |
med raw eQQ diff..... 0 0 | |
max raw eQQ diff..... 1 1 | |
mean eCDF diff........ 0.019832 0.00024789 | |
med eCDF diff........ 0.019832 0.00024789 | |
max eCDF diff........ 0.039665 0.00049579 | |
var ratio (Tr/Co)..... 0.9855 1.001 | |
T-test p-value........ 0.16084 0.31731 | |
***** (V2) age ***** | |
Before Matching After Matching | |
mean treatment........ 37.081 37.081 | |
mean control.......... 37.672 37.008 | |
std mean diff......... -4.0594 0.50662 | |
mean raw eQQ diff..... 0.94518 0.052801 | |
med raw eQQ diff..... 0 0 | |
max raw eQQ diff..... 10 10 | |
mean eCDF diff........ 0.013277 0.00068466 | |
med eCDF diff........ 0.0095707 0.00049579 | |
max eCDF diff........ 0.035092 0.0076847 | |
var ratio (Tr/Co)..... 0.90801 1.0164 | |
T-test p-value........ 0.48417 0.024783 | |
KS Bootstrap p-value.. 0.696 0.968 | |
KS Naive p-value...... 0.84023 0.99977 | |
KS Statistic.......... 0.035092 0.0076847 | |
***** (V3) farmer_dar ***** | |
Before Matching After Matching | |
mean treatment........ 0.81853 0.81853 | |
mean control.......... 0.82731 0.82798 | |
std mean diff......... -2.2769 -2.4501 | |
mean raw eQQ diff..... 0.0075614 0.0029747 | |
med raw eQQ diff..... 0 0 | |
max raw eQQ diff..... 1 1 | |
mean eCDF diff........ 0.0043919 0.0014874 | |
med eCDF diff........ 0.0043919 0.0014874 | |
max eCDF diff........ 0.0087837 0.0029747 | |
var ratio (Tr/Co)..... 1.0403 1.0429 | |
T-test p-value........ 0.68633 0.025071 | |
***** (V4) herder_dar ***** | |
Before Matching After Matching | |
mean treatment........ 0.18904 0.18904 | |
mean control.......... 0.12048 0.18336 | |
std mean diff......... 17.492 1.447 | |
mean raw eQQ diff..... 0.069943 0.0017353 | |
med raw eQQ diff..... 0 0 | |
max raw eQQ diff..... 1 1 | |
mean eCDF diff........ 0.034277 0.00086763 | |
med eCDF diff........ 0.034277 0.00086763 | |
max eCDF diff........ 0.068554 0.0017353 | |
var ratio (Tr/Co)..... 1.4475 1.0238 | |
T-test p-value........ 0.0010122 0.082973 | |
Before Matching Minimum p.value: 0.0010122 | |
Variable Name(s): herder_dar Number(s): 4 | |
After Matching Minimum p.value: 0.024783 | |
Variable Name(s): age Number(s): 2 | |
> summary(mout_updated) | |
Estimate... 0.054114 | |
AI SE...... 0.022042 | |
T-stat..... 2.455 | |
p.val...... 0.014087 | |
Original number of observations.............. 1276 | |
Original number of treated obs............... 529 | |
Matched number of observations............... 529 | |
Matched number of observations (unweighted). 4034 | |
The weighted regression model, which incorporated direct harm, gender, age, being a farmer or herder, and interaction between | |
gender and age, alongside a quadratic term for age, revealed several key insights: | |
- Direct harm was associated with a slight increase in peace factor scores, with an estimate of 0.054114 and a p-value of | |
0.014087, underscoring its significance. | |
''' | |
matched_data3 <- rbind(darfur[mout_updated$index.treated,], | |
darfur[mout_updated$index.control,]) | |
matched_model3 <- lm(peacefactor ~ directlyharmed + female + age + | |
farmer_dar + herder_dar + | |
I(age^2) + I(female * age), | |
data = matched_data3) | |
summary(matched_model3) | |
sensitivity_analysis3 <- sensemakr(model = matched_model3, | |
treatment = "directlyharmed", | |
benchmark_covariates = c("female", "age")) | |
# Print and summarize the sensitivity analysis | |
print(sensitivity_analysis3) | |
''' | |
Sensitivity Analysis to Unobserved Confounding | |
Model Formula: peacefactor ~ directlyharmed + female + age + farmer_dar + herder_dar + | |
I(age^2) + I(female * age) | |
Null hypothesis: q = 1 and reduce = TRUE | |
Unadjusted Estimates of ' directlyharmed ': | |
Coef. estimate: 0.01005 | |
Standard Error: 0.0067 | |
t-value: 1.50056 | |
Sensitivity Statistics: | |
Partial R2 of treatment with outcome: 0.00028 | |
Robustness Value, q = 1 : 0.01658 | |
Robustness Value, q = 1 alpha = 0.05 : 0 | |
''' | |
### Part I: SUMMARY | |
''' | |
When we try to use matching to control for two variables (age and female), | |
whether we perform propensity score matching or genetic matching, | |
ATT is similar and sensitivity results are also similar. We see ATT in the | |
range of around 0.03, and some robustness to hidden bias but not much. | |
When we use genetic matching to control for additional variables, including | |
interaction terms, the treatment effect estimate shrinks to 1 percent and | |
robustness decreases accordingly. It is worth noting, however, that we | |
do not really obtain a satisfactory degree of balance when attempting | |
genmatch with this richer set of covariates. So, ultimately, the main takeaway | |
(from the specific analysis above) is that we have not performed matching to | |
a reasonable degree of rigor and success, and therefore it is premature to | |
speculate about hidden bias. First you should get the matching result you | |
are prepared to endorse, and only then would you check for robustness if | |
in fact the treatment effect estimate Is non-zero and/or statistically | |
significant. | |
''' | |
## Part II Synthetic Controls | |
''' | |
The analysis was conducted for the Baleares (Islas). I have highlighted the line of code that should be modified to extend | |
the analysis to two other regions: Andalucía and Cantabria. | |
Below is the code for conducting the synthetic control analysis, which includes generating the covariate balance table and | |
the unit weights table. An important assumption made in this analysis is that the pre-treatment period ends in 1969 and | |
the post-treatment period begins in 1970, as the paper does not specify the exact year. This assumption is based on the | |
understanding that terrorism activity became notable around that time, given that the first known victim was in 1968. | |
When examining both the gaps plot and the path plot, it is crucial to first analyze the pre-treatment period from 1955 to 1969. | |
The gaps plot shows a significant difference between the treatment and synthetic lines just before 1970, with the discrepancys | |
amplitude nearly half of what is observed in the post-treatment period. The covariate balance table further reveals that several | |
variables were not perfectly matched. For instance, sec.energy carries the highest weight of 0.621, yet its values for the treated | |
and control groups are 2.076 and 2.704, respectively. | |
Thus, despite the 15-year pre-treatment period being adequately long for variable matching, and the 27-year post-treatment | |
period providing a sufficient timeframe to observe differences between the synthetic and treatment groups, the potential | |
confounding bias cannot be overlooked. Additionally, the suboptimal matching during the pre-treatment period could potentially | |
influence the results. | |
```{r pressure, echo=FALSE} | |
#loading library and data | |
#install.packages("Synth") | |
library(Synth) | |
data(basque) | |
#IMPORTANT: | |
#CHANGE TO 2 FOR Andalucia | |
#CHANGE TO 7 FOR Cantabria | |
treatment.unit <- 5 #Corresponds for Baleares | |
data(basque) | |
# dataprep: prepare data for synth | |
dataprep.out <- | |
dataprep( | |
foo = basque | |
,predictors= c("school.illit", | |
"school.prim", | |
"school.med", | |
"school.high", | |
"school.post.high" | |
,"invest" | |
) | |
,predictors.op = c("mean") | |
,dependent = c("gdpcap") | |
,unit.variable = c("regionno") | |
,time.variable = c("year") | |
,special.predictors = list( | |
list("gdpcap",1960:1969,c("mean")), | |
list("sec.agriculture",seq(1961,1969,2),c("mean")), | |
list("sec.energy",seq(1961,1969,2),c("mean")), | |
list("sec.industry",seq(1961,1969,2),c("mean")), | |
list("sec.construction",seq(1961,1969,2),c("mean")), | |
list("sec.services.venta",seq(1961,1969,2),c("mean")), | |
list("sec.services.nonventa",seq(1961,1969,2),c("mean")), | |
list("popdens",1969,c("mean"))) | |
,treatment.identifier = 5 # Baleares | |
,controls.identifier = c(2:4, 6:16, 18) | |
,time.predictors.prior = c(1964:1969) | |
,time.optimize.ssr = c(1960:1969) | |
,unit.names.variable = c("regionname") | |
,time.plot = c(1955:1997) | |
) | |
# 1. combine highest and second highest | |
# schooling category and eliminate highest category | |
dataprep.out$X1["school.high",] <- | |
dataprep.out$X1["school.high",] + | |
dataprep.out$X1["school.post.high",] | |
dataprep.out$X1 <- | |
as.matrix(dataprep.out$X1[ | |
-which(rownames(dataprep.out$X1)=="school.post.high"),]) | |
dataprep.out$X0["school.high",] <- | |
dataprep.out$X0["school.high",] + | |
dataprep.out$X0["school.post.high",] | |
dataprep.out$X0 <- | |
dataprep.out$X0[ | |
-which(rownames(dataprep.out$X0)=="school.post.high"),] | |
# 2. make total and compute shares for the schooling catgeories | |
lowest <- which(rownames(dataprep.out$X0)=="school.illit") | |
highest <- which(rownames(dataprep.out$X0)=="school.high") | |
dataprep.out$X1[lowest:highest,] <- | |
(100 * dataprep.out$X1[lowest:highest,]) / | |
sum(dataprep.out$X1[lowest:highest,]) | |
dataprep.out$X0[lowest:highest,] <- | |
100 * scale(dataprep.out$X0[lowest:highest,], | |
center=FALSE, | |
scale=colSums(dataprep.out$X0[lowest:highest,]) | |
) | |
# run synth | |
synth.out <- synth(data.prep.obj = dataprep.out) | |
# Get result tables | |
synth.tables <- synth.tab( | |
dataprep.res = dataprep.out, | |
synth.res = synth.out | |
) | |
# results tables: | |
print(synth.tables) | |
# plot results: | |
# path | |
path.plot(synth.res = synth.out, | |
dataprep.res = dataprep.out, | |
Ylab = c("real per-capita GDP (1986 USD, thousand)"), | |
Xlab = c("year"), | |
Ylim = c(0,13), | |
Legend = c("Baleares","synthetic Baleares"), | |
) | |
abline(v = 1970, lty= "dotted", col = "red") | |
## gaps | |
gaps.plot(synth.res = synth.out, | |
dataprep.res = dataprep.out, | |
Ylab = c("gap in real per-capita GDP (1986 USD, thousand)"), | |
Xlab = c("year"), | |
Ylim = c(-1.5,10), | |
) | |
abline(v = 1970, lty= "dotted", col = "red") | |
## Summarizing Results ## | |
(1) Pre-treatment balance on Y is not ideal. Not the worst, not a fail, | |
but very far from ideal. Borderline acceptable, in terms of credibility. | |
(2) Balance on the Xs is poor (see table results, re education variables, | |
investment variables, industry-sectoral variables, etc.). This is a fail. | |
Standards not met. The data scientist is doing nothing wrong: the data simply | |
cannot support the inference. | |
(3) If I believed points 1 & 2 above were OK, then I would claim a positive | |
impact here. But points 1 and 2 above do not satisfy (esp point 2 re the Xs). | |
Therefore, we should not pay too much attention to the post-treatment gaps | |
in the gaps plot. If we really believe the Xs we have chosen are indeed the | |
important ones, then there is not much we can do about the balance problem. | |
We probably ought to conclude that this causal question for this treated unit | |
cannot be credibly answered using synthetic control methods. | |
using this method. | |
''' |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment