Instantly share code, notes, and snippets.

# EconometricsBySimulation/gist:fe40ec9f749213d1240011684001dda5 Last active Sep 5, 2019

2019_09_04-Assignment1.md

emailto: psteiner@umd.edu student: Francis Smart fsmart@gmail.com EDMS769G

``````# Text Network graph
#           Student Ability _______________________
#            /           |                         \_____
#          /             |  Public/Private Transport     \
#         ↓              ↓     ↓                          ↘
# Pretest score ------→ Teacher ability ---------> Post test
#        ↑                ↑                         ↗
#         \               |                       /
#          \___________Student SES -____________/
``````

This is a simulation of what factors might contribute to student performance on teacher pretest and posttest. We would like to estimate a Value-Added measure of teacher ability so to know who to retain and who to give additional training to.

In principal this is a question

```samplesize = 1000

studentability = randn(samplesize)
studentses     = randn(samplesize)

# Unobserved Pretest Factors
pretestunobserved = randn(samplesize)

# Pretest is a function of student ability, ses factors, and unobserved
pretest = 0.75 * studentability + 0.25 * studentses + .5 * pretestunobserved```

Pretest scores, student ability, student SES, as well as public/private access transportation contribute to teacher ability selection as well as some random unobserved factors.

```transportation           = randn(samplesize)
# Now let's generate some teacher abilities  teacher ability is sorted from low to high
teacherabilityunobserved = randn(samplesize)

# Most factors are positively weighted with students who come from higher SES or have better transportation being associated with higher performing teachers. However, in this simulation students who are worse off in ability are matched with higher ability performing teachers reflecting a approach by the school to intervene is poor student performance.

teacherability = teacherabilityunobserved -.5*studentability + .5*studentses -
.4*pretest + .2*transportation + teacherabilityunobserved

using Statistics

cor(studentability, teacherability)
# -0.3833970805817827

cor(pretest, teacherability)
# -0.3115985976394242

cor(studentses, teacherability)
# 0.15521043067699103
```

From the correlation statistics we can see that the simulation is working as expected with high performing students getting worse teachers.

The next step is generating performance scores for the end of year exam.

```postexamerror = randn(samplesize)

posttext = .75*studentability + 0.25 * studentses + .5 * teacherability + .5*postexamerror```

Finally we would like to estimate the Value Added from the teachers. One method of doing this would be to take the `postexam` and subtract out the `preexam`.

`Δscores = posttext - pretestunobserved`

We might want to ask some basic questions like. Are `Δscores` correlated with `teacherability`?

```cor(Δscores, teacherability)
# 0.5713281702441514```

In this case they do appear to be positively correlated, which is what we would like to see.

If we were to rank teachers and rank change in scores how well would be do?

```using StatsBase
corspearman(Δscores, teacherability)
# 0.5516197316197317```

We retain about the same correlation. This should not surprise us as we have no ceiling or floor effects built into our data generating process.

We might want to ask questions of our data. What is the likelihood that a teacher classified in the top 10% of our data actually is in the top 10%.

```teacherrank = denserank(teacherability)/samplesize
Δscorerank  = denserank(Δscores)/samplesize

teacherrank_bottom10 = teacherrank .<= .1
Δscorerank_bottom10  = Δscorerank  .<= .1```

Given that a teacher is actually ranked in the bottom 10% what is the likelihood that the change score will correctly classify that teacher?

```mean(Δscorerank_bottom10[teacherrank_bottom10 .== 1])
# 0.36```

36% is not so good.

```mean(teacherrank_bottom10[Δrank_bottom10 .== 1])