Skip to content

Instantly share code, notes, and snippets.

@abegong
Created August 3, 2016 03:39
Show Gist options
  • Save abegong/7389ed1fe4a7344d49951f6bb3195a2c to your computer and use it in GitHub Desktop.
Save abegong/7389ed1fe4a7344d49951f6bb3195a2c to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@thejefflarson
Copy link

thejefflarson commented Aug 6, 2016

Hi Abe,

Thanks for taking a look at our analysis. Unfortunately, you’ve misinterpreted how the test works. You are treating the decile scores as independent factors when they are ordinal. What this means in practice is that a user of the score is going to choose a threshold and defendants who score higher than that threshold will have more consequences from criminal justice agencies. In fact, COMPAS’s own guide says that defendants with scores higher than low “garner more interest from supervision agencies.”

In fact in medicine, when analyzing prognostic models like this, interpreting these thresholds as cutoffs in this way is standard practice.

Northpointe agrees with this interpretation, as you can see in the tradeoff tables they published in their report:

screen shot 2016-08-06 at 1 09 13 pm
screen shot 2016-08-06 at 1 09 02 pm

As you can see the difference the false positive rate across thresholds greater than or equal to 5 (Northpointe's cutoff) is higher for black defendants.

In other words, the score over predicts black defendant’s risk of recidivism. What that means in practice is that in Broward county, 805 black defendants were labelled as a higher risk of recidivating when they didn’t.

You also might be interested to read our technical response to Northpointe’s critique, which includes a cox regression model that controls for differences in age, sex, and criminal history across these groups and confirms our findings.

One other note, you’ve made mistake in the fields you chose to analyze. You’ve taken a two year sample of general recidivism — “compas-two-years.csv” — and used the violent recidivism score v_decile_score. You should have used the decile_score column for general recidivism. If you wanted to analyze the violent recidivism score you should have used the “compas-two-years-violent.csv” dataset instead.

I hope you are enjoying this great summer weekend!

--Jeff

@abegong
Copy link
Author

abegong commented Aug 7, 2016

Jeff -
It's great to hear from you! Thanks for the thoughtful response—and for putting this issue on the map in such a powerful way. I’m a huge fan of the way you’re combining data science with journalism.

Thanks also for setting me straight on the v_decile_score issue. The conditional means plot tightens up quite a bit after matching to the non-violent decile score with the non-violent outcomes data:

image

The thing I’m hung up on is the difference between statistical bias (as in E[Y] ≠ Ŷ) and rates of false positives. As I read it, the strongest criticisms you’ve leveled against COMPAS are expressed in terms of sensitivity and specificity. Those are worth considering, but they’re neither necessary nor sufficient to conclude that the algorithm is biased.

For clarity: are you claiming that the algorithm is statistically biased against blacks? Or are you saying that it's unfair, even if it’s not biased in a statistical sense?

Cheers from the west coast,
-- Abe

@thejefflarson
Copy link

Hi Abe,

You should take a closer look at our published articles.

Enjoy your Sunday!

--Jeff

@abegong
Copy link
Author

abegong commented Aug 7, 2016

Jeff -
I've read each of your articles about COMPAS carefully, more than once. I've also worked my way through NorthPointe's (very dense) response.

My read is that you're talking past each other. You're throwing around the word "bias," but your stats only address false positives. NorthPointe is arguing that there's no statistical bias, with supporting evidence.

When they include tables that show how the numbers play out in terms of false positives, you're picking them up and saying "See? Bias." (Your initial response above is a perfect example.) But that's not bias under the statistical definition.

That's why I'm asking the question: in all your writing, you've never once defined exactly what you mean by "bias."

If you're claiming that COMPAS is statistically biased against blacks, then your evidence needs to support that claim. From what I've seen, the evidence isn't there. But I'm willing to be proven wrong if you have proof you haven't brought forward yet.

If you're using "bias" in a less precise, non-statistical way, that's okay too. (I don't love the conflation of statistical terminology, but I can see why you'd go there.) At this point, you should just clear it up, so that the thousands of confused data scientists following this issue can get on with their lives.

What do you say? Will you clarify your terminology, so the conversation can move forward?

I'm not trying to be a jerk here---just looking for clarity in an important public debate.

Hope you enjoy your Sunday, too!
-- Abe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment