Skip to content

Instantly share code, notes, and snippets.

@abegong
Created August 3, 2016 03:39
Show Gist options
  • Save abegong/7389ed1fe4a7344d49951f6bb3195a2c to your computer and use it in GitHub Desktop.
Save abegong/7389ed1fe4a7344d49951f6bb3195a2c to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@abegong
Copy link
Author

abegong commented Aug 7, 2016

Jeff -
It's great to hear from you! Thanks for the thoughtful response—and for putting this issue on the map in such a powerful way. I’m a huge fan of the way you’re combining data science with journalism.

Thanks also for setting me straight on the v_decile_score issue. The conditional means plot tightens up quite a bit after matching to the non-violent decile score with the non-violent outcomes data:

image

The thing I’m hung up on is the difference between statistical bias (as in E[Y] ≠ Ŷ) and rates of false positives. As I read it, the strongest criticisms you’ve leveled against COMPAS are expressed in terms of sensitivity and specificity. Those are worth considering, but they’re neither necessary nor sufficient to conclude that the algorithm is biased.

For clarity: are you claiming that the algorithm is statistically biased against blacks? Or are you saying that it's unfair, even if it’s not biased in a statistical sense?

Cheers from the west coast,
-- Abe

@thejefflarson
Copy link

Hi Abe,

You should take a closer look at our published articles.

Enjoy your Sunday!

--Jeff

@abegong
Copy link
Author

abegong commented Aug 7, 2016

Jeff -
I've read each of your articles about COMPAS carefully, more than once. I've also worked my way through NorthPointe's (very dense) response.

My read is that you're talking past each other. You're throwing around the word "bias," but your stats only address false positives. NorthPointe is arguing that there's no statistical bias, with supporting evidence.

When they include tables that show how the numbers play out in terms of false positives, you're picking them up and saying "See? Bias." (Your initial response above is a perfect example.) But that's not bias under the statistical definition.

That's why I'm asking the question: in all your writing, you've never once defined exactly what you mean by "bias."

If you're claiming that COMPAS is statistically biased against blacks, then your evidence needs to support that claim. From what I've seen, the evidence isn't there. But I'm willing to be proven wrong if you have proof you haven't brought forward yet.

If you're using "bias" in a less precise, non-statistical way, that's okay too. (I don't love the conflation of statistical terminology, but I can see why you'd go there.) At this point, you should just clear it up, so that the thousands of confused data scientists following this issue can get on with their lives.

What do you say? Will you clarify your terminology, so the conversation can move forward?

I'm not trying to be a jerk here---just looking for clarity in an important public debate.

Hope you enjoy your Sunday, too!
-- Abe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment