For context on this document, see the fairness reading group summary.
The first component of fairness metrics functionality, assessment, is
completed and merged into the main
dev versions of tidymodels.
-
compute_metrics()
in tune: tidymodels/tune#663 -
Fairness metrics in yardstick: https://github.com/tidymodels/yardstick/pulls
-
Backend support for new yardstick metrics in tune: tidymodels/tune#684
Getting this functionality on CRAN will require yardstick and tune releases.
This functionality should be accompanied by long-form documentation demonstrating the approaches to critique noted in the group summary:
Content | Packages | Scope | Location | Status |
---|---|---|---|---|
Group-wise metrics | yardstick | Concepts only | yardstick vignette | Merged, available on the dev tidymodels website |
Applied analysis (Are GPT detectors fair?) | yardstick | Assessment only | tidymodels.org | Pull request made, draft available on a deploy preview |
Applied analysis (Hospital readmission) | yardstick + tune | Assessment only | tidymodels.org | Pull request made, draft available on a deploy preview |
Fair machine learning with tidymodels | yardstick | Concepts only, very much about the social aspects | tidyverse blog | Not yet drafted, but can be based off of the reading group summary |
Implementing mitigation would require another round of development. The approach would follow Algorithm 1 in “A Reductions Approach to Fair Classification” (2018), described at a higher level on page 5 of a white paper from the authors of Fairlearn. Given how that functionality is described in the latter paper and a follow-up analysis—and how much more of a maintenance burden it would entail than assessment—it may be worth starting a conversation with Fairlearn maintainers to learn from their experience.
If we do implement this functionality, we’re envisioning that this will
likely look like another tidymodels package (or just a stacks release,
if there’s enough shared functionality). Taking in a trained (with
save_pred = TRUE
) workflowset as input, the algorithm would return a
weighted ensemble of the workflowset.
Once we’ve implemented mitigation, additional documentation could look like:
Content | Packages | Scope | Location | Status |
---|---|---|---|---|
Applied analysis | yardstick + tune + mitigation | Assessment and mitigation | New package vignette | Not yet drafted |