Skip to content

Instantly share code, notes, and snippets.

@shlomihod
Last active February 26, 2024 19:44
Show Gist options
  • Save shlomihod/d4a068a2411a08886e86b8d23b6d88fd to your computer and use it in GitHub Desktop.
Save shlomihod/d4a068a2411a08886e86b8d23b6d88fd to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "7xPuF47f-ECM"
},
"source": [
"![banner](https://learn.responsibly.ai/assets/banner.jpg)\n",
"\n",
"# Class 3 - Discrimination & Fairness: Analysis of Unfairness Metrics\n",
"\n",
"https://learn.responsibly.ai"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "l9GKSh9nOnR-"
},
"source": [
"## General Instructions\n",
"\n",
"1. Start with running all cells in the notebook (Runtime > Run all)\n",
"2. Most of this Notebook would be familiar from the pre-class task.\n",
"3. Do not spend time understanding the details of the code. Focus on the text, comments, and outputs of cells."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "GDLFwEBGr3qD"
},
"source": [
"## 1. Setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "RnBxHABh_i3Q"
},
"outputs": [],
"source": [
"!wget http://stash.responsibly.ai/3-fairness/activity/data-all.zip -O data-all.zip -q\n",
"!unzip -oq data-all.zip"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "O-xNKB34U-UB"
},
"outputs": [],
"source": [
"!wget http://stash.responsibly.ai/3-fairness/activity/data-revisited.zip -O data-revisited.zip -q\n",
"!unzip -oq data-revisited.zip"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "UhrEtmtWOnSA",
"inputHidden": false,
"outputHidden": false,
"outputId": "f70157b4-c91d-4830-f703-e2b9b2d27a60"
},
"outputs": [],
"source": [
"%pip install -qqq git+https://github.com/ResponsiblyAI/railib.git"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "GhwLACUF53TE"
},
"outputs": [],
"source": [
"import pandas as pd\n",
"from railib.fairness.first import (sampled,\n",
" train,\n",
" accuracy_score,\n",
" predict,\n",
" create_unfairness_metrics_df,\n",
" plt_unfairness_metrics,\n",
" unfairness_metrics_df_gap,\n",
" plot_labeled_regression)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "vO1b2cRhr_jF"
},
"source": [
"## 2. Dataset"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "57nnfiuzgEyu"
},
"outputs": [],
"source": [
"train_df = pd.read_csv('./data-all/train.csv')\n",
"test_df = pd.read_csv('./data-all/test.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "nr0tl8qwse6a"
},
"source": [
"The training dataset and the test dataset consists of multiple rows, one for each person, and three columns:\n",
"\n",
"1. `bio` - The biographies as text (i.e., `string`). This is the input to the model.\n",
"1. `occupation` - The occupations of each person as text (i.e., `string`). This is the model's output.\n",
"1. `gender` - The gender of each person (`'M'` or `'F'`). **Note: This is a new column that you have not had in your pre-class task**."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"id": "s6wWJuLcr-Qo",
"outputId": "865f1e2f-d83d-405b-e3e6-96e9081be984"
},
"outputs": [],
"source": [
"train_df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"id": "OAVzyhBC9plG",
"outputId": "95578b44-1e3f-4221-80ba-788ff46ed1e0"
},
"outputs": [],
"source": [
"test_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1uPt7Rv6sho4"
},
"source": [
"We used 75%-25% split between the training and the test dataset:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "HaJpWBD_jQpE",
"outputId": "74ec46e3-8c47-41d4-bdfd-69e83532f5c7"
},
"outputs": [],
"source": [
"print(f'# train: {len(train_df)}')\n",
"print(f'# test: {len(test_df)}')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "vlBUnwJcsvI4"
},
"source": [
"There are 28 occupations:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Vh6O8fYksvZc",
"outputId": "98c2f428-ce00-4ca8-b750-e6edc54f4ebb"
},
"outputs": [],
"source": [
"sorted(train_df['occupation'].unique())"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "w4pVCAm0so-h"
},
"source": [
"Each running of the next cell will sample 10 random rows and show their occupations, gender and biographies:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "Kysov0YvgfyP",
"outputId": "e927b8a0-0081-4929-c885-8eed5b8ec844"
},
"outputs": [],
"source": [
"sampled(train_df, display_list=['occupation', 'gender'])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "rwAD1w0dtg7o"
},
"source": [
"## 3. Model\n",
"\n",
"We will use a Logistic Regression model with Bag of Words features."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EQ4cIjub61Hm"
},
"source": [
"### Training"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "NSEnEUyEOnSK",
"inputHidden": false,
"outputHidden": false
},
"outputs": [],
"source": [
"# count vec is feature engineering: Bag of Words\n",
"# model: Logistic Regression\n",
"# might take a minute or two to run\n",
"count_vect, model = train(train_df)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "v-mR5ovfOnSK"
},
"source": [
"### Evaluation"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "jyGebbhJOnSL",
"inputHidden": false,
"outputHidden": false,
"outputId": "c20ef879-9eb9-49aa-e4ed-d35868f04fb2"
},
"outputs": [],
"source": [
"print('Train Accuracy =', round(accuracy_score(train_df, model, count_vect),3))\n",
"\n",
"print('Test Accuracy =', round(accuracy_score(test_df, model, count_vect),3))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "T8eMod9wOnSL"
},
"source": [
"### `predict` function\n",
"\n",
"The function takes a list of bios (as text) and predicts their occupation."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "gkjwOSSrOnSM"
},
"source": [
"Let's predict the occupation of two famous people - Albert Einstein (professor) and Wisława Szymborska (poet):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "JWu_YYNjOnSM",
"inputHidden": false,
"outputHidden": false,
"outputId": "0f52f576-f495-4a03-aebb-9add8d4f7144"
},
"outputs": [],
"source": [
"einstein = \"\"\"He is known for developing the theory of relativity,\n",
"but he also made important contributions to the development\n",
"of the theory of quantum mechanics.\n",
"Relativity and quantum mechanics are together\n",
"the two pillars of modern physics.\n",
"His mass–energy equivalence formula E = mc^2,\n",
"which arises from relativity theory,\n",
"has been dubbed \\\"the world's most famous equation\\\"\"\"\"\n",
"\n",
"\n",
"szymborska = \"\"\"She was a Polish poet, essayist, translator and recipient\n",
"of the 1996 Nobel Prize in Literature.\n",
"Born in Prowent, which has since become part of Kórnik,\n",
"she later resided in Kraków until the end of her life.\n",
"In Poland, her books have reached sales rivaling prominent prose authors',\n",
"though she wrote in a poem, \"Some Like Poetry\" (\"Niektórzy lubią poezję\"),\n",
"that \"perhaps\" two in a thousand people like poetry.\"\"\"\n",
"\n",
"\n",
"print('Prediction:', predict([einstein, szymborska], model, count_vect))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PHrZiVSkOnSO"
},
"source": [
"Apply the model and generate predictions to the train and test datasets:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "8pGrcdvwOnSO",
"inputHidden": false,
"outputHidden": false
},
"outputs": [],
"source": [
"train_df['prediction'] = predict(train_df['bio'], model, count_vect)\n",
"test_df['prediction'] = predict(test_df['bio'], model, count_vect)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "uhNW561HOnSP"
},
"source": [
"### Prediction demonstartion on train dataset"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "xVDSLd2vOnSP",
"inputHidden": false,
"outputHidden": false,
"outputId": "3ca45b1e-e38a-4a6b-e5a4-c43f3038d640"
},
"outputs": [],
"source": [
"sampled(train_df, display_list=['occupation', 'prediction', 'gender'])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EdHvqizWmV-K"
},
"source": [
"## 4. Evaluation of a Single Occupation"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "AoLgYDmRmV-M"
},
"source": [
"Let's focus on a single occupation: **poet**. We could evaluate the classifier on the test dataset with respect to this label only.\n",
"\n",
"Read this section carefully, it is extremely important."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "f7ud3_BMmV-N"
},
"source": [
"### Metric I: Acceptance Rate\n",
"\n",
"![AR.png]()\n",
"\n",
"> What it the proportion of bios that were predicted as `poet`?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "fCWQO0zsmV-N",
"outputId": "9700002b-4968-4199-f970-31f5d5a736cb"
},
"outputs": [],
"source": [
"print('Poet - Acceptance Rate', (test_df['prediction'] == 'poet').mean())"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "mUFyUeUdmV-N"
},
"source": [
"### Metric II: False Negative Rate\n",
"\n",
"![FNR.png]()\n",
"\n",
"> What it the proportion of bios that were predicted incorrectly as NOT `poet` (i.e., something else) even though their true label is actually `poet`?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "8GljHos7mV-O",
"outputId": "3042f0d0-def7-474f-82b3-afd052143af9"
},
"outputs": [],
"source": [
"actual_poets_test_df = test_df[test_df['occupation'] == 'poet']\n",
"print('Poet - False Negative Rate', (actual_poets_test_df['prediction'] != 'poet').mean())"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YpB6kAr9mV-O"
},
"source": [
"### Metric III: False Positive Rate\n",
"\n",
"![FPR.png]()\n",
"\n",
"> What it the proportion of bios that were predicted incorrectly as `poet` (i.e., something else) even though their true label is NOT `poet`?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "HGJbzH2YmV-O",
"outputId": "7ece5856-ad58-4247-ed90-a4033f895b7b"
},
"outputs": [],
"source": [
"actual_not_poets_test_df = test_df[test_df['occupation'] != 'poet']\n",
"print('Poet - False Positive Rate', (actual_not_poets_test_df['prediction'] == 'poet').mean())"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "rD6NPUqsmV-P"
},
"source": [
"### Fancy Way to Summarize\n",
"You can skip this part if you understand the definition of the three metrics.\n",
"\n",
"#### Confusion Matrix\n",
"\n",
"Actual class/Predicted class | P | N\n",
"-----------------------------|---|--------------\n",
"P | **TP** | FN\n",
"N | FP | **TN** \n",
"\n",
"\n",
"#### **Metric Definitions**\n",
"\n",
"\n",
"<u>**Acceptance Rate (Positive Rate)**</u>\n",
"\n",
"What it the proportion of bios that were predicted as occupation x?\n",
"\n",
"${\\displaystyle \\mathrm {AR} \n",
"= {\\frac{\\mathrm {TP + FP}}{\\mathrm {TP+FN+FP+TN}}}}$\n",
"\n",
"\n",
"<u>**False negative rate (FNR)**</u>\n",
"\n",
"What it the proportion of bios that were predicted incorrectly as NOT occupation x (i.e., y != x) even though their true label is actually x?\n",
"\n",
"${\\displaystyle \\mathrm {FNR} = {\\frac {\\mathrm {FN} }{\\mathrm {FN} +\\mathrm {TP} }}}$\n",
"\n",
"<u>**False Positive Rate (FPR)**</u>\n",
"\n",
"What it the proportion of bios that were predicted incorrectly as occupation x (i.e., y != x) even though their true label is NOT x?\n",
"\n",
"${\\displaystyle \\mathrm {FPR} = {\\frac {\\mathrm {FP} }{\\mathrm {FP} +\\mathrm {TN} }}}$"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "dqHurjkrOnSP"
},
"source": [
"## 4. Unfairness Metric Results by Gender\n",
"\n",
"Thanks to the additional column `gender`, we can compare the performance of the model separately on the female individuals and the male individuals in the test dataset."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9oLs9dkXOnSP"
},
"source": [
"We can compute the following evaluation metrics of each gender test dataset:\n",
"\n",
"1. `ar` - **Acceptance Rate** - used for Demographic Prity\n",
"2. `fnr` - **False Negative Rate** - used for Equalized Odds\n",
"3. `fpr` - **False Positive Rate** - used for Equalized Odds"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xD3Rq1-_1mPq"
},
"source": [
"We can have a quick look on the first few occupations with all metric:\n",
"\n",
"(have a <u>quick look and continue</u> to the plot below)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 390
},
"id": "_9dNk1qg1mYp",
"outputId": "8aa96794-c43b-4a3c-b5cb-a9d4f3727cce"
},
"outputs": [],
"source": [
"unfairness_metrics_df = create_unfairness_metrics_df(test_df)\n",
"unfairness_metrics_df.round(2).head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "oEZRZ3z9Ph6E"
},
"source": [
"It is a bit difficult to analyze this table, so let's create a visualization. The following plot shows the value for female (blue) and male (orange) for each metric for across occupations. The occupations are ordered according to the Acceptance Rate."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 994
},
"id": "taXRXTYr9S_Z",
"outputId": "6586e988-c63b-4cb4-962f-66bab3d04093"
},
"outputs": [],
"source": [
"plt_unfairness_metrics(test_df)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "x-LqLoXE1GqB"
},
"source": [
"**For reference only**, we also calculate the **gap** (**difference**) between the three metrics between females and males. A positive value means that this metric is larger for females, and a negative value means that the metric is larger for males.\n",
"\n",
"The visualization above should be sufficient for your analysis, but we include this table just in case you would like to findout the exact difference of a specific occupation/metric."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 961
},
"id": "xbsHW-Ct1G4y",
"outputId": "b5859fae-ad71-4fb7-b55b-9daffbb5bab8"
},
"outputs": [],
"source": [
"gap_unfairness_metrics_df = unfairness_metrics_df_gap(unfairness_metrics_df).reset_index()\n",
"gap_unfairness_metrics_df"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YriJqk8lWobe"
},
"source": [
"## Unfairness Metrics\n",
"\n",
"1. Indepenece\n",
"\n",
"**AR**\n",
"\n",
"> What it the proportion of bios that were predicted as `physician`?\n",
"\n",
"2. Seperation (Errors)\n",
"\n",
"**FNR**\n",
"\n",
"> What it the proportion of bios that were predicted incorrectly as NOT `physician` (i.e., something else) even though their true label is actually `physician`?\n",
"\n",
"**FPR**\n",
"\n",
"> What it the proportion of bios that were predicted incorrectly as `nurse` (i.e., something else) even though their true label is NOT `nurse`?"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xlJWj_1TWqtP"
},
"source": [
"## What is the cause for the bias?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 801
},
"id": "GpF_kJoq7zd4",
"outputId": "16ef603a-a74e-48ce-9928-733d3e1f3099"
},
"outputs": [],
"source": [
"import matplotlib.pylab as plt\n",
"\n",
"plt.rcParams[\"figure.figsize\"] = (10, 10)\n",
"\n",
"gap_unfairness_metrics_df.sort_values(by='fnr_gap').plot.barh(x='occupation', y='fnr_gap')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "dWHexocK8q9-"
},
"source": [
"## 5. Unfairness Metrics vs. Training Dataset"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 460
},
"id": "TzWNvjlu83dk",
"outputId": "dd1c0c72-851f-4952-bc2d-ecaf6bce3381"
},
"outputs": [],
"source": [
"# add feamle proportion of each occupation in the training dataset\n",
"# to our dataframe\n",
"female_proportion = (train_df.groupby('occupation')['gender']\n",
" .value_counts(normalize=True)\n",
" [:, 'F']\n",
" .reset_index(drop=True)\n",
" * 100).round(2)\n",
"gap_unfairness_metrics_df['female_proportion'] = female_proportion\n",
"\n",
"plot_labeled_regression('female_proportion', 'fnr_gap', 'occupation', gap_unfairness_metrics_df);\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pcsJcc4jqHP2"
},
"source": [
"**10% difference in the female proportion translates into 2% difference in the FNR gap**."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0OM2Utp599SL"
},
"source": [
"## 6. Fairness through unawareness"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "rS3aYIVM85Bj"
},
"outputs": [],
"source": [
"!wget http://stash.responsibly.ai/3-fairness/activity/data-revisited.zip -O data-revisited.zip -q\n",
"!unzip -oq data-revisited.zip\n",
"\n",
"unawareness_train_df = pd.read_csv('./data-revisited/train_unawarness.csv')\n",
"unawareness_test_df = pd.read_csv('./data-revisited/test_unawarness.csv')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "mGyz8nvx-D-0",
"outputId": "1068361f-f58a-4634-b20a-4b581404f8fc"
},
"outputs": [],
"source": [
"sampled(unawareness_train_df, [\"occupation\", \"gender\"], bio='scrubbing')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "t5ttpCOQ0Tgs"
},
"outputs": [],
"source": [
"unawareness_count_vect, unawareness_model = train(unawareness_train_df, X='scrubbing')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "kMnNV49R3Oby",
"outputId": "c036e11d-43dd-466e-d1f1-170ff90fced4"
},
"outputs": [],
"source": [
"print('Train Accuracy =', accuracy_score(train_df, model, count_vect))\n",
"print('Test Accuracy =', accuracy_score(test_df, model, count_vect))\n",
"\n",
"print('Unawareness Train Accuracy =', accuracy_score(unawareness_train_df, unawareness_model, unawareness_count_vect, 'scrubbing'))\n",
"print('Unawareness Test Accuracy =', accuracy_score(unawareness_test_df, unawareness_model, unawareness_count_vect, 'scrubbing'))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "CRcEngXf6Gr9",
"outputId": "35df9285-2d8e-4339-9a97-7405f7d900a6"
},
"outputs": [],
"source": [
"unawareness_train_df['prediction'] = predict(unawareness_train_df['scrubbing'], unawareness_model, unawareness_count_vect)\n",
"unawareness_test_df['prediction'] = predict(unawareness_test_df['scrubbing'], unawareness_model, unawareness_count_vect)\n",
"\n",
"sampled(unawareness_train_df, [\"occupation\", 'prediction', \"gender\"], bio='scrubbing')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 994
},
"id": "N_7iDpJP8e5Q",
"outputId": "0ab91eda-a8ea-4609-880d-f9248ff9a40b"
},
"outputs": [],
"source": [
"plt_unfairness_metrics(unawareness_test_df)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "l3H1eYvU9A2S"
},
"outputs": [],
"source": [
"# plt_unfairness_metrics(test_df)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "WD4oCSPU9T40"
},
"outputs": [],
"source": [
"# unfairness_metrics_df = unfairness_metrics_df(test_df)\n",
"unfairness_unawareness_metrics_df = create_unfairness_metrics_df(unawareness_test_df)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "arLColN5Bfl_"
},
"source": [
"### Predicting gender from the data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "cxcCW32nBiDp"
},
"outputs": [],
"source": [
"gen_count_vect, gen_model = train(train_df, X='bio', y='gender')\n",
"gen_unawareness_count_vect, gen_unawareness_model = train(unawareness_train_df, X='scrubbing', y='gender')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "pHXQVke6CIZK",
"outputId": "8c1b5337-1439-4284-fa53-7ee3e93f53d9"
},
"outputs": [],
"source": [
"print('Train Accuracy =', round(accuracy_score(train_df, gen_model, gen_count_vect, X='bio', y='gender'),4))\n",
"print('Test Accuracy =', round(accuracy_score(test_df, gen_model, gen_count_vect, X='bio', y='gender'),4))\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "W2QSJBdpeLWP",
"outputId": "0ac36d3d-d0d0-4bc6-f087-c543a29f8ddb"
},
"outputs": [],
"source": [
"\n",
"print('WITHOUT gender markers Train Accuracy =', round(accuracy_score(unawareness_train_df, gen_unawareness_model, gen_unawareness_count_vect, X='scrubbing', y='gender'),3))\n",
"print('WITHOUT gender markers Test Accuracy =', round(accuracy_score(unawareness_test_df, gen_unawareness_model, gen_unawareness_count_vect, X='scrubbing', y='gender'),3))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UBYVLu5Ihapo"
},
"source": [
"## 7. Counterfactual"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "wkfXeoTjDOq2"
},
"outputs": [],
"source": [
"test_df_CF = pd.read_csv('./data-revisited/test_counterfactual.csv')\n",
"CF = test_df_CF.loc[[53044,70661]]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 111
},
"id": "3jfNF9w3mxvw",
"outputId": "61faa659-abed-4767-d892-7656aeb278e1"
},
"outputs": [],
"source": [
"CF.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 526
},
"id": "9Q5Xz1wIh1QO",
"outputId": "7222bcf4-915e-48ba-9daa-306795b826c2"
},
"outputs": [],
"source": [
"sampled(CF, [\"occupation\", \"gender\"], samples=2, Counterfactual=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9vEJPybNcDl5"
},
"source": [
"## 8. Technical Fairness Intervension\n",
"\n",
"1. Pre-processing: Training data.\n",
"\n",
"2. In-training: Learning algorithm.\n",
"\n",
"3. Post-processing: Model."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-2QIv2ficf12"
},
"source": [
"## 8. Fairness is a complex concept; think about metrics as \"flags of unfairness\""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5dq-DfKDcqpz"
},
"source": [
"## 9. Abstraction Error: data is not fixed, the model is not fixed - a computational system is built by many decisions; a system is also deployed in a context"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"provenance": []
},
"kernel_info": {
"name": "python3"
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.0"
},
"nteract": {
"version": "0.12.3"
},
"vscode": {
"interpreter": {
"hash": "55bbdba5d2159c30191d9b81156a2ec7ece345201aa1fcd9b85bbc484276dddb"
}
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment