Skip to content

Instantly share code, notes, and snippets.

@shlomihod
Last active February 26, 2024 19:44
Show Gist options
  • Save shlomihod/d4a068a2411a08886e86b8d23b6d88fd to your computer and use it in GitHub Desktop.
Save shlomihod/d4a068a2411a08886e86b8d23b6d88fd to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "7xPuF47f-ECM"
},
"source": [
"![banner](https://learn.responsibly.ai/assets/banner.jpg)\n",
"\n",
"# Class 3 - Discrimination & Fairness: Analysis of Unfairness Metrics\n",
"\n",
"https://learn.responsibly.ai"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "l9GKSh9nOnR-"
},
"source": [
"## General Instructions\n",
"\n",
"1. Start with running all cells in the notebook (Runtime > Run all)\n",
"2. Most of this Notebook would be familiar from the pre-class task.\n",
"3. Do not spend time understanding the details of the code. Focus on the text, comments, and outputs of cells."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "GDLFwEBGr3qD"
},
"source": [
"## 1. Setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "RnBxHABh_i3Q"
},
"outputs": [],
"source": [
"!wget http://stash.responsibly.ai/3-fairness/activity/data-all.zip -O data-all.zip -q\n",
"!unzip -oq data-all.zip"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "O-xNKB34U-UB"
},
"outputs": [],
"source": [
"!wget http://stash.responsibly.ai/3-fairness/activity/data-revisited.zip -O data-revisited.zip -q\n",
"!unzip -oq data-revisited.zip"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "UhrEtmtWOnSA",
"inputHidden": false,
"outputHidden": false,
"outputId": "f70157b4-c91d-4830-f703-e2b9b2d27a60"
},
"outputs": [],
"source": [
"%pip install -qqq git+https://github.com/ResponsiblyAI/railib.git"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "GhwLACUF53TE"
},
"outputs": [],
"source": [
"import pandas as pd\n",
"from railib.fairness.first import (sampled,\n",
" train,\n",
" accuracy_score,\n",
" predict,\n",
" create_unfairness_metrics_df,\n",
" plt_unfairness_metrics,\n",
" unfairness_metrics_df_gap,\n",
" plot_labeled_regression)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "vO1b2cRhr_jF"
},
"source": [
"## 2. Dataset"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "57nnfiuzgEyu"
},
"outputs": [],
"source": [
"train_df = pd.read_csv('./data-all/train.csv')\n",
"test_df = pd.read_csv('./data-all/test.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "nr0tl8qwse6a"
},
"source": [
"The training dataset and the test dataset consists of multiple rows, one for each person, and three columns:\n",
"\n",
"1. `bio` - The biographies as text (i.e., `string`). This is the input to the model.\n",
"1. `occupation` - The occupations of each person as text (i.e., `string`). This is the model's output.\n",
"1. `gender` - The gender of each person (`'M'` or `'F'`). **Note: This is a new column that you have not had in your pre-class task**."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"id": "s6wWJuLcr-Qo",
"outputId": "865f1e2f-d83d-405b-e3e6-96e9081be984"
},
"outputs": [],
"source": [
"train_df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"id": "OAVzyhBC9plG",
"outputId": "95578b44-1e3f-4221-80ba-788ff46ed1e0"
},
"outputs": [],
"source": [
"test_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1uPt7Rv6sho4"
},
"source": [
"We used 75%-25% split between the training and the test dataset:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "HaJpWBD_jQpE",
"outputId": "74ec46e3-8c47-41d4-bdfd-69e83532f5c7"
},
"outputs": [],
"source": [
"print(f'# train: {len(train_df)}')\n",
"print(f'# test: {len(test_df)}')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "vlBUnwJcsvI4"
},
"source": [
"There are 28 occupations:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Vh6O8fYksvZc",
"outputId": "98c2f428-ce00-4ca8-b750-e6edc54f4ebb"
},
"outputs": [],
"source": [
"sorted(train_df['occupation'].unique())"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "w4pVCAm0so-h"
},
"source": [
"Each running of the next cell will sample 10 random rows and show their occupations, gender and biographies:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "Kysov0YvgfyP",
"outputId": "e927b8a0-0081-4929-c885-8eed5b8ec844"
},
"outputs": [],
"source": [
"sampled(train_df, display_list=['occupation', 'gender'])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "rwAD1w0dtg7o"
},
"source": [
"## 3. Model\n",
"\n",
"We will use a Logistic Regression model with Bag of Words features."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EQ4cIjub61Hm"
},
"source": [
"### Training"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "NSEnEUyEOnSK",
"inputHidden": false,
"outputHidden": false
},
"outputs": [],
"source": [
"# count vec is feature engineering: Bag of Words\n",
"# model: Logistic Regression\n",
"# might take a minute or two to run\n",
"count_vect, model = train(train_df)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "v-mR5ovfOnSK"
},
"source": [
"### Evaluation"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "jyGebbhJOnSL",
"inputHidden": false,
"outputHidden": false,
"outputId": "c20ef879-9eb9-49aa-e4ed-d35868f04fb2"
},
"outputs": [],
"source": [
"print('Train Accuracy =', round(accuracy_score(train_df, model, count_vect),3))\n",
"\n",
"print('Test Accuracy =', round(accuracy_score(test_df, model, count_vect),3))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "T8eMod9wOnSL"
},
"source": [
"### `predict` function\n",
"\n",
"The function takes a list of bios (as text) and predicts their occupation."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "gkjwOSSrOnSM"
},
"source": [
"Let's predict the occupation of two famous people - Albert Einstein (professor) and Wisława Szymborska (poet):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "JWu_YYNjOnSM",
"inputHidden": false,
"outputHidden": false,
"outputId": "0f52f576-f495-4a03-aebb-9add8d4f7144"
},
"outputs": [],
"source": [
"einstein = \"\"\"He is known for developing the theory of relativity,\n",
"but he also made important contributions to the development\n",
"of the theory of quantum mechanics.\n",
"Relativity and quantum mechanics are together\n",
"the two pillars of modern physics.\n",
"His mass–energy equivalence formula E = mc^2,\n",
"which arises from relativity theory,\n",
"has been dubbed \\\"the world's most famous equation\\\"\"\"\"\n",
"\n",
"\n",
"szymborska = \"\"\"She was a Polish poet, essayist, translator and recipient\n",
"of the 1996 Nobel Prize in Literature.\n",
"Born in Prowent, which has since become part of Kórnik,\n",
"she later resided in Kraków until the end of her life.\n",
"In Poland, her books have reached sales rivaling prominent prose authors',\n",
"though she wrote in a poem, \"Some Like Poetry\" (\"Niektórzy lubią poezję\"),\n",
"that \"perhaps\" two in a thousand people like poetry.\"\"\"\n",
"\n",
"\n",
"print('Prediction:', predict([einstein, szymborska], model, count_vect))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PHrZiVSkOnSO"
},
"source": [
"Apply the model and generate predictions to the train and test datasets:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "8pGrcdvwOnSO",
"inputHidden": false,
"outputHidden": false
},
"outputs": [],
"source": [
"train_df['prediction'] = predict(train_df['bio'], model, count_vect)\n",
"test_df['prediction'] = predict(test_df['bio'], model, count_vect)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "uhNW561HOnSP"
},
"source": [
"### Prediction demonstartion on train dataset"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "xVDSLd2vOnSP",
"inputHidden": false,
"outputHidden": false,
"outputId": "3ca45b1e-e38a-4a6b-e5a4-c43f3038d640"
},
"outputs": [],
"source": [
"sampled(train_df, display_list=['occupation', 'prediction', 'gender'])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EdHvqizWmV-K"
},
"source": [
"## 4. Evaluation of a Single Occupation"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "AoLgYDmRmV-M"
},
"source": [
"Let's focus on a single occupation: **poet**. We could evaluate the classifier on the test dataset with respect to this label only.\n",
"\n",
"Read this section carefully, it is extremely important."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "f7ud3_BMmV-N"
},
"source": [
"### Metric I: Acceptance Rate\n",
"\n",
"![AR.png](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAbkAAACzCAYAAAAUuGL6AAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAAFiUAABYlAUlSJPAAAB9lSURBVHhe7Z15fBXluceToFRba10qFqHF1qUuKOQkJDkn+0IgAREXkGor1QqlXqyigoAJi1ZbbC+9tm69yj/Y29uKKK0LotaLeK0VUKwoghDpdWEJYM2ekJDnvs+c84Y3k/dMzp5z5v09/Xw/OZl3yZwZOl+fd953Ji0vL48AAAAANwLJAQAAcC2QHAAAANcCyQEAAHAtkBwAAADXAskBAABwLZAcAAAA1wLJAQAAcC2QHAAAANcCyQEAAHAtkBwAAADXAskBAABwLZAcAAAA1wLJAQAAcC2QHAAAANcCyQEAAHAtSSO5/Px8KigoACHCxys/8Fl3PAEAAAyg5H7wgx/Q/fffT489+ij9/veP01tvvUXvvPNODNlCW7a8HYA/++Htfvz1jtbpXa8vvesc7Ufty97maFk4dXT17GWbNm6kv7/xBm3ctIn++Mc/UVlZmfY4AwCAySRcctOn/5BeeOEFam5pIUTk0dnZSYcPdwgOU3f3Efrdo/9JeV79MQcAAFNJmOTKxo2nufPn04FDhwKXaX90B9CHc2n/EUp7WSfUegMbR44coaamph4amxqptbVZHNd6unb6dO2xBwAAU0mY5B7+wyrq6g5HElIqAy+WZArO3BobG49KTnxuaGqgw10d9NhjK7THHgAATCUhkquqrqYdu3YFLtOhiguSs0e3+I+E5ubmHrn1yK6xgdraW2nb9u1UUVGhPQcAAGAiCZHcQw8/HLhMI6KJrq6uXoLrEZ2A73F+tm8fTZs2TXsOAADAROIuucLCQtryzpbAZRoRTcihSh1NjU2W6G677TbteQAAABOJu+QmTJhAe/bsCVymEdFEe3u7VnAMZ3gtzc102623as8DAACYSNwlN2PGDCsDQUQfra2tWsExLDnO5G5FJgcAAD3EX3IzZ0JyMYoWITGd4Bgu+2zvXpo6dar2PAAAgIlAcikSPLNSN+lEwkOZb27aREXFxdrzAAAAJpIQyXVAclGHXASuExzPruQnoDz8yCPacwAAAKaSEMm1d0Jy0YZTJtfa0kr19Qet54HqzgEAAJhKQiTXBslFHSy59o4O6rDBQ8HdR7rpoYce1h5/AAAwmQRIbgZ1QHJRB8+cfO31v9H6V9fThg2v+nn1VXr99ddp5crHqbS0RHv8oyUzJ49ycvVlAACQ7CRm4gkkF3XU1dVReUUFeb1e8vl8vdAd91iQmeulEzxFdHJmAY3M1tcBAIBkJmUk9+mhT+j5f6ylF7a+4Oe9dbRu6zp6/t119N6uF4kOvETd9S/24sj+db2xlWvrCOx1dPVCqdOL+heo+8DLRJ2fB75ReFFXtyuh74y7eNQoGnJRFqVnl1JadgUNziymYVn55EFWBwBIIRIkuc7ApTry2PjRRpr7xHxasLqG5q+uFYifq2rEtkW0+qUaou3zqXv7HcnLB/PEz1qitn8GvlF4wZIrLy/XHuN4MOKcc+nkbwyl40ecTYMu9gnRlVNGVjmdkFVM3x6TTzl5Xm07AABIJlJGcpt3b6YFT9ZSzdN3+VmzVPxcIqR3F/35laVEO2uoe8edvaAPhfxsBK0j2kvsdbo/FPWUcquO2Gav16uO8jf9LBRtllJ3e/JLbozI1k7yFNJx52XRSd8YTiefPpS+dM5FlJ5ZIjK7sTRICG9IVhHlatoCAEAykTLDlRt3v0nz/rSQFqxaJKgVwquxpHf7E4vpyb8uJNpxB9EHc0VGN0/8nB+AP/emW5RzVtWnjLfHCP8+9KX7AyG7tt2BbxRe1O2qS5jkzvTkWzJLy66kjMxiOv6sC+jk04bQSWcMp2PPz6F0TyUNzS6B5AAASU/KSG73gTpatWk1rd68RvAUPbV5teApemLjGtr4wVNE+wR7maeJ9qzxY/3em25Rr1uzvacsBuj6pr2r6ci+NdR9+GDgG4UXiZJcZo6Xjs8sFZKrFDKroLSscmuoctCFPjph+Ag65etD6KsjzqWRWTna9gAAkEykjORMj0RJ7nRPEWVYgisXlAmE8AQZ2RWULoQ3+LxMGvLNEXTa0NPJk5Wl7QMAAJKFxEgOj/WKOhIhufNy8mkwCy5rrCW4DJZbQHLpWaWUlllCw8ZdSRfm5tPw4cMpJwfZHAAguYHkUiTiLTmebHJyVomQmcjYAlmcFFwaC85TQoNzyyh70hV0+tChNGr0aG0/AACQTGC4MkUi3pI7MzufjskScuP7cTJ7U0gbXUznT7qGzjp/JJ111ne0fQAAQLIByaVIxFNyo3iyiafYf9+tj+DKrGHKkwsnUmZZNQ05bQhlZ2dr+wEAgGQj7pKbCcnFJOIludw8L30jq9Ba+8ZSy2DZCdI9JT2S40Xgnkun0RnDh1NJ9SQqKovvvUEAAIgViZEc7slFHfGS3Lk5BXRsljKL0ia5tNEl9M3KKfRdzxg6b1QmzVqyjErHVmr7AgCAZCOpJp7wiz9170vj96i5Hf6e/GLUYBEPyY0RWdxJWbxk4Kjk/HILCE5wXE4VZVdfRkOGDqUps2+jcZMv1/YFAADJSFJJrq2tjfbu3Uv79u3rBW8zga6ursCR6BvxkNy3PD46RsnielNiTTa5YOJVdObZ55Kv+lKaeuMtlOfFMysBAKlDAiQ3I2TJcSbDdU2FX4waLKxX7cRQcqPG5NFxo0XGFrgXZ4cnm5xUNJE8pWNp2LfOpAnfu5YKSxP3FgQAAIgFCZJcR+BSjYg0Yim53Nw8Om00Tzbhhd+9n2wiyfBU0KhJ0+iMESPootEebT8AAJDsQHIpErGU3AXZXjqG18NZC7/l4u+jguPJJsPGXUEXF5TQGcOGizYYogQgXuzYsQP0g+64hQoklyIRq3tyOYKTPYWUkc2P7rINVfITTzJLaLAQ32W31tA1c2uofPKV2n4AALEh2ou424HkDIlYTjz5+ug8v9B40olt4knaqCK68IrraPay39DN//EYXXHDjdo+AACxAZJzBpIzJGI1XOnJ89JXxhRT+kW5lD7KRxnWMyr9Q5ZpmWX0VW8VXbtoGd38q4doxuJlVFKBNXEAxBNIzhlXSY5nV3Z0dBhLImZXDvWWUEZRtZCcyObO91iyS8ssEnAWV0yFN8yxBPdvv3qYJl4zXdsHACbhqZqj3R4rIDlnXCW5YOvkTMFxnVwMJHeht4AGF46j9OIqysgSYrtgtCCT0s7PouM8RXTOpKtp5r2/ppuXP0RX37aQfPkF2n4AMIlRl/+asqpv15bFAkjOGVdJji/yzU3NZtLc7JzJRXlPLifPS6fml/uzuMIqSs8tE4LLEohs7rujqfKGW+imXz1CP/3lg3T9op9T6fgqbT8AmAZLbszVj8ZNdJCcM66SHCJ4RCu573gLaVDBOEoTWJLLr6S0kTmUdt5o+nbFJfSjml/QzMX30dRZN1MRFn0D0ANLruLW/42b6CA5ZyA5QyKa4crMPB+dWFhJ6UUsOD9phWMpbZSXvnxxHhVeOsUSW35RMXnx2C4AeiElFy/RQXLOQHKGRDSSG5ZXTIOsYUohOgshuqJKysgpphFj8vE8SgAcUCUXD9FBcs5AcoZEpJK70FdAxxWOpwweopRZXNF4K6v7Sn4ZZXt92nYAAD92ycVadJCcM5CcIRHJPTl+lc4p+WMpo5gFJ8TWQxUNEtvO8hVq2wEAjqKTXCxFB8k54yrJ8exCXivHP03EKSLJ5M70FtEgKTbO3gKZXEbRBDqloNySoK4dAOAowSTHxEJ0kJwzrpIcL4iur6+nAwcOGEks18llCoEd76uw1sRJucn7cV8SwhvpxRo4AELBSXJMtKKD5JxxneQOHjxoJIcOHYqp5IZxFldSLTK43pLjocthvhJtm2Qia+yPyDP5LgAGnP4kx0QjunAv4rNmzbIenKGLjz/+mNauXUuTJ0/Wtk1FXCU5RPAI557cyFwvHVtQHhiiPEqaENzx+WPJkwKTTVhyudc+TsU3rgUDTFuH/z++lqx4R1seDtwHB/epK09WdGKzE6nowr2IL1++3DqGHCw1yRdffBHYSrR///4BF50UcU1NjbY8VCA5QyJUyfGrdE7KLbZmT/qzt4DkREaXIbadmyKTTVhyvuuf0F5MUpn6L47+f+HTA63aOjrmPPAuNTR3BloSrXltj7ZePJDx4NN12vJw4D5k6MpTnUhEF43kdGVSLiw+e3kikcH7pCsPFUjOkAh1uPI7eYX+p5lYTzY5msXxZJNTfeWWBHXtkg23Ss4eocpq2z8bAy388bf3DmnrxQMZkFxohCu6WEqOWbVqVaCUrKFNXZ1EIMMMyXW2B74uItIIRXKe3Dz6ipeHKasoTc3iBMf4KumivHxtu2TE7ZLjLE7+1NVTuebuTdTeccSqL7M5SC65CUd0sZacWm4XDA9hbtiwwRrOlMGZ386dO/uVEctTbcfDo5s2beozLOoU/HfUuqGATM6QCEVyZ+QW+9fEKffirEXgRdU0zFusbZOsuF1ynMHJ4KFIXV3Jy5vrrXqczUk5QnLJT6iiS5TkWEaqpOS9PHUSC4tM7UuydevWQI2j7WSw7NSM0V7Of1NuY8Gq/YZCCkhuJrVDclFHf/fkLhRZ2nH5PCzJj+86KjnO6r7sK7eyPF27ZMXtkuOLfd2eZuvz2x/+S1tXIrM3btOf5FiYLEP1/h1/5m1OMpXt1IyR5cplMoJJjoWt3mvktvydOAO11zVJckwooou15FgkHCwvdbsUD0tHlRLLT5WYfYhT9qdrx5kZB/ettmFk9Jch9oerJNfZ2UkNDQ1G0tjQaC2EDxZOkuP7bKf6ykQWZxNcAVNJ5+Sl3po4EyQnszkWi64u8/iL/gsTi4N/d5Lcz1Zu75EU/+S6sj4Hb+M6obSTv6v3AnWSU8vtf4/32S5W0yTH9Ce6WEqOlw/IULMmtU2w+3RydiYLT25jkclMTycrLpdh71eGIZIL7Z5ce3s77dmzx3pxqmns2x/5S1PP9hXRoKLAg5d7JFcpto2jU7zFKTPZRMUEyfHvUibBJqDIbE9mVVIidsmp9+1YPGoWxaKRmZaUpQpv4+A6ajv+G2rYJSfLuZ0qM+5D7jfvr9rGRMkxTqKLRnJyGJBRhx1VUTF874zD6Z6YzNhYdnLbihUr+myzI4dA7TKTAckpwY+24mzOVJwimOQ8eT76asFYSi/uvWQgQ0juWMFIb+pMNlExRXI8rMdhlwHD4uBgeUn5BJOcFI5OYozsi0MVqswUOexZF6OKTpWcKlW7/GS5DLVfUyXHBBNdNJJTw2kCiRyqdLonpvYrt6lDn6pQVaRc7X3LMERyuCcXbegklys4I7+MMkqqKU1kbfx2AQlPPjndmmySms+nNEVyqnzskpEC5MxMbgsmOZmpyYxPh2yr9iclppOsRIYqM/swqg65T2o7kyXHZF55P2VV/bTXv/VoJKcr18Ey4ohUcqGEXWYyIDlESKG7J8dLAgb7KnvJzcJ6sklFSjzZJBimSI6R8lEnoKiZknofLZjkZKj92tEJLVh/KjLUvmVfvI/chw65/2rfJkuOMznPhHl9/q2nguS4vVo3FGRAcoiQwp7J5Xi99HVvKWXwmjghNmuoMpDNpRdX01m+1FoyYMckyakTUOSwpNzG2ZCsx7A8OJJFcqGE2s5UyQUTHJMIycnZk6Hck+N7bHKbXFjOQ5Jq3VCQAckhQgq75M7OK6RBRROsNXA8NJlRUCng1+hU09fyU/81OiZJjpFZj7xfJof67BNSkk1yal+hYKLknATHJEJy6lNQeEakro6cXalmezxjUka4spIBySFCClVyHiGwE3xjhdwmUlqhf9kAS45/HitEd6ELXqNjmuTkVHyWmxQBi0+twwSTkpwhGco9OXVY1Gnii0SGus9q9qnW7Q/TJNef4JhESI7FJiXGmZoqOv4sMz3O2OwSVMt0wmKB8uxN+3Y561JXFg4pILnQn3jCU+j5QPJSAhNxenGqKrkzOIvjJ5kEZlJKBvGTTXylfc5BKmKa5Pi+mwyZxakykgSTnCpJdbtEneCikxWHHCpVkf1yqO2C9dcfJkkuFMExiZAcw28DkDMhOeyzI/mn7o0BLD0pLA6WpWwrQ7fEQJ20wu2ZSO7tuUpyfJA/++wz7Toyt7NvX//r5CrKy2hkTh4N5iyuoIoyhNgYv+Sq6Mv5FZTlkrd9myY5RspNhn22JRNMcqok1dmTDPcj+7ZnbOoEF3s7VXAc9n2W5dxe931YoHZRmyK5UAXHhHsRZxFx8PVSV+4EDz9yZiazOg7+rHsGpR1eaK6KjYN/d3p/HYtOlWiwx4Y5kRqS6wg9k2tubqaWlhbjaG1tdXziyUcffUSlZeX0tZxiv9T4/pslOc7oqkVmN57OFBme7hykIiZKTs2qgg0fBpMco7Zn8XBdWZ+DRafL1tR2POzJbVTxyT7s+8x9qWKWbdW/aV9iYILkwhEcE+1F3O2khOQ6Dh8O/LNGRBo7xYm+qLicMnzjKM3K3JSnmxRX04k+frKJO7I4xq2Sk/LQPV5Lzap0EmTkk0R0kmO4HdeR/XCwiJzu1TGynQxuw2vhuExu1+0zw32rYuPg33m7Xaoy4+T9U7e7hXAFx0ByzkByhsSWbdvolMJK6xU6nL1lFPiHKhmebHJ+Cr1GJxRYcngzOEgWdEKzE4ngGEjOmRSQ3Ew63AnJRRu33LOM0nNLrHVwgwqqBX7B8TDlcG+J9tinMiw5z+S7ABhwRl3+a63UVCIVHAPJOQPJGRCvbnybTuQnm+T7F3v3TDYpHk/H82t0XDRMCUCy0Z/kohEcA8k5A8m5PBpbWqjiuh9TWk65NaOSZ1H6n1PJw5aVdLbXPZNNAEhGnCQXreAYSM4ZV0mO14nxDENTUYNXzG37v91U87vH6ESWmncsZRRMsDK4tCKR1Yks7kRfmasmmwCQjASTXCwEx0ByzrhKcrwgmhcM1tfXG8eBAwd6rZPrEtJb+dKLdNW999H4BUvpzCnTaVC+EJyAlwwcW1BJF6Toa3QASCV0kouV4BhIzhlXSa6jo4MOHjxIhw4dMo7PP/+8Tzb38uZNNLl2KV11zzKaJmRXdMtcOrX6ckrPr6KhLnmyCQDJjl1ysRQcA8k54yrJIY5Gp8jqah5bQePm19AkIbqpd/+crhaiu+Lu+8hz3U8oKz/1n08JQCqgSi7WgmMgOWcgOZfG1ro6urR2EVXXLKEJdy6hSXcupcsW/4ym/eKXNGXufPL6UvddcQCkElJy8RAcA8k5A8m5NP79iSdp/LwamlgrJCdEN7FmqWAJXSlkd8n0G7THGgAQe1hy8RIcA8k5A8m5MD47cJC+d/cymrBQCE3I7RIhuuqaxXT5HTV0zbxFdMn3r9ceawBA7GHJxUtwDCTnDCTnwvjvl9dTlcjiJonMjSU3oWYR/egX99HV8xbStDkLqHyi89PCAQCxw1M1R7s9VvBFHDijO26hAsklWTS1ttCM5ffT+IWLaGLtYqpeUEM//eX9tPKJNTTtViG4SZdpjzMAAIC+JJXkOjs7rXcbmQrHK29toar5tTShdimNExncdfcuo+eeW0tPrl5DlZdeqT3GAAAA9CSV5HgxeLAXirqd+v37qU18/7kPP0rjFyyiqjuX0LQl99Afnv4zrV//Cj3zzLNUWuK+BzEDAEA8SSrJ8WO9OJszld0HD9FNDz0iJFdLV9beRSv++CS99OLL9N5722j7tu1UVooF4AAAEA64J5ck8XlrM20/dIj+sXcfPbL2RXrg93+gl9a9SG++uZG6uo7QR3UfUXl5ufYYAwAA0APJJUl8+q8GWvPa6/TOp5/Rrr319D9/XU+vvrqBWlr9b1yuq6uD5AAAIEwSIDl+M3iHdaFG6IOfWLltzz76yZKf02vbd9Gb775L655fR1vf3eqvIAKSAwCA8EmQ5JDJOUXHkW76898302vvf0BvbH6Lnl37PL3xtzeosbEpUAOSAwCASIDkkiA+OXiIntv4Fr2/Ywc9+9yztOWdLdZMUzUgOQAACJ+E3JOD5Jyjk46I/3XT+++/R+9v/6DPK3c4IDkAAAgfSC7M+HT/v+j6u56mKTWr6arapyJn0VM0lVn8FE1Z9DRdf88a+uRAQ+Cv9A1IDgAAwgeSCzM+3f85XXvnf9Glt62kybc/Tpcp8O/hbLt0rp9Jt6+k74s+933eEvgrfQOSAwCA8ME9uaSK7gB9A5IDAIDwSZDksIQg2oDkAAAgfBIiucPI5KIOSA4AAMIn7pKbft0PkcnFICA5AAAIn5hIbty4cTRv3jy6Y94dAv45z/p97ty59JsHHrDWfHVpHkhsOl1dXdZPfjB1fwHJAQBA+EQtOa/XS3955jkhsS5qb20TtFB7Swu1tTRTazPTRE2NDRaNjY1AoaGhgZrFMYLkAAAgPkQtuTlz5ogLdhO1tnaIC7YQmrhoM41N4rNFoxBcIzU36C/0psPZXCgByQEAQPhEJbmKigrasmULHT7cIWQmxGa/iDeJ7I2xbwcWbW1tAYX1H5AcAACET1SSmz17NrWI7K1ZJzjgCGe5usd3BYtdu3ZBcgAAECZRSe7BBx+gTpHFNYsLtu5CDoLTGnhPXKgByQEAQPhELDmfz0d/+csz1HW408pKdBdyEJyOjvCWVezevdsaHtadCwAAAHoillxpaSlt3fo+tbd3WpNMdBdyEBxeOhBOvP3221RUVKQ9FwAAAPREJbl3t31ALVYmp7+Qg+CEK7nHVz6uPQ8AAACCE7Hk8gsK6IW/vkIdh7sguQgIR3JdR47Q7Jtu0p4HAAAAwYlq4slvfvtbOsITTxoxXBku4dyTe2X9egxVAgBABEQluZ/ceCM1NjdbT+3QXchBcEKdXfnxJ5/QlKlTtccfAACAM1FJju/Lbdy8kQ53dliLvq2nm1hPOYkcaxKLbVsyEOl+ObXr72kn/CSZ22+/XXvsAQAA9E9UkmPumH8HNbU0Ult7C7W2NVsZSjS0tPnRlQ0kke6XUzunVxB9uHMXzZj5Y+0xBwAAEBpRS47Xy82+aTbV1NxJCxcuCLAwKhYE0JUNFHKfwt2v/trV1tbSokWLeli8eLG1fcLEidrjDQAAIHQiltzTq1aBFEJ3DgEAwO1EJTnddpB84FwBAEwFkjMAnCsAgKlAcgaAcwUAMBVIzgBwrgAApgLJGQDOFQDAVCA5A8C5AgCYCiRnADhXAABTgeQMAOcKAGAqkJwB4FwBAEwFkjMAnCsAgKlAcgaAcwUAMBVIzgBwrgAApgLJJYiamhrasGEDrVixQlseT3CuAACmMuCSW7t2beANakSbNm3S1kkmeB910dbWRjt37gwqsY8//rinnq48nkByAABTGXDJffHFF9bFn2MgBBAuUla83/xZwvsugzM2ezspR65rL4s3kBwAwFQGVHI8hMexf//+HtmtSvILspScTmRbt261yjhmzZrVp3yggOQAAKYyoJKTUmCxsTQ4eMhPVzdZcJLc5MmTrTKOZJI1JAcAMJUBlZwc4mM5cOYjg3/X1ZeZnxQhi4SzQBm83Z5ByVi+fHmv7SpyP7h/XbmKk+QYGfZyee8xmMR5v1n66vAtf+Zt0WaFkBwAwFQGTHI8QYNDvehLYbEQ1LoSFhUHi0ZmgfLemAwWlioF7p+D66t9SeR+hHo/MNRMzi5Mmalye3U7w3WlaPkn17F/p1AEHAxIDgBgKgMmOSkfdVhPZjssO7WuREpOhipDFozMglShcf8ydBmilGUwCdoJJjnuW/alE1kwyXE7KThur+4jy1qKn7+b2i4cIDkAgKkMiOT4Qs5hz57kdg7dEJ0qOd09L50kVYno2sgyp+FMFSk5mUFKZD/8WRWVJJjk5PZgEuPjIEO3/6EAyQEATGVAJCezK132JDM83Zo5VXL2MqdymWHZ74fJocpwsiSWVH/Bf8cuumCS62+IlpF/M9Rs0w4kBwAwlQGRnLyw6xZOSwHqxBOp5KTMOFT5SPmFswhdCkc3XMn7rmZ0ankwyclwyiSDtQ0VSA4AYCoJl5w6/MYXbR0y7BKMVHIMS5NDHfKTQgpnUofcP7vkJOo+qOKC5AAAIPEkXHLygh1K2IfnopGcfOKIHLKMZKiS6U9yjAy1DiQHAACJJ+GSkxmVbqhSIgXEmZY6vBiN5Dhbk8F9yqFKJ1npiLXk5PEI5Z5cOMOqKpAcAMBUEio5KSGWl65cRYY6vBiN5Bh5L1C9d6abxelEf5JTh2NVkQeTnJStOiNURe2Pv5+uTn9AcgAAU0mo5OQF3T4MqUPOslRnREYrObnEQEYwsTjhJDkWkhSpfRg0mOTUDNN+XNT+7O3CAZIDAJhKQiUnsyc1OwsG15Ehs61oJadmRRxOQ4TBkJJjifFnFRn8Pe2TWYJJjlG/K7e198eisy9JCAdIDgBgKgmTnMxY+CIeygWb69ilqPZhrx9KOaPKQ8ozHOQEFl2wjFhmuu8ns0j7Wj0JC5rL5Hfm4P4iEbEdSA4AYCoJn3gy0EhJ6TIqtwLJAQBMxTjJydmMoQyZugVIDgBgKkZJTi5N4IjmHleqAckBAEzFKMnpZmyaACQHADAVoyQnw2khuhuB5AAApmLcPTkTwbkCAJgKJGcAOFcAAFOB5AwA5woAYCqQnAHgXAEATAWSMwCcKwCAqUByBoBzBQAwFUjOAHCuAACmAskZAM4VAMBUopIcSB105xAAANxOxJIDAAAAkh1IDgAAgGuB5AAAALgWSA4AAIBrgeQAAAC4FkgOAACAa4HkAAAAuBZIDgAAgGuB5AAAALiUPPp/Qo0DGXMitSQAAAAASUVORK5CYII=)\n",
"\n",
"> What it the proportion of bios that were predicted as `poet`?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "fCWQO0zsmV-N",
"outputId": "9700002b-4968-4199-f970-31f5d5a736cb"
},
"outputs": [],
"source": [
"print('Poet - Acceptance Rate', (test_df['prediction'] == 'poet').mean())"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "mUFyUeUdmV-N"
},
"source": [
"### Metric II: False Negative Rate\n",
"\n",
"![FNR.png](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAb8AAACyCAYAAADS+sEYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAAFiUAABYlAUlSJPAAACL6SURBVHhe7Z15nBTlnf+HIbqaw1WTaBB2daMxnoGZwZnpnvtggIEAXoTVja4mssafJoqCgMOh2Rwk+ZmfWa+s8kc0m82KKDkMeMRV/BlXQGFFEQPI/jy4MTLMyQzz/dWnuh/mmeKp7urpqumjPl9e71c3Vc9TXd01Xe/+PkdVQXl5uRBCCCFhgvIjhBASOig/QgghoYPyI4QQEjooP0IIIaGD8iOEEBI6KD9CCCGhg/IjhBASOig/QgghoYPyI4QQEjoyIr+KigqprKwkHsHnVRF/bvo8CSGEpMaQyO/rX/+63HPPPfLwQw/JL3/5qLz22muyYcMGH1kv69e/HgfPY2B5jFi5/jIDyx3NwDL929G35azTvy6VMqZyznVr16yR/3rlFVmzdq38+tf/IfX19cbPmRBCiDcCld/VV/+jrFq1Stra24Ux+Ojp6ZFDh7otDklf32H5+UP/KuUR82dOCCEkOYHIr378BJk9d67s3b8/fvqORV8ccyRemzy81FdlvJbLbBw+fFgOHjx4hNaDrdLR0WZ9rnvkqquvNn72hBBCkhOI/B741TLp7UtFHko2mRdONgUyvdbW1n75Wc8PHDwgh3q75eGHlxo/e0IIIcnxXX4Tm5vlna1b46dvr0Kj/JzRZ/14aGtrOyK9IxJsPSCdXR2yafNmaWxsNB4DQgghifFdfvc/8ED89M1IJ3p7eweI74gALdCH+uGuXTJjxgzjMSCEEJIYX+VXVVUl6zesj5++GemEavI0cbD1oC3AW2+91XgcCCGEJMZX+U2aNEl27NgRP30z0omuri6j+AAywva2Nrl11izjcSCEEJIYX+V33XXX2RkLI/3o6Ogwig9Afsj8ZjHzI4SQQeGv/GbOpPx8inZLbibxAaz7cOdOmT59uvE4EEIISQzll4WBkZ6mwS4KNIm+unatVNfUGI8DIYSQxPguv27KL+1Qk9tN4sNoT1zx5YEHHzQeA0IIIcnxXX5dPZRfupEo8+to75A9e/bZ10s1HQNCCCHJ8V1+nZRf2gH5dXV3S7cDNCn3He6T++9/wPj5E0II8YbP8rtOuim/tAMjOV96+U/ywosvyOrVL8Z48UV5+eWX5ZFHHpW6ulrj558uRaXlUlpmXkcIIfmE/wNeKL+0Y9u2bdLQ2CiRSESi0egATJ+7HxSVReTTxdVyUlGlXDDWXIYQQvKFrJTfB/vflz/890pZtXFVjDeflqc3Pi1/eONpeXPrMyJ7n5W+Pc8M4PDupwfiWG8sY+EsYyrnpcwA9qySvr3PifR8FH9HqcW2bVuH9J59Xxk9Wk65sESGja2TgrGNcmxRjYwsqZBiZoGEkDwlAPn1xE/hg481766R2Y/NlXnLW2Tu8gUW1uOyFmvZQln+bIvI5rnSt/n27OXtOdbjApHO/4m/o9QC8mtoaDB+xkFw+pfOlpO+MEKOP/0sGf6VqCXABiksaZBPl9TI311UIaXlEWM9QgjJVbJSfuu2r5N5jy+QlifvirHiTutxsSXDu+Q3z98psqVF+t65YwDyZ0uKDlzLWPUVzjJ9f7bKaevtMtYyZ7kBZbTXjDHfqnOn9HVlv/wusrK7E4ur5LhzSuTEL4ySk04dIX/1pQtlWFGtlQmOk+GWCE8pqZYyQ11CCMlVsrLZc832V2XOf8yXecsWWiywRNhiy/C2xxbJ43+cL/LO7SJvz7YywDnW49w4eD6QPms9srCj1mG5T8T24Wj63rYk2Lk9/o5Si21btw2Z/M4orrAlVzC2SQqLauT4M8+Tkz5/ipx42ig55txSGVbcJCPG1lJ+hJC8Iivlt33vNlm2drksX7fC4gl5Yt1yiyfksTUrZM3bT4jsstgJnhTZsSKG/f+B9Fnl+gzLj6zzAdO2ZedyObxrhfQd2hd/R6nFUMmvqDQixxfVWfJrsiTXKAUlDXaT5/Dzo/LpUafLyZ87RT5z+tlyQUmpsT4hhOQqWSm/sMdQye/U4moptMXXYFFvYYnQonBsowyzRHjsOUVyyt+cLp8fcaoUl5QYt0EIIbmI//Lj5c3SjqGQ3zmlFXIsxFcyzhZfIaQXl9+wkjopKKqVkeMvk/PLKmTUqFFSWsrsjxCSP1B+WRhByw+DXE4qqbUkZ2V48axPia8A4iuulWPL6mXslEvl1BEjZPSYMcbtEEJIrsJmzyyMoOV3xtgK+USJJT3096lsT6NgTI2cO+VKOfPcC+TMM79o3AYhhOQylF8WRpDyG41BLsU1sX69o8RXbzd3nlQ1WYrqm+WUz58iY8eONW6HEEJyGV/lN5Py8yWCkl9ZeUS+UFJlz92D7AohQYthxbVH5IfJ7cVTZ8hpo0ZJbfMUqa4Ptu+REEIygf/yY59f2hGU/M4urZRjSrRRnQ75FYyplb9puly+XHyRnDO6SK5fvETqxjUZt0UIIblMxga84IaspvvV4T52+Q7eJ25Y6xZByO8iK+s7sQRTG/rlF5NeXHwWx5VOlLHNF8spI0bI5TfeKuOnXWLcFiGE5DoZk19nZ6fs3LlTdu3aNQAsCwO9vb3xT+LoCEJ+f1sclU9oWd9Aau1BLudN/pqccdbZEm2eKtNvuFnKI7ymJyEkP/FZftd5lh8yH5QNK7hhrVvYtzTyUX6jLyqX48ZYGV68r88JBrmcWD1ZiuvGyci/PUMm/f1VUlU3dHeVIISQoSYA+XXHT+GMwYaf8isrK5fPj8EgF0xoH3glF0VhcaOMnjJDTjv9dLlwTLFxO4QQkk9QflkYfsrvvLER+QTm89kT2tWk9n7xYZDLyPGXylcqa+W0kaOsOmzqJCQo3nnnHRIgps/cDcovC8OvPr9Si5OKq6RwLC5h5mjyxBVeimrlWEuIF89qkStnt0jDtMuM2yGE+EOqJ2jiHcovD8LPAS+fG1MeEx0GuzgGvBSMrpbzL71GblzyM/nO/3lYLv3mDcZtEEL8gfILDsovD8KvZs/i8oh86qIaGXZhmQwbHZVC+xqesabPgqJ6+Uxkoly1cIl85yf3y3WLlkhtI+f0ERIklF9w5Iz8MNqzu7s7tAzFaM8RkVoprG625Gdlf+cW2xIsKKq2QNZXI1XfvMUW3//6yQMy+cqrjdsgJEwUT7zFuNwvKL/gyBn5uc3zCwsJ5/n5IL/zI5VybNV4GVYzUQpLLOGdN8aiSArOLZHjiqvlS1OukJnf/6l85+775Ypb50u0otK4HULCxOhLfiolzbcZ1/kB5RccOSM/nPzbDraFk7a2xJlfmn1+peUR+WxFQyzrq5oow8rqLfGVWFjZ35fHSNM3b5abfvKgfPvH98m1C38gdRMmGrdDSNiA/C664qHABEj5BUfOyI/hHunK74uRKhleOV4KLGz5VTRJwQWlUnDOGPm7xq/KN1p+KDMX/UimX/8dqeZkdkKOAPk1zvq/gQmQ8gsOyi8PIp1mz6LyqJxQ1STDqiG+GAVV46RgdEQ++ZVyqZp6uS28iuoaifDyZYQMQMkvKAFSfsFB+eVBpCO/keU1Mtxu7rQEaGMJsLpJCktr5PSLKni9TkISoMsvCAFSfsFB+eVBDFZ+50cr5biqCVKIpk6V9VVPsLPAT1XUy9hI1FiPEBLDKT+/BUj5BQfllwcxmD4/3LLo5IpxUlgD8VnCO8JEGW4tOzNaZaxHCOnHJD8/BUj5BUfOyA+jHTHXD49hJFEMJvM7I1Itw5XwkO3FM7/C6klycmWDLUdTPUJIP27yA34IkPILjpyRHyZ679mzR/bu3RtK/JznV2SJ7fhooz2nT0lP9ff9lSXCCyKcw0eIFxLJD6QrQMovOHJKfvv27Qsl+/fv91V+I5H11TZbGd9A+aEJdGS01lgnmygZ9w0pnnYXIRknmfxAOgJM9QR9/fXX2xcEQXz88cfS0tJiLAdWrlxpl9uyZYtxfb6TM/JjuEcqfX4XlEXkmMqGeFNnPwWW+I6vGCfFOTDIBfIru+pRqblhJckwnd2xH2WLl24wrk8FbAOBbZrWZysm4TkZrABTPUHffffd9meo4r333jOWA6tXr05aJp+h/PIgvMoPtyw6sazGHs0Zy/bi8rMywEJr2dk5MsgF8ote+5jxJJPL7Pm4/7vwwd4OYxkTt9z7hhxo64nXFFnx0g5juSBQcd+T24zrUwHbUGFan+sMRoCDlR+yPhVLly41lqX8KL+cD6/Nnl8sr4pdvcW+kkt/1odBLp+NNthyNNXLNvJVfs7wKrFN/9MarxGLP72531guCFRQft5IVYDpZH4bN260HyFCU1nKL9Py6+myDwBj8OFFfsVl5fKpCJo7J0qBnvVZfCLaJBeWVxjrZSP5Lj9kferRVE7nyu+ula7uw3Z5lf1RftlNKgJMR37o/1OB/j1n2UTyU/2Gbn2GbuvRf6iWA5WB4hH7o8rhtfXsFPXcXgv1IHL1mgiUd8tovcLMLw/Ci/xOK6uJzenT+vrsye3VzTIyUmOsk63ku/yQ8alAk6aprOK5dXvscsj+lDQpv+zHqwDTkR/+v3btWvs5xDFt2rQBZRPJTwW251yXaD22hVCDafRQZZUgESi/e/du+zn2Ud8WgBCV9CBLlNeliffgrOOVDMtvpnRRfmlHsj6/862s7rgKNG/iMmb98kMW+Mlog50VmuplK/kuP0hg2442+/nrf/6LsaxCZXuok0x+ECkkqfcP4jmWJZKsqqdnmJAu1qlwkx9Ervdloi7eEzJWZ9kwyQ94EWC68oPwlDycoghSfgi8rlqPR+wLZKbW6TLGckhQ/R/o++7MXPX36baPycgZ+fX09MiBAwdCSeuBVnuCv1skkh/68T4brbeyPof4KkGTfKk89+b0hUF+KvuDcExlwaPPxE40EAr+n0h+//zI5iPywiPKqvIILEMZL/XU//W+RpP89PXO18M+O4UbNvmBZAJMV35Az8KcTY+IoORnasZU+2J6TSeJ9g+orBZNoqb1ycgC+Xnr8+vq6pIdO3bYN7QNG7t2D/5mtmdFq2V4dfyC1Ufk12QtGy8nR2pyZpCLThjkh/8rybgNfFHZocrClFyc8tP7BSEkPeuCgFRmpiSqg2UIlNHr4TX0cMpPrUc9XXLYhtpv7K9eJ4zyA4kE6If8gGoq1EURpPwgJn25Qu2fnhW6oZpDly1bZlyPPj+EF5GayBn54RJfyP7CSqJwk19xeVQ+UzlOhtUMnNpQaMnvGIsLIrkzyEUnLPJD8yDCKQkAoSAgNSUlN/kpEZnkBtS2ELpoVWaJcGZpQBegLj9dtk4pqvUq9O2GVX7ATYB+yU+JAqGkE6T8sG19uY4qg0g0cEUFJIg6TpQcEab6ycgC+bHPL90wya/M4rSKeimsbZYCK8vD3RoUGPRyqj3IJTev3xkW+elScspHiRGZnFrmJj+V2akM0YSqq29Pyc0kX4UKXXLO5lgTap/0emGWHyi67B4pnvDtAX/rfskPQBgIPOL/mZIf+vKwXvXnIZCZOiXoNUz77wXKLw/C1OeHqQvHRpsGSM/GvpJLY05cycWNsMgPKCnpA1/0zErvp3OTnwp9u05MonPbno4KfdtqW9hHbMOE2n9922GWHzK/4klzjvpb91N+arAJAuUyJT8FJIgmTX30pr49FW77kC6UXx6EM/MrjUTkc5E6KcScPkt4dpNnPPsbVtMsZ0Zza2qDkzDJTx/4opo31TJkT6ocgFQQ2SI/L6HXC6v83MQH/JQf0Ce+Z1p+OqoemkHVMiVF0xxFP6D88iCc8jurvEqGV0+y5/ChibOwsskCtytqlr+uyP3bFYVJfkBlSao/TjUZOgfCZJv89G15IYzySyQ+4Lf8nBe+Rpjkp8qYBpsoaSL8kp9JxErUpv3zA8ovD0KXX7Eltk9Hx1nSmywFVbHpDZAfHo+xBHh+HtyuKGzyU1MGID0lCAhRLwPcZKVGbHrp89ObVxMNuFGo0PdZz1b1sskIm/ySiQ/4LT+gywthkoueIerz8Zx1U5UfMjuMBNW3iedq8Iqe5elXqMH+6HXUemzLuQ9eybD8vF/hBUP98WsEUx7CSKIb2uryOw1ZH67cEh/ZqRiOK7lE6446BrlI2OSHfj0VKuvTJaVwk58uT325Qh9YY5IYQjW56qjtIvR6bttLRpjk50V8IAj5QSL6YBOT/PQMEY8oozJFiEhJLlX5KakiUBao14EAnYJD5qmHqqP2BeE2FSIZOSM/fEAffvihcR5cvrNrV/J5fo0N9XJBabkci6yvcqIUWsIDMflNlE9WNEpJntydPWzyA0p6KpyjP4Gb/HR56qM5Abajtu3M8PSBNc56uvgQzn1W61Hf9H4gVqfAwyI/r+IDqZ6g9SuomNYrMLJSScdNVNiWfikyyEmNyFSSc05kV+UT9dNhnT5NASLDPjjFp8BrQJq68PAcywab9YHMy6/be+bX1tYm7e3toaOjoyPhFV7effddqatvkL8urYnJDv17tvyQATZbmeAEOcPKCE3HIBcJo/z0LMytGdJNfkCvDyGhrCqPgABN2Z1eD82nqKMLUW3Duc/Yli5sVVd/TedUiDDILxXxgVRP0MQ7GZdf96FD8T93xmBji3UQL6xpkMLoeCmwMz3tai41zXJCFFdyyY+sD+Sr/JRUTJcZ07MwkxyBunKKSX4A9VBGbQcBQSXqCwSqngrUwVw+rFPLTfsMsG1deAj8H8udslUZKvZPX54vpCo+QPkFB+WXB7F+0yY5uarJvlURsr3CyliTJ8Agl3Nz6HZFXoD8eCd3ki2YROdkMOIDlF9wZFh+M+VQD+WXbtz8vSUyrKzWnsc3vLLZIiY+NHeOitQaP/tcBvIrnnYXIRln9CU/NcpOZ7DiA5RfcFB+OR4vrnldTsCVXCpik9iPDHKpmSDH43ZFedTcSUi2kUx+6YgPUH7BQfnlcLS2t0vjNf8kBaUN9ghPjOqMXccTzZ9NclYkfwa5EJKNJJJfuuIDlF9w5Iz8MM8NIx7Dih6Y8bfp/22Xlp8/LCdAdpFxUlg5yc74CqqtLNDK+k6I1ufVIBdCshE3+fkhPkD5BUfOyA8TvTE3ZM+ePaFj7969A+b59VoyfOTZZ+Rr3/+RTJh3p5xx+dUyvMISnwWmNhxT2STn5ejtigjJJUzy80t8gPILjpyRX3d3t+zbt0/2798fOj766KOjsr/n1q2VaQvulK99b4nMsCRYffNs+WzzJTKsYqKMyJMruRCS7Tjl56f4AOUXHDkjP0Z/9FhZYMvDS2X83BaZYglw+nd/IFdYArz0uz+S4mu+JSUVuX/9TkJyAV1+fosPUH7BQfnlYGzctk2mLlgozS2LZdIdi2XKHXfKxYv+WWb88Mdy+ey5Eonm7r36CMkllPyCEB+g/IKD8svB+N+PPS4T5rTI5AWW/CwBTm6502KxXGZJ8KtXf9P4WRNC/AfyC0p8gPILDsovx+LDvfvk77+7RCbNt0RnSe+rlgCbWxbJJbe3yJVzFspX/+Fa42dNCPEfyC8o8QHKLzgovxyLf3/uBZloZX1TrEwP8pvUslC+8cMfyRVz5suMW+ZJw2TzldEJIf5TPPFm43K/wAmaBIfpM3eD8stgHOxol+vuvkcmzF8okxcskuZ5LfLtH98jjzy2QmbMssQ35WLj50wIISQ9Mia/np4e+x5OYQXx/GvrZeLcBTJpwZ0y3sr4rvn+EnnqqZXy+PIV0jT1MuNnTAghJH0yJj9Mcne70Wu+s2f3bum03v/sBx6SCfMWysQ7FsuMxd+TXz35G3nhhefld7/7vdTV5t8FrAkhJFvImPxweTNkf2Fl+779ctP9D1ryWyCXLbhLlv76cXn2mefkzTc3yeZNm6W+jhPbCSEkKNjnl4H4qKNNNu/fL/+9c5c8uPIZufeXv5Jnn35GXn11jfT2HpZ3t70rDQ0Nxs+YEEJI+lB+GYgP/nJAVrz0smz44EPZunOP/OcfX5AXX1wt7R2xO2Rv27aN8iOEkADxWX64k3u3fQJnmANX9Ny0Y5d8a/EP5KXNW+XVN96Qp//wtGx8Y2OsgBWUHyH5w9SpU43LSWYJQH7M/BJF9+E++c1/rZOX3npbXln3mvx+5R/klT+9Iq2tB+MlKD9C8olf/OIX9mA+PM6aNctYhgw9lN8Qx/v79stTa16Tt955R37/1O9l/Yb19shXPSg/QvIHCK+1tXUAq1atkvvuu0+uuOIKYx0SPL73+VF+iaNHDlv/+uStt96Utza/fdStjRCUHyH5A77LTvnpICtcuHChsS4JDsovQXyw+y9y7V1PyuUty+VrC54YPAufkOlg0RNy+cIn5drvrZD39x6Iv8rRQfkRkh/ge4zsziQ9BbJAft+HHsovQXyw+yO56o5/k6m3PiLTbntULtbA/1NZNnV2jCm3PSL/YG1z10ft8Vc5Oig/QnIPJbolS5bY/XsbNmywszr16JQelqGsaVskeNjnl7Hoi3N0UH6EZDf4fs60fuzrotuyZYv9iEwOyyFC9T1GGV18KMvBL5klAPlxqkO6QfkRkj3guwhRYYDKSy+9dCSTw3NIzSk6E+jTU+JDPX6/M4/v8jvEzC/toPwIyQyQmC46SE4XHdYlE50J1FHNnPxuZwe+yu/qa/6RmZ8PQfkREiz4fkFIyMj0/jk0R6ppCGjW9GuCOl6PzZzZRcryGz9+vMyZM0dun3O7BR7n2P+fPXu2/Ozee+05a72GCzmHnd7eXvsRF/ROFpQfIf6B71Ki/jk1347fuXCRkvwikYj89ndPWXLrla6OTot26Wpvl872NuloAwflYOsBG71zl7TKgQMHpM36jCg/QoJDFx3EpovOa/8cCQcpye+WW26xTuQHpaOj2zqRW6KzTuag9aD13KbVEl+rtB0wCyDsIPvzEpQfIcmBxPT+OUhO759DkyZFR9zwLL/GxkZZv369HDrUbUnOEp7z5H7QyvaAczmx6ezsjKsteVB+hPSD74IuOtU/B/T+OZQx1SfEhGf53XjjjdJuZXttJvGRhCArNl3GzC22bt1K+ZFQokSXqH8OouP3g6SLZ/ndd9+90mNlfW3Widx0gifudMTv0+c1KD8SBvA3rvrnHn/8cfbPkSHFk/yi0aj89re/k95DPXYWYzrBE3e6u1Ob/rF9+3a7mdl0LAjJRTBlQO+fU02Xqn+OoiNDjSf51dXVycaNb0lXV489uMV0gifuYIpDKvH6669LdXW18VgQks1AXnr/HOTG/jmSjXiW3xub3pZ2O/Mzn+CJO6nK79FHHjUeB0KyCSU6vX8OkmP/HMkFPMmvorJSVv3xeek+1Ev5DYJU5Nd7+LDceNNNxuNASKaAwFT/nHMgCvrr2GxJcg3PA15+9i//Iocx4KWVzZ6pkkqf3/MvvMAmT5JRIDD2z5F8x7P8vnXDDdLa1mZfpcR0gifueB3t+d7778vl06cbP39CgsCtf06JDusoOpKPeJYf+v3WrFsjh3q67cns9tVc7Ku6DB578IxjWTYw2P1KVC/Z1V1w5ZzbbrvN+NkTki6QFyTmvJCzs3/Orws5E5LteJYfuH3u7XKwvVU6u9qlo7PNzmjSob0zhmldJhnsfiWql+hWT3/eslWum/lPxs+ckFSB6Nz655TomM2RsJOS/DDf78abbpSWljtk/vx5ceanxbw4pnWZQu1TqvuVrN6CBQvsX96KRYsW2csnTZ5s/LwJSYYuOohNiY79c4QkxlV+Ty5bRvIM03EmuYOzfw6S0/vneCFnQryTUH6m5SQ34fHMHSAvXXSqfw6y40RxQvyB8gsJPJ7Zid5s6dY/h/XM5gjxF8ovJPB4Zh5ddLyQMyGZhfILCTyeQ0uyCzmzf46QzEL5hQQez2CAvPT+OcgNkgPsnyMke6H8QgKPZ/oo0bF/jpDch/ILCTyeqQGBuQ1E4YWcCcl9KL+QwOPpDgSWqH+OoiMk/6D8QgKPZwy3/jklOqyj6AjJfyi/kBC24wl5QWIYVamaLSE5Z/9cPl7I+frrr5fOzk77urFr1641liEk7FB+ISGfjydE59Y/B9GFrdly48aNtvgQy/g9JsQI5RcS8uV46qKD2JTo2D8X4+67745rT+Tjjz82liGEUH5DytKlS2X16tXS0tJiXB8kuXg8nf1zkBz75xLz3nvvxdXHrI+QRPguP72/wRn4Yq5cuVKmTZtmrJsLJHp/u3fvtvtY3N6fCpzETeuDJJvlB3npolP9c/icOFHcO8z6CPGO7/LTv4CQnQJfRhWQRCYFiMwLAYmZ1ifC7f3hPanAdk3vT5VB9udcFzTZIj+92dLUP6dEx2yuH/zgSjc48IWQgQQqP9M6lTVBGM71Q0WifUxGorqQqnp/GHTgXJ9JvBxPv4Wji44Xch48+t/cYCOT3zdCspEhlR9AP4QK/KI1lQmaoOQH8AsbkW3NTomOJ+QDGSHzMq33Ai/kHBx+ZH7Z9mOMkEwz5PLT1+O5vg5fcnxJ9SZSPMeyRKJEEyOaEp1Nq+hf1Mvpr20KZ3kTyd4f9kOFcx2yHoTb6+CHAX6h64H/+zFwwe14QkaQXmtrqy0rUxkdlNf75yA31APsn8sMuhyz7UcXIdlK1shPbzLEI076ugiwzDRKEuJTfWmqnt7/hv+rsqhvWq/AaEx92yaSvT+V+ZkGteA1EKY+P31uFvYPZXWZp/vL3XQ8ISlIC+JT6JmZEh3757Ib9aMK4eVvmBCSAfmpzAiiUssgMCU+nOT1wSL4VatkZfpVq4Riqqfk4ZRNsn1MRKK6WKfeB54717vJT/9MnPWQJapIJwPUjydkBZk5xQfQN+cUHS/knL3of4/4+zKVIYQczZDKTz+R6wJQJ3+3Jhu9WUcXgHotSEMXnwK/ghHO7Sbax2Q4TzYKPZt0+/Vtkp8ufje5qazQ7fPxgjqekJdq5jSBZkyKLndQf1MI/G2ayhBCjiZQ+elyUCd4hLMJT4kjUZ+b+pLrdVUTo3N7CohFhb7cL/m5Bd6rSYDqPejyU4JGHb2sjv6aifo+E4HjCZkhmzNJT5HOoBcytODHkgpTMzshxJ1A5acHTu74gmK9s44K0zqFyg4hELVMyQQZEZ6bUKFvW99HtcwrieqiT1Hvf3G+H7U/pqwX6/SyTlQk+owSoR9PDFZxa/bEMr0eyTx6y8dgg/P8CBnIkPf5mVCR6MSeSH7JAuLVm0UHs48KL3XVfjmFppZnWn46ah4emjuVANncmV3of3ODjWR/X4SEjbyQny4TLwxmHxVe6qp9RejLs1F+TpAVUn7ZhR+Zn1vXACFhJSvkp0Zleunz05tv1ECQVPs7skl+qt/Ga5+faWCPFwZ7PEn2off1JfvRRAgxkxXyUxLDwBfTev2XL7avluviSEUKQcvP7f2Y5Ke/N7dRomp76ZzoKL/8Qf1YRJjmvhJCkpMV8sMXWIWzeQZyUKNBTSd/dSJAGZTV10GIyCadGaU+CtRNOG4ke3/6dA7n1AWT/IAaJAOJO09m+vZ08acK5Zcf6Fmf87tCCPFOVsgP6F9qSACiULJAQG6m7A6yQHkVKId6SpgI00lC3zaeQ6JOKZnQ35/aR4X+izzRazpfB+9L31/1HvTtOUWaKpRffqD/TTh/7BFCvOO7/FQWl6gPyw2IBVmQU2aJ+gIB5IG+QF0g2Aa25SYN1FHNiQicVLw0IeGEo++fHuo13TI0leGZ3g/2B1J0vgfso5f9Sgbll/uo7xaCUxcISQ/f5UeyEx5PQgjph/ILCTyehBDSD+UXEng8CSGkH8ovJPB4EkJIP5RfSODxJISQfii/kMDjSQgh/VB+IYHHkxBC+qH8QgKPJyGE9EP5hQQeT0II6YfyCwk8noQQ0g/lFxJ4PAkhpB/KLyTweBJCSD+UX0jg8SSEkH4Syo/kF6bjTAghYcRVfoQQQki+QvkRQggJHZQfIYSQ0EH5EUIICR2UHyGEkNBB+RFCCAkdlB8hhJDQQfkRQggJHZQfIYSQ0EH5EUIICR2UHyGEkNBB+RFCCAkdlB8hhJDQQfkRQggJGeXy/wFYgIHcNWbTuQAAAABJRU5ErkJggg==)\n",
"\n",
"> What it the proportion of bios that were predicted incorrectly as NOT `poet` (i.e., something else) even though their true label is actually `poet`?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "8GljHos7mV-O",
"outputId": "3042f0d0-def7-474f-82b3-afd052143af9"
},
"outputs": [],
"source": [
"actual_poets_test_df = test_df[test_df['occupation'] == 'poet']\n",
"print('Poet - False Negative Rate', (actual_poets_test_df['prediction'] != 'poet').mean())"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YpB6kAr9mV-O"
},
"source": [
"### Metric III: False Positive Rate\n",
"\n",
"![FPR.png](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAcIAAACsCAYAAADokazbAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAAFiUAABYlAUlSJPAAACMGSURBVHhe7Z17fBXlmcdDqFZ7sWpbLcKutlrXayEJJjm53wgkUMAbZXWr1VbWurpVFAQMF7U32q5du952lT+q3W5XROnFgpe6imutgMKKIhaQXS/csRJyJSHPvr855w1vhvdMzsk5OZk583vy+X5OMvPOm5k5c+Z7nvd9ZyanuLhYCCGEkLBCERJCCAk1FCEhhJBQQxESQggJNRQhIYSQUEMREkIICTUUISGEkFBDERJCCAk1FCEhhJBQQxESQggJNYEUYWlpqZSVlZEEwf4qjf1u25+EEBJmAiHCr3/963L33XfLQw8+KL/4xSPy6quvyvr169PIOlm37rUY+D0KpkeJljtcpm+5I+lb5nA9Zl3uZQ7PS6aMrZx73prVq+VPL78sq9eskV/96j+lpqbGup8JISSM+FqEV175DVm5cqW0tLYKY+DR1dUlBw92Kg5KT88h+dcH/02KI/Z9TgghYcOXIqwZP0FmzZkje/bti53Ko9ETwx7ec/uPRJbXZRItN7Rx6NAhOXDgQC/NB5qlra1F7dfdcsWVV1r3PSGEhA1fivD+Xy6V7p5kRKLFM/Ty8VMgA2xubj4sQvX7/gP75WB3pzz00BLrvieEkLDhOxE2NDbK21u2xE7licqNInRHj/oi0dLS0ivAXiE275f2jjbZuGmT1NXVWd8DQggJE74T4X333x87lTNSie7u7j4S7JWhAn2uH+zcKdOnT7e+B4QQEiZ8JcLy8nJZt35d7FTOSCV0s6iNA80HHBnefPPN1veBEELChK9EOHHiRNm+fXvsVM5IJTo6OqwSBMgUW1ta5OaZM63vAyGEhAlfifCaa65xMhlG6tHW1maVIIAIkRHOZEZICCE+E+GMGRRhmqJVic4mQYB5H+zYIdOmTbO+D4QQEiYowiwMjBi1DZTRoNn0lTVrpKKy0vo+EEJImPCdCDspwpRDX0hvkyBGjeJOM/c/8ID1PSCEkLDhOxF2dFGEqYZXRtjW2ia7d+917t9qew8IISRs+E6E7RRhygERdnR2SqcLNDv3HOqR++6737r/CSEkjPhMhNdIJ0WYcmBE6Isv/VGef+F5WbXqhSgvvCAvvfSSPPzwI1JdXWXd/6mSV1gshUX2eYQQ4lf8N1iGIkw5tm7dKrV1dRKJRKSkpKQPtv2eDvKKIvKp/Ao5Ia9MzhtrL0MIIX4kK0X4/r735Pf/s0JWblgZ5Y2n5KkNT8nvX39K3tjytMieZ6Rn99N9OLTrqb645lvLKNxlbOUSKdOH3SulZ8+zIl0fxrYoudi6dUtGnzn4ldGj5aTzC2TY2GrJGVsnR+dVysiCUslndkgICQA+FGFX7HQ+8Fj9zmqZ9egcmbusSeYsm69Qr0ub1LQFsuyZJpFNc6Rn063+5a3Z6nW+SPv/xrYouYAIa2trrft4MDj1y2fKCV8YIceeeoYM/0qJkmGt5BbUyqcKKuWLF5RKYXHEuhwhhPiBrBTh2m1rZe5j86XpiTuiLL9dvS5SYrxDfv3c7SKbm6Tn7dv6IH9WgnQRt4xaXuMu0/NnVc6Y75RR09zl+pQx/meUeWqZ26Wnw/8ivEBlfcfnl8sxZxXI8V8YJSecPEI+/uXzZVhelcoQx8lwJcWTCiqkyLIsIYT4gaxsGl297RWZ/Z/zZO7SBYr5SopNjhhveXShPPaHeSJv3yry1iyVGc5Wr3Ni4Pe+9Kj5yM6OmIfpaSK6DkfS85YSYvu22BYlF1u3bM2YCE/LL3WElzO2XnLzKuXY08+REz5/khx/yig56uxCGZZfLyPGVlGEhBDfkpUi3LZnqyxds0yWrV2ueFweX7tM8bg8unq5rH7rcZGdih3gCZHty6M4f/elR5XrsUzvnZcGbHXLjmVyaOdy6Tm4N7ZFyUWmRJhXGJFj86qVCOuV8Ookp6DWaRYdfm6JfGrUqXLi506ST596ppxXUGhdnhBC/EBWijDskSkRnpxfIbmOBGsVNQolRUXu2DoZpqR49Fl5ctJfnSqfH3Gy5BcUWOsghJChxn8i5C3WUo5MiPCswlI5GhIsGOdIMBcCjIlwWEG15ORVycjxl8i5RaUyatQoKSxkVkgI8ScUYRbGYIsQA2ROKKhSwlOZXywb1BLMgQTzq+ToohoZO/liOXnECBk9Zoy1HkII8QNsGs3CGGwRnja2VD5WoASI/kGdBRrkjKmUsydfLqeffZ6cfvqXrHUQQohfoAizMAZThKMxQCa/MtoPeIQEa5wm0RPKJ0leTaOc9PmTZOzYsdZ6CCHEL/hKhDMowrTEYImwqDgiXygod64NhPhyIUTFsPyqXhHiQvr8KdPllFGjpKpxslTUDG5fJSGEpIr/RMg+wpRjsER4ZmGZHFVgjA51iTBnTJX8Vf2l8jf5F8hZo/Pk2kWLpXpcvbUuQgjxC4EdLIOHy9qet4fn8GU72E48fDdeDIYIL1DZ4PEFuFzisAijAoxJUHFMYYOMbbxQThoxQi69/mYZP/Uia12EEOInAivC9vZ22bFjh+zcubMPmBYGuru7Y3viyBgMEf51fol8zMgG+1LlDJA5Z9LX5LQzzpSSxiky7bobpTjCe4wSQvyPz0R4TcIiREaEsmEFD9+NF85jmNIowtEXFMsxY1TmF+sbdIMBMsdXTJL86nEy8q9Pk4l/e4WUV2fu6ReEEJIKPhRhZ+x0zhhopFOERUXF8vkxGCCDi+f73kFGk5tfJ6MnT5dTTj1Vzh+Tb62HEEL8CkWYhZFOEZ4zNiIfw/WCzsXz+gL6wxLEAJmR4y+Wr5RVySkjR6ll2BxKyGDx9ttvk36w7bf+oAizMNLVR1ioOCG/XHLH4jZqrmZR3Fkmr0qOVnK8cGaTXD6rSWqnXmKthxCSHgZ6og8LFCGjN9I5WOZzY4qj0sNAGddgmZzRFXLuxVfJ9Yt/Jt/554fk4m9dZ62DEJIeKEJvKEJGb6SraTS/OCKfvKBShp1fJMNGl0iuc0/RaPNoTl6NfDrSIFcsWCzf+cl9cs3CxVJVx2sGCRlMKEJvQidCjBrt7OwMLZkYNToiUiW5FY1KhCorPDvfEWJOXoUC2WCllH/rJkeC//CT+2XS5Vda6yAkTOQ33GSdni4oQm9CJ8J41xGGBc/rCNMgwnMjZXJ0+XgZVtkguQVKfueMUeRJztkFckx+hXx58mUy4/s/le/cdZ9cdvM8KSkts9ZDSJgYfdFPpaDxFuu8dEARehM6EUIELQdawklLi3dGmGIfYWFxRD5bWhvNBssbZFhRjZJggUJlhX8zRuq/daPc8JMH5B9/fK9cveAHUj2hwVoPIWEDIrzgsgcHTYYUoTehEyEjfqQqwi9FymV42XjJUTgiLK2XnPMKJeesMfLFuq/KN5t+KDMW/kimXfsdqeCF84T0AhHWzfzvQZMhRegNRcjojVSaRvOKS+S48noZVgEJRskpHyc5oyPyia8US/mUSx35lVZUSoS3UCOkD1qEgyVDitAbipDRG6mIcGRxpQx3mkSVDB2UDCvqJbewUk69oJT3DyXEA1OEgyFDitAbipDRGwMV4bklZXJM+QTJRXOozgYrJjjZ4SdLa2RspMS6HCEkiluE6ZYhRegNRcjojYH0EeIxSyeWjpPcSkhQya+XBhmupp1eUm5djhByGJsI0ylDitCb0IkQoyZxLSFew4hXDCQjPC1SIcO1/JAFxjLC3IqJcmJZrSNK23KEkMPEEyFIhwwpQm9CJ0JcVL57927Zs2dPKEnndYR5SnLHltQ51wxqAer+wY8rKZ4X4TWChCSClwhBqjKkCL0JpQj37t0bSvbt25dWEY5ENljVqDLBviJEM+nIkirrMn6iYNw3JX/qHYQMOf2JEKQiw2RP9Ndee61z8xFbvPvuu7JixQqZOnWqddkgEjoRMuJHMn2E5xVF5Kiy2lhz6GFylASPLR0n+QEYIAMRFl3xiFRet4IMMe2d0S9oi5ast85PBtSBQJ22+X7FJj83A5Vhsif6u+66y9mHCIhP89FHH8WmiuzatWvIZahl3dTUZJ2fKBQhozcSFSEes3R8UaUzKjSaBcZEqDLDXDXtzIAMkIEIS65+1HrCCTK7Pzr8WXh/T5u1jI2b7nld9rd0xZYUWf7idmu5wUDHvU9stc5PBtShwzY/6AxEhqmI0DZPCwhydM/PJDqwTrb5iUIRMnoj0abRLxWXR+8a49xB5nA2iAEyny2pdURpW85vZKsI3ZGo0Db+b3NsiWj88Y191nKDgQ6KMDGSlWE6RQiWLl0amytOM6qtTCbQQREqHBF2dcR2CWOgkYgI84uK5ZMRNIk2SI6ZDSo+VlIv5xeXWpfzI9kuQmSD+tVWzuTyO9dIR+chp7zOCilCf5OMDNMtQnO+W0JoLl21apXTdKoDGeTmzZv7FRYEay6Hptg1a9Yc0QTrFfg/ZtlEYEbI6I1ERHhKUWX0mkGjb9C5kL6iUUZGKq3L+JVsFyEyQR1o9rSV1Ty7drdTDlmhFihF6H8SlWGmRAhhmSLTfYvmwBvIzqxLs2HDhliJw8vpgBDNzNM9H/9TT4OEzXoTIUtEOEM6KMKUo78+wnNVtndMKZpAcSu1wyJEdviJklonW7Qt51eyXYQQwtbtLc7vr/35L9ayGp0FYpn+RAipQphmfyJ+xzQv4erlzMwTAsY8HfFECKmbfZ9YFtuETNZdNkwiBInIMN0ihGwQEJw5XcsJYjLFBUGaonM3p+r6bMshw0OgbnMZoKO/TLM/QifCrq4u2b9/fyhp3t/s3EwgXniJEP1+ny2pUdmgS4JloF6+XBy8awbDIEKdFUI+trLgkaejJy/IBX97ifC7D2/qFRleUVaXR2AayiSynP7b7Ju0idCc7/5/WGe3fMMmQtCfDNMpQlw6ocPMvsxl4vUb6lGnkKKeBtnpjNEmNMzX4a5XB0WoiIowsT7Cjo4O2b59u/Nw3rCxc9fAH8x7RkmFDK+I3Uy7V4T1atp4OTFSGZgBMiZhECH+1sKJN2hGZ406O9OicYvQ7EeEnMxsDDLSGZsWqgmmIVDGXA7/wwy3CPV8LGcKD3Xo9cb6msuEUYTAS4apiFA3OQKzidOUGUBfHsKrj05nfhCinrZkyZIjprnRza1u4emgCBXJiBC3GUNWGFa8Ip4I84tL5NNl42RYZd/LJXKVCI9SnBcJzgAZk7CIEE2ICLcwAOSCgOC0oOKJUEvJJjqg60KY0tUZJ8KdvQFThqYITfG6Bann6zDrDasIQTwZpiJCM7wGvehmUa8+OrNePc1sZjWla6IF7K5bB0WoYB9hesImwiLFKaU1klvVKDkq+8NTJTQYMHOyM0AmmPcTDYsITUG5RaQliQxPT4snQp3x6czRhl7WrE+LziZijQ5TeO4mWxt6nczlwixCkHfJ3ZLfcGOfYz0VEdrm24CwEAMVYSLhFp4OilBBEaYnbH2EuBzi6JL6PgJ0cO4gUxeIO8jEIywiBFpQ5qAZM+My+/XiiVCHWa8bm/Ti1Weiw6xb14V1RB029PqbdYdZhMgI8yfOPuJYD4IIsbxZNhF0UIQKijA94c4ICyMR+VykWnJxzaCSn9MsGssKh1U2yuklwbpcwk2YRGgOmtFNoHoasipdDkAwCL+IMJEwlwurCONJEGRChHpUaCJ9hOjz09P0xflo/jTLJoIOilBBEaYn3CI8o7hchldMdK4RRDNoblm9Ao9YapTPlAb/EUthEiHQ2ZPuv9PNiu5BNH4ToVlXIoRRhF4SBJkQoXm3GYz0tJXRo0bNrBEjQXUkKzQdFKGCIkxPmCLMV5L7VMk4JcBJklMevWQCIsTrUUqG52bBI5bCJkJ9GQIEqGUBOZplQDxx6ZGfifQRmk2wXoN1NDrMdTazWLNsf4RNhP1JEGRChJCfFh0yPlOG+F1njMj83KI059mkBsliVKp7uh5NapuXDFkiwsTvLIPLB7CzcRlFGPF6OK8pwlOQDeKOMbERoprhuINMSfUR70EQCZsI0Q+oQ2eDprA08URoitScrjEH5diEhtDNsia6XoS5XLz6+iNMIkxEgiATIgR4CoQe4Ylwj/rEq+1JERCjlhoCQtXL6rBdXmEOtMHyYCB9jaETId6IDz74wHqdXbazc2f/1xHW1dbIeYXFcjSywbIGyVXyA1ERNsgnSuukIEueOh82EQItQB3uUaQgnghNkZqjQgHq0XW7Mz9zUI57OVOCCPc66/lY3rY9kKxb5mERYaISBMme6CErBM6XtvleoKkTGZ7ODhH43XbPUDe4WN+UHwJ/ez3/EDI0RRvvFm5eZI8IOxPPCFtaWqS1tTV0tLW1ed5Z5p133pHqmlr5TGFlVHzoD3REiMywUWWIE+Q0lSna3oMgEkYRmtlZvKbKeCIE5vKQE8rq8gjI0Jb1mcuhiRXLmHLUdbjXGXWZ8tbLmv/TfXlFGESYjATBQE/0YSFrRNh58GDs0GcMNDarg+H8ylrJLRkvOU4GaNxFprJRjivBHWSyIxsE2SpCLRjbrc7M7MwmSqDv2GITIcByKKPrQUBWXn2HQC+nA8vgWkHM09Nt6wxQtyk/BP7GdLd4deaK9TOnZwvJShBQhN5QhIzeWLdxo5xYXu88XglZYG5ZtFkUYIDM2QF6xFIiQIR8Qj3xCzbpuRmIBAFF6E2WiHCGHOyiCFONG7+3WIYVVTnXCQ4va1REJYgm0VGRKuu+DzIQYf7UOwgZckZf9FOr+EwGKkFAEXpDETKceGH1a3Ic7iBTGr1gvneATOUEORaPWMqiJlFC/EZ/IkxFgoAi9IYiZEhza6vUXfX3klNY64wUxejQ6H1F0URaL2dEsmeADCF+xEuEqUoQUITehE6EuI4OIyfDihm4onDj/22Tpn99SI6D+CLjJLdsopMJ5lSo7FBlg8eV1GTVABlC/Eg8EaZDgoAi9CZ0IsRF5bjocvfu3aFjz549fa4j7FZifPiZp+Vr3/+RTJh7u5x26ZUyvFRJUIHLJY4qq5dzAvqIJUKChE2E6ZIgoAi9CZ0IOzs7Ze/evbJv377Q8eGHHx6RFT67do1MnX+7fO17i2W6EmLFjbPks40XybDSBhmRJXeQIcTvuEWYTgkCitCb0ImQcTi6VHbY9NASGT+nSSYrGU678wdymZLhxXf+SPKv+rYUlAb/fqKEBAFThOmWIKAIvaEIQxwbtm6VKfMXSGPTIpl42yKZfNvtcuHC78r0H/5YLp01RyIlwX3WICFBQotwMCQIKEJvKMIQxz89+phMmN0kk+YrESoZTmq6XbFILlFC/OqV37Lua0JI+oEIB0uCgCL0hiIMaXywZ6/87Z2LZeI8JT0lwK8qGTY2LZSLbm2Sy2cvkK/+3dXWfU0IST8Q4WBJEFCE3lCEIY3/ePZ5aVDZ4GSVAUKEE5sWyDd/+CO5bPY8mX7TXKmd5H2XeEJI+shvuMk6PV3gRE+8se23/qAIAxwH2lrlmrvulgnzFsik+QulcW6T/OOP75aHH10u02cqCU6+0LqfCSGEHCawIuzq6nKejRVWEM+9uk4a5syXifNvl/EqE7zq+4vlySdXyGPLlkv9lEus+5gQQkhfAitCXFAf76G12c7uXbukXW3/rPsflAlzF0jDbYtk+qLvyS+f+LU8//xz8tvf/k6qq7Lv5tqEEDIYBFaEuMUassKwsm3vPrnhvgeUCOfLJfPvkCW/ekyeefpZeeONjbJp4yapqeZF9IQQkgjsIwxgfNjWIpv27ZP/2bFTHljxtNzzi1/KM089La+8slq6uw/JO1vfkdraWus+JoQQ0heKMIDx/l/2y/IXX5L1738gW3bslv/6w/PywgurpLUt+uTvrVu3UoSEEJIgPhMhnlDf6ZzMGfbAHUY3bt8p3170A3lx0xZ55fXX5anfPyUbXt8QLaCCIiSEkMTxoQiZEXpF56Ee+fWf1sqLb74lL699VX634vfy8h9flubmA7ESFCEh2cS9994rK1eudF5nzJhhLUNSgyIMWLy3d588ufpVefPtt+V3T/5O1q1f54ygNYMiJCR7mDlzpvqi29wHiHHx4sVy2WWXWZchyeG7PkKK0Du65JD66ZE333xD3tz01hGPY0JQhIRkD/gsu0VogsvGfv7zn/MznwIU4SDG+7v+Ilff8YRc2rRMvjb/8YGz4HGZBhY+LpcueEKu/t5yeW/P/th/OTIoQkKyi82bN1slCNavXy9TpkyxLkcSgyIcxHh/14dyxW3/LlNuflim3vKIXGiAv5OZNmVWlMm3PCx/p+rc+WFr7L8cGRQhIcEFn100h6JP8MUXX+y9WYhNgswE0wP7CAMbPTGODIqQkGCAzykGwKC/D9JDdgfpmYNjUGbBggV9BIgyWMZWJ0keH4qQl0+kGhQhIf4Dn0kMboHAkMlBemjyhADxN7LAeINfMN2UIMraypGB4TsRHmRGmHJQhIQMPZCXbuLU0sMrpKdHfCb6OUU5SBDSZH9g+vGVCK+86hvMCNMQFCEhmQWfN3e/HqSnmziTkV480DzKz/XgkHERjh8/XmbPni23zr5VgdfZzt+zZs2Sn91zj3NNXLflJtNhp7u723nFzcb7C4qQkMEDny1bvx5+N/v1bMsSf5JREUYiEfnNb59UouuWjrZ2Rat0tLZKe2uLtLWAA3Kgeb+Dbg8nUfbv3y8tah9RhIRkDnyOkM3F69dDlob5tmVJcMioCG+66SZ1Uj8gbW2d6qSupKdO7KD5gPrdoVlJsFla9ttlEHaQFSYSFCEhAyNev95jjz2WdL8eCQ4ZE2FdXZ2sW7dODh7sVMJT8nOf6A+oLBC4pxOH9vb2mOb6D4qQkP7BZwTNmGa/HsSXzn49EgwyJsLrr79eWlUW2GKTIPEE2bLtVmrxYsuWLfwAE2KgpYesDqJjvx4xyZgI7733HulS2WCLOqnbTvYkPm2x5wwmGhQhCTM49tmvR5IhIyIsKSmR3/zmt9J9sMvJbmwnexKfzs7kLinZtm2b0xRtey8IyTbi9euZT2jgF0PiRUZEWF1dLRs2vCkdHV3OwBjbyZ7EB5dNJBOvvfaaVFRUWN8LQoKMbuI0+/UA+/VIKmRMhK9vfEtanYzQfrIn8UlWhI88/Ij1fSAkSGjpIavDqE1bvx7vskLSQUZEWFpWJiv/8Jx0HuymCAdAMiLsPnRIrr/hBuv7QIhf0f166L9jvx7JNBkbLPOzf/kXOYTBMs1sGk2WZPoIn3v+eTaLEt+DTM7s10Omx349MlRkTITfvu46aW5pce6OYjvZk/gkOmr03ffek0unTbPuf0KGiv769XjpAhlqMiZC9BOuXrtaDnZ1OhfOO3eRce4mM3CcgTeuaX5goOvltVx/d5XBHXtuueUW674nJFN49euhiRPz2MRJ/EbGRAhunXOrHGhtlvaOVmlrb3EynVRobY9imzeUDHS9vJbzejzVnzdvkWtm/L11nxMyWOh+PTRxmv16eDX79ZjtEb+TURHiesLrb7hemppuk3nz5saYlxJzY9jmDRV6nZJdr/6Wmz9/vnNy0SxcuNCZPnHSJOv+JiSdQGjs1yPZSFIifGLpUkKs2I4XElx0E6fu10Omx349kq0kLULbdBJueFwEGy099uuRsEIRkpThcREsvPr12MRJwghFSFKGx4V/gdB0v56+dEH362EapUcIRUjSAI8Lf2A2cbJfj5DEoQhJyvC4yDwQGrI5s1/PvCUZskD26xGSGBQhSRkeF4MP+/UIGTwoQpIyPC7SC4TGfj1CMgdFSFKGx8XAgdDMfj0IT1+6wH49QjIDRUhShsdFYkBoul/PbOLU/Xp81BAhQwNFSFKGx4Ud3a+HzM7s18PgFj/361177bXS3t7u3Md2zZo11jKEZBMUIUkZHhf2fj2IL4j9ehs2bHAkiFjK95aEAIqQpEzYjgsILVv79e66666YAkU++ugjaxlCsg2KMOAsWbJEVq1aJU1NTdb5mSCbjwsILUz9eu+++25Mg8wGSXjIqAh37drlfMDQ/+D1IcNJXZezzc82zD4Zd2CfoZ9m6tSp1mV14ORsm58JskmE8fr10MTp5369dMBskISVjIrQDJz4453czQ+kbX62YW4vvpFr9BcHRLz9pcsgK3TPyxRBFaFu4jT79UC2Xa+HL1qpBgfNkGxmSESIb5uIeB+uMIvQPQ/Zsc4WMYjBPd8PBEGEWnrI6iA6W7/elClTrMsGHfP4Gmjgi5mtbkKygSERoTkqDd9W3eUowr7gCwPCr81VfhNhvH49vJr9etnaxOkmHRmhX7+EEZIOhkSEOPHrrNDWt+UlBpRHrFix4oh5XvPxt56OJkY9KADZFgac6HJoYtTrhkB98Qai4ASDE4TZv4fyZn2J0J8IsU463PP62x/oi9XbqgN/p3MgxFCL0N2vh0wvLP166cAUJfsGSRgZMhFCFjrwt1nOSwz6pB6vTyzefC0TZFem6BC6rJYKAvWYg3vMuoDZZIn6UN6sN9762ehPhDojxPq553ntDzPzxra41zFd3/ITOS50lmablwy6iTNev16QL10YKszjPtkvcYRkA0MmQvytT+J4NcsNpgh16AwKQsM3YnOkqjkoBdMhEf03wHwtQXcmZq673s7+8NpezNP/y1Zff9uLZd3L6ewYkY7MsL/jAn1vkBZkZZsfDy09ZHX6UUO6Xw9NnJiXDrmGGfPYc38OCQkLQypCLR+EeUI2P5x6miYdIrQ1I2o5JHIy0HXFK6szuEQzLvfJSGOOGo33TR3lEOb2mqKOJzqdLaajKczruICsIK/m5mbn1VYG6IwR/Xfs18sc+vhBuL8wERIWhlSEwHZCHkwRurM7jf6ftgzKjRZUPMnoZl+si22+G3N74wXWyyZD2/bq/49lzLIm5v+0DVhKhnjHBTI5LUGNFhmyxHj9ephG6Q0+OH512JrdCQkLQy5CnIR19qJP5uZJWpfTpCpCLznpZRE4McTLwnTofjc3ZiZnW96N1/Yiazb7cMx9B/D/EOb2JrKtQIe7zmRxHxcQGJoyTQFqtPQA+/UygzkYZqAR71InQrKBIRch0CduCBHNekMlQvxvlNNiRiBTdQsx0ehPRBqv7dXo7XLXadveRLYV6EinCJHJQXY2CQI0c7JfL7OYx9dAI9FjmZAg4gsRQkB6NCNO4kMlQg3WB81G5ghLc511pCoQjdf2avT6I8zptu1NdFt1pLod+rhAVuduCnWT7IAZkjrpyAjTNcKYED/iCxECs7/C/N1dLhMiNNH1mX0oWpC2QTcDId0i1Psv0T5CiN9WJlHcx4X5OCK3CCFKsyzJPObnK9nPAyHZiG9ECHTfmpmJucvo/jLbN1Tz5J4uEdqW0wN80nUSSUSE+n+6B/vYRGhmAPH6OdO5Df0dF+agGMgwW29lFhTMzxf6oG1lCAkTvhKhKQQd7jLmt1nzQ4wTvtm3l6wIIVgMCDCzI/yu5Wxmf6ZoIBR3RoX5qCvedrrpT4Re1/3ZRAj0FwbsE/fJzqwv0XX0ItnjggNjhg7z82P7MklIGPGVCIE+seuwlTFHZaK8/huv+hq+ZEWoMyQEygAtVtTrlp15QkHoZcxv225pxcMUoa7HVp/txIUyCPf2mhJH4Hd3fYmuX3+kelyQzGG+//jCZitDSNjIqAi1WLyaY/Dh1B9WnLhtZXCShxR0fSivRaCF5+6/01mQ2dfnBmVMeeh63RLUYDuwHubJBb9jWjKZFrbZzGbNwHSsc7z6dObn3l6A9cb6m9uE+rB+6WwSowiDAd5zHbwcgpDDZFSEJDvhcUEICTIUIUkZHheEkCBDEZKU4XFBCAkyFCFJGR4XhJAgQxGSlOFxQQgJMhQhSRkeF4SQIEMRkpThcUEICTIUIUkZHheEkCBDEZKU4XFBCAkyFCFJGR4XhJAgQxGSlOFxQQgJMhQhSRkeF4SQIEMRkpThcUEICTJJi5AQG7bjhRBCgkBSIiSEEEKyDYqQEEJIqKEICSGEhBqKkBBCSKihCAkhhIQaipAQQkiooQgJIYSEGoqQEEJIiCmW/wdoFXGeatUj0QAAAABJRU5ErkJggg==)\n",
"\n",
"> What it the proportion of bios that were predicted incorrectly as `poet` (i.e., something else) even though their true label is NOT `poet`?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "HGJbzH2YmV-O",
"outputId": "7ece5856-ad58-4247-ed90-a4033f895b7b"
},
"outputs": [],
"source": [
"actual_not_poets_test_df = test_df[test_df['occupation'] != 'poet']\n",
"print('Poet - False Positive Rate', (actual_not_poets_test_df['prediction'] == 'poet').mean())"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "rD6NPUqsmV-P"
},
"source": [
"### Fancy Way to Summarize\n",
"You can skip this part if you understand the definition of the three metrics.\n",
"\n",
"#### Confusion Matrix\n",
"\n",
"Actual class/Predicted class | P | N\n",
"-----------------------------|---|--------------\n",
"P | **TP** | FN\n",
"N | FP | **TN** \n",
"\n",
"\n",
"#### **Metric Definitions**\n",
"\n",
"\n",
"<u>**Acceptance Rate (Positive Rate)**</u>\n",
"\n",
"What it the proportion of bios that were predicted as occupation x?\n",
"\n",
"${\\displaystyle \\mathrm {AR} \n",
"= {\\frac{\\mathrm {TP + FP}}{\\mathrm {TP+FN+FP+TN}}}}$\n",
"\n",
"\n",
"<u>**False negative rate (FNR)**</u>\n",
"\n",
"What it the proportion of bios that were predicted incorrectly as NOT occupation x (i.e., y != x) even though their true label is actually x?\n",
"\n",
"${\\displaystyle \\mathrm {FNR} = {\\frac {\\mathrm {FN} }{\\mathrm {FN} +\\mathrm {TP} }}}$\n",
"\n",
"<u>**False Positive Rate (FPR)**</u>\n",
"\n",
"What it the proportion of bios that were predicted incorrectly as occupation x (i.e., y != x) even though their true label is NOT x?\n",
"\n",
"${\\displaystyle \\mathrm {FPR} = {\\frac {\\mathrm {FP} }{\\mathrm {FP} +\\mathrm {TN} }}}$"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "dqHurjkrOnSP"
},
"source": [
"## 4. Unfairness Metric Results by Gender\n",
"\n",
"Thanks to the additional column `gender`, we can compare the performance of the model separately on the female individuals and the male individuals in the test dataset."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9oLs9dkXOnSP"
},
"source": [
"We can compute the following evaluation metrics of each gender test dataset:\n",
"\n",
"1. `ar` - **Acceptance Rate** - used for Demographic Prity\n",
"2. `fnr` - **False Negative Rate** - used for Equalized Odds\n",
"3. `fpr` - **False Positive Rate** - used for Equalized Odds"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xD3Rq1-_1mPq"
},
"source": [
"We can have a quick look on the first few occupations with all metric:\n",
"\n",
"(have a <u>quick look and continue</u> to the plot below)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 390
},
"id": "_9dNk1qg1mYp",
"outputId": "8aa96794-c43b-4a3c-b5cb-a9d4f3727cce"
},
"outputs": [],
"source": [
"unfairness_metrics_df = create_unfairness_metrics_df(test_df)\n",
"unfairness_metrics_df.round(2).head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "oEZRZ3z9Ph6E"
},
"source": [
"It is a bit difficult to analyze this table, so let's create a visualization. The following plot shows the value for female (blue) and male (orange) for each metric for across occupations. The occupations are ordered according to the Acceptance Rate."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 994
},
"id": "taXRXTYr9S_Z",
"outputId": "6586e988-c63b-4cb4-962f-66bab3d04093"
},
"outputs": [],
"source": [
"plt_unfairness_metrics(test_df)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "x-LqLoXE1GqB"
},
"source": [
"**For reference only**, we also calculate the **gap** (**difference**) between the three metrics between females and males. A positive value means that this metric is larger for females, and a negative value means that the metric is larger for males.\n",
"\n",
"The visualization above should be sufficient for your analysis, but we include this table just in case you would like to findout the exact difference of a specific occupation/metric."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 961
},
"id": "xbsHW-Ct1G4y",
"outputId": "b5859fae-ad71-4fb7-b55b-9daffbb5bab8"
},
"outputs": [],
"source": [
"gap_unfairness_metrics_df = unfairness_metrics_df_gap(unfairness_metrics_df).reset_index()\n",
"gap_unfairness_metrics_df"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YriJqk8lWobe"
},
"source": [
"## Unfairness Metrics\n",
"\n",
"1. Indepenece\n",
"\n",
"**AR**\n",
"\n",
"> What it the proportion of bios that were predicted as `physician`?\n",
"\n",
"2. Seperation (Errors)\n",
"\n",
"**FNR**\n",
"\n",
"> What it the proportion of bios that were predicted incorrectly as NOT `physician` (i.e., something else) even though their true label is actually `physician`?\n",
"\n",
"**FPR**\n",
"\n",
"> What it the proportion of bios that were predicted incorrectly as `nurse` (i.e., something else) even though their true label is NOT `nurse`?"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xlJWj_1TWqtP"
},
"source": [
"## What is the cause for the bias?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 801
},
"id": "GpF_kJoq7zd4",
"outputId": "16ef603a-a74e-48ce-9928-733d3e1f3099"
},
"outputs": [],
"source": [
"import matplotlib.pylab as plt\n",
"\n",
"plt.rcParams[\"figure.figsize\"] = (10, 10)\n",
"\n",
"gap_unfairness_metrics_df.sort_values(by='fnr_gap').plot.barh(x='occupation', y='fnr_gap')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "dWHexocK8q9-"
},
"source": [
"## 5. Unfairness Metrics vs. Training Dataset"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 460
},
"id": "TzWNvjlu83dk",
"outputId": "dd1c0c72-851f-4952-bc2d-ecaf6bce3381"
},
"outputs": [],
"source": [
"# add feamle proportion of each occupation in the training dataset\n",
"# to our dataframe\n",
"female_proportion = (train_df.groupby('occupation')['gender']\n",
" .value_counts(normalize=True)\n",
" [:, 'F']\n",
" .reset_index(drop=True)\n",
" * 100).round(2)\n",
"gap_unfairness_metrics_df['female_proportion'] = female_proportion\n",
"\n",
"plot_labeled_regression('female_proportion', 'fnr_gap', 'occupation', gap_unfairness_metrics_df);\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pcsJcc4jqHP2"
},
"source": [
"**10% difference in the female proportion translates into 2% difference in the FNR gap**."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0OM2Utp599SL"
},
"source": [
"## 6. Fairness through unawareness"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "rS3aYIVM85Bj"
},
"outputs": [],
"source": [
"!wget http://stash.responsibly.ai/3-fairness/activity/data-revisited.zip -O data-revisited.zip -q\n",
"!unzip -oq data-revisited.zip\n",
"\n",
"unawareness_train_df = pd.read_csv('./data-revisited/train_unawarness.csv')\n",
"unawareness_test_df = pd.read_csv('./data-revisited/test_unawarness.csv')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "mGyz8nvx-D-0",
"outputId": "1068361f-f58a-4634-b20a-4b581404f8fc"
},
"outputs": [],
"source": [
"sampled(unawareness_train_df, [\"occupation\", \"gender\"], bio='scrubbing')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "t5ttpCOQ0Tgs"
},
"outputs": [],
"source": [
"unawareness_count_vect, unawareness_model = train(unawareness_train_df, X='scrubbing')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "kMnNV49R3Oby",
"outputId": "c036e11d-43dd-466e-d1f1-170ff90fced4"
},
"outputs": [],
"source": [
"print('Train Accuracy =', accuracy_score(train_df, model, count_vect))\n",
"print('Test Accuracy =', accuracy_score(test_df, model, count_vect))\n",
"\n",
"print('Unawareness Train Accuracy =', accuracy_score(unawareness_train_df, unawareness_model, unawareness_count_vect, 'scrubbing'))\n",
"print('Unawareness Test Accuracy =', accuracy_score(unawareness_test_df, unawareness_model, unawareness_count_vect, 'scrubbing'))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "CRcEngXf6Gr9",
"outputId": "35df9285-2d8e-4339-9a97-7405f7d900a6"
},
"outputs": [],
"source": [
"unawareness_train_df['prediction'] = predict(unawareness_train_df['scrubbing'], unawareness_model, unawareness_count_vect)\n",
"unawareness_test_df['prediction'] = predict(unawareness_test_df['scrubbing'], unawareness_model, unawareness_count_vect)\n",
"\n",
"sampled(unawareness_train_df, [\"occupation\", 'prediction', \"gender\"], bio='scrubbing')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 994
},
"id": "N_7iDpJP8e5Q",
"outputId": "0ab91eda-a8ea-4609-880d-f9248ff9a40b"
},
"outputs": [],
"source": [
"plt_unfairness_metrics(unawareness_test_df)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "l3H1eYvU9A2S"
},
"outputs": [],
"source": [
"# plt_unfairness_metrics(test_df)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "WD4oCSPU9T40"
},
"outputs": [],
"source": [
"# unfairness_metrics_df = unfairness_metrics_df(test_df)\n",
"unfairness_unawareness_metrics_df = create_unfairness_metrics_df(unawareness_test_df)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "arLColN5Bfl_"
},
"source": [
"### Predicting gender from the data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "cxcCW32nBiDp"
},
"outputs": [],
"source": [
"gen_count_vect, gen_model = train(train_df, X='bio', y='gender')\n",
"gen_unawareness_count_vect, gen_unawareness_model = train(unawareness_train_df, X='scrubbing', y='gender')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "pHXQVke6CIZK",
"outputId": "8c1b5337-1439-4284-fa53-7ee3e93f53d9"
},
"outputs": [],
"source": [
"print('Train Accuracy =', round(accuracy_score(train_df, gen_model, gen_count_vect, X='bio', y='gender'),4))\n",
"print('Test Accuracy =', round(accuracy_score(test_df, gen_model, gen_count_vect, X='bio', y='gender'),4))\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "W2QSJBdpeLWP",
"outputId": "0ac36d3d-d0d0-4bc6-f087-c543a29f8ddb"
},
"outputs": [],
"source": [
"\n",
"print('WITHOUT gender markers Train Accuracy =', round(accuracy_score(unawareness_train_df, gen_unawareness_model, gen_unawareness_count_vect, X='scrubbing', y='gender'),3))\n",
"print('WITHOUT gender markers Test Accuracy =', round(accuracy_score(unawareness_test_df, gen_unawareness_model, gen_unawareness_count_vect, X='scrubbing', y='gender'),3))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UBYVLu5Ihapo"
},
"source": [
"## 7. Counterfactual"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "wkfXeoTjDOq2"
},
"outputs": [],
"source": [
"test_df_CF = pd.read_csv('./data-revisited/test_counterfactual.csv')\n",
"CF = test_df_CF.loc[[53044,70661]]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 111
},
"id": "3jfNF9w3mxvw",
"outputId": "61faa659-abed-4767-d892-7656aeb278e1"
},
"outputs": [],
"source": [
"CF.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 526
},
"id": "9Q5Xz1wIh1QO",
"outputId": "7222bcf4-915e-48ba-9daa-306795b826c2"
},
"outputs": [],
"source": [
"sampled(CF, [\"occupation\", \"gender\"], samples=2, Counterfactual=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9vEJPybNcDl5"
},
"source": [
"## 8. Technical Fairness Intervension\n",
"\n",
"1. Pre-processing: Training data.\n",
"\n",
"2. In-training: Learning algorithm.\n",
"\n",
"3. Post-processing: Model."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-2QIv2ficf12"
},
"source": [
"## 8. Fairness is a complex concept; think about metrics as \"flags of unfairness\""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5dq-DfKDcqpz"
},
"source": [
"## 9. Abstraction Error: data is not fixed, the model is not fixed - a computational system is built by many decisions; a system is also deployed in a context"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"provenance": []
},
"kernel_info": {
"name": "python3"
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.0"
},
"nteract": {
"version": "0.12.3"
},
"vscode": {
"interpreter": {
"hash": "55bbdba5d2159c30191d9b81156a2ec7ece345201aa1fcd9b85bbc484276dddb"
}
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment