Skip to content

Instantly share code, notes, and snippets.

@vingkan
Last active July 5, 2021 17:53
Show Gist options
  • Save vingkan/e9762ce3dc1eeb59fa1f77b974eef5ef to your computer and use it in GitHub Desktop.
Save vingkan/e9762ce3dc1eeb59fa1f77b974eef5ef to your computer and use it in GitHub Desktop.
Ethical CS: Quantitative Input Influence Activity

Algorithmic Audit: QII

A big moving company gets so many applications that it has started using an automated algorithm to decide who to hire. You have been called in as an independent consultant to determine if the hiring algorithm is biased against women. The algorithm is proprietary so you cannot access its source code. Instead, you will learn how to perform an algorithmic audit to measure potential biases.

In this activity, you will edit the influence.py module.

Applicant Data

Each applicant's data is stored as a list with five elements. Each element is a string representing a different attribute:

  1. Age
  2. Gender
  3. Weight Lifting Ability
  4. Marital Status
  5. Education Level

An example applicant's data might look like this:

['adult', 'female', 'excellent', 'married', 'primary']

You can see the possible values for each attribute in a global variable called APPLICANT_VALUES, which is a 2D list: a list of lists, where each sublist contains all of the possible values for the corresponding attribute.

Hiring Algorithm

The proprietary hiring algorithm is a method called decide(applicant) that takes an applicant as a parameter. The method returns True if the applicant is hired and False if not.

In your analysis, you will test the decide method on many applicants. To create a random applicant, use the get_random_applicant() method.

Measuring Bias

The measure of algorithmic bias you will use is called Quantitative Input Influence (QII), first proposed by a group of researchers at Carnegie Mellon University. The goal of QII is to quantify how much a given input attribute influences the algorithm's decision. Here is the idea behind QII:

To measure bias, we need a quantity of interest: a metric of the algorithm which might be skewed by certain factors. In this case, the quantity of interest is the fraction of women hired. We want to find out which attributes of applicants, if any, influence the decision to hire a women.

Given a set of applicants, we run them through the hiring algorithm and use the results to calculate the quantity of interest. This is quantity is Q0 (pronounced "Q naught"): the base quantity of interest.

Then, we can start asking questions like: "How much does being female influence the fraction of women hired?"

To answer this questions, we perform an intervention on the applicant data:

  • Make a copy of the list of applicants.
  • Take every applicant whose gender is female and change the value of their gender to a random value.
  • Compute the quantity of interest for the new "intervened" dataset, called Qi.

The influence score is the difference Q0 - Qi. If influence is close to zero, it suggests that an intervention on gender did not change the quantity of interest much. If influence is negative, it suggests that being female has a negative influence on the quantity of interest. If influence is positive, it suggests that being female as a positive influence on the quantity of interest.

QII can be calculated for any attribute and value of an applicant. We could also ask: "How much does being elderly influence the fraction of women hired?"

If you want to read more about QII, you can check out the original paper here. The specific measure used in this activity is called "Unary QII" because it considers the influence of a single attribute value.

Activity

Implement the QII approach to measuring algorithmic bias. There are three methods for you to fill in:

Quantity of Interest

Given a list of applicants, calculate the quantity of interest. In this case, what fraction of female applicants are hired?

def quantity_of_interest(applicants):
    '''What fraction of women are selected?'''

Intervention

Generate a new list of applicants based on the QII intervention. The parameter index is a number corresponding to an attribute (1 = Gender, 4 = Education Level, etc.) and value is the value we are analyzing the influence of.

def intervene(index, value, applicants):
    '''Create a new list of applicants, where those matching the
    given attribute value have that attribute randomly modified'''

Influence

And finally, the measure we've all been waiting for: calculate the influence of a given value of an attribute (index).

def calculate_influence(index, value, applicants):
    '''Calculate the influence that a given attribute value has on the quantity of interest'''

Instructor Notes

This section is a Work in Progress.

  • Students may be confused as to why the quantity of interest is fraction of women hired rather than fraction of applicants hired.
  • The QII paper by Datta, Sen, and Zick is quite dense. This activity is based on the intuition behind their measure, under the assumption that all values for an applicant attribute are equally likely. The formulas described in the paper consider probability distributions. That level of computation is omitted to make the activity accessible to students no matter their statistics background.
  • Need discussion questions that ask students if they think QII is a reliable measure of bias. Under what conditions? Would they make any modifications to the process?
  • Contributed by Daniel Bilar: Truncate influence output to 3, max 4 significant digits. Teaches them not to fetishize (and not get distracted by) non-informative often fake precision.
'''Simple proprietary algorithm '''
from random import (
randrange
)
def decide(applicant):
gender = 1
education = 4
if applicant[education] is 'primary':
return True
if applicant[gender] is 'female':
r = randrange(0, 1)
if r > 0:
return True
else:
return False
else:
return True
else:
return False
from algorithm import (
decide
)
from random import (
seed,
randrange
)
''' HELPER CODE '''
APPLICANT_VALUES = [
['youth', 'adult', 'elderly'],
['male', 'female'],
['weak', 'average', 'excellent'],
['single', 'married'],
['primary', 'secondary', 'tertiary']
]
def generate_random_applicant():
'''Generate a random applicant'''
person = []
for choices in APPLICANT_VALUES:
index = randrange(0, len(choices))
choice = choices[index]
person.append(choice)
return person
def analyze(applicants):
'''Analyze the influence of every attribute and value on the quantity of interest'''
for index, values in enumerate(APPLICANT_VALUES):
print(f'Attribute {index}:')
for value in values:
influence = calculate_influence(index, value, applicants)
# When displaying, "%.3f" rounds a float to three decimal places
print(f'- Influence({value}) = {"%.3f" % influence}')
def main():
seed(12) # The seed number generates pseudorandom values in a consistent way
size = 10000 # The number of random applicants to create
applicants = []
for i in range(size):
person = generate_random_applicant()
applicants.append(person)
analyze(applicants)
''' STUDENT CODE '''
def quantity_of_interest(applicants):
'''What fraction of women are selected?'''
gender = 1
female = 'female'
count_selected = 0
count_matching = 0
for person in applicants:
matching = person[gender] == female
selected = decide(person)
if matching:
count_matching = count_matching + 1
if matching and selected:
count_selected = count_selected + 1
rate = float(count_selected) / float(count_matching)
return rate
def intervene(index, value, applicants):
'''Create a new list of applicants, where those matching the
given attribute value have that attribute randomly modified'''
choices = APPLICANT_VALUES[index]
new_applicants = []
for old_person in applicants:
person = list(old_person)
if person[index] == value:
choice_index = randrange(0, len(choices))
choice = choices[choice_index]
person[index] = choice
new_applicants.append(person)
return new_applicants
def calculate_influence(index, value, applicants):
'''Calculate the influence that a given attribute value has on the quantity of interest'''
intervention = intervene(index, value, applicants)
q0 = quantity_of_interest(applicants)
qi = quantity_of_interest(intervention)
influence = q0 - qi
return influence
if __name__ == '__main__':
main()

Program output for N = 10,000:

$ python influence.py
Attribute 0:
- Influence(youth) = 0.000
- Influence(adult) = 0.000
- Influence(elderly) = 0.000
Attribute 1:
- Influence(male) = 0.001
- Influence(female) = 0.003
Attribute 2:
- Influence(weak) = 0.000
- Influence(average) = 0.000
- Influence(excellent) = 0.000
Attribute 3:
- Influence(single) = 0.000
- Influence(married) = 0.000
Attribute 4:
- Influence(primary) = 0.221
- Influence(secondary) = -0.109
- Influence(tertiary) = -0.108
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment