Skip to content

Instantly share code, notes, and snippets.

@danyashorokh
Last active November 14, 2021 15:50
Show Gist options
  • Save danyashorokh/b2f894c2ab29ba927944493597dca152 to your computer and use it in GitHub Desktop.
Save danyashorokh/b2f894c2ab29ba927944493597dca152 to your computer and use it in GitHub Desktop.
[Python] Information value calculation
import pandas as pd
# Calculate information value
def calc_iv(df, feature, target, pr=0):
lst = []
for i in range(df[feature].nunique()):
val = list(df[feature].unique())[i]
lst.append([feature, val, df[df[feature] == val].count()[feature], df[(df[feature] == val) & (df[target] == 1)].count()[feature]])
data = pd.DataFrame(lst, columns=['Variable', 'Value', 'All', 'Bad'])
data = data[data['Bad'] > 0]
data['Share'] = data['All'] / data['All'].sum()
data['Bad Rate'] = data['Bad'] / data['All']
data['Distribution Good'] = (data['All'] - data['Bad']) / (data['All'].sum() - data['Bad'].sum())
data['Distribution Bad'] = data['Bad'] / data['Bad'].sum()
data['WoE'] = np.log(data['Distribution Good'] / data['Distribution Bad'])
data['IV'] = (data['WoE'] * (data['Distribution Good'] - data['Distribution Bad'])).sum()
data = data.sort_values(by=['Variable', 'Value'], ascending=True)
if pr == 1:
print(data)
return data['IV'].values[0]
@billy-odera
Copy link

how do you call the function or implement the above?

@lakshay92-cyber
Copy link

how do you call the function or implement the above?

Example data frame is:-

  | Age | Performance | Work experience in years | Promotion
1 | 52 | 3 | 9 | 1
2 | 32 | 9 | 6 | 1
3 | 51 | 9 | 10 | 0
4 | 18 | 2 | 20 | 0
5 | 60 | 5 | 5 | 1
6 | 59 | 4 | 17 | 0
7 | 55 | 8 | 8 | 1
8 | 56 | 10 | 1 | 0
9 | 59 | 2 | 17 | 1
10 | 59 | 5 | 11 | 0

For above our target/dependent variable is promotion.
Name of the dataframe is 'df'

In case you want to find the information value for Age, we'll use the function as:-
calc_iv(df, 'Age', 'Promotion', pr=0)

This will result in an output of information value for the feature Age.

@Kirili4ik
Copy link

Kirili4ik commented Apr 6, 2020

You are welcome to check my revision
I think it's a bit more clear and closer to books explanations.

@AzamatGudiev
Copy link

If I have like 100 params, how can I run this function for all params simultaneously, is it possible?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment