Skip to content

Instantly share code, notes, and snippets.

@danyashorokh
Last active November 14, 2021 15:50
Show Gist options
  • Save danyashorokh/b2f894c2ab29ba927944493597dca152 to your computer and use it in GitHub Desktop.
Save danyashorokh/b2f894c2ab29ba927944493597dca152 to your computer and use it in GitHub Desktop.
[Python] Information value calculation
import pandas as pd
# Calculate information value
def calc_iv(df, feature, target, pr=0):
lst = []
for i in range(df[feature].nunique()):
val = list(df[feature].unique())[i]
lst.append([feature, val, df[df[feature] == val].count()[feature], df[(df[feature] == val) & (df[target] == 1)].count()[feature]])
data = pd.DataFrame(lst, columns=['Variable', 'Value', 'All', 'Bad'])
data = data[data['Bad'] > 0]
data['Share'] = data['All'] / data['All'].sum()
data['Bad Rate'] = data['Bad'] / data['All']
data['Distribution Good'] = (data['All'] - data['Bad']) / (data['All'].sum() - data['Bad'].sum())
data['Distribution Bad'] = data['Bad'] / data['Bad'].sum()
data['WoE'] = np.log(data['Distribution Good'] / data['Distribution Bad'])
data['IV'] = (data['WoE'] * (data['Distribution Good'] - data['Distribution Bad'])).sum()
data = data.sort_values(by=['Variable', 'Value'], ascending=True)
if pr == 1:
print(data)
return data['IV'].values[0]
@AzamatGudiev
Copy link

If I have like 100 params, how can I run this function for all params simultaneously, is it possible?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment