Skip to content

Instantly share code, notes, and snippets.

@CMCDragonkai
Last active November 1, 2023 09:50
Show Gist options
  • Save CMCDragonkai/c79b9a0883e31b327c88bfadb8b06fc4 to your computer and use it in GitHub Desktop.
Save CMCDragonkai/c79b9a0883e31b327c88bfadb8b06fc4 to your computer and use it in GitHub Desktop.
Lorenz Curve and Gini Coefficient #python
import numpy as np
import matplotlib.pyplot as plt
# ensure your arr is sorted from lowest to highest values first!
arr = np.array([1,4,6,9,100])
def gini(arr):
count = arr.size
coefficient = 2 / count
indexes = np.arange(1, count + 1)
weighted_sum = (indexes * arr).sum()
total = arr.sum()
constant = (count + 1) / count
return coefficient * weighted_sum / total - constant
def lorenz(arr):
# this divides the prefix sum by the total sum
# this ensures all the values are between 0 and 1.0
scaled_prefix_sum = arr.cumsum() / arr.sum()
# this prepends the 0 value (because 0% of all people have 0% of all wealth)
return np.insert(scaled_prefix_sum, 0, 0)
# show the gini index!
print(gini(arr))
lorenz_curve = lorenz(arr)
# we need the X values to be between 0.0 to 1.0
plt.plot(np.linspace(0.0, 1.0, lorenz_curve.size), lorenz_curve)
# plot the straight line perfect equality curve
plt.plot([0,1], [0,1])
plt.show()
@lgourdon
Copy link

Thank you for your explanations!

@CMCDragonkai
Copy link
Author

CMCDragonkai commented Sep 10, 2019

Assuming that the total frequency distribution comes from element-wise summation of individual ratios, using a gini coefficient, can we derive the changes required to the individual ratios in order to make the system more fair and balanced?

Suppose arr = np.array([1,4,6,9,100]) was acquired from many individual ratios of [0,1,1,2,3], [1,0,1,2,20]... etc. If the changes are that:

  1. We can duplicate an individual ratio
  2. We can drop a ratio

How can we make the resulting gini coefficiently perfectly fair? Note that the perverse case is dropping all ratios resulting in a total ratio of [0,0,0,0,0]. One could disallow this by setting a minimal number of individual ratios to preserve.

This problem would enable class balancing on object detection data.

See: Frame Augmentation for Imbalanced Object Detection Datasets

An alternative to attempting to balance by oversampling object detection images or undersampling object detection images, is to augment with synthetic images that is composited from existing labelled objects together. This relies on the existence of the same scene in which the object may exist. Dealing with rare objects may be complicated.

@ZihiFredo
Copy link

Thank you so much, you saved me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment