Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
def diversity_percentage(df, columns):
"""
This function returns the number of different elements in each column as a percentage of the total elements in the group.
A low value indicates there are many repeated elements.
Example 1: a value of 0 indicates all values are the same.
Example 2: a value of 100 indicates all values are different.
"""
diversity = dict()
for col in columns:
diversity[col] = len(df[col].unique())
diversity_series = pd.Series(diversity)
return (100*diversity_series/len(df)).sort_values()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment