Skip to content

Instantly share code, notes, and snippets.

@ricgu8086
Created October 2, 2019 10:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ricgu8086/2716540c8b8fafb9ccd68a22d99053cd to your computer and use it in GitHub Desktop.
Save ricgu8086/2716540c8b8fafb9ccd68a22d99053cd to your computer and use it in GitHub Desktop.
def diversity_percentage(df, columns):
"""
This function returns the number of different elements in each column as a percentage of the total elements in the group.
A low value indicates there are many repeated elements.
Example 1: a value of 0 indicates all values are the same.
Example 2: a value of 100 indicates all values are different.
"""
diversity = dict()
for col in columns:
diversity[col] = len(df[col].unique())
diversity_series = pd.Series(diversity)
return (100*diversity_series/len(df)).sort_values()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment