Skip to content

Instantly share code, notes, and snippets.

@cereniyim
Created January 8, 2020 12:56
Show Gist options
  • Save cereniyim/9d459c2b2cd6441a370e83e4dc816dc3 to your computer and use it in GitHub Desktop.
Save cereniyim/9d459c2b2cd6441a370e83e4dc816dc3 to your computer and use it in GitHub Desktop.
Extreme outlier detection
def outlier_function(df, col_name):
''' this function detects first and third quartile and interquartile range for a given column of a dataframe
then calculates upper and lower limits to determine outliers conservatively
returns the number of lower and uper limit and number of outliers respectively
'''
first_quartile = np.percentile(np.array(df[col_name].tolist()), 25)
third_quartile = np.percentile(np.array(df[col_name].tolist()), 75)
IQR = third_quartile - first_quartile
upper_limit = third_quartile+(3*IQR)
lower_limit = first_quartile-(3*IQR)
outlier_count = 0
for value in df[col_name].tolist():
if (value < lower_limit) | (value > upper_limit):
outlier_count +=1
return lower_limit, upper_limit, outlier_count
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment