Skip to content

Instantly share code, notes, and snippets.

@aniruddha27
Last active March 20, 2020 05:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save aniruddha27/1a37a944370e06c65e0fa77035ba9d5b to your computer and use it in GitHub Desktop.
Save aniruddha27/1a37a944370e06c65e0fa77035ba9d5b to your computer and use it in GitHub Desktop.
Q1 = []
Q3 = []
Lower_bound = []
Upper_bound = []
Outliers = []
for i in top_features:
# 25th and 75th percentiles
q1, q3 = np.percentile(train[i],25), np.percentile(train[i],75)
# Interquartile range
iqr = q3 - q1
# Outlier cutoff
cut_off = 1.5*iqr
# Lower and Upper bounds
lower_bound = q1-cut_off
upper_bound = q3+cut_off
# save outlier indexes
outlier = [x for x in train.index if train.loc[x,i]<lower_bound or train.loc[x,i]>upper_bound]
# append values for DataFrame
Q1.append(q1)
Q3.append(q3)
Lower_bound.append(lower_bound)
Upper_bound.append(upper_bound)
Outliers.append(len(outlier))
try:
train.drop(outlier,inplace=True,axis=0)
except:
continue
df_out = pd.DataFrame({'Column':top_features,'Q1':Q1,'Q3':Q3,'Lower bound':Lower_bound,'Upper_bound':Upper_bound,'No. of outliers':Outliers})
df_out.sort_values(by='No. of outliers',ascending=False)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment