Skip to content

Instantly share code, notes, and snippets.

@ariffyasri
Created December 11, 2017 01:13
Show Gist options
  • Save ariffyasri/70f1e9139da770cb8514998124560281 to your computer and use it in GitHub Desktop.
Save ariffyasri/70f1e9139da770cb8514998124560281 to your computer and use it in GitHub Desktop.
Remove outliers in pandas
import pandas as pd
import numpy as np
from pandas.api.types import is_numeric_dtype
np.random.seed(42)
age = np.random.randint(20,100,50)
name = ['name'+str(i) for i in range(50)]
address = ['address'+str(i) for i in range(50)]
df = pd.DataFrame(data={'age':age, 'name':name, 'address':address})
def remove_outlier(df):
low = .05
high = .95
quant_df = df.quantile([low, high])
for name in list(df.columns):
if is_numeric_dtype(df[name]):
df = df[(df[name] > quant_df.loc[low, name]) & (df[name] < quant_df.loc[high, name])]
return df
remove_outlier(df).head()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment