Skip to content

Instantly share code, notes, and snippets.

@stanlee321
Forked from ariffyasri/rem_outlier.py
Created August 24, 2019 03:17
Show Gist options
  • Save stanlee321/e3015ae1b7af1d707bb0cc8495596489 to your computer and use it in GitHub Desktop.
Save stanlee321/e3015ae1b7af1d707bb0cc8495596489 to your computer and use it in GitHub Desktop.
Remove outliers in pandas
import pandas as pd
import numpy as np
from pandas.api.types import is_numeric_dtype
np.random.seed(42)
age = np.random.randint(20,100,50)
name = ['name'+str(i) for i in range(50)]
address = ['address'+str(i) for i in range(50)]
df = pd.DataFrame(data={'age':age, 'name':name, 'address':address})
def remove_outlier(df):
low = .05
high = .95
quant_df = df.quantile([low, high])
for name in list(df.columns):
if is_numeric_dtype(df[name]):
df = df[(df[name] > quant_df.loc[low, name]) & (df[name] < quant_df.loc[high, name])]
return df
remove_outlier(df).head()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment