Skip to content

Instantly share code, notes, and snippets.

@fluency03
Last active August 24, 2018 18:49
Show Gist options
  • Save fluency03/3074ec3b0d2d1f48c4303001c817fa5f to your computer and use it in GitHub Desktop.
Save fluency03/3074ec3b0d2d1f48c4303001c817fa5f to your computer and use it in GitHub Desktop.
# Reference: http://stackoverflow.com/questions/22354094/pythonic-way-of-detecting-outliers-in-one-dimensional-observation-data/22357811#22357811
import numpy as np
def is_outlier(points, thresh=3.5):
"""
Returns a boolean array with True if points are outliers and False
otherwise.
Parameters:
-----------
points : An numobservations by numdimensions array of observations
thresh : The modified z-score to use as a threshold. Observations with
a modified z-score (based on the median absolute deviation) greater
than this value will be classified as outliers.
Returns:
--------
mask : A numobservations-length boolean array.
References:
----------
Boris Iglewicz and David Hoaglin (1993), "Volume 16: How to Detect and
Handle Outliers", The ASQC Basic References in Quality Control:
Statistical Techniques, Edward F. Mykytka, Ph.D., Editor.
"""
if len(points.shape) == 1:
points = points[:,None]
median = np.median(points, axis=0)
diff = np.sum((points - median)**2, axis=-1)
diff = np.sqrt(diff)
med_abs_deviation = np.median(diff)
modified_z_score = 0.6745 * diff / med_abs_deviation
return modified_z_score > thresh
def mad(data, axis=None):
return np.median(np.abs(data - np.median(data, axis=0)), axis=0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment