Skip to content

Instantly share code, notes, and snippets.

@drmingle
Created June 10, 2018 18:39
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save drmingle/a6185070b8342f744658b140a4f1ca92 to your computer and use it in GitHub Desktop.
Save drmingle/a6185070b8342f744658b140a4f1ca92 to your computer and use it in GitHub Desktop.
title author date
Detecting Outliers
Damian Mingle
06/10/2018

Preliminaries

# Load libraries
import numpy as np
from sklearn.covariance import EllipticEnvelope
from sklearn.datasets import make_blobs

Create Data

# Simulate data
simulated_data, _ = make_blobs(n_samples = 255,
                  n_features = 3,
                  centers = 1,
                  random_state = 1)

# Make extreme values
simulated_data[0,0] = 99999
simulated_data[0,1] = 99999

Detect Outliers

Using EllipticEnvelope forces you to specify a contaimination parameter (the proportition of outliers you think are in the data) - a significant limitation to this approach.

# Build outlier detector
outlier_detector = EllipticEnvelope(contamination=.1)

# Fit outlier detector
outlier_detector.fit(simulated_data)

# Predict outliers
outlier_detector.predict(simulated_data)
array([-1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1, -1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1, -1,  1, -1,  1,  1,  1, -1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1, -1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1, -1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1, -1,  1,  1,  1, -1,  1,  1, -1,  1,  1,  1,  1,  1, -1,  1,
        1,  1, -1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1, -1,  1,  1,  1,
        1,  1, -1,  1, -1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1, -1,  1,  1,  1,  1,  1,  1,  1,  1, -1,  1,  1,  1,  1,
        1,  1,  1,  1,  1, -1, -1,  1,  1,  1,  1, -1,  1, -1,  1,  1,  1,
        1, -1,  1,  1,  1,  1,  1,  1, -1,  1, -1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1, -1,  1,  1,  1,  1,  1, -1,  1])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment