Skip to content

Instantly share code, notes, and snippets.

@jrosell
Last active August 23, 2023 15:57
Show Gist options
  • Save jrosell/c09cb15bfef8b29ae85a5799f72bad5b to your computer and use it in GitHub Desktop.
Save jrosell/c09cb15bfef8b29ae85a5799f72bad5b to your computer and use it in GitHub Desktop.
Detect anomalies over time using percentiles and using a Isolation forest model.
---
title: "anomalies"
format: html
jupyter: python3
editor_options:
chunk_output_type: console
---
Install requisites:
```{python}
!pip install numpy==1.23 pandas matplotlib scipy pycaret seaborn
```
Import packages and check versions:
```{python}
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import norm
from pycaret.anomaly import *
print(f'numpy ', np.__version__)
```
Anomaly detection:
```{python}
# Set random seed
np.random.seed(2)
# Simulate data
elapsed = np.random.normal(size=200) + 20
elapsed = np.maximum(elapsed, 1)
data = pd.DataFrame({
'x': np.arange(1, 201).astype(float),
'elapsed': elapsed
})
print(data)
```
Getting anomalies
```{python}
exp_name = setup(data=data)
iforest = create_model('iforest')
anomalies = assign_model(iforest, transformation=True, score=True)
anomalies = anomalies[anomalies['Anomaly'] == 1]
print(anomalies)
```
Calculate confidence intervals based on actual elapsed values
```{python}
quantiles = data.merge(anomalies[['x', 'Anomaly_Score']], on='x', how='left')
quantiles['upper'] = quantiles['elapsed'].quantile(0.95)
quantiles['lower'] = quantiles['elapsed'].quantile(0.05)
quantiles['Anomaly_Score'] = quantiles['Anomaly_Score'].fillna(0)
print(quantiles.isna().any())
```
Plot anomalies and confidence intervals
```{python}
plt.figure(figsize=(10, 6))
plt.plot(data['x'], quantiles['elapsed'], label='Actual')
plt.fill_between(data['x'], quantiles['lower'], quantiles['upper'], color='grey', alpha=0.3, label='Quantiles 5% and 95%')
plt.scatter(anomalies['x'], anomalies['elapsed'], color='red', label='Anomalies')
plt.title('Isolation Forest with quantiles 5% and 95%')
plt.xlabel('# Execution')
plt.ylabel('Elapsed time (s)')
plt.legend()
plt.show()
```
@jrosell
Copy link
Author

jrosell commented Aug 23, 2023

python-anomalies

@jrosell
Copy link
Author

jrosell commented Aug 23, 2023

R version with GAM or Isolation Forest, here: https://gist.github.com/jrosell/959ca3160df1f2658531088b1e922708

@jrosell
Copy link
Author

jrosell commented Aug 23, 2023

elapsed = np.random.normal(size=200) + 20
elapsed = np.maximum(elapsed, 1)
data = pd.DataFrame({
    'x': np.arange(1, 201).astype(float),
    'elapsed': elapsed
})
q75 = data['elapsed'].quantile(0.75)
q25 = data['elapsed'].quantile(0.25)
IQR_1_5_lower = q25 - 1.5*(q75-q25)
IQR_1_5_upper = q75 + 1.5*(q75-q25)
IQR_3_lower = q25 - 3*(q75-q25)
IQR_3_upper = q75 + 3*(q75-q25)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment