jpcbertoldo/gsoc23.md

## gsoc23.md

      
    Raw
  

              gsoc23.md
            
          
    Anomaly Segmentation Metrics for Anomalib — GSoC 2023 @ OpenVINO

Google Summer of Code 2023
Project: Anomaly Segmentation Metrics for anomalib
Organization: OpenVINO (https://github.com/openvinotoolkit/openvino)
Mentors: @samet-akcay, @djdameln
Contributor: @jpcbertoldo
Repo: Anomalib github.com/openvinotoolkit/anomalib
Proposal: summerofcode.withgoogle.com/proposals/details/YkU9PckZ
Report: gist.github.com/jpcbertoldo/12553b7eaa97cfbf3e55bfd7d1cafe88
Slides: Anomaly Segmentation Metrics for Anomalib — GSoC 2023 @ OpenVINO (slides)
Demo: gist.github.com/jpcbertoldo/91c9d5bd83ac9cf6cb42ad872ca40306

Work references

New subpackage: anomalib.utils.metrics.perimg
Tutorial notebooks:

Introduction to Per-Image Overlap metric (latest version in the fork repo).
Comparing models with statistics-based approach (fork repo).

The contribution of this project was split in several Pull Requests (PRs) (to be merged sequentially) to simplify the review process. Only one PR at a time is open in the official repository, so others are on the waiting list in my fork repository.
List of PRs:

Per-Image Overlap (PImO) openvinotoolkit/anomalib#1247
Area Under the Curve (AUC) boxplot openvinotoolkit/anomalib#1294 (fork PR)
Max FPR restriction (fork PR)
Log PImO (fork PR)
Save & Load (fork PR)
Model comparisons (fork PR)
Heatmaps (fork PR)

Refactors to do before merging the feature branch (renamings, etc): https://github.com/jpcbertoldo/anomalib/blob/metrics/refactors/src/anomalib/utils/metrics/perimg/.refactors

Part of the contribution is under review as of end of 2023-08 and expected to be soon released.


Contribution summary

Anomaly detection is like finding rare gems in a pile of pebbles ("recall"), while -- importantly -- not giving attention to pebbles ("low FPR"). Anomaly segmentation takes it a step further: one shoudl not only spot unusual samples, but also outlining them. But to improve these methods, we need better ways to measure their performance. Our project focuses on enhancing these evaluation tools for more effective anomaly detection and segmentation.
Current anomaly segmentation metrics are no longer suitable because they are saturated (many models have high scores so the difference between them is not tangible). This project aims to address this issue by defining a more strict and meaningful metric. Our contribution to the Anomalib library focused on the defining such metrics and achieved the following features:


Novel metric Integration: The Per-Image Overlap (PImO) metric and its corresponding Area Under the Curve (AUC), known as AUPImO, were introduced. Inspired by the Receiver Operating Characteristic (ROC) curve, the PImO curve was similarly designed with a ["shared"] False Positive Rate (FPR) on the X-axis and per-image True Positive Rate (TPR) on the Y-axis -- i.e. each image has is own curve. The AUPImO score is computed to gauge model performance (higher means better).


Distribution Analysis: The ability to analyze the distribution of AUPImO scores across images using boxplots was implemented. This feature allows for insights into the overall performance spread and statistics within the test set.


Curve and Boxplot Combination: The integration of PImO curves and AUPImO boxplots for representative samples was proposed. This combination allows for the visualization of [statistically] representative curves aligned with the boxplot statistics, providing a practical way of selecting image examples to qualitatively evaluate a model.


Shared FPR Bounds: PImO incorporates shared False Positive Rate (FPR) (lower and upper) bounds: a valuable extensions introducing a different way of indexing anomaly score thresholds by utilizing information exclusively from normal images. This approach offers a better meaning for defining maximum FPR levels and choosing anomaly score threshold ranges. Moreover, this concept finds practical use in normalizing heatmaps, contributing to a clearer representation of anomalies within their context.


Log PImO: To address certain limitations, the Log PImO curve was introduced, visualizing PImO curves with a logarithmic scale on the shared FPR axis (X-axis). AULogPImO (Area Under the Log PImO curve) was similarly defined, resulting in a metric less tolerant to high-FPR thresholds, focusing on low-FPR threshold choices instead; as a consequence, this helps to solve the lack of resolution in the AUC scores (using the linear scale, models easily achieve >= 90% scores).


Model Comparison Utilities: Tools were added for comparing multiple models using per-image metrics. Both parametric and non-parametric comparisons were implemented, along with statistical tests and explanatory plots, enabling robust model assessment.


Heatmap Normalization: Heatmap generation was enhanced by incorporating normalization bounds. These bounds, shared with the AUPImO bounds, contribute to more informative and visually meaningful heatmaps.


All in all, this contribution offers a suite of advanced and novel metrics, visualization tools, and model comparison utilities for anomaly segmentation tasks in Anomalib.
Per-Image Overlap (PImO) curves


PImO curves combined with boxplot statistics (image selection)


Heatmap normalization


Statistical test-based model comparison

Example of two PaDiM models using different bacbones: while their (pixel) AUROC scores are less than 0.1% different, a paired t-test with their AULogPImO values shows very high confidence that one of them is indeed more efficient.