shagunsodhani/LIME.md

## LIME.md

      
    Raw
  

              LIME.md
            
          
    "Why Should I Trust You?": Explaining the Predictions of Any Classifier

Introduction


The paper introduces a novel technique to explain the predictions of any classifier in an interpretable and faithful manner.
It also proposes a method to explain models by obtaining representative individual predictions and their explanations.
Link to the paper
Demo

Desired Characteristics for Explanations


Interpretable

Take into account user limitations.
Since features in a machine learning model need not be interpretable, the input to the explanations may have to be different from input to the model.


Local Fidelity

Explanation should be locally faithful, ie it should correspond to how the model behaves in the vicinity of the instance being predicted.


Model Agnostic

Treat the original, given model as a black box.


Global Perspective

Select a few predictions such that they represent the entire model.


LIME


Local Interpretable Model-agnostic Explanations


Interpretable Data Representations

For text classification, an interpretable representation could be a binary vector indicating the presence or absence of a word (or bag or words).
For image classification, an interpretable representation may be a binary vector indicating the "presence" of a super-pixel.
x ∈ R^d is the original representation of an instance being explained while x` ∈ {0, 1}^d denotes a binary vector for its representation.


Fidelity-Interpretability Trade-off

Define an explanation as a model g ∈ G, where G is a class of potentially interpretable models and g acts over absence/presence of the interpretable components
Define Ω(g) as a measure of complexity (as opposed to interpretability) of the explanation g ∈ G.
Define f to be the model being explained.
Define π_x(z) as a proximity measure between an instance z to x (to define locality around x).
Define L(f, g, π_x) as a measure of how unfaithful g is in approximating f in the locality defined by π_x.
To ensure both interpretability and local fidelity, we minimise L(f, g, π_x) while having Ω(g) be low enough to be interpretable.


Sampling for Local Exploration

Since f is treated as a black box, the local behaviour of L(f, g, π_x) is approximated by drawing samples weighted by π_x.
Given an instance x`, generate a dataset of perturbed samples Z and optimise the LIME model loss, L(f, g, π_x).
The paper proposes to use sparse linear explanations with the locally weighted square loss as L. This could be a problem in the case of highly non-linear models.


Submodular Pick for Explaining Models


Global understanding of the model by explaining a set of individual instances.
Define B to the number of explanations to be generated.
Pick Step - the task of selecting B instances for the user to inspect.
Aim to obtain non-redundant explanations that represent how the model behaves globally.
Given a matrix of n explanations, using d features (also called explanation matrix), rank the features such that the feature which explains more instances gets a higher score.
When selecting instances, avoid instances with similar explanation and try to increase coverage.

Conclusion


The paper evaluates its approach on a series of simulated and human-in-the-loop tasks to check:

Are explanations faithful to the model.
Could the predictions be trusted.
Can the model be trusted.
Can users select the best classifier given the explanations.
Can user (non-experts) improve the classifier by means of feature selection.
Can explanations lead to insights about the model itself.


Future Work


Need to define a way of finding (and ranking) compatible features across images for SP-LIME.
It could be difficult to define the relevant features for model explanation in certain cases - for example, single words may not be a good feature in sentiment analysis models.