Skip to content

Instantly share code, notes, and snippets.

View Hamptonjc's full-sized avatar

Jonathan Hampton Hamptonjc

View GitHub Profile

Introduction

It is very important to learn how and why a machine learning model behaves a certain way while making predictions. As NLP models get more bigger and complex, it becomes imperative that we aim to attribute the output predictions to precise and distinct signals from the input data, especially in production environments. Model interpretability helps pin down a few questions like:

  1. What kind of examples does my model perform poorly on?

  2. Why did my model make this prediction? Can this prediction be attributed to adversarial behavior, or to undesirable priors in the training set?

  3. Does my model behave consistently if I change things like textual style, verb tense, or pronoun gender?