oegedijk/blog.md Secret

## blog.md

      
    Raw
  

              blog.md
            
          
    Making ML transparant and explainable

The world is rapidly adopting more and more machine learning solutions in ever
more sectors, including some heavily regulated ones. At the same time fear of
using algorithms in settings where they
could potentially have harmful effects is increasing, and regulators have started
taking a closer look at how companies and governments are applying machine learning
solutions and how they could harm their users.
A particular concern is often that these algorithms present 'black box' solutions
that make it impossible for anyone to understand how they really work and how
predictions are generated. In recent years algorithmic improvements such as SHAP
(Lundberg et al, 2017)
have made it so that this is no longer really true for
most algorithms and the inner workings of a model and their individual predictions
can now readily be explained.
However the technical know-how and manual input needed in order to generate these
explanations still forms a barrier to making these explanations accesible to not just
data scientists themselves, but also to other stakeholders, management, staff,
external regulators and finally customers.
Management and other stakeholders ultimately need to decide whether to go
forward with a proposed machine learning solution, and so need to feel confident
that they understand what goes into the model and how predictions are generated.
External regulators need to be able to come in and assess whether the model
violates any local or European regulations, such as the GDPR. And under the GDPR
consumers have a a right to an explanation whenever an automated solutions
affects them in some material way. Finally, many real world machine learning deployments
involve human-in-the-loop solutions,
where a human decision-maker can choose to
overrule the algorithm. In such cases it is important that this person understands
how the model works in order to judge when it should be overruled.
With the explainerdashboard library
(developed by Oege Dijk,
one of our Fourkind (and ThoughtWorks) data scientists)
it is easy to build rich interactive dashboards that allow even non-technical
stakeholders to explore the workings and predictions of machine learning models.
Introducing SHAP

With the introduction of SHapley Additive exPlanations (SHAP)  (Lundberg et al 2017;2020)
it became possible to answer the following question for each feature input for each prediction
of almost any machine learning algorithm:
"What was the contribution of this feature to the final prediction
of the model?" This allows us to calculate the contribution of each
feature (column) to the final predictions for each row of data (e.g. a specific customer).
The contributions of each feature are guaranteed to add up to the final prediction.
Even though the implementation details differ for the various algorithms (e.g.
scikit-learn RandomForests,
xgboost, lightgbm,
PyTorch, TensorFlow, etc),
with the shap library, generating SHAP values
is now really staightforward for a data scientist:
https://gist.github.com/6a1c157c019f7eb8afecf1382d7a8cad
In this example we used the canonical kaggle Titanic
dataset in order to predict the likelihood of particular passengers surviving the sinking of Titanic. So
we can now see what predicts the survival of the first passenger in our test set. In this
case the fact that the passenger was a 3rd class male, and that the ticket fare was low,
seem to have contributed to a low expected chance of survival:
https://gist.github.com/0511785a56fe1ce3001af29d006e7e9b

We can also look at the general effect of a feature such as 'Age'
on predictions in our data set. Here we see that older people generally were
predicted to have a lower chance of surviving:
https://gist.github.com/cb2e4541d03f9201a77a56c9d2f33ea1

Although the shap library is awesome, and the above implementation
is extremely straightforward for any experienced data scientist, it is easy to
forget that for most stakeholders inside or outside a company the above
will seem like complete black magic. Where do you even type those magic
incantations??
Introducing explainerdashboard

Your manager may not know how to cast Python spells, but is probably able to
use a browser, so what you can do instead is use the explainerdashboard
library to start a model explainer dashboard:
https://gist.github.com/12ab5783ab6fff75770b56e6ba59bd04
This will start a web server that hosts an interactive dashboard that allows
you to to perform all the investigations of the shap library and more, simply
by clicking around. All you need to do is point your manager to the right
url. (e.g. http://123.456.789.1:8050 or wherever you are hosting your dashboard)
The dashboard includes a number of tabs showing different aspects of the model
under investigation, such as feature importances, model performance metrics,
individual predictions, what if analysis, feature dependence and interactions
and individual decision trees:

Feature Importances

The first question you would want to answer about any algorithm: what are the input
features and how important are they to the model? With SHAP values we can calculate the mean
absolute SHAP value for each feature, meaning the average impact (positive or negative)
that knowing that feature has on the predictions. Another way of measuring importance
is through permutation importance: how much worse does the model metric get when
you intentionally shuffle the values of a particular column? The more important the
feature the bigger an impact shuffling it should have on model performance.

Model performance

The next question a stakeholder would of course like to know: so how good is the
model anyway? As data scientists we know that there are many ways of measuring
the performance of an algorithms, and many ways of displaying that performance.
So in the dashboard we are a bit promiscuous and simply display almost all of them.
(You can optionally hide any component as we will see later)
For example for the above classification model, you can see model metrics,
a confusion matrix, precision plot, classification plot, ROC AUC curve, PR AUC
curve, lift curve and a cumulative precision curve. And you can adjust the cutoff
value for each plot:

And for regression models you can see the goodness-of-fit, inspect residuals,
and plot predictions and residuals against particular features:

Individual predictions + 'What if' analysis

In the GDPR customers have a 'right to an explanation' whenever an automated
algorithm makes decisions that affect them in a material way. It thus becomes
very important to be able to explain individual predictions that your model has
produced. By selecting an index you get all the contributions both in waterfall plot
form as well as in table form.

You can also edit the feature input on the 'What if...' tab in order to investigate
how small changes in the input would change the output:

Feature Dependence

Besides being able to explain individual predictions, you also want to be able
to explain in broad contours how feature values impact predictions of your model.
For example: does the predicted likelihood of survival go up or down with age?
Is it higher for men than for women? This is called the feature dependence of the model,
and it can give a lot of insight in how the model uses specific features to generate
its predictions.

You can go one step further however and investigate feature interactions as well. You can
decompose the contribution of a feature into both a direct effect and indirect effects. The
direct effect is simply "how does knowing the value of this feature directly change the
prediction?". The indirect effects occur when knowing a feature changes how you
interpret the value of another feature. So for example, knowing that a passenger
was male may have both a direct effect (lower chances of survival), but also an
indirect effect (relatively higher chance if third class). The total contribution
is the sum of direct and indirect effects. Looking at the indirect effects
(interactions) can often give you valuable insights into the underlying patterns
in your data that your model managed to discover and make use of.

Individual DecisionTrees

Many of the most used algorithms in applied machine learning are based on simple
decision trees. Visualizing these decision trees and showing how they are aggregated
to a final prediction can decrease the fear of the unkown that sometimes comes
with such models. Although these algorithms have often been called 'black-boxes',
all they are is just a collection of simple decision trees!
For example in the case of Random Forests, the algorithm simply takes the average prediction
of a number of decision trees:

And you can visualize the invidiual decision trees
using the awesome dtreeviz package right inside the dashboard:

Low-code customizable dashboards

As you can see the default dashboard is quite fully featured. Maybe it is even a
bit too fully featured for some purposes. In order not to overwhelm your stakeholders, it
probably makes sense to switch off some or most of the bells and whistles. Luckily due
to the modular nature of the library this is quite straightforward. You can
switch off
and hide any tab, component or toggle from the dashboard by simply passing the appropriate
flags to the dashboard.
So for example we can switch off the 'What if...' tab, hide the confusion matrix
on the model performance tab, and hide the cutoff sliders:
https://gist.github.com/c6758d827489c70cf02b754e1331fc23
You can even go farther and fully design your own dashboard layout. By re-using
the various components in the dashboard you only have to design a layout
using dash bootstrap components (dbc),
and ExplainerDashboard will figure out the rest:
https://gist.github.com/6ff115856b2a4ea69cf4b50b7b9b6989
This gives you the flexibility to show exactly those components that you find
most useful, and surround with them with further explanatory context:

Hosting multiple dashboards in ExplainerHub

If you have a number of models in production for which you want to have
explainerdashboards up and running it is convenient to host them in a single place.
You can do that by passing multiple dashboards to an ExplainerHub:
https://gist.github.com/059060b33c07e7eb5d2a8e6f19175042

You can also manage user access,
store dashboards
to configuration .yaml files, and start dashboards from the
command line.
Conclusion

Algorithmic breakthroughs in making models explainable are not in themselves
enough. What is needed is implementations that make the usage and functioning
of models transparant and understandable to stakeholders at all levels.
Being able to quickly build and deploy interactive dashboard that show the workings
of machine learning models is an important step towards this goal.
Citations

Lundberg, S. M., & Lee, S. I. (2017, December). A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (pp. 4768-4777).
Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., ... & Lee, S. I. (2020). From local explanations to global understanding with explainable AI for trees. Nature machine intelligence, 2(1), 56-67.