What does an interpretable RF visualization look like? Out-of-the-box 📦 RF implementations in R and Python compute variable importance over all trees, but how do we get there?
In other words, what would a cumulative variable importance for a RF look like?
The randomForest
R
package (to the best of my knowledge) doesn't record the individual variable importance for each CART in the forest. Instead, it supplies the overall (summarized) importance via importance(rf_model)
.
Thus, instead of fitting a RF of n trees, I fit n RFs of 1 tree each, and compute the cumulative %IncMSE
. Then, I plot the forest tree-by-tree alongside the cumulative variable importance as the nth tree is added.
The code below is a minimal example with mtcars
. Start by cloning https://github.com/richpauloo/reprtree
and adding the file path in line 19.
This script takes ~ 1 min to run on my personal computer. Unless your display is very large, you may need to expand the final animation.
If you think this is cool and useful enough to be an R
package 📦, if you'd use it to make your RF models more interpretable, or want to work on it with me, please let me know on Twitter (@RichPauloo), or at my email: richpauloo at gmail dot com.
I'm looking into the randomForest
source code to see if I can add an option to save the importance
data of each CART model, so that I can build functions that take randomForest
objects directly, rather than needing to re-run the models tree-by-tree.
Thanks for your interest!
Hi - great code, thanks for sharing.
one question: I'm having issues with plot.getTree - is this part of the reprtree package? Could be my mistake but cant seem to run the function.
Thanks