Cumulative Variable Importance for Random Forest (RF)
🌲 🌳 Models
What does an interpretable RF visualization look like? Out-of-the-box
In other words, what would a cumulative variable importance for a RF look like?
R package (to the best of my knowledge) doesn't record the individual variable importance for each CART in the forest. Instead, it supplies the overall (summarized) importance via
Thus, instead of fitting a RF of n trees, I fit n RFs of 1 tree each, and compute the cumulative
%IncMSE. Then, I plot the forest tree-by-tree alongside the cumulative variable importance as the nth tree is added.
The code below is a minimal example with
mtcars. Start by cloning
https://github.com/richpauloo/reprtree and adding the file path in line 19.
This script takes ~ 1 min to run on my personal computer. Unless your display is very large, you may need to expand the final animation.
If you think this is cool and useful enough to be an
I'm looking into the
randomForest source code to see if I can add an option to save the
importance data of each CART model, so that I can build functions that take
randomForest objects directly, rather than needing to re-run the models tree-by-tree.
Thanks for your interest!