🌲 🌳 Models
Cumulative Variable Importance for Random Forest (RF) Motivation
What does an interpretable RF visualization look like? Out-of-the-box
In other words, what would a cumulative variable importance for a RF look like?
Approach
The randomForest
R
package (to the best of my knowledge) doesn't record the individual variable importance for each CART in the forest. Instead, it supplies the overall (summarized) importance via importance(rf_model)
.
Thus, instead of fitting a RF of n trees, I fit n RFs of 1 tree each, and compute the cumulative %IncMSE
. Then, I plot the forest tree-by-tree alongside the cumulative variable importance as the nth tree is added.
The code below is a minimal example with mtcars
. Start by cloning https://github.com/richpauloo/reprtree
and adding the file path in line 19.
This script takes ~ 1 min to run on my personal computer. Unless your display is very large, you may need to expand the final animation.
Next Steps
If you think this is cool and useful enough to be an R
package
I'm looking into the randomForest
source code to see if I can add an option to save the importance
data of each CART model, so that I can build functions that take randomForest
objects directly, rather than needing to re-run the models tree-by-tree.
Thanks for your interest!
Hi - great code, thanks for sharing.
one question: I'm having issues with plot.getTree - is this part of the reprtree package? Could be my mistake but cant seem to run the function.
Thanks