This post documents, for quick reference, a very simple strategy for mitigating a possible problem with GridSearchCV
in sklearn
. What happens if your process crashes in the middle of your grid search? You can lose everything, even hours of tuning. This post documents a very simple solution that can be set up in under a minute.
The idea, suggested on this StackOverflow post is to
- Use a verbose option, so that
sklearn
prints the performance of each model after a fold finishes - Direct that output to a persistent file on disk.
This solution does not save all the information that is returned after fit()
completes. Moreover, it does not provide a way to jump right back into the grid search where the program crashed, and relies in setting up a new search to complete the grid. However, I think we can agree it is much better than losing hours of compute, and has tremendous value for the minimal amount of work