@simonpcouch and I have been brainstorming with regard to how and where to specify and make the dataset used to estimate a post-processor, currently dubbed the "potato set".
- The proportion of the data used for estimatation (preproc, model, post) that should be held back specificially for estimating the post-processor.
- The method for how to split that data for estimation. This may need to be a time-based or grouped split rather than a random split. If we are in the context of resampling a workflow, it should most likely be the same method as used to make the resamples.