- What is the problem? Provide formal and informal definitions.
- Why does the problem need to be solved? Motivation, benefits, how it will be used.
- How would I solve the problem? Describe how the problem would be solved manually to flush domain knowledge.
- Data Selection. Availability, what is missing, what can be removed.
- Data Preprocessing. Organize selected data by formatting, cleaning and sampling.
- Data Transformation. Feature engineering using scaling, attribute decomposition and attribute aggregation.
- Data visualizations such as with histograms.
- Test harness with default values.
- Run family of algorithms across all the transformed and scaled versions of dataset.
- View comparisons with box plots.
- Algorithm Tuning: discovering the best models in model parameter space. This may include hyper parameter optimizations with additional helper services.
- Ensemble Methods: where the predictions made by multiple models are combined.
- Feature Engineering: where the attribute decomposition and aggregation seen in data preparation is tested further.
- Context (Why): how the problem definition arose in the first place.
- Problem (Question): describe the problem as a question.
- Solution (Answer): describe the answer the the question in the previous step.
- Findings: Bulleted lists of discoveries you made along the way that interests the audience. May include discoveries in the data, methods that did or did not work or the model performance benefits you observed.
- Limitations: describe where the model does not work.
- Conclusions (Why+Question+Answer)