- Reproducibility:
- code files: under version control, code review
- data: data pipeline or snapshots
- environment: venv/conda/docker
- models: training pipeline or pickled models, saved hyper-parameters and metrics
- experiment: tracking, report
- Maintainability:
- code: modularity, code review, documentation, logging
- data: data quality checks, format docs, metadata, data versioning
- environment: venv/conda, requirements.txt
- models: hyper-parameters as configuration, model versioning
- experiment: code/data/env/models comparison using its artifacts, changelog
- Security and Privacy:
- No data outside DMZ.
Directories:
|-- src/
| |-- core/ <- Core functions and utils
| |-- abstracts.(py|R)
| |-- configuration.(py|R)
| |-- experiment.(py|R)
| |-- logging.(py|R)
| |-- ...
| |-- utils.(py|R)
| |-- training/
| |-- model.(py|R) <- Model definition
| |-- preprocessing.(py|R) <- Preprocessing functions
| |-- ...
| |-- utils.(py|R)
| |-- __init.(py|R)
| |-- 1_load_data.(py|R) <- Data loading pipeline
| |-- 2_preprocessing.(py|R) <- Data preprocessing pipeline
| |-- 2.1_hypothesis_1.ipynb <- Hypothesis testing and data exploration notebook
| |-- 2.2_hypothesis_2.ipynb
| |-- 3_feature_engineering.(py|R) <- Feature engineering pipeline
| |-- 4_model_training.(py|R) <- Model training pipeline, e.g. hyper-params optimization
| |-- 5_model_evaluation.(py|R) <- Model evaluation pipeline
| |-- ...
| |-- config.yml
| |-- config-(dev|release).yml
| |-- secrets.yml
| |-- secrets-(dev|release).yml
|-- data/ <- Data directory (not under version control, in S3)
| |-- {data_version}/ <- Raw data
|-- experiments/ <- Experiments artifacts, outputs and temp files
| |-- {experiment_version}/
| |-- cache/ <- Cache for different experiment stages
| |-- output/ <- validate dataset, test dataset, hyper-opt artifacts, plots
| |-- models/ or model.pkl <- Final model (or models ensemble)
| |-- report.md <- Manual report
| |-- changelog <- Automated report
|-- logs/
| |-- {experiment_name}_{stage_name}_{timestamp}.log
|-- tests/
| |-- unit/
| |-- integration/
| |-- e2e/
|-- docs/
|-- labs/ <- Jupyter notebooks and other experiments
|-- requirements.txt
|-- requirements-dev.txt
|-- Dockerfile
|-- Dockerfile.release
|-- .dockerignore
|-- .gitignore
|-- .github/workflows/
| |-- build.yml
| |-- release.yml
|-- run.(sh|ps)
|-- README.md
|-- LICENSE
|-- CHANGELOG