Title: Luigi Pipelines for Academia (Part 1) Date: 2017-01-29 Category: Articles
I'm excited to share with you a tool I have been using to streamline the aspects of research that require processing and performing computations on data. I'm sure there are a not of researchers who, like me, have less than ideal data pipelines.
My previous setup looked something like this. I would keep the raw data untouched in a folder called data. Each level of data processing (e.g. reformatting, cleaning, fitting a model, cross-validating, plotting) has its own script or collection of scripts and would write it's output to a folder prefixed by a number. The directory structure might look like this:
Experiment 1