Using Massachusetts school enrollment data as an example:
-
Loop through a list of years, downloading a spreadsheet for each one.
-
Each spreadsheet is a weird Excel-as-HTML format, so I need to process each with a Python script to convert to CSV format.
-
I only want Boston schools, not the whole state, so I need to filter each file using csvgrep
-
Run the resulting files through another Python script to load them into a database using dataset
-
Export aggregates and query results on the combined data using datafreeze
Jeff Larson's answer: https://gist.github.com/thejefflarson/1221ad1984eba794fd9b. Essentially, use
make
to decide which Python scripts to run, based on which files exist.