Using Massachusetts school enrollment data as an example:
-
Loop through a list of years, downloading a spreadsheet for each one.
-
Each spreadsheet is a weird Excel-as-HTML format, so I need to process each with a Python script to convert to CSV format.
-
I only want Boston schools, not the whole state, so I need to filter each file using csvgrep
-
Run the resulting files through another Python script to load them into a database using dataset
-
Export aggregates and query results on the combined data using datafreeze
The twitter conversation that started all this: https://twitter.com/eyeseast/status/502946780146044928