This write-up details how I re-structure and organise StatisticsNZ CSVs. It makes no mention of how I handle particular data values and codes (e.g. datetimes, missing and confidential values). My practice changes a little between projects, but typically I end up with a structure along the following lines:
- I remove all footnotes and metadata.
- I separate data measured at different scales into different CSVs (e.g. meshblock data goes in a different file from area units).
- I remove all
total
rows. - I rename all columns with concise but meaningful shortnames so they are easier refer to in code.
- I generate a JSON file containing key metadata and mappings between my shortnames and the original long fieldnames.
See my_working_data.csv
& my_working_data.json
for how I structure the data for the first nine columns of mb2013-mb-dataset-Total-New-Zealand-Individual-Part-1.csv
.
Original data can be viewed in original.csv
.