Skip to content

Instantly share code, notes, and snippets.

@fogonwater
Last active February 12, 2016 04:53
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save fogonwater/afa13b5ab87612e632d6 to your computer and use it in GitHub Desktop.
Save fogonwater/afa13b5ab87612e632d6 to your computer and use it in GitHub Desktop.

Notes on how I organise StatisticsNZ CSVs

This write-up details how I re-structure and organise StatisticsNZ CSVs. It makes no mention of how I handle particular data values and codes (e.g. datetimes, missing and confidential values). My practice changes a little between projects, but typically I end up with a structure along the following lines:

  • I remove all footnotes and metadata.
  • I separate data measured at different scales into different CSVs (e.g. meshblock data goes in a different file from area units).
  • I remove all total rows.
  • I rename all columns with concise but meaningful shortnames so they are easier refer to in code.
  • I generate a JSON file containing key metadata and mappings between my shortnames and the original long fieldnames.

See my_working_data.csv & my_working_data.json for how I structure the data for the first nine columns of mb2013-mb-dataset-Total-New-Zealand-Individual-Part-1.csv.

Original data can be viewed in original.csv.

area_code_description code description usual_res_2001 usual_res_2006 usual_res_2013 night_pop_2001 night_pop_2006 night_pop_2013
MB 0000100 0000100 9 3 3 15 9 9
MB 0000200 0000200 90 87 84 99 96 84
MB 0000300 0000300 90 72 57 93 72 54
MB 0000400 0000400 30 21 30 45 39 30
MB 0000501 0000501 0 0 0 0 0 0
[jump to end]
MB 3210003 3210003 69 75 75 72 72 75
{
"title":"2013 Census meshblock dataset – CSV files",
"fields":{
"area_code_description":"Area_Code_and_Description",
"code":"Code",
"description":"Description",
"usual_res_2001":"2001_Census_census_usually_resident_population_count(1)",
"usual_res_2006":"2006_Census_census_usually_resident_population_count(1)",
"usual_res_2013":"2013_Census_census_usually_resident_population_count(1)",
"night_pop_2001":"2001_Census_census_night_population_count(2)",
"night_pop_2006":"2006_Census_census_night_population_count(2)",
"night_pop_2013":"2013_Census_census_night_population_count(2)"
},
"src_agency":["Statistics New Zealand"],
"src_www":"http://www3.stats.govt.nz/meshblock/2013/csv/2013_mb_dataset_Total_New_Zealand_CSV.zip",
"src_local":"data/statsnz/mb2013-mb-dataset-Total-New-Zealand-Individual-Part-1.csv",
"docs":"http://www.stats.govt.nz/Census/2013-census/data-tables/meshblock-dataset.aspx",
"accessed":"2015-04-12",
"notes":"Footnotes for all tables in this CSV file are collected below. Footnotes for all tables,This data has been randomly rounded to protect confidentiality. Individual figures may not sum to totals and values for the same data may vary in different tables., Footnotes for all time series tables,On 6 March 2006 Banks Peninsula District Council combined with the Christchurch City Council. For consistency of time series Banks Peninsula data for 2001 2006 and 2013 have been incorporated under Christchurch City., Footnotes for all time series tables,On 1 November 2010 Auckland Council became a unitary authority when Auckland regional council area and seven territorial authority areas – Rodney district North Shore city Waitakere city Auckland city Manukau city Papakura district and Franklin district – amalgamated. For the purposes of time series 2001 and 2006 data for these seven territorial authority areas have been incorporated under Auckland. In addition data is also provided for the 21 local boards within Auckland Council., Footnotes for all time series tables,This time series is irregular. Because the 2011 Census was cancelled after the Canterbury earthquake on 22 February 2011 the gap between this census and the last one is seven years. The change in the data between 2006 and 2013 may be greater than in the usual five-year gap between censuses. Be careful when comparing trends., Footnotes for specific variables or categories,1,See definition of census usually resident population count Footnotes for specific variables or categories,2,See definition of census night population count Footnotes for specific variables or categories,3,Calculated using single-year-of-age data that has been independently randomly rounded. For categories with small populations the data may not look as expected because of the effect of random rounding. Footnotes for specific variables or categories,4,Consists of response unidentifiable and not stated. Footnotes for specific variables or categories,5,Consists of inadequately described and not stated. Footnotes for specific variables or categories,6,Consists of response unidentifiable response outside scope and not stated. Footnotes for specific variables or categories,7,Includes all people who stated each ethnic group whether as their only ethnic group or as one of several. Where a person reported more than one ethnic group they were counted in each applicable group. Footnotes for specific variables or categories,8,In 2001 up to six ethnicity responses per person were output (prioritised at input) and in 2006 and 2013 up to six responses per person were output (randomised after input). Footnotes for specific variables or categories,9,MELAA = Middle Eastern Latin American and African. This was a new category introduced for the 2006 Census. Previously MELAA responses were allocated to the 'other ethnicity' category. Footnotes for specific variables or categories,10,Consists of responses for a number of small ethnic groups and for New Zealander. New Zealander was included as a new category for the 2006 Census. In 2001 New Zealander was counted in the European category. Footnotes for specific variables or categories,11,Includes all people who stated each language spoken whether as their only language or as one of several. Where a person reported more than one language spoken they were counted in each applicable group. Footnotes for specific variables or categories,12,Consists of don't know refused to answer response unidentifiable response outside scope and not stated."
}
Area_Code_and_Description Code Description 2001_Census_census_usually_resident_population_count(1) 2006_Census_census_usually_resident_population_count(1) 2013_Census_census_usually_resident_population_count(1) 2001_Census_census_night_population_count(2) 2006_Census_census_night_population_count(2) 2013_Census_census_night_population_count(2)
MB 0000100 0000100 9 3 3 15 9 9
MB 0000200 0000200 90 87 84 99 96 84
MB 0000300 0000300 90 72 57 93 72 54
MB 0000400 0000400 30 21 30 45 39 30
MB 0000501 0000501 0 0 0 0 0 0
[jump to end of mbs]
MB 3210003 3210003 69 75 75 72 72 75
Total New Zealand Total New Zealand Total New Zealand 3737280 4027947 4242048 3820749 4143282 4353201
500100 Awanui 500100 Awanui 369 348 339 405 360 342
[jump to footnotes]
Footnotes
Footnotes for all tables in this CSV file are collected below.
Footnotes for all tables
[jump to end]
Symbols ..C confidential.
Symbols * not able to be calculated.
Source Statistics New Zealand.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment