fogonwater/README.md

## README.md

      
    Raw
  

              README.md
            
          
    Notes on how I organise StatisticsNZ CSVs

This write-up details how I re-structure and organise StatisticsNZ CSVs. It makes no mention of how I handle particular data values and codes (e.g. datetimes, missing and confidential values). My practice changes a little between projects, but typically I end up with a structure along the following lines:

I remove all footnotes and metadata.
I separate data measured at different scales into different CSVs (e.g. meshblock data goes in a different file from area units).
I remove all total rows.
I rename all columns with concise but meaningful shortnames so they are easier refer to in code.
I generate a JSON file containing key metadata and mappings between my shortnames and the original long fieldnames.

See my_working_data.csv & my_working_data.json for how I structure the data for the first nine columns of mb2013-mb-dataset-Total-New-Zealand-Individual-Part-1.csv.
Original data can be viewed in original.csv.

  
## my_working_data.csv

          
            area_code_description
            code
            description
            usual_res_2001
            usual_res_2006
            usual_res_2013
            night_pop_2001
            night_pop_2006
            night_pop_2013

            
              MB 0000100
              0000100
              
              9
              3
              3
              15
              9
              9

            
              MB 0000200
              0000200
              
              90
              87
              84
              99
              96
              84

            
              MB 0000300
              0000300
              
              90
              72
              57
              93
              72
              54

            
              MB 0000400
              0000400
              
              30
              21
              30
              45
              39
              30

            
              MB 0000501
              0000501
              
              0
              0
              0
              0
              0
              0

            
              [jump to end]

            
              MB 3210003
              3210003
              
              69
              75
              75
              72
              72
              75

## my_working_data.json
{
    "title":"2013 Census meshblock dataset – CSV files",
    "fields":{
        "area_code_description":"Area_Code_and_Description",
        "code":"Code",
        "description":"Description",
        "usual_res_2001":"2001_Census_census_usually_resident_population_count(1)",
        "usual_res_2006":"2006_Census_census_usually_resident_population_count(1)",
        "usual_res_2013":"2013_Census_census_usually_resident_population_count(1)",
        "night_pop_2001":"2001_Census_census_night_population_count(2)",
        "night_pop_2006":"2006_Census_census_night_population_count(2)",
        "night_pop_2013":"2013_Census_census_night_population_count(2)"
    },
    "src_agency":["Statistics New Zealand"],
    "src_www":"http://www3.stats.govt.nz/meshblock/2013/csv/2013_mb_dataset_Total_New_Zealand_CSV.zip",
    "src_local":"data/statsnz/mb2013-mb-dataset-Total-New-Zealand-Individual-Part-1.csv",
    "docs":"http://www.stats.govt.nz/Census/2013-census/data-tables/meshblock-dataset.aspx",
    "accessed":"2015-04-12",
    "notes":"Footnotes for all tables in this CSV file are collected below. Footnotes for all tables,This data has been randomly rounded to protect confidentiality. Individual figures may not sum to totals and values for the same data may vary in different tables., Footnotes for all time series tables,On 6 March 2006 Banks Peninsula District Council combined with the Christchurch City Council.  For consistency of time series Banks Peninsula data for 2001 2006 and 2013 have been incorporated under Christchurch City., Footnotes for all time series tables,On 1 November 2010 Auckland Council became a unitary authority when Auckland regional council area and seven territorial authority areas – Rodney district North Shore city Waitakere city Auckland city Manukau city Papakura district and Franklin district – amalgamated. For the purposes of time series 2001 and 2006 data for these seven territorial authority areas have been incorporated under Auckland. In addition data is also provided for the 21 local boards within Auckland Council., Footnotes for all time series tables,This time series is irregular. Because the 2011 Census was cancelled after the Canterbury earthquake on 22 February 2011 the gap between this census and the last one is seven years. The change in the data between 2006 and 2013 may be greater than in the usual five-year gap between censuses. Be careful when comparing trends., Footnotes for specific variables or categories,1,See definition of census usually resident population count Footnotes for specific variables or categories,2,See definition of census night population count Footnotes for specific variables or categories,3,Calculated using single-year-of-age data that has been independently randomly rounded. For categories with small populations the data may not look as expected because of the effect of random rounding. Footnotes for specific variables or categories,4,Consists of response unidentifiable and not stated. Footnotes for specific variables or categories,5,Consists of inadequately described and not stated. Footnotes for specific variables or categories,6,Consists of response unidentifiable response outside scope and not stated. Footnotes for specific variables or categories,7,Includes all people who stated each ethnic group whether as their only ethnic group or as one of several. Where a person reported more than one ethnic group they were counted in each applicable group. Footnotes for specific variables or categories,8,In 2001 up to six ethnicity responses per person were output (prioritised at input) and in 2006 and 2013 up to six responses per person were output (randomised after input).  Footnotes for specific variables or categories,9,MELAA = Middle Eastern Latin American and African.  This was a new category introduced for the 2006 Census.  Previously MELAA responses were allocated to the 'other ethnicity' category. Footnotes for specific variables or categories,10,Consists of responses for a number of small ethnic groups and for New Zealander. New Zealander was included as a new category for the 2006 Census.  In 2001 New Zealander was counted in the European category. Footnotes for specific variables or categories,11,Includes all people who stated each language spoken whether as their only language or as one of several. Where a person reported more than one language spoken they were counted in each applicable group. Footnotes for specific variables or categories,12,Consists of don't know refused to answer response unidentifiable response outside scope and not stated."
}

## original.csv

          
            Area_Code_and_Description
            Code
            Description
            2001_Census_census_usually_resident_population_count(1)
            2006_Census_census_usually_resident_population_count(1)
            2013_Census_census_usually_resident_population_count(1)
            2001_Census_census_night_population_count(2)
            2006_Census_census_night_population_count(2)
            2013_Census_census_night_population_count(2)

            
              MB 0000100
              0000100
              
              9
              3
              3
              15
              9
              9

            
              MB 0000200
              0000200
              
              90
              87
              84
              99
              96
              84

            
              MB 0000300
              0000300
              
              90
              72
              57
              93
              72
              54

            
              MB 0000400
              0000400
              
              30
              21
              30
              45
              39
              30

            
              MB 0000501
              0000501
              
              0
              0
              0
              0
              0
              0

            
              [jump to end of mbs]

            
              MB 3210003
              3210003
              
              69
              75
              75
              72
              72
              75

            
              Total New Zealand
              Total New Zealand
              Total New Zealand
              3737280
              4027947
              4242048
              3820749
              4143282
              4353201

            
              500100 Awanui
              500100
              Awanui
              369
              348
              339
              405
              360
              342

            
              [jump to footnotes]

            
              Footnotes

            
              Footnotes for all tables in this CSV file are collected below.

            
              Footnotes for all tables

            
              [jump to end]

            
              Symbols
              ..C 
              confidential.

            
              Symbols
              *
              not able to be calculated.

            
              Source
              Statistics New Zealand.
area_code_description	code	description	usual_res_2001	usual_res_2006	usual_res_2013	night_pop_2001	night_pop_2006	night_pop_2013
MB 0000100	0000100		9	3	3	15	9	9
MB 0000200	0000200		90	87	84	99	96	84
MB 0000300	0000300		90	72	57	93	72	54
MB 0000400	0000400		30	21	30	45	39	30
MB 0000501	0000501		0	0	0	0	0	0
[jump to end]
MB 3210003	3210003		69	75	75	72	72	75
	{
	"title":"2013 Census meshblock dataset – CSV files",
	"fields":{
	"area_code_description":"Area_Code_and_Description",
	"code":"Code",
	"description":"Description",
	"usual_res_2001":"2001_Census_census_usually_resident_population_count(1)",
	"usual_res_2006":"2006_Census_census_usually_resident_population_count(1)",
	"usual_res_2013":"2013_Census_census_usually_resident_population_count(1)",
	"night_pop_2001":"2001_Census_census_night_population_count(2)",
	"night_pop_2006":"2006_Census_census_night_population_count(2)",
	"night_pop_2013":"2013_Census_census_night_population_count(2)"
	},
	"src_agency":["Statistics New Zealand"],
	"src_www":"http://www3.stats.govt.nz/meshblock/2013/csv/2013_mb_dataset_Total_New_Zealand_CSV.zip",
	"src_local":"data/statsnz/mb2013-mb-dataset-Total-New-Zealand-Individual-Part-1.csv",
	"docs":"http://www.stats.govt.nz/Census/2013-census/data-tables/meshblock-dataset.aspx",
	"accessed":"2015-04-12",
	"notes":"Footnotes for all tables in this CSV file are collected below. Footnotes for all tables,This data has been randomly rounded to protect confidentiality. Individual figures may not sum to totals and values for the same data may vary in different tables., Footnotes for all time series tables,On 6 March 2006 Banks Peninsula District Council combined with the Christchurch City Council. For consistency of time series Banks Peninsula data for 2001 2006 and 2013 have been incorporated under Christchurch City., Footnotes for all time series tables,On 1 November 2010 Auckland Council became a unitary authority when Auckland regional council area and seven territorial authority areas – Rodney district North Shore city Waitakere city Auckland city Manukau city Papakura district and Franklin district – amalgamated. For the purposes of time series 2001 and 2006 data for these seven territorial authority areas have been incorporated under Auckland. In addition data is also provided for the 21 local boards within Auckland Council., Footnotes for all time series tables,This time series is irregular. Because the 2011 Census was cancelled after the Canterbury earthquake on 22 February 2011 the gap between this census and the last one is seven years. The change in the data between 2006 and 2013 may be greater than in the usual five-year gap between censuses. Be careful when comparing trends., Footnotes for specific variables or categories,1,See definition of census usually resident population count Footnotes for specific variables or categories,2,See definition of census night population count Footnotes for specific variables or categories,3,Calculated using single-year-of-age data that has been independently randomly rounded. For categories with small populations the data may not look as expected because of the effect of random rounding. Footnotes for specific variables or categories,4,Consists of response unidentifiable and not stated. Footnotes for specific variables or categories,5,Consists of inadequately described and not stated. Footnotes for specific variables or categories,6,Consists of response unidentifiable response outside scope and not stated. Footnotes for specific variables or categories,7,Includes all people who stated each ethnic group whether as their only ethnic group or as one of several. Where a person reported more than one ethnic group they were counted in each applicable group. Footnotes for specific variables or categories,8,In 2001 up to six ethnicity responses per person were output (prioritised at input) and in 2006 and 2013 up to six responses per person were output (randomised after input). Footnotes for specific variables or categories,9,MELAA = Middle Eastern Latin American and African. This was a new category introduced for the 2006 Census. Previously MELAA responses were allocated to the 'other ethnicity' category. Footnotes for specific variables or categories,10,Consists of responses for a number of small ethnic groups and for New Zealander. New Zealander was included as a new category for the 2006 Census. In 2001 New Zealander was counted in the European category. Footnotes for specific variables or categories,11,Includes all people who stated each language spoken whether as their only language or as one of several. Where a person reported more than one language spoken they were counted in each applicable group. Footnotes for specific variables or categories,12,Consists of don't know refused to answer response unidentifiable response outside scope and not stated."
	}
Area_Code_and_Description	Code	Description	2001_Census_census_usually_resident_population_count(1)	2006_Census_census_usually_resident_population_count(1)	2013_Census_census_usually_resident_population_count(1)	2001_Census_census_night_population_count(2)	2006_Census_census_night_population_count(2)	2013_Census_census_night_population_count(2)
MB 0000100	0000100		9	3	3	15	9	9
MB 0000200	0000200		90	87	84	99	96	84
MB 0000300	0000300		90	72	57	93	72	54
MB 0000400	0000400		30	21	30	45	39	30
MB 0000501	0000501		0	0	0	0	0	0
[jump to end of mbs]
MB 3210003	3210003		69	75	75	72	72	75
Total New Zealand	Total New Zealand	Total New Zealand	3737280	4027947	4242048	3820749	4143282	4353201
500100 Awanui	500100	Awanui	369	348	339	405	360	342
[jump to footnotes]
Footnotes
Footnotes for all tables in this CSV file are collected below.
Footnotes for all tables
[jump to end]
Symbols	..C	confidential.
Symbols	*	not able to be calculated.
Source	Statistics New Zealand.