Skip to content

Instantly share code, notes, and snippets.

@nilshg
Last active August 29, 2015 14:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nilshg/1cbe62bdcd5bbf0a5d95 to your computer and use it in GitHub Desktop.
Save nilshg/1cbe62bdcd5bbf0a5d95 to your computer and use it in GitHub Desktop.
BHPS 2005
{
"metadata": {
"name": "",
"signature": "sha256:a98765f20861b94ac778fde457603bc725c863b3acd5b1ed3eab221aeefbefdf"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"British Household Panel Survey 2005"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: The BHPS survey year starts on the 1st of September, i.e. the year 2005 is mostly covered by BHPS Wave 15, the questions of which relate to the year starting September 1st, 2004. In the BHPS dataset, these are the files starting with `o`.\n",
"The entire dataset ranges from the files starting in `a`, which contain answers to questions related to the year starting on September 1st, 1990, to files starting in `r`, which relate to the year starting on September 1st, 2007.\n",
"\n",
"Documentation material for the BHPS is <a href = 'https://www.iser.essex.ac.uk/bhps/documentation'>available from the ISER</a>\n",
"\n",
"The BHPS was first conducted between September 1991 and January 1992, when 8167 households were selected for wave one (around 5,500 of which responded). Later, an European Community Household Panel (ECHP) subsample was added, consisting of the original UKECHP sample and an additional low-income sample from Great Britain (from Wave 7 onwards). From Wave 9, a Scotland and Wales Extension sample and from Wave 11 a Northern Ireland sample was added. \n",
"\n",
"Survey questions fall into three categories: <b>Core</b> questions that are asked in all waves, <b>Rotating Core</b> questions asked on a cyclical basis and <b>Variable Components</b> asked only once during the lifetime of the survey.\n",
"\n",
"Some of the variables of interest are:\n",
"\n",
"- Household Questionnaire:\n",
" - Household Finances\n",
" * Rent and Mortgage, Loan and HP Details, Consumer Durables, Cars\n",
"- Individual Questionnaire:\n",
" - Neighbourhood and individual\n",
" - Demographics\n",
" - Educational background and attainments\n",
" - Changes in marital status\n",
" - Current Employment\n",
" - Employment Status\n",
" - Nature of Business/Duties\n",
" - Hours worked/Overtime\n",
" - Superannuation/Pension schemes\n",
" - Wages/Salary/Deductions\n",
" - Bonuses\n",
" - Performance related pay\n",
" - Finances\n",
" - Incomes from Benefits/Allowances/Pensions/Rents/Savings/Interest/Dividends\n",
" - Pension plans\n",
" - Savings and Investments\n",
" - Internal Transfers\n",
" - External Transfers\n",
" - Personal Spending\n",
" - Bills/Everyday spending\n",
" - Value of Car\n",
" - Windfalls\n",
" - Investments and Savings (Waves 5, 10, 15)\n",
" - Commitments (Variable)\n",
" - Health and Caring\n",
" - Personal health condition\n",
" - Visits to doctor\n",
" - Hospital/Clinic use\n",
" - Use of Health and Welfare Services\n",
" - Social Services\n",
" - Checkups, Tests and Screening\n",
" - Smoking\n",
" - Caring for Relatives, Others in/outside household\n",
" - Private medical insurance\n",
" - Activities in daily living\n",
" - Attitude towards costs/payments for health care (Rotating Core)\n",
" - Childrens Health, other health scales (Waves 9 and 14)\n",
" - Employment History \n",
" - Past year\n",
" - Labour force status spells\n",
" - Size/Sector/Nature of Business/Duties\n",
" - Wages/Salary/Deductions\n",
" \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h3>Import Data</h3>\n",
"\n",
"The datasets we are importing are:\n",
"\n",
"4. <b>Fieldwork control, interview outcome and weighting information</b>\n",
"<a href = \"https://iser.sx.ac.uk/bhps/documentation/volb/wave15/ohhsamp.html\">(ohhsamp)</a>\n",
" \n",
" DataFrame (11915, 26): These are all 11,915 households in the sample, including those that did not respond to the survey. The variables in this dataset mostly relate to meta-information such as interviewer number, coversheet status etc. The variables of interest are:\n",
" \n",
" - OHID \tHousehold identification number\n",
" - OIVFHO \tFinal household interview outcome\n",
" - OXHWGHT \tHousehold Weight\n",
" \n",
"3. <b>All information from household questionnaire</b> \n",
"<a href=\"https://iser.sx.ac.uk/bhps/documentation/volb/wave15/ohhresp.html\">(ohhresp)</a>\n",
"\n",
" DataFrame (8709, 225): This dataset contains the information on those households that responded to the survey. The variables cover those issues that can only reasonably asked at a household level, e.g. housing related questions (accomodation, tenure, mortgages, rental costs, heating costs etc.), durable goods consumption, number of children, household income etc. The specific variables of interest are:\n",
" \n",
" - OHID \tHousehold identification number\n",
" - OHSOWND \tHouse owned or rented\n",
" - OHSVAL \tValue of property: home owners\n",
" - OHS2VALO \tValue of other property\n",
" - OMGHAVE \tHome mortgaged or owned outright\n",
" - OXHWGHT \tHousehold Weight\n",
" - OFIHHYR \tAnnual HH income (1.9.2004-1.9.2005)\n",
" - OMGTOT \tTotal mortgage on all property\n",
"\n",
"1. <b>Individual level variables derived from Household Cover Sheet</b> <a href = \"https://iser.sx.ac.uk/bhps/documentation/volb/wave15/oindsamp.html\">(oindsamp)</a>\n",
" \n",
" DataFrame (28011, 33), variables of interest:\n",
" \n",
" - OHID \tHousehold identification number\n",
" - OPNO \tPerson number\n",
" - OIVFIO \tIndividual interview outcome\n",
" - OIVFHO \tFinal household interview outcome\n",
" \n",
"7. <b>All information from the individual questionnaires</b>\n",
"<a href = \"https://iser.sx.ac.uk/bhps/documentation/volb/wave15/oindresp.html\">(oindresp)</a>\n",
"\n",
" DataFrame (15617, 1245): This is the individual-level data for all individuals that responded to the survey. Note that these are not necessarily all individuals in a household that responded to the survey, i.e. a household in `oindresp` could have two or more members, only one of which responded (and is hence included in `oindresp`). \n",
" \n",
" - OHID \tHousehold identification number\n",
" - OPNO \tPerson number\n",
" - ODOBY \tYear of birth\n",
" - OSEX \tSex\n",
" - OPLBORNC \tCountry of birth\n",
" - OYR2UK4 \tYear came to Britain\n",
" - ONATIDH \tNational identity - Other\n",
" - ORACEL \tEthnic group membership (long version)\n",
" - OJBSEMP\tEmployee or self-employed: current job\n",
" - OSAVED \tAmount saved each month\n",
" - OSVACK \tAmount in savings account, tessa or ISA\n",
" - OSVACSJ\tSavings in sole name or jointly held\n",
" - OSVACSK\tAmnt. savings held in sole name\n",
" - ONVESTK \tTotal value of investments\n",
" - ONVESTSJ \tInvestments in sole name or jointly held\n",
" - ONVESTSK \tAmnt. investments held in sole name\n",
" - ODEBTY \tTotal amount owed\n",
" - ODEBTSK \tSole amount owed\n",
" - OMRJSOC \tOccupation (SOC) of most recent job\n",
" - OMRJSIC \tIndustry (SIC) of most recent job\n",
" - OMRJSIC9 \tIndustry (SIC92) of most recent job\n",
"\n",
"2. <b>Kin and other relationship between pairs of individuals</b> \n",
"<a href = \"https://iser.sx.ac.uk/bhps/documentation/volb/wave15/oegoalt.html\">(oegoalt)</a>\n",
"\n",
" DataFrame (), variables of interest:\n",
" \n",
" - OHID \tHousehold Identification number\n",
" - OPNO \tEgo's Person number\n",
" - OSEX Sex\n",
" \n",
"6. <b>Demographic enumeration grid information</b>\n",
"<a href = \"https://iser.sx.ac.uk/bhps/documentation/volb/wave15/oindall.html\">(oindall)</a>\n",
"\n",
" DataFrame (21730, 56), variables of interest:\n",
" \n",
" - OHID \tHousehold identification number\n",
" - OPNO \tPerson number\n",
" - OAGE \tAge at Date of Interview\n",
" - OAGE12 \tAge at 1.12.2005\n",
" - OBUTYPE \tBenefit Unit Type\n",
" - OBUNO \tBenefit Unit Number\n",
"\n",
"8. <b>Spell employment status, dates of start and finish, job characteristics</b>\n",
"<a href = \"https://iser.sx.ac.uk/bhps/documentation/volb/wave15/ojobhist.html\">(ojobhist)</a>\n",
"\n",
" DataFrame (), variables of interest:\n",
" \n",
" \n",
"Further, we import the variables `oneta` and `onethh`; these are derived variables, found in <a href = \"http://discover.ukdataservice.ac.uk/catalogue?sn=3909\">SN 3909</a>, corresponding documentation is available from the <a href = \"http://doc.ukdataservice.ac.uk/doc/3909/mrdoc/pdf/3909userguide.pdf\">UK Data Service</a>\n",
"\n",
"These variables give post-tax, post-transfer (i.e. net) income for households which conducted the entire interview. These are the households for which `ohhsamp.oivfho == 10`, of which there are 6980 in the data of wave 15.\n",
"\n",
"1. onethh.dta, variables of interest:\n",
" - OHID: Household Identification Number\n",
" - OHHNETI: Total Household Net Income, \u00a3/week\n",
" - OHHNETDE: Total income, deflated and equivalised using McClements 'before housing costs' scale\n",
" - OHHNETDE2: deflated using modified OECD scale\n",
" \n",
"2. oneta.dta, variables of interest:\n",
" - OHID: Household Identification Number\n",
" - OHHYNETI: Total Household Annual Net Income, \u00a3/year\n",
" - OHHNYRDE: Deflated, McClements equivalised\n",
" - OHHNYRDE2: Deflated, OECD equivalised"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import tables as tab\n",
"%matplotlib inline"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Household Level Data:\n",
"ohhresp = pd.read_stata('/Users/tew207/Dropbox/QMUL/PhD/Data/BHPS/2005/ohhresp.dta')\n",
"ohhsamp = pd.read_stata('/Users/tew207/Dropbox/QMUL/PhD/Data/BHPS/2005/ohhsamp.dta')\n",
"# Individual Level Data:\n",
"oindresp = pd.read_stata('/Users/tew207/Dropbox/QMUL/PhD/Data/BHPS/2005/oindresp.dta')\n",
"oindsamp = pd.read_stata('/Users/tew207/Dropbox/QMUL/PhD/Data/BHPS/2005/oindsamp.dta')\n",
"oindall = pd.read_stata('/Users/tew207/Dropbox/QMUL/PhD/Data/BHPS/2005/oindall.dta')\n",
"oincome = pd.read_stata('/Users/tew207/Dropbox/QMUL/PhD/Data/BHPS/2005/oincome.dta')\n",
"# Derived Variables:\n",
"oneta = pd.read_stata('/Users/tew207/Dropbox/QMUL/PhD/Data/BHPS/2005/o_neta.dta')\n",
"onethh = pd.read_stata('/Users/tew207/Dropbox/QMUL/PhD/Data/BHPS/2005/o_nethh.dta')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to reduce computational overhead, we continue working only with those variables that are relevant:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"indsamp = oindsamp[['ohid', 'opno', 'oivfio', 'oivfho', 'ofinloc']]\n",
"hhresp = ohhresp[['ohid', 'ohsownd', 'ohsval', 'ohs2valo', 'omghave', 'oxhwght', 'ofihhyr', 'omgtot']] \n",
"indall = oindall[['ohid', 'opno', 'obutype', 'obuno', 'ohgr2r']]\n",
"neta = oneta[['ohid', 'ohhyneti', 'ohhnyrde', 'ohhnyrde2']]\n",
"nethh = onethh[['ohid', 'ohhneti', 'ohhnetde', 'ohhnetde2']]\n",
"indresp = oindresp[['ohid', 'opno', 'oage12', 'odoby', 'osex', 'oplbornc', 'oyr2uk4', 'onatidh', 'oracel', 'ojbsemp', \n",
" 'osaved', 'osvack', 'osvacsj', 'osvacsk', 'onvestk', 'onvestsj', 'onvestsk', 'odebty',\n",
" 'odebtsk', 'oqfedhi', 'ojbstat', 'omrjsoc', 'omrjsic', 'omrjsic9']]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h3>Coding conventions</h3>\n",
"\n",
"In the BHPS, interview outcomes other than an answer to the question at hand are coded as negative numbers. Some of these can be interpreted as a value of zero (when the answer is either 'inapplicable' or 'none') while others should be considered as missing data (when the answer was refused, or a proxy). To capture this, we construct two variables for missing and zero data:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"zero = [-3, -8]\n",
"missing = [-1, -2, -7, -9]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Merge them into a single DataFrame and reorder some columns:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df = pd.merge(indsamp, hhresp, on=['ohid'], how='left')\n",
"df = pd.merge(df, indall, on=['ohid', 'opno'], how='left')\n",
"df = pd.merge(df, indresp, on=['ohid', 'opno'], how='left')\n",
"df = pd.merge(df, neta, on=['ohid'], how='left')\n",
"df = pd.merge(df, nethh, on=['ohid'], how='left')\n",
"\n",
"first = ['ohid', 'opno', 'osvack', 'onvestk', 'ohsval', 'odebty', 'omgtot', 'ohhneti', 'ohhnetde', 'oage12']\n",
"rest = [x for x in df.columns if x not in first]\n",
"df = df[first + rest]\n",
"\n",
"# Drop duplicates\n",
"d1 = sum(df.ofinloc=='duplicate')\n",
"df = df[df.ofinloc == 'final record']\n",
"\n",
"# Drop households with reference person not aged 20-65\n",
"d2 = list(df[np.logical_or(np.logical_and(df.ohgr2r=='reference person', df.oage12<20),\\\n",
" np.logical_and(df.ohgr2r=='reference person', df.oage12>65))].ohid)\n",
"df = df[~df['ohid'].isin(d2)]\n",
"\n",
"# Drop households without complete interviews\n",
"d3 = sum(df.oivfho!='all eligible hh intv')\n",
"df = df[df.oivfho=='all eligible hh intv']\n",
"\n",
"# Drop households with missing mortgage information\n",
"d4 = df.omgtot.isin(missing)\n",
"df = df[~df['omgtot'].isin(missing)]\n",
"\n",
"# Drop outliers:\n",
"d5 = sum(df.omgtot > 2000000)\n",
"df = df[df.omgtot < 2000000]\n",
"\n",
"print \"Dropped %d duplicate observations, %d households with reference person not aged 20-65,\" \\\n",
" \"%d observations without complete interviews, %d observations with missing mortgage data,\" \\\n",
" \"and %d outliers.\" % (d1,len(d2),d3,len(d4),d5)\n",
"print \"The final dataset has %d observations and %d variables\" % (len(df), len(df.columns))\n",
"print \"There are %d households and %d benefit units\" % (len(set(df.ohid)), len(set(zip(df.ohid, df.obuno))))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Dropped 806 duplicate observations, 2086 households with reference person not aged 20-65,10133 observations without complete interviews, 13719 observations with missing mortgage data,and 1 outliers.\n",
"The final dataset has 12794 observations and 43 variables\n",
"There are 4809 households and 6057 benefit units\n"
]
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h3>Benefit Units</h3>\n",
"\n",
"The BHPS has individuals and households as the principal unit of analysis, with households being defined as <i>\"one person living alone or a group of people who either share living accommodation, or share one meal a day and who have the address as their only or main residence\"</i>. In some cases, this does not necessarily define an economic unit - as an example, three unrelated adults working full time and sharing accomodation would be counted as a household, even though they don't share income or assets. For this purpose, the BHPS also defines benefit units, which are used as the unit of account in many studies (cp. for example <a href=\"http://www.ifs.org.uk/wps/wp0221.pdf\">Banks et al. (2002)</a>. The benefit unit is defined as a single person or a couple living together with or without children. It can safely be assumed that benefit units share finances (under some mild assumptions, such as no wealth sharing with individuals outside the benefit unit) and hence they might be the appropriate unit of account for questions related to the income and wealth distribution.\n",
"\n",
"Benefit units in the BHPS are enumerated by the person number (`opno`) of the first person in the household belonging to the benefit unit. That means that in a household of a couple with two dependent children, every household member (`opno` 1 to 4) would share the same `ohid` value as well as the same `obuno` value. In contrast, in a household of a couple living with an adult (non-dependent) child, only the couple would share the same `ohid` value and `obuno` value (1), while the non-dependent child would have the same `ohid` value, but an `obuno` value equal to its `opno` (in this case 3)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h3>Debt holdings</h3>\n",
"\n",
"Debt figures are collected in the following variables:\n",
"\n",
"- ODEBTY: Total amount owed \n",
"- ODEBTSK: Sole amount owed\n",
"- OMGTOT: All mortgage on all property\n",
"\n",
"Note that the `odebty` and `odebtsk` variables include loans for hire purchase agreements, which might be classified as secured borrowing, although there is no way to seperate between the different sub-categories of debt.\n",
"\n",
"A further complication is the difference between `odebty` and `odebtsk` for some observations. While this is certainly a small number of observations (less than 200), the differences between the two variables is significant for some individuals (e.g. `ohid` 15185397, `opno` 1, a single person household where `odebtsk` equals 38,000 while `odebty` equals 240,000) and in three cases `odebtsk` is even larger than `odebty`, which seems impossible:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#df[np.logical_and(df.odebtsk>0, df.odebtsk>df.odebty)][['ohid','opno','odebtsk','odebty']]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One complication in that arises when calculating summary statistics is the treatment of missing answers, which are coded as negative numbers in the BHPS. Thus, to compute summary statistics, it is necessary to replace them with missing values (`np.Nan`), as they would otherwise be counted as observations with negative debt holdings. When adding up the different debt categories (i.e. unsecured and secured debt), we want to replace negative values in `omgtot`, `odebty` and `odebtsk` with zeros, as they represent missing, inapplicable, proxy or don't know answers. However, we must be careful to replace zeros with NaNs afterwards in order not to bias downwards our estimates by counting missing observations as households with zero debt:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"col_inapp = ['osvack', 'osvacsk', 'onvestk', 'onvestsk', 'ohsval', 'ohs2valo', 'odebty', 'odebtsk', 'omgtot']\n",
"\n",
"for col in col_inapp:\n",
" df.loc[df[col].isin(zero), col] = 0\n",
" df.loc[df[col].isin(missing), col] = np.NaN"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to consolidate debt at the household level, we call the `sum()` function on a `groupby` object. For secured debt, we want to use the mean of our grouped households, as the merging of our dataset ascribed the amount of the mortgage to each household member, so summing up would multiply the actual value of the mortgage by the amount of members in a household:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"unsecured_hh = df[['ohid', 'odebty']].groupby('ohid').sum()\n",
"df = df.join(unsecured_hh, on='ohid', rsuffix='_hh')\n",
"secured_hh = df[['ohid', 'omgtot']].groupby('ohid').mean()\n",
"df = df.join(secured_hh, on='ohid', rsuffix='_hh')\n",
"\n",
"df['totaldebt_hh'] = df.odebty_hh + df.omgtot_hh"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h3>Asset Holdings</h3>\n",
"\n",
"Assets in the BHPS are recorded in the following variables:\n",
"\n",
"- OHSVAL: Value of property if sold\n",
"- OSVACK\tAmnt. in savings account, tessa or ISA\n",
"- OSVACSK:\tAmnt. savings held in sole name\n",
"- ONVESTK:\tTotal value of investments\n",
"- ONVESTSK:\tAmnt. investments held in sole name\n",
"\n",
"Where broadly speaking `osvack` and `osvacsk` represent liquid financial wealth, `onvestk` and `onvestsk` represent illiquid financial wealth and `ohsval` represents housing wealth. Again, values of -8 indicate missing values and should be replaced by zero when adding up columns or by NaN when computing summary statistics."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"savings_hh = df[['ohid', 'osvack']].groupby('ohid').sum()\n",
"df = df.join(savings_hh, on='ohid', rsuffix='_hh')\n",
"investments_hh = df[['ohid', 'onvestk']].groupby('ohid').sum()\n",
"df = df.join(investments_hh, on='ohid', rsuffix='_hh')\n",
"\n",
"df['fin_assets_hh'] = df['osvack_hh'] + df['onvestk_hh']\n",
"\n",
"hswealth1_hh = df[['ohid', 'ohsval']].groupby('ohid').mean()\n",
"df = df.join(hswealth1_hh, on='ohid', rsuffix='_hh')\n",
"hswealth2_hh = df[['ohid', 'ohs2valo']].groupby('ohid').mean()\n",
"df = df.join(hswealth2_hh, on='ohid', rsuffix='_hh')\n",
"df['hswealth_hh'] = df['ohsval_hh'] + df['ohs2valo_hh'] \n",
"\n",
"df['totalwealth_hh'] = df['fin_assets_hh'] + df['hswealth_hh']\n",
"\n",
"df['netwealth_hh'] = df['totalwealth_hh'] - df['totaldebt_hh']"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 9
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h3>Household Income</h3>\n",
"\n",
"Household Income can easily be calculated from the variables in the `oneta` and `onethh` datasets:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"inc_w_hh = df[['ohid', 'ohhneti']].groupby('ohid').mean()\n",
"df = df.join(inc_w_hh, on='ohid', rsuffix='_hh')\n",
"inc_y_hh = df[['ohid', 'ohhyneti']].groupby('ohid').mean()\n",
"df = df.join(inc_y_hh, on='ohid', rsuffix='_hh')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 10
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h3> Finally: Keep only household heads </h3>"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df_hh = df[df.ohgr2r=='reference person']"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 11
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df_hh = df_hh.rename(columns={'odebty_hh': 'unsecured_hh', 'omgtot_hh': 'secured_hh', 'osvack_hh': 'savings_hh', \n",
" 'onvestk_hh': 'investments_hh'})"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h3>Summary Statistics</h3>"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Entire Sample\n",
"df_hh[['unsecured_hh', 'secured_hh', 'totaldebt_hh', 'savings_hh', 'investments_hh', 'fin_assets_hh', 'hswealth_hh', 'totalwealth_hh', 'netwealth_hh']].describe().T"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>count</th>\n",
" <th>mean</th>\n",
" <th>std</th>\n",
" <th>min</th>\n",
" <th>25%</th>\n",
" <th>50%</th>\n",
" <th>75%</th>\n",
" <th>max</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>unsecured_hh</th>\n",
" <td> 4730</td>\n",
" <td> 4631.194503</td>\n",
" <td> 13376.600163</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 200</td>\n",
" <td> 5000.00</td>\n",
" <td> 442000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>secured_hh</th>\n",
" <td> 4809</td>\n",
" <td> 37479.233728</td>\n",
" <td> 62240.223987</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 9000</td>\n",
" <td> 58000.00</td>\n",
" <td> 900000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>totaldebt_hh</th>\n",
" <td> 4730</td>\n",
" <td> 42186.423890</td>\n",
" <td> 65727.151605</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 18000</td>\n",
" <td> 63000.00</td>\n",
" <td> 910000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>savings_hh</th>\n",
" <td> 4374</td>\n",
" <td> 7729.578647</td>\n",
" <td> 22814.051374</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 400</td>\n",
" <td> 5300.00</td>\n",
" <td> 350000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>investments_hh</th>\n",
" <td> 4698</td>\n",
" <td> 4449.391230</td>\n",
" <td> 21592.159749</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 100.00</td>\n",
" <td> 400000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>fin_assets_hh</th>\n",
" <td> 4296</td>\n",
" <td> 12144.425512</td>\n",
" <td> 35940.671480</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 650</td>\n",
" <td> 8800.00</td>\n",
" <td> 650000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>hswealth_hh</th>\n",
" <td> 4786</td>\n",
" <td> 149270.457167</td>\n",
" <td> 193060.866224</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 120000</td>\n",
" <td> 200000.00</td>\n",
" <td> 4000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>totalwealth_hh</th>\n",
" <td> 4280</td>\n",
" <td> 159233.720327</td>\n",
" <td> 207908.345626</td>\n",
" <td> 0</td>\n",
" <td> 2000</td>\n",
" <td> 125000</td>\n",
" <td> 209312.50</td>\n",
" <td> 4200000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>netwealth_hh</th>\n",
" <td> 4232</td>\n",
" <td> 117022.136578</td>\n",
" <td> 191532.270870</td>\n",
" <td>-344200</td>\n",
" <td> 200</td>\n",
" <td> 71650</td>\n",
" <td> 157757.75</td>\n",
" <td> 4200000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 13,
"text": [
" count mean std min 25% 50% \\\n",
"unsecured_hh 4730 4631.194503 13376.600163 0 0 200 \n",
"secured_hh 4809 37479.233728 62240.223987 0 0 9000 \n",
"totaldebt_hh 4730 42186.423890 65727.151605 0 0 18000 \n",
"savings_hh 4374 7729.578647 22814.051374 0 0 400 \n",
"investments_hh 4698 4449.391230 21592.159749 0 0 0 \n",
"fin_assets_hh 4296 12144.425512 35940.671480 0 0 650 \n",
"hswealth_hh 4786 149270.457167 193060.866224 0 0 120000 \n",
"totalwealth_hh 4280 159233.720327 207908.345626 0 2000 125000 \n",
"netwealth_hh 4232 117022.136578 191532.270870 -344200 200 71650 \n",
"\n",
" 75% max \n",
"unsecured_hh 5000.00 442000 \n",
"secured_hh 58000.00 900000 \n",
"totaldebt_hh 63000.00 910000 \n",
"savings_hh 5300.00 350000 \n",
"investments_hh 100.00 400000 \n",
"fin_assets_hh 8800.00 650000 \n",
"hswealth_hh 200000.00 4000000 \n",
"totalwealth_hh 209312.50 4200000 \n",
"netwealth_hh 157757.75 4200000 "
]
}
],
"prompt_number": 13
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h3>Statistics by Age Group</h3>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Age 20 - 29"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df_hh[df_hh.oage12<=29][['unsecured_hh', 'secured_hh', 'totaldebt_hh', 'savings_hh', 'investments_hh', 'fin_assets_hh', \\\n",
" 'hswealth_hh', 'totalwealth_hh', 'netwealth_hh']].describe().T"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>count</th>\n",
" <th>mean</th>\n",
" <th>std</th>\n",
" <th>min</th>\n",
" <th>25%</th>\n",
" <th>50%</th>\n",
" <th>75%</th>\n",
" <th>max</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>unsecured_hh</th>\n",
" <td> 646</td>\n",
" <td> 7197.832817</td>\n",
" <td> 15052.903570</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 1900.0</td>\n",
" <td> 9000.0</td>\n",
" <td> 250500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>secured_hh</th>\n",
" <td> 659</td>\n",
" <td> 34552.427921</td>\n",
" <td> 54863.808461</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 0.0</td>\n",
" <td> 62000.0</td>\n",
" <td> 435000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>totaldebt_hh</th>\n",
" <td> 646</td>\n",
" <td> 41809.365325</td>\n",
" <td> 57935.797935</td>\n",
" <td> 0</td>\n",
" <td> 400</td>\n",
" <td> 14000.0</td>\n",
" <td> 69750.0</td>\n",
" <td> 436600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>savings_hh</th>\n",
" <td> 626</td>\n",
" <td> 2299.753994</td>\n",
" <td> 6944.733105</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 23.5</td>\n",
" <td> 1511.5</td>\n",
" <td> 105000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>investments_hh</th>\n",
" <td> 648</td>\n",
" <td> 1039.427469</td>\n",
" <td> 9375.465618</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 0.0</td>\n",
" <td> 0.0</td>\n",
" <td> 208000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>fin_assets_hh</th>\n",
" <td> 617</td>\n",
" <td> 3402.260940</td>\n",
" <td> 14004.448186</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 50.0</td>\n",
" <td> 2000.0</td>\n",
" <td> 272500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>hswealth_hh</th>\n",
" <td> 659</td>\n",
" <td> 61821.699545</td>\n",
" <td> 89877.967755</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 0.0</td>\n",
" <td> 115000.0</td>\n",
" <td> 600000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>totalwealth_hh</th>\n",
" <td> 617</td>\n",
" <td> 65344.724473</td>\n",
" <td> 97315.368430</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 2300.0</td>\n",
" <td> 115800.0</td>\n",
" <td> 872500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>netwealth_hh</th>\n",
" <td> 606</td>\n",
" <td> 23237.608911</td>\n",
" <td> 59172.909355</td>\n",
" <td>-250100</td>\n",
" <td>-1800</td>\n",
" <td> 0.0</td>\n",
" <td> 34300.0</td>\n",
" <td> 522500</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 14,
"text": [
" count mean std min 25% 50% \\\n",
"unsecured_hh 646 7197.832817 15052.903570 0 0 1900.0 \n",
"secured_hh 659 34552.427921 54863.808461 0 0 0.0 \n",
"totaldebt_hh 646 41809.365325 57935.797935 0 400 14000.0 \n",
"savings_hh 626 2299.753994 6944.733105 0 0 23.5 \n",
"investments_hh 648 1039.427469 9375.465618 0 0 0.0 \n",
"fin_assets_hh 617 3402.260940 14004.448186 0 0 50.0 \n",
"hswealth_hh 659 61821.699545 89877.967755 0 0 0.0 \n",
"totalwealth_hh 617 65344.724473 97315.368430 0 0 2300.0 \n",
"netwealth_hh 606 23237.608911 59172.909355 -250100 -1800 0.0 \n",
"\n",
" 75% max \n",
"unsecured_hh 9000.0 250500 \n",
"secured_hh 62000.0 435000 \n",
"totaldebt_hh 69750.0 436600 \n",
"savings_hh 1511.5 105000 \n",
"investments_hh 0.0 208000 \n",
"fin_assets_hh 2000.0 272500 \n",
"hswealth_hh 115000.0 600000 \n",
"totalwealth_hh 115800.0 872500 \n",
"netwealth_hh 34300.0 522500 "
]
}
],
"prompt_number": 14
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Age 30 - 39"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df_hh[np.logical_and(df_hh.oage12>=30, df_hh.oage12<40)][['unsecured_hh', 'secured_hh', 'totaldebt_hh', 'savings_hh',\\\n",
" 'investments_hh', 'fin_assets_hh', 'hswealth_hh', 'totalwealth_hh',\\\n",
" 'netwealth_hh']].describe().T"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>count</th>\n",
" <th>mean</th>\n",
" <th>std</th>\n",
" <th>min</th>\n",
" <th>25%</th>\n",
" <th>50%</th>\n",
" <th>75%</th>\n",
" <th>max</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>unsecured_hh</th>\n",
" <td> 1147</td>\n",
" <td> 4700.722755</td>\n",
" <td> 8754.904203</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 950</td>\n",
" <td> 6000.00</td>\n",
" <td> 110000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>secured_hh</th>\n",
" <td> 1172</td>\n",
" <td> 55522.093003</td>\n",
" <td> 71944.884836</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 41800</td>\n",
" <td> 80000.00</td>\n",
" <td> 900000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>totaldebt_hh</th>\n",
" <td> 1147</td>\n",
" <td> 60538.031386</td>\n",
" <td> 74788.807519</td>\n",
" <td> 0</td>\n",
" <td> 3115</td>\n",
" <td> 45000</td>\n",
" <td> 87630.00</td>\n",
" <td> 903500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>savings_hh</th>\n",
" <td> 1075</td>\n",
" <td> 4498.274419</td>\n",
" <td> 14871.325366</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 200</td>\n",
" <td> 3000.00</td>\n",
" <td> 220000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>investments_hh</th>\n",
" <td> 1151</td>\n",
" <td> 1989.174631</td>\n",
" <td> 11970.851520</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 2.00</td>\n",
" <td> 251500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>fin_assets_hh</th>\n",
" <td> 1061</td>\n",
" <td> 6609.175306</td>\n",
" <td> 20807.306925</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 286</td>\n",
" <td> 5000.00</td>\n",
" <td> 270200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>hswealth_hh</th>\n",
" <td> 1167</td>\n",
" <td> 130829.353899</td>\n",
" <td> 159410.870218</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 115000</td>\n",
" <td> 180000.00</td>\n",
" <td> 3000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>totalwealth_hh</th>\n",
" <td> 1057</td>\n",
" <td> 134862.700095</td>\n",
" <td> 166518.611507</td>\n",
" <td> 0</td>\n",
" <td> 2000</td>\n",
" <td> 115000</td>\n",
" <td> 183500.00</td>\n",
" <td> 3000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>netwealth_hh</th>\n",
" <td> 1044</td>\n",
" <td> 76547.723180</td>\n",
" <td> 135581.524546</td>\n",
" <td>-155995</td>\n",
" <td> 100</td>\n",
" <td> 52000</td>\n",
" <td> 104076.25</td>\n",
" <td> 3000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 15,
"text": [
" count mean std min 25% 50% \\\n",
"unsecured_hh 1147 4700.722755 8754.904203 0 0 950 \n",
"secured_hh 1172 55522.093003 71944.884836 0 0 41800 \n",
"totaldebt_hh 1147 60538.031386 74788.807519 0 3115 45000 \n",
"savings_hh 1075 4498.274419 14871.325366 0 0 200 \n",
"investments_hh 1151 1989.174631 11970.851520 0 0 0 \n",
"fin_assets_hh 1061 6609.175306 20807.306925 0 0 286 \n",
"hswealth_hh 1167 130829.353899 159410.870218 0 0 115000 \n",
"totalwealth_hh 1057 134862.700095 166518.611507 0 2000 115000 \n",
"netwealth_hh 1044 76547.723180 135581.524546 -155995 100 52000 \n",
"\n",
" 75% max \n",
"unsecured_hh 6000.00 110000 \n",
"secured_hh 80000.00 900000 \n",
"totaldebt_hh 87630.00 903500 \n",
"savings_hh 3000.00 220000 \n",
"investments_hh 2.00 251500 \n",
"fin_assets_hh 5000.00 270200 \n",
"hswealth_hh 180000.00 3000000 \n",
"totalwealth_hh 183500.00 3000000 \n",
"netwealth_hh 104076.25 3000000 "
]
}
],
"prompt_number": 15
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Age 40 - 49"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df_hh[np.logical_and(df_hh.oage12>=40, df_hh.oage12<50)][['unsecured_hh', 'secured_hh', 'totaldebt_hh', 'savings_hh',\\\n",
" 'investments_hh', 'fin_assets_hh', 'hswealth_hh', 'totalwealth_hh',\\\n",
" 'netwealth_hh']].describe().T"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>count</th>\n",
" <th>mean</th>\n",
" <th>std</th>\n",
" <th>min</th>\n",
" <th>25%</th>\n",
" <th>50%</th>\n",
" <th>75%</th>\n",
" <th>max</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>unsecured_hh</th>\n",
" <td> 1280</td>\n",
" <td> 5125.762500</td>\n",
" <td> 16230.359535</td>\n",
" <td> 0</td>\n",
" <td> 0.0</td>\n",
" <td> 300.0</td>\n",
" <td> 5043.75</td>\n",
" <td> 442000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>secured_hh</th>\n",
" <td> 1301</td>\n",
" <td> 47076.021522</td>\n",
" <td> 65307.224401</td>\n",
" <td> 0</td>\n",
" <td> 0.0</td>\n",
" <td> 31000.0</td>\n",
" <td> 69000.00</td>\n",
" <td> 740000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>totaldebt_hh</th>\n",
" <td> 1280</td>\n",
" <td> 52550.687500</td>\n",
" <td> 70125.166491</td>\n",
" <td> 0</td>\n",
" <td> 2042.5</td>\n",
" <td> 36000.0</td>\n",
" <td> 73550.00</td>\n",
" <td> 740000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>savings_hh</th>\n",
" <td> 1215</td>\n",
" <td> 7568.526749</td>\n",
" <td> 22254.643935</td>\n",
" <td> 0</td>\n",
" <td> 0.0</td>\n",
" <td> 550.0</td>\n",
" <td> 5125.00</td>\n",
" <td> 255000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>investments_hh</th>\n",
" <td> 1276</td>\n",
" <td> 4627.007053</td>\n",
" <td> 20630.122505</td>\n",
" <td> 0</td>\n",
" <td> 0.0</td>\n",
" <td> 0.0</td>\n",
" <td> 400.00</td>\n",
" <td> 350030</td>\n",
" </tr>\n",
" <tr>\n",
" <th>fin_assets_hh</th>\n",
" <td> 1195</td>\n",
" <td> 12493.911297</td>\n",
" <td> 36649.354938</td>\n",
" <td> 0</td>\n",
" <td> 0.0</td>\n",
" <td> 1000.0</td>\n",
" <td> 10000.00</td>\n",
" <td> 395003</td>\n",
" </tr>\n",
" <tr>\n",
" <th>hswealth_hh</th>\n",
" <td> 1296</td>\n",
" <td> 175938.175154</td>\n",
" <td> 206886.324143</td>\n",
" <td> 0</td>\n",
" <td> 70000.0</td>\n",
" <td> 140000.0</td>\n",
" <td> 220000.00</td>\n",
" <td> 2700000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>totalwealth_hh</th>\n",
" <td> 1191</td>\n",
" <td> 189499.142737</td>\n",
" <td> 224490.951612</td>\n",
" <td> 0</td>\n",
" <td> 70002.5</td>\n",
" <td> 150100.0</td>\n",
" <td> 240000.00</td>\n",
" <td> 2700000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>netwealth_hh</th>\n",
" <td> 1178</td>\n",
" <td> 137959.044992</td>\n",
" <td> 200714.754531</td>\n",
" <td>-334000</td>\n",
" <td> 26475.0</td>\n",
" <td> 93900.5</td>\n",
" <td> 174862.50</td>\n",
" <td> 2589000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 16,
"text": [
" count mean std min 25% \\\n",
"unsecured_hh 1280 5125.762500 16230.359535 0 0.0 \n",
"secured_hh 1301 47076.021522 65307.224401 0 0.0 \n",
"totaldebt_hh 1280 52550.687500 70125.166491 0 2042.5 \n",
"savings_hh 1215 7568.526749 22254.643935 0 0.0 \n",
"investments_hh 1276 4627.007053 20630.122505 0 0.0 \n",
"fin_assets_hh 1195 12493.911297 36649.354938 0 0.0 \n",
"hswealth_hh 1296 175938.175154 206886.324143 0 70000.0 \n",
"totalwealth_hh 1191 189499.142737 224490.951612 0 70002.5 \n",
"netwealth_hh 1178 137959.044992 200714.754531 -334000 26475.0 \n",
"\n",
" 50% 75% max \n",
"unsecured_hh 300.0 5043.75 442000 \n",
"secured_hh 31000.0 69000.00 740000 \n",
"totaldebt_hh 36000.0 73550.00 740000 \n",
"savings_hh 550.0 5125.00 255000 \n",
"investments_hh 0.0 400.00 350030 \n",
"fin_assets_hh 1000.0 10000.00 395003 \n",
"hswealth_hh 140000.0 220000.00 2700000 \n",
"totalwealth_hh 150100.0 240000.00 2700000 \n",
"netwealth_hh 93900.5 174862.50 2589000 "
]
}
],
"prompt_number": 16
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Age 50 - 59"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df_hh[np.logical_and(df_hh.oage12>=50, df_hh.oage12<60)][['unsecured_hh', 'secured_hh', 'totaldebt_hh', 'savings_hh',\\\n",
" 'investments_hh', 'fin_assets_hh', 'hswealth_hh', 'totalwealth_hh',\\\n",
" 'netwealth_hh']].describe().T"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>count</th>\n",
" <th>mean</th>\n",
" <th>std</th>\n",
" <th>min</th>\n",
" <th>25%</th>\n",
" <th>50%</th>\n",
" <th>75%</th>\n",
" <th>max</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>unsecured_hh</th>\n",
" <td> 1051</td>\n",
" <td> 4183.559467</td>\n",
" <td> 15074.666718</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 4000</td>\n",
" <td> 415000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>secured_hh</th>\n",
" <td> 1063</td>\n",
" <td> 25080.349012</td>\n",
" <td> 55706.337611</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 30000</td>\n",
" <td> 900000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>totaldebt_hh</th>\n",
" <td> 1051</td>\n",
" <td> 28958.546147</td>\n",
" <td> 58573.422202</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 7000</td>\n",
" <td> 36250</td>\n",
" <td> 910000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>savings_hh</th>\n",
" <td> 946</td>\n",
" <td> 11084.853066</td>\n",
" <td> 26468.923953</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 1300</td>\n",
" <td> 10000</td>\n",
" <td> 350000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>investments_hh</th>\n",
" <td> 1036</td>\n",
" <td> 6969.016409</td>\n",
" <td> 29665.079048</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 1000</td>\n",
" <td> 400000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>fin_assets_hh</th>\n",
" <td> 929</td>\n",
" <td> 17750.470398</td>\n",
" <td> 43945.638431</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 2000</td>\n",
" <td> 16000</td>\n",
" <td> 650000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>hswealth_hh</th>\n",
" <td> 1056</td>\n",
" <td> 181656.937500</td>\n",
" <td> 235514.231403</td>\n",
" <td> 0</td>\n",
" <td> 68750</td>\n",
" <td> 150000</td>\n",
" <td> 231250</td>\n",
" <td> 4000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>totalwealth_hh</th>\n",
" <td> 925</td>\n",
" <td> 198791.149189</td>\n",
" <td> 258606.103338</td>\n",
" <td> 0</td>\n",
" <td> 66600</td>\n",
" <td> 157000</td>\n",
" <td> 257225</td>\n",
" <td> 4200000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>netwealth_hh</th>\n",
" <td> 921</td>\n",
" <td> 168438.609121</td>\n",
" <td> 249779.957229</td>\n",
" <td>-344200</td>\n",
" <td> 37200</td>\n",
" <td> 125920</td>\n",
" <td> 220700</td>\n",
" <td> 4200000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 17,
"text": [
" count mean std min 25% 50% \\\n",
"unsecured_hh 1051 4183.559467 15074.666718 0 0 0 \n",
"secured_hh 1063 25080.349012 55706.337611 0 0 0 \n",
"totaldebt_hh 1051 28958.546147 58573.422202 0 0 7000 \n",
"savings_hh 946 11084.853066 26468.923953 0 0 1300 \n",
"investments_hh 1036 6969.016409 29665.079048 0 0 0 \n",
"fin_assets_hh 929 17750.470398 43945.638431 0 0 2000 \n",
"hswealth_hh 1056 181656.937500 235514.231403 0 68750 150000 \n",
"totalwealth_hh 925 198791.149189 258606.103338 0 66600 157000 \n",
"netwealth_hh 921 168438.609121 249779.957229 -344200 37200 125920 \n",
"\n",
" 75% max \n",
"unsecured_hh 4000 415000 \n",
"secured_hh 30000 900000 \n",
"totaldebt_hh 36250 910000 \n",
"savings_hh 10000 350000 \n",
"investments_hh 1000 400000 \n",
"fin_assets_hh 16000 650000 \n",
"hswealth_hh 231250 4000000 \n",
"totalwealth_hh 257225 4200000 \n",
"netwealth_hh 220700 4200000 "
]
}
],
"prompt_number": 17
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Age 60+"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df_hh[df_hh.oage12>=60][['unsecured_hh', 'secured_hh', 'totaldebt_hh', 'savings_hh', 'investments_hh', 'fin_assets_hh',\\\n",
" 'hswealth_hh', 'totalwealth_hh', 'netwealth_hh']].describe().T"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>count</th>\n",
" <th>mean</th>\n",
" <th>std</th>\n",
" <th>min</th>\n",
" <th>25%</th>\n",
" <th>50%</th>\n",
" <th>75%</th>\n",
" <th>max</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>unsecured_hh</th>\n",
" <td> 606</td>\n",
" <td> 1495.254125</td>\n",
" <td> 6544.761100</td>\n",
" <td> 0</td>\n",
" <td> 0.0</td>\n",
" <td> 0.0</td>\n",
" <td> 115.00</td>\n",
" <td> 132000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>secured_hh</th>\n",
" <td> 614</td>\n",
" <td> 7311.688925</td>\n",
" <td> 28965.924001</td>\n",
" <td> 0</td>\n",
" <td> 0.0</td>\n",
" <td> 0.0</td>\n",
" <td> 0.00</td>\n",
" <td> 350000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>totaldebt_hh</th>\n",
" <td> 606</td>\n",
" <td> 8903.466997</td>\n",
" <td> 31058.047402</td>\n",
" <td> 0</td>\n",
" <td> 0.0</td>\n",
" <td> 0.0</td>\n",
" <td> 2000.00</td>\n",
" <td> 350000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>savings_hh</th>\n",
" <td> 512</td>\n",
" <td> 15335.654297</td>\n",
" <td> 36347.092324</td>\n",
" <td> 0</td>\n",
" <td> 0.0</td>\n",
" <td> 2000.0</td>\n",
" <td> 14000.00</td>\n",
" <td> 350000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>investments_hh</th>\n",
" <td> 587</td>\n",
" <td> 8204.751278</td>\n",
" <td> 28778.229896</td>\n",
" <td> 0</td>\n",
" <td> 0.0</td>\n",
" <td> 0.0</td>\n",
" <td> 1000.00</td>\n",
" <td> 317000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>fin_assets_hh</th>\n",
" <td> 494</td>\n",
" <td> 23563.787449</td>\n",
" <td> 53268.600621</td>\n",
" <td> 0</td>\n",
" <td> 0.0</td>\n",
" <td> 3010.5</td>\n",
" <td> 21750.00</td>\n",
" <td> 440000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>hswealth_hh</th>\n",
" <td> 608</td>\n",
" <td> 166356.004934</td>\n",
" <td> 189571.567699</td>\n",
" <td> 0</td>\n",
" <td> 55750.0</td>\n",
" <td> 130000.0</td>\n",
" <td> 220000.00</td>\n",
" <td> 2000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>totalwealth_hh</th>\n",
" <td> 490</td>\n",
" <td> 181790.738776</td>\n",
" <td> 198296.631824</td>\n",
" <td> 0</td>\n",
" <td> 35250.0</td>\n",
" <td> 135700.0</td>\n",
" <td> 246890.75</td>\n",
" <td> 1230000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>netwealth_hh</th>\n",
" <td> 483</td>\n",
" <td> 173068.641822</td>\n",
" <td> 191070.108064</td>\n",
" <td>-19700</td>\n",
" <td> 25612.5</td>\n",
" <td> 126550.0</td>\n",
" <td> 234750.00</td>\n",
" <td> 1230000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 18,
"text": [
" count mean std min 25% 50% \\\n",
"unsecured_hh 606 1495.254125 6544.761100 0 0.0 0.0 \n",
"secured_hh 614 7311.688925 28965.924001 0 0.0 0.0 \n",
"totaldebt_hh 606 8903.466997 31058.047402 0 0.0 0.0 \n",
"savings_hh 512 15335.654297 36347.092324 0 0.0 2000.0 \n",
"investments_hh 587 8204.751278 28778.229896 0 0.0 0.0 \n",
"fin_assets_hh 494 23563.787449 53268.600621 0 0.0 3010.5 \n",
"hswealth_hh 608 166356.004934 189571.567699 0 55750.0 130000.0 \n",
"totalwealth_hh 490 181790.738776 198296.631824 0 35250.0 135700.0 \n",
"netwealth_hh 483 173068.641822 191070.108064 -19700 25612.5 126550.0 \n",
"\n",
" 75% max \n",
"unsecured_hh 115.00 132000 \n",
"secured_hh 0.00 350000 \n",
"totaldebt_hh 2000.00 350000 \n",
"savings_hh 14000.00 350000 \n",
"investments_hh 1000.00 317000 \n",
"fin_assets_hh 21750.00 440000 \n",
"hswealth_hh 220000.00 2000000 \n",
"totalwealth_hh 246890.75 1230000 \n",
"netwealth_hh 234750.00 1230000 "
]
}
],
"prompt_number": 18
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h3>Migrants</h3>\n",
"\n",
"The BHPS only contains 26 individuals who consider themselves as having a national identity other than English, Scottish, Welsh or Irish:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"oindresp.onatidh.value_counts()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 19,
"text": [
"-8 14024\n",
"-7 982\n",
"not mentioned 572\n",
"yes 26\n",
"-9 13\n",
"dtype: int64"
]
}
],
"prompt_number": 19
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After restricting the sample to household heads, that number drops to zero:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df_hh.onatidh.value_counts()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 20,
"text": [
"-8 4744\n",
"not mentioned 62\n",
"-9 3\n",
"dtype: int64"
]
}
],
"prompt_number": 20
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Furthermore, there are only 39 individuals that came to the UK at some point in their lives:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"oindresp[oindresp.oyr2uk4>0].oyr2uk4.value_counts()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 21,
"text": [
"2004 7\n",
"2005 4\n",
"1992 3\n",
"2003 3\n",
"2002 3\n",
"1989 2\n",
"1990 2\n",
"1995 2\n",
"1993 2\n",
"1983 1\n",
"1986 1\n",
"1987 1\n",
"1991 1\n",
"1997 1\n",
"1982 1\n",
"2000 1\n",
"1946 1\n",
"1973 1\n",
"1981 1\n",
"1985 1\n",
"dtype: int64"
]
}
],
"prompt_number": 21
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Again, restricting the sample to household heads, the number drops to zero:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df_hh[df_hh.oyr2uk4>0].oyr2uk4.value_counts()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 22,
"text": [
"Series([], dtype: int64)"
]
}
],
"prompt_number": 22
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment