nilshg/bhps2005

## bhps2005
{
 "metadata": {
  "name": "",
  "signature": "sha256:a98765f20861b94ac778fde457603bc725c863b3acd5b1ed3eab221aeefbefdf"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "British Household Panel Survey 2005"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Note: The BHPS survey year starts on the 1st of September, i.e. the year 2005 is mostly covered by BHPS Wave 15, the questions of which relate to the year starting September 1st, 2004. In the BHPS dataset, these are the files starting with `o`.\n",
      "The entire dataset ranges from the files starting in `a`, which contain answers to questions related to the year starting on September 1st, 1990, to files starting in `r`, which relate to the year starting on September 1st, 2007.\n",
      "\n",
      "Documentation material for the BHPS is <a href = 'https://www.iser.essex.ac.uk/bhps/documentation'>available from the ISER</a>\n",
      "\n",
      "The BHPS was first conducted between September 1991 and January 1992, when 8167 households were selected for wave one (around 5,500 of which responded). Later, an European Community Household Panel (ECHP) subsample was added, consisting of the original UKECHP sample and an additional low-income sample from Great Britain (from Wave 7 onwards). From Wave 9, a Scotland and Wales Extension sample and from Wave 11 a Northern Ireland sample was added. \n",
      "\n",
      "Survey questions fall into three categories: <b>Core</b> questions that are asked in all waves, <b>Rotating Core</b> questions asked on a cyclical basis and <b>Variable Components</b> asked only once during the lifetime of the survey.\n",
      "\n",
      "Some of the variables of interest are:\n",
      "\n",
      "- Household Questionnaire:\n",
      "     - Household Finances\n",
      "         * Rent and Mortgage, Loan and HP Details, Consumer Durables, Cars\n",
      "- Individual Questionnaire:\n",
      "    - Neighbourhood and individual\n",
      "        - Demographics\n",
      "        - Educational background and attainments\n",
      "        - Changes in marital status\n",
      "    - Current Employment\n",
      "        - Employment Status\n",
      "        - Nature of Business/Duties\n",
      "        - Hours worked/Overtime\n",
      "        - Superannuation/Pension schemes\n",
      "        - Wages/Salary/Deductions\n",
      "        - Bonuses\n",
      "        - Performance related pay\n",
      "    - Finances\n",
      "        - Incomes from Benefits/Allowances/Pensions/Rents/Savings/Interest/Dividends\n",
      "        - Pension plans\n",
      "        - Savings and Investments\n",
      "        - Internal Transfers\n",
      "        - External Transfers\n",
      "        - Personal Spending\n",
      "        - Bills/Everyday spending\n",
      "        - Value of Car\n",
      "        - Windfalls\n",
      "        - Investments and Savings (Waves 5, 10, 15)\n",
      "        - Commitments (Variable)\n",
      "    - Health and Caring\n",
      "        - Personal health condition\n",
      "        - Visits to doctor\n",
      "        - Hospital/Clinic use\n",
      "        - Use of Health and Welfare Services\n",
      "        - Social Services\n",
      "        - Checkups, Tests and Screening\n",
      "        - Smoking\n",
      "        - Caring for Relatives, Others in/outside household\n",
      "        - Private medical insurance\n",
      "        - Activities in daily living\n",
      "        - Attitude towards costs/payments for health care (Rotating Core)\n",
      "        - Childrens Health, other health scales (Waves 9 and 14)\n",
      "    - Employment History \n",
      "        - Past year\n",
      "        - Labour force status spells\n",
      "        - Size/Sector/Nature of Business/Duties\n",
      "        - Wages/Salary/Deductions\n",
      "        \n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<h3>Import Data</h3>\n",
      "\n",
      "The datasets we are importing are:\n",
      "\n",
      "4. <b>Fieldwork control, interview outcome and weighting information</b>\n",
      "<a href = \"https://iser.sx.ac.uk/bhps/documentation/volb/wave15/ohhsamp.html\">(ohhsamp)</a>\n",
      "    \n",
      "    DataFrame (11915, 26): These are all 11,915 households in the sample, including those that did not respond to the survey. The variables in this dataset mostly relate to meta-information such as interviewer number, coversheet status etc. The variables of interest are:\n",
      "    \n",
      "    - OHID \tHousehold identification number\n",
      "    - OIVFHO \tFinal household interview outcome\n",
      "    - OXHWGHT \tHousehold Weight\n",
      "    \n",
      "3. <b>All information from household questionnaire</b> \n",
      "<a href=\"https://iser.sx.ac.uk/bhps/documentation/volb/wave15/ohhresp.html\">(ohhresp)</a>\n",
      "\n",
      "    DataFrame (8709, 225): This dataset contains the information on those households that responded to the survey. The variables cover those issues that can only reasonably asked at a household level, e.g. housing related questions (accomodation, tenure, mortgages, rental costs, heating costs etc.), durable goods consumption, number of children, household income etc. The specific variables of interest are:\n",
      "    \n",
      "    - OHID \tHousehold identification number\n",
      "    - OHSOWND \tHouse owned or rented\n",
      "    - OHSVAL \tValue of property: home owners\n",
      "    - OHS2VALO \tValue of other property\n",
      "    - OMGHAVE \tHome mortgaged or owned outright\n",
      "    - OXHWGHT \tHousehold Weight\n",
      "    - OFIHHYR \tAnnual HH income (1.9.2004-1.9.2005)\n",
      "    - OMGTOT \tTotal mortgage on all property\n",
      "\n",
      "1. <b>Individual level variables derived from Household Cover Sheet</b> <a href = \"https://iser.sx.ac.uk/bhps/documentation/volb/wave15/oindsamp.html\">(oindsamp)</a>\n",
      "    \n",
      "    DataFrame (28011, 33), variables of interest:\n",
      "    \n",
      "    - OHID \tHousehold identification number\n",
      "    - OPNO \tPerson number\n",
      "    - OIVFIO \tIndividual interview outcome\n",
      "    - OIVFHO \tFinal household interview outcome\n",
      "    \n",
      "7. <b>All information from the individual questionnaires</b>\n",
      "<a href = \"https://iser.sx.ac.uk/bhps/documentation/volb/wave15/oindresp.html\">(oindresp)</a>\n",
      "\n",
      "    DataFrame (15617, 1245): This is the individual-level data for all individuals that responded to the survey. Note that these are not necessarily all individuals in a household that responded to the survey, i.e. a household in `oindresp` could have two or more members, only one of which responded (and is hence included in `oindresp`). \n",
      "    \n",
      "    - OHID \tHousehold identification number\n",
      "    - OPNO \tPerson number\n",
      "    - ODOBY \tYear of birth\n",
      "    - OSEX \tSex\n",
      "    - OPLBORNC \tCountry of birth\n",
      "    - OYR2UK4 \tYear came to Britain\n",
      "    - ONATIDH \tNational identity - Other\n",
      "    - ORACEL \tEthnic group membership (long version)\n",
      "    - OJBSEMP\tEmployee or self-employed: current job\n",
      "    - OSAVED \tAmount saved each month\n",
      "    - OSVACK \tAmount in savings account, tessa or ISA\n",
      "    - OSVACSJ\tSavings in sole name or jointly held\n",
      "    - OSVACSK\tAmnt. savings held in sole name\n",
      "    - ONVESTK \tTotal value of investments\n",
      "    - ONVESTSJ \tInvestments in sole name or jointly held\n",
      "    - ONVESTSK \tAmnt. investments held in sole name\n",
      "    - ODEBTY \tTotal amount owed\n",
      "    - ODEBTSK \tSole amount owed\n",
      "    - OMRJSOC \tOccupation (SOC) of most recent job\n",
      "    - OMRJSIC \tIndustry (SIC) of most recent job\n",
      "    - OMRJSIC9 \tIndustry (SIC92) of most recent job\n",
      "\n",
      "2. <b>Kin and other relationship between pairs of individuals</b> \n",
      "<a href = \"https://iser.sx.ac.uk/bhps/documentation/volb/wave15/oegoalt.html\">(oegoalt)</a>\n",
      "\n",
      "    DataFrame (), variables of interest:\n",
      "    \n",
      "    - OHID \tHousehold Identification number\n",
      "    - OPNO \tEgo's Person number\n",
      "    - OSEX  Sex\n",
      "    \n",
      "6. <b>Demographic enumeration grid information</b>\n",
      "<a href = \"https://iser.sx.ac.uk/bhps/documentation/volb/wave15/oindall.html\">(oindall)</a>\n",
      "\n",
      "  DataFrame (21730, 56), variables of interest:\n",
      "  \n",
      "  - OHID \tHousehold identification number\n",
      "  - OPNO \tPerson number\n",
      "  - OAGE \tAge at Date of Interview\n",
      "  - OAGE12 \tAge at 1.12.2005\n",
      "  - OBUTYPE \tBenefit Unit Type\n",
      "  - OBUNO \tBenefit Unit Number\n",
      "\n",
      "8. <b>Spell employment status, dates of start and finish, job characteristics</b>\n",
      "<a href = \"https://iser.sx.ac.uk/bhps/documentation/volb/wave15/ojobhist.html\">(ojobhist)</a>\n",
      "\n",
      "    DataFrame (), variables of interest:\n",
      "    \n",
      " \n",
      "Further, we import the variables `oneta` and `onethh`; these are derived variables, found in <a href = \"http://discover.ukdataservice.ac.uk/catalogue?sn=3909\">SN 3909</a>, corresponding documentation is available from the <a href = \"http://doc.ukdataservice.ac.uk/doc/3909/mrdoc/pdf/3909userguide.pdf\">UK Data Service</a>\n",
      "\n",
      "These variables give post-tax, post-transfer (i.e. net) income for households which conducted the entire interview. These are the households for which `ohhsamp.oivfho == 10`, of which there are 6980 in the data of wave 15.\n",
      "\n",
      "1. onethh.dta, variables of interest:\n",
      "    - OHID: Household Identification Number\n",
      "    - OHHNETI: Total Household Net Income, \u00a3/week\n",
      "    - OHHNETDE: Total income, deflated and equivalised using McClements 'before housing costs' scale\n",
      "    - OHHNETDE2: deflated using modified OECD scale\n",
      "    \n",
      "2. oneta.dta, variables of interest:\n",
      "    - OHID: Household Identification Number\n",
      "    - OHHYNETI: Total Household Annual Net Income, \u00a3/year\n",
      "    - OHHNYRDE: Deflated, McClements equivalised\n",
      "    - OHHNYRDE2: Deflated, OECD equivalised"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import pandas as pd\n",
      "import matplotlib.pyplot as plt\n",
      "import numpy as np\n",
      "import tables as tab\n",
      "%matplotlib inline"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 1
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Household Level Data:\n",
      "ohhresp = pd.read_stata('/Users/tew207/Dropbox/QMUL/PhD/Data/BHPS/2005/ohhresp.dta')\n",
      "ohhsamp = pd.read_stata('/Users/tew207/Dropbox/QMUL/PhD/Data/BHPS/2005/ohhsamp.dta')\n",
      "# Individual Level Data:\n",
      "oindresp = pd.read_stata('/Users/tew207/Dropbox/QMUL/PhD/Data/BHPS/2005/oindresp.dta')\n",
      "oindsamp = pd.read_stata('/Users/tew207/Dropbox/QMUL/PhD/Data/BHPS/2005/oindsamp.dta')\n",
      "oindall = pd.read_stata('/Users/tew207/Dropbox/QMUL/PhD/Data/BHPS/2005/oindall.dta')\n",
      "oincome = pd.read_stata('/Users/tew207/Dropbox/QMUL/PhD/Data/BHPS/2005/oincome.dta')\n",
      "# Derived Variables:\n",
      "oneta = pd.read_stata('/Users/tew207/Dropbox/QMUL/PhD/Data/BHPS/2005/o_neta.dta')\n",
      "onethh = pd.read_stata('/Users/tew207/Dropbox/QMUL/PhD/Data/BHPS/2005/o_nethh.dta')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 2
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "In order to reduce computational overhead, we continue working only with those variables that are relevant:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "indsamp = oindsamp[['ohid', 'opno', 'oivfio', 'oivfho', 'ofinloc']]\n",
      "hhresp = ohhresp[['ohid', 'ohsownd', 'ohsval', 'ohs2valo', 'omghave', 'oxhwght', 'ofihhyr', 'omgtot']]    \n",
      "indall = oindall[['ohid', 'opno', 'obutype', 'obuno', 'ohgr2r']]\n",
      "neta = oneta[['ohid', 'ohhyneti', 'ohhnyrde', 'ohhnyrde2']]\n",
      "nethh = onethh[['ohid', 'ohhneti', 'ohhnetde', 'ohhnetde2']]\n",
      "indresp = oindresp[['ohid', 'opno', 'oage12', 'odoby', 'osex', 'oplbornc', 'oyr2uk4', 'onatidh', 'oracel', 'ojbsemp', \n",
      "                    'osaved', 'osvack', 'osvacsj', 'osvacsk', 'onvestk', 'onvestsj', 'onvestsk', 'odebty',\n",
      "                    'odebtsk', 'oqfedhi', 'ojbstat', 'omrjsoc', 'omrjsic', 'omrjsic9']]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 3
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<h3>Coding conventions</h3>\n",
      "\n",
      "In the BHPS, interview outcomes other than an answer to the question at hand are coded as negative numbers. Some of these can be interpreted as a value of zero (when the answer is either 'inapplicable' or 'none') while others should be considered as missing data (when the answer was refused, or a proxy). To capture this, we construct two variables for missing and zero data:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "zero = [-3, -8]\n",
      "missing = [-1, -2, -7, -9]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 4
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Merge them into a single DataFrame and reorder some columns:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df = pd.merge(indsamp, hhresp, on=['ohid'], how='left')\n",
      "df = pd.merge(df, indall, on=['ohid', 'opno'], how='left')\n",
      "df = pd.merge(df, indresp, on=['ohid', 'opno'], how='left')\n",
      "df = pd.merge(df, neta, on=['ohid'], how='left')\n",
      "df = pd.merge(df, nethh, on=['ohid'], how='left')\n",
      "\n",
      "first = ['ohid', 'opno', 'osvack', 'onvestk', 'ohsval', 'odebty', 'omgtot', 'ohhneti', 'ohhnetde', 'oage12']\n",
      "rest = [x for x in df.columns if x not in first]\n",
      "df = df[first + rest]\n",
      "\n",
      "# Drop duplicates\n",
      "d1 = sum(df.ofinloc=='duplicate')\n",
      "df = df[df.ofinloc == 'final record']\n",
      "\n",
      "# Drop households with reference person not aged 20-65\n",
      "d2 = list(df[np.logical_or(np.logical_and(df.ohgr2r=='reference person', df.oage12<20),\\\n",
      "                           np.logical_and(df.ohgr2r=='reference person', df.oage12>65))].ohid)\n",
      "df = df[~df['ohid'].isin(d2)]\n",
      "\n",
      "# Drop households without complete interviews\n",
      "d3 = sum(df.oivfho!='all eligible hh intv')\n",
      "df = df[df.oivfho=='all eligible hh intv']\n",
      "\n",
      "# Drop households with missing mortgage information\n",
      "d4 = df.omgtot.isin(missing)\n",
      "df = df[~df['omgtot'].isin(missing)]\n",
      "\n",
      "# Drop outliers:\n",
      "d5 = sum(df.omgtot > 2000000)\n",
      "df = df[df.omgtot < 2000000]\n",
      "\n",
      "print \"Dropped %d duplicate observations, %d households with reference person not aged 20-65,\" \\\n",
      "      \"%d observations without complete interviews, %d observations with missing mortgage data,\" \\\n",
      "      \"and %d outliers.\" % (d1,len(d2),d3,len(d4),d5)\n",
      "print \"The final dataset has %d observations and %d variables\" % (len(df), len(df.columns))\n",
      "print \"There are %d households and %d benefit units\" % (len(set(df.ohid)), len(set(zip(df.ohid, df.obuno))))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Dropped 806 duplicate observations, 2086 households with reference person not aged 20-65,10133 observations without complete interviews, 13719 observations with missing mortgage data,and 1 outliers.\n",
        "The final dataset has 12794 observations and 43 variables\n",
        "There are 4809 households and 6057 benefit units\n"
       ]
      }
     ],
     "prompt_number": 5
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<h3>Benefit Units</h3>\n",
      "\n",
      "The BHPS has individuals and households as the principal unit of analysis, with households being defined as <i>\"one person living alone or a group of people who either share living accommodation, or share one meal a day and who have the address as their only or main residence\"</i>. In some cases, this does not necessarily define an economic unit - as an example, three unrelated adults working full time and sharing accomodation would be counted as a household, even though they don't share income or assets. For this purpose, the BHPS also defines benefit units, which are used as the unit of account in many studies (cp. for example <a href=\"http://www.ifs.org.uk/wps/wp0221.pdf\">Banks et al. (2002)</a>. The benefit unit is defined as a single person or a couple living together with or without children. It can safely be assumed that benefit units share finances (under some mild assumptions, such as no wealth sharing with individuals outside the benefit unit) and hence they might be the appropriate unit of account for questions related to the income and wealth distribution.\n",
      "\n",
      "Benefit units in the BHPS are enumerated by the person number (`opno`) of the first person in the household belonging to the benefit unit. That means that in a household of a couple with two dependent children, every household member (`opno` 1 to 4) would share the same `ohid` value as well as the same `obuno` value. In contrast, in a household of a couple living with an adult (non-dependent) child, only the couple would share the same `ohid` value and `obuno` value (1), while the non-dependent child would have the same `ohid` value, but an `obuno` value equal to its `opno` (in this case 3)."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<h3>Debt holdings</h3>\n",
      "\n",
      "Debt figures are collected in the following variables:\n",
      "\n",
      "- ODEBTY: Total amount owed \n",
      "- ODEBTSK: Sole amount owed\n",
      "- OMGTOT: All mortgage on all property\n",
      "\n",
      "Note that the `odebty` and `odebtsk` variables include loans for hire purchase agreements, which might be classified as secured borrowing, although there is no way to seperate between the different sub-categories of debt.\n",
      "\n",
      "A further complication is the difference between `odebty` and `odebtsk` for some observations. While this is certainly a small number of observations (less than 200), the differences between the two variables is significant for some individuals (e.g. `ohid` 15185397, `opno` 1, a single person household where `odebtsk` equals 38,000 while `odebty` equals 240,000) and in three cases `odebtsk` is even larger than `odebty`, which seems impossible:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "#df[np.logical_and(df.odebtsk>0, df.odebtsk>df.odebty)][['ohid','opno','odebtsk','odebty']]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 6
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "One complication in that arises when calculating summary statistics is the treatment of missing answers, which are coded as negative numbers in the BHPS. Thus, to compute summary statistics, it is necessary to replace them with missing values (`np.Nan`), as they would otherwise be counted as observations with negative debt holdings. When adding up the different debt categories (i.e. unsecured and secured debt), we want to replace negative values in `omgtot`, `odebty` and `odebtsk` with zeros, as they represent missing, inapplicable, proxy or don't know answers. However, we must be careful to replace zeros with NaNs afterwards in order not to bias downwards our estimates by counting missing observations as households with zero debt:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "col_inapp = ['osvack', 'osvacsk', 'onvestk', 'onvestsk', 'ohsval', 'ohs2valo', 'odebty', 'odebtsk', 'omgtot']\n",
      "\n",
      "for col in col_inapp:\n",
      "    df.loc[df[col].isin(zero), col] = 0\n",
      "    df.loc[df[col].isin(missing), col] = np.NaN"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 7
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "In order to consolidate debt at the household level, we call the `sum()` function on a `groupby` object. For secured debt, we want to use the mean of our grouped households, as the merging of our dataset ascribed the amount of the mortgage to each household member, so summing up would multiply the actual value of the mortgage by the amount of members in a household:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "unsecured_hh = df[['ohid', 'odebty']].groupby('ohid').sum()\n",
      "df = df.join(unsecured_hh, on='ohid', rsuffix='_hh')\n",
      "secured_hh = df[['ohid', 'omgtot']].groupby('ohid').mean()\n",
      "df = df.join(secured_hh, on='ohid', rsuffix='_hh')\n",
      "\n",
      "df['totaldebt_hh'] = df.odebty_hh + df.omgtot_hh"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 8
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<h3>Asset Holdings</h3>\n",
      "\n",
      "Assets in the BHPS are recorded in the following variables:\n",
      "\n",
      "- OHSVAL: Value of property if sold\n",
      "- OSVACK\tAmnt. in savings account, tessa or ISA\n",
      "- OSVACSK:\tAmnt. savings held in sole name\n",
      "- ONVESTK:\tTotal value of investments\n",
      "- ONVESTSK:\tAmnt. investments held in sole name\n",
      "\n",
      "Where broadly speaking `osvack` and `osvacsk` represent liquid financial wealth, `onvestk` and `onvestsk` represent illiquid financial wealth and `ohsval` represents housing wealth. Again, values of -8 indicate missing values and should be replaced by zero when adding up columns or by NaN when computing summary statistics."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "savings_hh = df[['ohid', 'osvack']].groupby('ohid').sum()\n",
      "df = df.join(savings_hh, on='ohid', rsuffix='_hh')\n",
      "investments_hh = df[['ohid', 'onvestk']].groupby('ohid').sum()\n",
      "df = df.join(investments_hh, on='ohid', rsuffix='_hh')\n",
      "\n",
      "df['fin_assets_hh'] = df['osvack_hh'] + df['onvestk_hh']\n",
      "\n",
      "hswealth1_hh = df[['ohid', 'ohsval']].groupby('ohid').mean()\n",
      "df = df.join(hswealth1_hh, on='ohid', rsuffix='_hh')\n",
      "hswealth2_hh = df[['ohid', 'ohs2valo']].groupby('ohid').mean()\n",
      "df = df.join(hswealth2_hh, on='ohid', rsuffix='_hh')\n",
      "df['hswealth_hh'] = df['ohsval_hh'] + df['ohs2valo_hh']  \n",
      "\n",
      "df['totalwealth_hh'] = df['fin_assets_hh'] + df['hswealth_hh']\n",
      "\n",
      "df['netwealth_hh'] = df['totalwealth_hh'] - df['totaldebt_hh']"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 9
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<h3>Household Income</h3>\n",
      "\n",
      "Household Income can easily be calculated from the variables in the `oneta` and `onethh` datasets:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "inc_w_hh = df[['ohid', 'ohhneti']].groupby('ohid').mean()\n",
      "df = df.join(inc_w_hh, on='ohid', rsuffix='_hh')\n",
      "inc_y_hh = df[['ohid', 'ohhyneti']].groupby('ohid').mean()\n",
      "df = df.join(inc_y_hh, on='ohid', rsuffix='_hh')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 10
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<h3> Finally: Keep only household heads </h3>"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df_hh = df[df.ohgr2r=='reference person']"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 11
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df_hh = df_hh.rename(columns={'odebty_hh': 'unsecured_hh', 'omgtot_hh': 'secured_hh', 'osvack_hh': 'savings_hh', \n",
      "                              'onvestk_hh': 'investments_hh'})"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 12
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<h3>Summary Statistics</h3>"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Entire Sample\n",
      "df_hh[['unsecured_hh', 'secured_hh', 'totaldebt_hh', 'savings_hh', 'investments_hh', 'fin_assets_hh', 'hswealth_hh', 'totalwealth_hh', 'netwealth_hh']].describe().T"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>count</th>\n",
        "      <th>mean</th>\n",
        "      <th>std</th>\n",
        "      <th>min</th>\n",
        "      <th>25%</th>\n",
        "      <th>50%</th>\n",
        "      <th>75%</th>\n",
        "      <th>max</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>unsecured_hh</th>\n",
        "      <td> 4730</td>\n",
        "      <td>   4631.194503</td>\n",
        "      <td>  13376.600163</td>\n",
        "      <td>      0</td>\n",
        "      <td>    0</td>\n",
        "      <td>    200</td>\n",
        "      <td>   5000.00</td>\n",
        "      <td>  442000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>secured_hh</th>\n",
        "      <td> 4809</td>\n",
        "      <td>  37479.233728</td>\n",
        "      <td>  62240.223987</td>\n",
        "      <td>      0</td>\n",
        "      <td>    0</td>\n",
        "      <td>   9000</td>\n",
        "      <td>  58000.00</td>\n",
        "      <td>  900000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>totaldebt_hh</th>\n",
        "      <td> 4730</td>\n",
        "      <td>  42186.423890</td>\n",
        "      <td>  65727.151605</td>\n",
        "      <td>      0</td>\n",
        "      <td>    0</td>\n",
        "      <td>  18000</td>\n",
        "      <td>  63000.00</td>\n",
        "      <td>  910000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>savings_hh</th>\n",
        "      <td> 4374</td>\n",
        "      <td>   7729.578647</td>\n",
        "      <td>  22814.051374</td>\n",
        "      <td>      0</td>\n",
        "      <td>    0</td>\n",
        "      <td>    400</td>\n",
        "      <td>   5300.00</td>\n",
        "      <td>  350000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>investments_hh</th>\n",
        "      <td> 4698</td>\n",
        "      <td>   4449.391230</td>\n",
        "      <td>  21592.159749</td>\n",
        "      <td>      0</td>\n",
        "      <td>    0</td>\n",
        "      <td>      0</td>\n",
        "      <td>    100.00</td>\n",
        "      <td>  400000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>fin_assets_hh</th>\n",
        "      <td> 4296</td>\n",
        "      <td>  12144.425512</td>\n",
        "      <td>  35940.671480</td>\n",
        "      <td>      0</td>\n",
        "      <td>    0</td>\n",
        "      <td>    650</td>\n",
        "      <td>   8800.00</td>\n",
        "      <td>  650000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>hswealth_hh</th>\n",
        "      <td> 4786</td>\n",
        "      <td> 149270.457167</td>\n",
        "      <td> 193060.866224</td>\n",
        "      <td>      0</td>\n",
        "      <td>    0</td>\n",
        "      <td> 120000</td>\n",
        "      <td> 200000.00</td>\n",
        "      <td> 4000000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>totalwealth_hh</th>\n",
        "      <td> 4280</td>\n",
        "      <td> 159233.720327</td>\n",
        "      <td> 207908.345626</td>\n",
        "      <td>      0</td>\n",
        "      <td> 2000</td>\n",
        "      <td> 125000</td>\n",
        "      <td> 209312.50</td>\n",
        "      <td> 4200000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>netwealth_hh</th>\n",
        "      <td> 4232</td>\n",
        "      <td> 117022.136578</td>\n",
        "      <td> 191532.270870</td>\n",
        "      <td>-344200</td>\n",
        "      <td>  200</td>\n",
        "      <td>  71650</td>\n",
        "      <td> 157757.75</td>\n",
        "      <td> 4200000</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 13,
       "text": [
        "                count           mean            std     min   25%     50%  \\\n",
        "unsecured_hh     4730    4631.194503   13376.600163       0     0     200   \n",
        "secured_hh       4809   37479.233728   62240.223987       0     0    9000   \n",
        "totaldebt_hh     4730   42186.423890   65727.151605       0     0   18000   \n",
        "savings_hh       4374    7729.578647   22814.051374       0     0     400   \n",
        "investments_hh   4698    4449.391230   21592.159749       0     0       0   \n",
        "fin_assets_hh    4296   12144.425512   35940.671480       0     0     650   \n",
        "hswealth_hh      4786  149270.457167  193060.866224       0     0  120000   \n",
        "totalwealth_hh   4280  159233.720327  207908.345626       0  2000  125000   \n",
        "netwealth_hh     4232  117022.136578  191532.270870 -344200   200   71650   \n",
        "\n",
        "                      75%      max  \n",
        "unsecured_hh      5000.00   442000  \n",
        "secured_hh       58000.00   900000  \n",
        "totaldebt_hh     63000.00   910000  \n",
        "savings_hh        5300.00   350000  \n",
        "investments_hh     100.00   400000  \n",
        "fin_assets_hh     8800.00   650000  \n",
        "hswealth_hh     200000.00  4000000  \n",
        "totalwealth_hh  209312.50  4200000  \n",
        "netwealth_hh    157757.75  4200000  "
       ]
      }
     ],
     "prompt_number": 13
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<h3>Statistics by Age Group</h3>"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Age 20 - 29"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df_hh[df_hh.oage12<=29][['unsecured_hh', 'secured_hh', 'totaldebt_hh', 'savings_hh', 'investments_hh', 'fin_assets_hh', \\\n",
      "                         'hswealth_hh', 'totalwealth_hh', 'netwealth_hh']].describe().T"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>count</th>\n",
        "      <th>mean</th>\n",
        "      <th>std</th>\n",
        "      <th>min</th>\n",
        "      <th>25%</th>\n",
        "      <th>50%</th>\n",
        "      <th>75%</th>\n",
        "      <th>max</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>unsecured_hh</th>\n",
        "      <td> 646</td>\n",
        "      <td>  7197.832817</td>\n",
        "      <td> 15052.903570</td>\n",
        "      <td>      0</td>\n",
        "      <td>    0</td>\n",
        "      <td>  1900.0</td>\n",
        "      <td>   9000.0</td>\n",
        "      <td> 250500</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>secured_hh</th>\n",
        "      <td> 659</td>\n",
        "      <td> 34552.427921</td>\n",
        "      <td> 54863.808461</td>\n",
        "      <td>      0</td>\n",
        "      <td>    0</td>\n",
        "      <td>     0.0</td>\n",
        "      <td>  62000.0</td>\n",
        "      <td> 435000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>totaldebt_hh</th>\n",
        "      <td> 646</td>\n",
        "      <td> 41809.365325</td>\n",
        "      <td> 57935.797935</td>\n",
        "      <td>      0</td>\n",
        "      <td>  400</td>\n",
        "      <td> 14000.0</td>\n",
        "      <td>  69750.0</td>\n",
        "      <td> 436600</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>savings_hh</th>\n",
        "      <td> 626</td>\n",
        "      <td>  2299.753994</td>\n",
        "      <td>  6944.733105</td>\n",
        "      <td>      0</td>\n",
        "      <td>    0</td>\n",
        "      <td>    23.5</td>\n",
        "      <td>   1511.5</td>\n",
        "      <td> 105000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>investments_hh</th>\n",
        "      <td> 648</td>\n",
        "      <td>  1039.427469</td>\n",
        "      <td>  9375.465618</td>\n",
        "      <td>      0</td>\n",
        "      <td>    0</td>\n",
        "      <td>     0.0</td>\n",
        "      <td>      0.0</td>\n",
        "      <td> 208000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>fin_assets_hh</th>\n",
        "      <td> 617</td>\n",
        "      <td>  3402.260940</td>\n",
        "      <td> 14004.448186</td>\n",
        "      <td>      0</td>\n",
        "      <td>    0</td>\n",
        "      <td>    50.0</td>\n",
        "      <td>   2000.0</td>\n",
        "      <td> 272500</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>hswealth_hh</th>\n",
        "      <td> 659</td>\n",
        "      <td> 61821.699545</td>\n",
        "      <td> 89877.967755</td>\n",
        "      <td>      0</td>\n",
        "      <td>    0</td>\n",
        "      <td>     0.0</td>\n",
        "      <td> 115000.0</td>\n",
        "      <td> 600000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>totalwealth_hh</th>\n",
        "      <td> 617</td>\n",
        "      <td> 65344.724473</td>\n",
        "      <td> 97315.368430</td>\n",
        "      <td>      0</td>\n",
        "      <td>    0</td>\n",
        "      <td>  2300.0</td>\n",
        "      <td> 115800.0</td>\n",
        "      <td> 872500</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>netwealth_hh</th>\n",
        "      <td> 606</td>\n",
        "      <td> 23237.608911</td>\n",
        "      <td> 59172.909355</td>\n",
        "      <td>-250100</td>\n",
        "      <td>-1800</td>\n",
        "      <td>     0.0</td>\n",
        "      <td>  34300.0</td>\n",
        "      <td> 522500</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 14,
       "text": [
        "                count          mean           std     min   25%      50%  \\\n",
        "unsecured_hh      646   7197.832817  15052.903570       0     0   1900.0   \n",
        "secured_hh        659  34552.427921  54863.808461       0     0      0.0   \n",
        "totaldebt_hh      646  41809.365325  57935.797935       0   400  14000.0   \n",
        "savings_hh        626   2299.753994   6944.733105       0     0     23.5   \n",
        "investments_hh    648   1039.427469   9375.465618       0     0      0.0   \n",
        "fin_assets_hh     617   3402.260940  14004.448186       0     0     50.0   \n",
        "hswealth_hh       659  61821.699545  89877.967755       0     0      0.0   \n",
        "totalwealth_hh    617  65344.724473  97315.368430       0     0   2300.0   \n",
        "netwealth_hh      606  23237.608911  59172.909355 -250100 -1800      0.0   \n",
        "\n",
        "                     75%     max  \n",
        "unsecured_hh      9000.0  250500  \n",
        "secured_hh       62000.0  435000  \n",
        "totaldebt_hh     69750.0  436600  \n",
        "savings_hh        1511.5  105000  \n",
        "investments_hh       0.0  208000  \n",
        "fin_assets_hh     2000.0  272500  \n",
        "hswealth_hh     115000.0  600000  \n",
        "totalwealth_hh  115800.0  872500  \n",
        "netwealth_hh     34300.0  522500  "
       ]
      }
     ],
     "prompt_number": 14
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Age 30 - 39"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df_hh[np.logical_and(df_hh.oage12>=30, df_hh.oage12<40)][['unsecured_hh', 'secured_hh', 'totaldebt_hh', 'savings_hh',\\\n",
      "                                                          'investments_hh', 'fin_assets_hh', 'hswealth_hh', 'totalwealth_hh',\\\n",
      "                                                          'netwealth_hh']].describe().T"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>count</th>\n",
        "      <th>mean</th>\n",
        "      <th>std</th>\n",
        "      <th>min</th>\n",
        "      <th>25%</th>\n",
        "      <th>50%</th>\n",
        "      <th>75%</th>\n",
        "      <th>max</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>unsecured_hh</th>\n",
        "      <td> 1147</td>\n",
        "      <td>   4700.722755</td>\n",
        "      <td>   8754.904203</td>\n",
        "      <td>      0</td>\n",
        "      <td>    0</td>\n",
        "      <td>    950</td>\n",
        "      <td>   6000.00</td>\n",
        "      <td>  110000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>secured_hh</th>\n",
        "      <td> 1172</td>\n",
        "      <td>  55522.093003</td>\n",
        "      <td>  71944.884836</td>\n",
        "      <td>      0</td>\n",
        "      <td>    0</td>\n",
        "      <td>  41800</td>\n",
        "      <td>  80000.00</td>\n",
        "      <td>  900000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>totaldebt_hh</th>\n",
        "      <td> 1147</td>\n",
        "      <td>  60538.031386</td>\n",
        "      <td>  74788.807519</td>\n",
        "      <td>      0</td>\n",
        "      <td> 3115</td>\n",
        "      <td>  45000</td>\n",
        "      <td>  87630.00</td>\n",
        "      <td>  903500</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>savings_hh</th>\n",
        "      <td> 1075</td>\n",
        "      <td>   4498.274419</td>\n",
        "      <td>  14871.325366</td>\n",
        "      <td>      0</td>\n",
        "      <td>    0</td>\n",
        "      <td>    200</td>\n",
        "      <td>   3000.00</td>\n",
        "      <td>  220000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>investments_hh</th>\n",
        "      <td> 1151</td>\n",
        "      <td>   1989.174631</td>\n",
        "      <td>  11970.851520</td>\n",
        "      <td>      0</td>\n",
        "      <td>    0</td>\n",
        "      <td>      0</td>\n",
        "      <td>      2.00</td>\n",
        "      <td>  251500</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>fin_assets_hh</th>\n",
        "      <td> 1061</td>\n",
        "      <td>   6609.175306</td>\n",
        "      <td>  20807.306925</td>\n",
        "      <td>      0</td>\n",
        "      <td>    0</td>\n",
        "      <td>    286</td>\n",
        "      <td>   5000.00</td>\n",
        "      <td>  270200</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>hswealth_hh</th>\n",
        "      <td> 1167</td>\n",
        "      <td> 130829.353899</td>\n",
        "      <td> 159410.870218</td>\n",
        "      <td>      0</td>\n",
        "      <td>    0</td>\n",
        "      <td> 115000</td>\n",
        "      <td> 180000.00</td>\n",
        "      <td> 3000000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>totalwealth_hh</th>\n",
        "      <td> 1057</td>\n",
        "      <td> 134862.700095</td>\n",
        "      <td> 166518.611507</td>\n",
        "      <td>      0</td>\n",
        "      <td> 2000</td>\n",
        "      <td> 115000</td>\n",
        "      <td> 183500.00</td>\n",
        "      <td> 3000000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>netwealth_hh</th>\n",
        "      <td> 1044</td>\n",
        "      <td>  76547.723180</td>\n",
        "      <td> 135581.524546</td>\n",
        "      <td>-155995</td>\n",
        "      <td>  100</td>\n",
        "      <td>  52000</td>\n",
        "      <td> 104076.25</td>\n",
        "      <td> 3000000</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 15,
       "text": [
        "                count           mean            std     min   25%     50%  \\\n",
        "unsecured_hh     1147    4700.722755    8754.904203       0     0     950   \n",
        "secured_hh       1172   55522.093003   71944.884836       0     0   41800   \n",
        "totaldebt_hh     1147   60538.031386   74788.807519       0  3115   45000   \n",
        "savings_hh       1075    4498.274419   14871.325366       0     0     200   \n",
        "investments_hh   1151    1989.174631   11970.851520       0     0       0   \n",
        "fin_assets_hh    1061    6609.175306   20807.306925       0     0     286   \n",
        "hswealth_hh      1167  130829.353899  159410.870218       0     0  115000   \n",
        "totalwealth_hh   1057  134862.700095  166518.611507       0  2000  115000   \n",
        "netwealth_hh     1044   76547.723180  135581.524546 -155995   100   52000   \n",
        "\n",
        "                      75%      max  \n",
        "unsecured_hh      6000.00   110000  \n",
        "secured_hh       80000.00   900000  \n",
        "totaldebt_hh     87630.00   903500  \n",
        "savings_hh        3000.00   220000  \n",
        "investments_hh       2.00   251500  \n",
        "fin_assets_hh     5000.00   270200  \n",
        "hswealth_hh     180000.00  3000000  \n",
        "totalwealth_hh  183500.00  3000000  \n",
        "netwealth_hh    104076.25  3000000  "
       ]
      }
     ],
     "prompt_number": 15
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Age 40 - 49"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df_hh[np.logical_and(df_hh.oage12>=40, df_hh.oage12<50)][['unsecured_hh', 'secured_hh', 'totaldebt_hh', 'savings_hh',\\\n",
      "                                                          'investments_hh', 'fin_assets_hh', 'hswealth_hh', 'totalwealth_hh',\\\n",
      "                                                          'netwealth_hh']].describe().T"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>count</th>\n",
        "      <th>mean</th>\n",
        "      <th>std</th>\n",
        "      <th>min</th>\n",
        "      <th>25%</th>\n",
        "      <th>50%</th>\n",
        "      <th>75%</th>\n",
        "      <th>max</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>unsecured_hh</th>\n",
        "      <td> 1280</td>\n",
        "      <td>   5125.762500</td>\n",
        "      <td>  16230.359535</td>\n",
        "      <td>      0</td>\n",
        "      <td>     0.0</td>\n",
        "      <td>    300.0</td>\n",
        "      <td>   5043.75</td>\n",
        "      <td>  442000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>secured_hh</th>\n",
        "      <td> 1301</td>\n",
        "      <td>  47076.021522</td>\n",
        "      <td>  65307.224401</td>\n",
        "      <td>      0</td>\n",
        "      <td>     0.0</td>\n",
        "      <td>  31000.0</td>\n",
        "      <td>  69000.00</td>\n",
        "      <td>  740000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>totaldebt_hh</th>\n",
        "      <td> 1280</td>\n",
        "      <td>  52550.687500</td>\n",
        "      <td>  70125.166491</td>\n",
        "      <td>      0</td>\n",
        "      <td>  2042.5</td>\n",
        "      <td>  36000.0</td>\n",
        "      <td>  73550.00</td>\n",
        "      <td>  740000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>savings_hh</th>\n",
        "      <td> 1215</td>\n",
        "      <td>   7568.526749</td>\n",
        "      <td>  22254.643935</td>\n",
        "      <td>      0</td>\n",
        "      <td>     0.0</td>\n",
        "      <td>    550.0</td>\n",
        "      <td>   5125.00</td>\n",
        "      <td>  255000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>investments_hh</th>\n",
        "      <td> 1276</td>\n",
        "      <td>   4627.007053</td>\n",
        "      <td>  20630.122505</td>\n",
        "      <td>      0</td>\n",
        "      <td>     0.0</td>\n",
        "      <td>      0.0</td>\n",
        "      <td>    400.00</td>\n",
        "      <td>  350030</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>fin_assets_hh</th>\n",
        "      <td> 1195</td>\n",
        "      <td>  12493.911297</td>\n",
        "      <td>  36649.354938</td>\n",
        "      <td>      0</td>\n",
        "      <td>     0.0</td>\n",
        "      <td>   1000.0</td>\n",
        "      <td>  10000.00</td>\n",
        "      <td>  395003</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>hswealth_hh</th>\n",
        "      <td> 1296</td>\n",
        "      <td> 175938.175154</td>\n",
        "      <td> 206886.324143</td>\n",
        "      <td>      0</td>\n",
        "      <td> 70000.0</td>\n",
        "      <td> 140000.0</td>\n",
        "      <td> 220000.00</td>\n",
        "      <td> 2700000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>totalwealth_hh</th>\n",
        "      <td> 1191</td>\n",
        "      <td> 189499.142737</td>\n",
        "      <td> 224490.951612</td>\n",
        "      <td>      0</td>\n",
        "      <td> 70002.5</td>\n",
        "      <td> 150100.0</td>\n",
        "      <td> 240000.00</td>\n",
        "      <td> 2700000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>netwealth_hh</th>\n",
        "      <td> 1178</td>\n",
        "      <td> 137959.044992</td>\n",
        "      <td> 200714.754531</td>\n",
        "      <td>-334000</td>\n",
        "      <td> 26475.0</td>\n",
        "      <td>  93900.5</td>\n",
        "      <td> 174862.50</td>\n",
        "      <td> 2589000</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 16,
       "text": [
        "                count           mean            std     min      25%  \\\n",
        "unsecured_hh     1280    5125.762500   16230.359535       0      0.0   \n",
        "secured_hh       1301   47076.021522   65307.224401       0      0.0   \n",
        "totaldebt_hh     1280   52550.687500   70125.166491       0   2042.5   \n",
        "savings_hh       1215    7568.526749   22254.643935       0      0.0   \n",
        "investments_hh   1276    4627.007053   20630.122505       0      0.0   \n",
        "fin_assets_hh    1195   12493.911297   36649.354938       0      0.0   \n",
        "hswealth_hh      1296  175938.175154  206886.324143       0  70000.0   \n",
        "totalwealth_hh   1191  189499.142737  224490.951612       0  70002.5   \n",
        "netwealth_hh     1178  137959.044992  200714.754531 -334000  26475.0   \n",
        "\n",
        "                     50%        75%      max  \n",
        "unsecured_hh       300.0    5043.75   442000  \n",
        "secured_hh       31000.0   69000.00   740000  \n",
        "totaldebt_hh     36000.0   73550.00   740000  \n",
        "savings_hh         550.0    5125.00   255000  \n",
        "investments_hh       0.0     400.00   350030  \n",
        "fin_assets_hh     1000.0   10000.00   395003  \n",
        "hswealth_hh     140000.0  220000.00  2700000  \n",
        "totalwealth_hh  150100.0  240000.00  2700000  \n",
        "netwealth_hh     93900.5  174862.50  2589000  "
       ]
      }
     ],
     "prompt_number": 16
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Age 50 - 59"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df_hh[np.logical_and(df_hh.oage12>=50, df_hh.oage12<60)][['unsecured_hh', 'secured_hh', 'totaldebt_hh', 'savings_hh',\\\n",
      "                                                          'investments_hh', 'fin_assets_hh', 'hswealth_hh', 'totalwealth_hh',\\\n",
      "                                                          'netwealth_hh']].describe().T"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>count</th>\n",
        "      <th>mean</th>\n",
        "      <th>std</th>\n",
        "      <th>min</th>\n",
        "      <th>25%</th>\n",
        "      <th>50%</th>\n",
        "      <th>75%</th>\n",
        "      <th>max</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>unsecured_hh</th>\n",
        "      <td> 1051</td>\n",
        "      <td>   4183.559467</td>\n",
        "      <td>  15074.666718</td>\n",
        "      <td>      0</td>\n",
        "      <td>     0</td>\n",
        "      <td>      0</td>\n",
        "      <td>   4000</td>\n",
        "      <td>  415000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>secured_hh</th>\n",
        "      <td> 1063</td>\n",
        "      <td>  25080.349012</td>\n",
        "      <td>  55706.337611</td>\n",
        "      <td>      0</td>\n",
        "      <td>     0</td>\n",
        "      <td>      0</td>\n",
        "      <td>  30000</td>\n",
        "      <td>  900000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>totaldebt_hh</th>\n",
        "      <td> 1051</td>\n",
        "      <td>  28958.546147</td>\n",
        "      <td>  58573.422202</td>\n",
        "      <td>      0</td>\n",
        "      <td>     0</td>\n",
        "      <td>   7000</td>\n",
        "      <td>  36250</td>\n",
        "      <td>  910000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>savings_hh</th>\n",
        "      <td>  946</td>\n",
        "      <td>  11084.853066</td>\n",
        "      <td>  26468.923953</td>\n",
        "      <td>      0</td>\n",
        "      <td>     0</td>\n",
        "      <td>   1300</td>\n",
        "      <td>  10000</td>\n",
        "      <td>  350000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>investments_hh</th>\n",
        "      <td> 1036</td>\n",
        "      <td>   6969.016409</td>\n",
        "      <td>  29665.079048</td>\n",
        "      <td>      0</td>\n",
        "      <td>     0</td>\n",
        "      <td>      0</td>\n",
        "      <td>   1000</td>\n",
        "      <td>  400000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>fin_assets_hh</th>\n",
        "      <td>  929</td>\n",
        "      <td>  17750.470398</td>\n",
        "      <td>  43945.638431</td>\n",
        "      <td>      0</td>\n",
        "      <td>     0</td>\n",
        "      <td>   2000</td>\n",
        "      <td>  16000</td>\n",
        "      <td>  650000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>hswealth_hh</th>\n",
        "      <td> 1056</td>\n",
        "      <td> 181656.937500</td>\n",
        "      <td> 235514.231403</td>\n",
        "      <td>      0</td>\n",
        "      <td> 68750</td>\n",
        "      <td> 150000</td>\n",
        "      <td> 231250</td>\n",
        "      <td> 4000000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>totalwealth_hh</th>\n",
        "      <td>  925</td>\n",
        "      <td> 198791.149189</td>\n",
        "      <td> 258606.103338</td>\n",
        "      <td>      0</td>\n",
        "      <td> 66600</td>\n",
        "      <td> 157000</td>\n",
        "      <td> 257225</td>\n",
        "      <td> 4200000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>netwealth_hh</th>\n",
        "      <td>  921</td>\n",
        "      <td> 168438.609121</td>\n",
        "      <td> 249779.957229</td>\n",
        "      <td>-344200</td>\n",
        "      <td> 37200</td>\n",
        "      <td> 125920</td>\n",
        "      <td> 220700</td>\n",
        "      <td> 4200000</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 17,
       "text": [
        "                count           mean            std     min    25%     50%  \\\n",
        "unsecured_hh     1051    4183.559467   15074.666718       0      0       0   \n",
        "secured_hh       1063   25080.349012   55706.337611       0      0       0   \n",
        "totaldebt_hh     1051   28958.546147   58573.422202       0      0    7000   \n",
        "savings_hh        946   11084.853066   26468.923953       0      0    1300   \n",
        "investments_hh   1036    6969.016409   29665.079048       0      0       0   \n",
        "fin_assets_hh     929   17750.470398   43945.638431       0      0    2000   \n",
        "hswealth_hh      1056  181656.937500  235514.231403       0  68750  150000   \n",
        "totalwealth_hh    925  198791.149189  258606.103338       0  66600  157000   \n",
        "netwealth_hh      921  168438.609121  249779.957229 -344200  37200  125920   \n",
        "\n",
        "                   75%      max  \n",
        "unsecured_hh      4000   415000  \n",
        "secured_hh       30000   900000  \n",
        "totaldebt_hh     36250   910000  \n",
        "savings_hh       10000   350000  \n",
        "investments_hh    1000   400000  \n",
        "fin_assets_hh    16000   650000  \n",
        "hswealth_hh     231250  4000000  \n",
        "totalwealth_hh  257225  4200000  \n",
        "netwealth_hh    220700  4200000  "
       ]
      }
     ],
     "prompt_number": 17
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Age 60+"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df_hh[df_hh.oage12>=60][['unsecured_hh', 'secured_hh', 'totaldebt_hh', 'savings_hh', 'investments_hh', 'fin_assets_hh',\\\n",
      "                         'hswealth_hh', 'totalwealth_hh', 'netwealth_hh']].describe().T"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>count</th>\n",
        "      <th>mean</th>\n",
        "      <th>std</th>\n",
        "      <th>min</th>\n",
        "      <th>25%</th>\n",
        "      <th>50%</th>\n",
        "      <th>75%</th>\n",
        "      <th>max</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>unsecured_hh</th>\n",
        "      <td> 606</td>\n",
        "      <td>   1495.254125</td>\n",
        "      <td>   6544.761100</td>\n",
        "      <td>     0</td>\n",
        "      <td>     0.0</td>\n",
        "      <td>      0.0</td>\n",
        "      <td>    115.00</td>\n",
        "      <td>  132000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>secured_hh</th>\n",
        "      <td> 614</td>\n",
        "      <td>   7311.688925</td>\n",
        "      <td>  28965.924001</td>\n",
        "      <td>     0</td>\n",
        "      <td>     0.0</td>\n",
        "      <td>      0.0</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>  350000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>totaldebt_hh</th>\n",
        "      <td> 606</td>\n",
        "      <td>   8903.466997</td>\n",
        "      <td>  31058.047402</td>\n",
        "      <td>     0</td>\n",
        "      <td>     0.0</td>\n",
        "      <td>      0.0</td>\n",
        "      <td>   2000.00</td>\n",
        "      <td>  350000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>savings_hh</th>\n",
        "      <td> 512</td>\n",
        "      <td>  15335.654297</td>\n",
        "      <td>  36347.092324</td>\n",
        "      <td>     0</td>\n",
        "      <td>     0.0</td>\n",
        "      <td>   2000.0</td>\n",
        "      <td>  14000.00</td>\n",
        "      <td>  350000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>investments_hh</th>\n",
        "      <td> 587</td>\n",
        "      <td>   8204.751278</td>\n",
        "      <td>  28778.229896</td>\n",
        "      <td>     0</td>\n",
        "      <td>     0.0</td>\n",
        "      <td>      0.0</td>\n",
        "      <td>   1000.00</td>\n",
        "      <td>  317000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>fin_assets_hh</th>\n",
        "      <td> 494</td>\n",
        "      <td>  23563.787449</td>\n",
        "      <td>  53268.600621</td>\n",
        "      <td>     0</td>\n",
        "      <td>     0.0</td>\n",
        "      <td>   3010.5</td>\n",
        "      <td>  21750.00</td>\n",
        "      <td>  440000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>hswealth_hh</th>\n",
        "      <td> 608</td>\n",
        "      <td> 166356.004934</td>\n",
        "      <td> 189571.567699</td>\n",
        "      <td>     0</td>\n",
        "      <td> 55750.0</td>\n",
        "      <td> 130000.0</td>\n",
        "      <td> 220000.00</td>\n",
        "      <td> 2000000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>totalwealth_hh</th>\n",
        "      <td> 490</td>\n",
        "      <td> 181790.738776</td>\n",
        "      <td> 198296.631824</td>\n",
        "      <td>     0</td>\n",
        "      <td> 35250.0</td>\n",
        "      <td> 135700.0</td>\n",
        "      <td> 246890.75</td>\n",
        "      <td> 1230000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>netwealth_hh</th>\n",
        "      <td> 483</td>\n",
        "      <td> 173068.641822</td>\n",
        "      <td> 191070.108064</td>\n",
        "      <td>-19700</td>\n",
        "      <td> 25612.5</td>\n",
        "      <td> 126550.0</td>\n",
        "      <td> 234750.00</td>\n",
        "      <td> 1230000</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 18,
       "text": [
        "                count           mean            std    min      25%       50%  \\\n",
        "unsecured_hh      606    1495.254125    6544.761100      0      0.0       0.0   \n",
        "secured_hh        614    7311.688925   28965.924001      0      0.0       0.0   \n",
        "totaldebt_hh      606    8903.466997   31058.047402      0      0.0       0.0   \n",
        "savings_hh        512   15335.654297   36347.092324      0      0.0    2000.0   \n",
        "investments_hh    587    8204.751278   28778.229896      0      0.0       0.0   \n",
        "fin_assets_hh     494   23563.787449   53268.600621      0      0.0    3010.5   \n",
        "hswealth_hh       608  166356.004934  189571.567699      0  55750.0  130000.0   \n",
        "totalwealth_hh    490  181790.738776  198296.631824      0  35250.0  135700.0   \n",
        "netwealth_hh      483  173068.641822  191070.108064 -19700  25612.5  126550.0   \n",
        "\n",
        "                      75%      max  \n",
        "unsecured_hh       115.00   132000  \n",
        "secured_hh           0.00   350000  \n",
        "totaldebt_hh      2000.00   350000  \n",
        "savings_hh       14000.00   350000  \n",
        "investments_hh    1000.00   317000  \n",
        "fin_assets_hh    21750.00   440000  \n",
        "hswealth_hh     220000.00  2000000  \n",
        "totalwealth_hh  246890.75  1230000  \n",
        "netwealth_hh    234750.00  1230000  "
       ]
      }
     ],
     "prompt_number": 18
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<h3>Migrants</h3>\n",
      "\n",
      "The BHPS only contains 26 individuals who consider themselves as having a national identity other than English, Scottish, Welsh or Irish:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "oindresp.onatidh.value_counts()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 19,
       "text": [
        "-8               14024\n",
        "-7                 982\n",
        "not mentioned      572\n",
        "yes                 26\n",
        "-9                  13\n",
        "dtype: int64"
       ]
      }
     ],
     "prompt_number": 19
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "After restricting the sample to household heads, that number drops to zero:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df_hh.onatidh.value_counts()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 20,
       "text": [
        "-8               4744\n",
        "not mentioned      62\n",
        "-9                  3\n",
        "dtype: int64"
       ]
      }
     ],
     "prompt_number": 20
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Furthermore, there are only 39 individuals that came to the UK at some point in their lives:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "oindresp[oindresp.oyr2uk4>0].oyr2uk4.value_counts()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 21,
       "text": [
        "2004    7\n",
        "2005    4\n",
        "1992    3\n",
        "2003    3\n",
        "2002    3\n",
        "1989    2\n",
        "1990    2\n",
        "1995    2\n",
        "1993    2\n",
        "1983    1\n",
        "1986    1\n",
        "1987    1\n",
        "1991    1\n",
        "1997    1\n",
        "1982    1\n",
        "2000    1\n",
        "1946    1\n",
        "1973    1\n",
        "1981    1\n",
        "1985    1\n",
        "dtype: int64"
       ]
      }
     ],
     "prompt_number": 21
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Again, restricting the sample to household heads, the number drops to zero:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df_hh[df_hh.oyr2uk4>0].oyr2uk4.value_counts()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 22,
       "text": [
        "Series([], dtype: int64)"
       ]
      }
     ],
     "prompt_number": 22
    }
   ],
   "metadata": {}
  }
 ]
}