mattwigway/non-household-carpools.ipynb

## non-household-carpools.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Non-household carpools\n",
    "\n",
    "The ACS provides information on commute mode share and splits out people who drive alone vs. carpool, but it doesn't detail how many carpool with other household members vs. non-household members, important in the wake of the COVID-19 pandemic. Figure out the prevalence of non-household carpools using the NHTS.\n",
    "\n",
    "This reports results using person-trips, not vehicle-trips, as this is the unit of analysis of the NHTS (and the ACS, for that matter, they're asking how you get to work, not how your car gets to work). This means that the percentage of carpools will be somewhat higher than if you were counting vehicle-trips, because the average number of people in a vehicle in the US is about 1.6, but the average number of people in a multi-household carpool has to be at least 2!\n",
    "\n",
    "Vehicle trips could also be computed by filtering the data to only trips where `DRIVER == 1`. Then the answer would represent the percent of vehicle trips _where a household member was the driver_ that included members of other households.\n",
    "\n",
    "I'm using the weights provided with the NHTS, and replicate weight standard error estimation, to provide representative and reliable results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "%config InlineBackend.figure_format = 'retina'\n",
    "\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "# use replicate weights to estimate standard errors\n",
    "def estReplicateSE (estimater, ptEst, year=None, n=None, scaleFactor=None):\n",
    "    '''\n",
    "    Estimate a replicate standard error for a single estimate. Pass in a function (probably a lambda) that\n",
    "    takes a number n and returns the estimate for the nth replicate weight, the point estimate, and the year of the NHTS.\n",
    "    Alternately, the number of replicate weights and the scale factor can be set individually (useful for replicate weight\n",
    "    error estimation with other datasets).\n",
    "    '''\n",
    "    if year == 2017:\n",
    "        n = 98\n",
    "        scaleFactor = 6 / 7\n",
    "    elif year == 2009:\n",
    "        n = 100\n",
    "        scaleFactor = 99 / 100\n",
    "    elif year == 2001:\n",
    "        n = 99\n",
    "        scaleFactor = 98 / 99\n",
    "        \n",
    "    replicateEsts = (estimater(1) - ptEst) ** 2\n",
    "    for i in range(2, n + 1):\n",
    "        replicateEsts += (estimater(i) - ptEst) ** 2\n",
    "    \n",
    "    return np.sqrt(scaleFactor * replicateEsts)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# read nhts data\n",
    "# available from https://nhts.ornl.gov/ , get both raw data and replicate weights\n",
    "trips = pd.read_csv('~/nhts/2017/trippub.csv')\n",
    "weights = pd.read_csv('/Users/matthewc/nhts/2017/perwgt.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# merge the trips with the weights\n",
    "trips = trips.merge(weights.drop(columns=['WTPERFIN', 'WTTRDFIN']), on=['HOUSEID', 'PERSONID'], how='left', validate='m:1')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "# filter to only trips by car\n",
    "car_trips = trips[trips.TRPTRANS.isin([\n",
    "    3, # car\n",
    "    4, # SUV\n",
    "    5, # Van\n",
    "    6, # Pickup\n",
    "    # leaving out Golf cart/Segway, Motorcycle/Moped, RV\n",
    "    18 # Rental car\n",
    "]) & (trips.NONHHCNT >= 0)].copy() # drop missing info on non household members, just 7 trips"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "19.49% ± 0.49 of car person-trips include a non-household member\n"
     ]
    }
   ],
   "source": [
    "# sum up the weights for trips that had non-household members, and divide by the weights of all car trips\n",
    "pt_est = car_trips.loc[car_trips.NONHHCNT >= 1, 'WTTRDFIN'].sum() / car_trips['WTTRDFIN'].sum()\n",
    "\n",
    "# use replicate weights to get a margin of error\n",
    "se = estReplicateSE(\n",
    "    lambda i: car_trips.loc[car_trips.NONHHCNT >= 1, f'WTTRDFIN{i}'].sum() / car_trips[f'WTTRDFIN{i}'].sum(),\n",
    "    pt_est,\n",
    "    2017\n",
    ")\n",
    "\n",
    "moe = 1.96 * se\n",
    "\n",
    "print(f'{pt_est*100:.2f}% ± {moe*100:.2f} of car person-trips include a non-household member')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "# make sure that all trips with a non household member are in fact carpools\n",
    "assert car_trips.loc[car_trips.NONHHCNT >= 1, 'NUMONTRP'].min() == 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Compare to all carpools\n",
    "\n",
    "How does that compare to the prevalence of carpools overall?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "54.15% ± 0.39 of car person-trips are in carpools\n"
     ]
    }
   ],
   "source": [
    "pt_est = car_trips.loc[car_trips.NUMONTRP >= 2, 'WTTRDFIN'].sum() / car_trips['WTTRDFIN'].sum()\n",
    "se = estReplicateSE(\n",
    "    lambda i: car_trips.loc[car_trips.NUMONTRP >= 2, f'WTTRDFIN{i}'].sum() / car_trips[f'WTTRDFIN{i}'].sum(),\n",
    "    pt_est,\n",
    "    2017\n",
    ")\n",
    "\n",
    "moe = 1.96 * se\n",
    "\n",
    "print(f'{pt_est*100:.2f}% ± {moe*100:.2f} of car person-trips are in carpools')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# Non-household carpools\n",
	"\n",
	"The ACS provides information on commute mode share and splits out people who drive alone vs. carpool, but it doesn't detail how many carpool with other household members vs. non-household members, important in the wake of the COVID-19 pandemic. Figure out the prevalence of non-household carpools using the NHTS.\n",
	"\n",
	"This reports results using person-trips, not vehicle-trips, as this is the unit of analysis of the NHTS (and the ACS, for that matter, they're asking how you get to work, not how your car gets to work). This means that the percentage of carpools will be somewhat higher than if you were counting vehicle-trips, because the average number of people in a vehicle in the US is about 1.6, but the average number of people in a multi-household carpool has to be at least 2!\n",
	"\n",
	"Vehicle trips could also be computed by filtering the data to only trips where `DRIVER == 1`. Then the answer would represent the percent of vehicle trips _where a household member was the driver_ that included members of other households.\n",
	"\n",
	"I'm using the weights provided with the NHTS, and replicate weight standard error estimation, to provide representative and reliable results."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {},
	"outputs": [],
	"source": [
	"%matplotlib inline\n",
	"%config InlineBackend.figure_format = 'retina'\n",
	"\n",
	"import pandas as pd\n",
	"import numpy as np\n",
	"import matplotlib.pyplot as plt"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 14,
	"metadata": {},
	"outputs": [],
	"source": [
	"# use replicate weights to estimate standard errors\n",
	"def estReplicateSE (estimater, ptEst, year=None, n=None, scaleFactor=None):\n",
	" '''\n",
	" Estimate a replicate standard error for a single estimate. Pass in a function (probably a lambda) that\n",
	" takes a number n and returns the estimate for the nth replicate weight, the point estimate, and the year of the NHTS.\n",
	" Alternately, the number of replicate weights and the scale factor can be set individually (useful for replicate weight\n",
	" error estimation with other datasets).\n",
	" '''\n",
	" if year == 2017:\n",
	" n = 98\n",
	" scaleFactor = 6 / 7\n",
	" elif year == 2009:\n",
	" n = 100\n",
	" scaleFactor = 99 / 100\n",
	" elif year == 2001:\n",
	" n = 99\n",
	" scaleFactor = 98 / 99\n",
	" \n",
	" replicateEsts = (estimater(1) - ptEst) ** 2\n",
	" for i in range(2, n + 1):\n",
	" replicateEsts += (estimater(i) - ptEst) ** 2\n",
	" \n",
	" return np.sqrt(scaleFactor * replicateEsts)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {},
	"outputs": [],
	"source": [
	"# read nhts data\n",
	"# available from https://nhts.ornl.gov/ , get both raw data and replicate weights\n",
	"trips = pd.read_csv('~/nhts/2017/trippub.csv')\n",
	"weights = pd.read_csv('/Users/matthewc/nhts/2017/perwgt.csv')"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"metadata": {},
	"outputs": [],
	"source": [
	"# merge the trips with the weights\n",
	"trips = trips.merge(weights.drop(columns=['WTPERFIN', 'WTTRDFIN']), on=['HOUSEID', 'PERSONID'], how='left', validate='m:1')"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 13,
	"metadata": {},
	"outputs": [],
	"source": [
	"# filter to only trips by car\n",
	"car_trips = trips[trips.TRPTRANS.isin([\n",
	" 3, # car\n",
	" 4, # SUV\n",
	" 5, # Van\n",
	" 6, # Pickup\n",
	" # leaving out Golf cart/Segway, Motorcycle/Moped, RV\n",
	" 18 # Rental car\n",
	"]) & (trips.NONHHCNT >= 0)].copy() # drop missing info on non household members, just 7 trips"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 21,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"19.49% ± 0.49 of car person-trips include a non-household member\n"
	]
	}
	],
	"source": [
	"# sum up the weights for trips that had non-household members, and divide by the weights of all car trips\n",
	"pt_est = car_trips.loc[car_trips.NONHHCNT >= 1, 'WTTRDFIN'].sum() / car_trips['WTTRDFIN'].sum()\n",
	"\n",
	"# use replicate weights to get a margin of error\n",
	"se = estReplicateSE(\n",
	" lambda i: car_trips.loc[car_trips.NONHHCNT >= 1, f'WTTRDFIN{i}'].sum() / car_trips[f'WTTRDFIN{i}'].sum(),\n",
	" pt_est,\n",
	" 2017\n",
	")\n",
	"\n",
	"moe = 1.96 * se\n",
	"\n",
	"print(f'{pt_est100:.2f}% ± {moe100:.2f} of car person-trips include a non-household member')"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 23,
	"metadata": {},
	"outputs": [],
	"source": [
	"# make sure that all trips with a non household member are in fact carpools\n",
	"assert car_trips.loc[car_trips.NONHHCNT >= 1, 'NUMONTRP'].min() == 2"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Compare to all carpools\n",
	"\n",
	"How does that compare to the prevalence of carpools overall?"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 28,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"54.15% ± 0.39 of car person-trips are in carpools\n"
	]
	}
	],
	"source": [
	"pt_est = car_trips.loc[car_trips.NUMONTRP >= 2, 'WTTRDFIN'].sum() / car_trips['WTTRDFIN'].sum()\n",
	"se = estReplicateSE(\n",
	" lambda i: car_trips.loc[car_trips.NUMONTRP >= 2, f'WTTRDFIN{i}'].sum() / car_trips[f'WTTRDFIN{i}'].sum(),\n",
	" pt_est,\n",
	" 2017\n",
	")\n",
	"\n",
	"moe = 1.96 * se\n",
	"\n",
	"print(f'{pt_est100:.2f}% ± {moe100:.2f} of car person-trips are in carpools')"
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.8.2"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 4
	}