Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save aflaxman/2038b560f6ca002c669ee1163dfa9826 to your computer and use it in GitHub Desktop.
Save aflaxman/2038b560f6ca002c669ee1163dfa9826 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tue Mar 26 12:09:06 PDT 2019\r\n"
]
}
],
"source": [
"import numpy as np, matplotlib.pyplot as plt, pandas as pd\n",
"pd.set_option('display.max_rows', 8)\n",
"!date\n",
"\n",
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# I have found that the low-birth weight/short gestation (LBWSG) risk factor in GBD has a data discrepancies\n",
"\n",
"The categories we expected do not all appear in the data. Let's explore what is missing, and see if it varies between countries. If it does, we need to try to figure out why it varies (for example different categories between rich countries and poor countries?)."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# gbd_mapping is a python module that captures what we _expect_ to find in the data\n",
"# https://github.com/ihmeuw/gbd_mapping \n",
"# https://gbd-mapping.readthedocs.io/en/latest/\n",
"import gbd_mapping as gbd"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"RiskFactor(\n",
"name='low_birth_weight_and_short_gestation',\n",
"kind='risk_factor',\n",
"gbd_id=reiid(339),\n",
"level=3,\n",
"most_detailed=False,\n",
"distribution='ordered_polytomous',\n",
"population_attributable_fraction_calculation_type='categorical',\n",
"restrictions=Restrictions(\n",
"male_only=False,\n",
"female_only=False,\n",
"yll_only=False,\n",
"yld_only=False,\n",
"yll_age_group_id_start=2,\n",
"yll_age_group_id_end=5,\n",
"yld_age_group_id_start=2,\n",
"yld_age_group_id_end=235,\n",
"violated=[exposure_age_restriction_violated, relative_risk_age_restriction_violated, population_attributable_fraction_yll_age_restriction_violated]),\n",
"exposure_exists=True,\n",
"exposure_year_type='binned',\n",
"relative_risk_exists=True,\n",
"relative_risk_in_range=True,\n",
"population_attributable_fraction_yll_exists=True,\n",
"population_attributable_fraction_yll_in_range=True,\n",
"population_attributable_fraction_yld_exists=True,\n",
"population_attributable_fraction_yld_in_range=True,\n",
"affected_causes=[all_causes, communicable_maternal_neonatal_and_nutritional_diseases, diarrheal_diseases, lower_respiratory_infections, upper_respiratory_infections, otitis_media, meningitis, pneumococcal_meningitis, h_influenzae_type_b_meningitis, meningococcal_meningitis, other_meningitis, encephalitis, neonatal_disorders, neonatal_preterm_birth, neonatal_encephalopathy_due_to_birth_asphyxia_and_trauma, neonatal_sepsis_and_other_neonatal_infections, hemolytic_disease_and_other_neonatal_jaundice, other_neonatal_disorders, non_communicable_diseases, other_non_communicable_diseases, sudden_infant_death_syndrome, respiratory_infections_and_tuberculosis, enteric_infections, other_infectious_diseases, maternal_and_neonatal_disorders],\n",
"population_attributable_fraction_of_one_causes=[neonatal_preterm_birth],\n",
"parent=child_and_maternal_malnutrition,\n",
"sub_risk_factors=[short_gestation_for_birth_weight, low_birth_weight_for_gestation],\n",
"categories=Categories(\n",
"cat1='Birth prevalence - [20, 22) wks, [0, 500) g',\n",
"cat2='Birth prevalence - [0, 24) wks, [0, 500) g, interpolated annual results',\n",
"cat3='Birth prevalence - [0, 20) wks, [0, 500) g',\n",
"cat4='Birth prevalence - [24, 26) wks, [0, 500) g, interpolated annual results',\n",
"cat5='Birth prevalence - [26, 28) wks, [0, 500) g, interpolated annual results',\n",
"cat6='Birth prevalence - [20, 22) wks, [500, 1000) g',\n",
"cat7='Birth prevalence - [28, 30) wks, [0, 500) g, interpolated annual results',\n",
"cat8='Birth prevalence - [0, 24) wks, [500, 1000) g, interpolated annual results',\n",
"cat9='Birth prevalence - [0, 20) wks, [500, 1000) g',\n",
"cat10='Birth prevalence - [24, 26) wks, [500, 1000) g, interpolated annual results',\n",
"cat11='Birth prevalence - [26, 28) wks, [500, 1000) g, interpolated annual results',\n",
"cat12='Birth prevalence - [32, 34) wks, [500, 1000) g, interpolated annual results',\n",
"cat13='Birth prevalence - [0, 24) wks, [1000, 1500) g, interpolated annual results',\n",
"cat14='Birth prevalence - [30, 32) wks, [500, 1000) g, interpolated annual results',\n",
"cat15='Birth prevalence - [28, 30) wks, [500, 1000) g, interpolated annual results',\n",
"cat16='Birth prevalence - [24, 26) wks, [1000, 1500) g, interpolated annual results',\n",
"cat17='Birth prevalence - [26, 28) wks, [1000, 1500) g, interpolated annual results',\n",
"cat18='Birth prevalence - [26, 28) wks, [1500, 2000) g, interpolated annual results',\n",
"cat19='Birth prevalence - [34, 36) wks, [1000, 1500) g, interpolated annual results',\n",
"cat20='Birth prevalence - [28, 30) wks, [1500, 2000) g, interpolated annual results',\n",
"cat21='Birth prevalence - [28, 30) wks, [1000, 1500) g, interpolated annual results',\n",
"cat22='Birth prevalence - [32, 34) wks, [1000, 1500) g, interpolated annual results',\n",
"cat23='Birth prevalence - [30, 32) wks, [1000, 1500) g, interpolated annual results',\n",
"cat24='Birth prevalence - [37, 38) wks, [1500, 2000) g, interpolated annual results',\n",
"cat25='Birth prevalence - [36, 37) wks, [1500, 2000) g, interpolated annual results',\n",
"cat26='Birth prevalence - [30, 32) wks, [2000, 2500) g, interpolated annual results',\n",
"cat27='Birth prevalence - [30, 32) wks, [1500, 2000) g, interpolated annual results',\n",
"cat28='Birth prevalence - [34, 36) wks, [1500, 2000) g, interpolated annual results',\n",
"cat29='Birth prevalence - [32, 34) wks, [1500, 2000) g, interpolated annual results',\n",
"cat30='Birth prevalence - [32, 34) wks, [2000, 2500) g, interpolated annual results',\n",
"cat31='Birth prevalence - [40, 42) wks, [2000, 2500) g, interpolated annual results',\n",
"cat32='Birth prevalence - [38, 40) wks, [2000, 2500) g, interpolated annual results',\n",
"cat33='Birth prevalence - [32, 34) wks, [2500, 3000) g, interpolated annual results',\n",
"cat34='Birth prevalence - [34, 36) wks, [2000, 2500) g, interpolated annual results',\n",
"cat35='Birth prevalence - [37, 38) wks, [2000, 2500) g, interpolated annual results',\n",
"cat36='Birth prevalence - [36, 37) wks, [2000, 2500) g, interpolated annual results',\n",
"cat37='Birth prevalence - [34, 36) wks, [2500, 3000) g, interpolated annual results',\n",
"cat38='Birth prevalence - [34, 36) wks, [4000, 4500) g, interpolated annual results',\n",
"cat39='Birth prevalence - [34, 36) wks, [3000, 3500) g, interpolated annual results',\n",
"cat40='Birth prevalence - [36, 37) wks, [2500, 3000) g, interpolated annual results',\n",
"cat41='Birth prevalence - [34, 36) wks, [3500, 4000) g, interpolated annual results',\n",
"cat42='Birth prevalence - [37, 38) wks, [2500, 3000) g, interpolated annual results',\n",
"cat43='Birth prevalence - [40, 42) wks, [2500, 3000) g, interpolated annual results',\n",
"cat44='Birth prevalence - [38, 40) wks, [2500, 3000) g, interpolated annual results',\n",
"cat45='Birth prevalence - [36, 37) wks, [3000, 3500) g, interpolated annual results',\n",
"cat46='Birth prevalence - [36, 37) wks, [4000, 4500) g, interpolated annual results',\n",
"cat47='Birth prevalence - [36, 37) wks, [3500, 4000) g, interpolated annual results',\n",
"cat48='Birth prevalence - [37, 38) wks, [3000, 3500) g, interpolated annual results',\n",
"cat49='Birth prevalence - [37, 38) wks, [4000, 4500) g, interpolated annual results',\n",
"cat50='Birth prevalence - [37, 38) wks, [3500, 4000) g, interpolated annual results',\n",
"cat51='Birth prevalence - [40, 42) wks, [3000, 3500) g, interpolated annual results',\n",
"cat52='Birth prevalence - [38, 40) wks, [3000, 3500) g, interpolated annual results',\n",
"cat53='Birth prevalence - [38, 40) wks, [4000, 4500) g, interpolated annual results',\n",
"cat54='Birth prevalence - [38, 40) wks, [3500, 4000) g, interpolated annual results',\n",
"cat55='Birth prevalence - [40, 42) wks, [3500, 4000) g, interpolated annual results',\n",
"cat56='Birth prevalence - [40, 42) wks, [4000, 4500) g, interpolated annual results',\n",
"cat80='Birth prevalence - [28, 30) wks, [2000, 2500) g',\n",
"cat81='Birth prevalence - [28, 30) wks, [2500, 3000) g',\n",
"cat82='Birth prevalence - [28, 30) wks, [3000, 3500) g',\n",
"cat88='Birth prevalence - [30, 32) wks, [2500, 3000) g',\n",
"cat89='Birth prevalence - [30, 32) wks, [3000, 3500) g',\n",
"cat90='Birth prevalence - [30, 32) wks, [3500, 4000) g',\n",
"cat95='Birth prevalence - [32, 34) wks, [3000, 3500) g',\n",
"cat96='Birth prevalence - [32, 34) wks, [3500, 4000) g',\n",
"cat106='Birth prevalence - [36, 37) wks, [1000, 1500) g',\n",
"cat116='Birth prevalence - [38, 40) wks, [1000, 1500) g',\n",
"cat117='Birth prevalence - [38, 40) wks, [1500, 2000) g',\n",
"cat123='Birth prevalence - [40, 42) wks, [1500, 2000) g'))"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# the gbd_mapping module has a risk_factors piece, and the risk factors piece has\n",
"# info about the LBWSG risk. It also has tab-completion, which I often use to search\n",
"# for risk factors without knowing their exact names\n",
"\n",
"gbd.risk_factors.low_birth_weight_and_short_gestation"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"Categories(\n",
"cat1='Birth prevalence - [20, 22) wks, [0, 500) g',\n",
"cat2='Birth prevalence - [0, 24) wks, [0, 500) g, interpolated annual results',\n",
"cat3='Birth prevalence - [0, 20) wks, [0, 500) g',\n",
"cat4='Birth prevalence - [24, 26) wks, [0, 500) g, interpolated annual results',\n",
"cat5='Birth prevalence - [26, 28) wks, [0, 500) g, interpolated annual results',\n",
"cat6='Birth prevalence - [20, 22) wks, [500, 1000) g',\n",
"cat7='Birth prevalence - [28, 30) wks, [0, 500) g, interpolated annual results',\n",
"cat8='Birth prevalence - [0, 24) wks, [500, 1000) g, interpolated annual results',\n",
"cat9='Birth prevalence - [0, 20) wks, [500, 1000) g',\n",
"cat10='Birth prevalence - [24, 26) wks, [500, 1000) g, interpolated annual results',\n",
"cat11='Birth prevalence - [26, 28) wks, [500, 1000) g, interpolated annual results',\n",
"cat12='Birth prevalence - [32, 34) wks, [500, 1000) g, interpolated annual results',\n",
"cat13='Birth prevalence - [0, 24) wks, [1000, 1500) g, interpolated annual results',\n",
"cat14='Birth prevalence - [30, 32) wks, [500, 1000) g, interpolated annual results',\n",
"cat15='Birth prevalence - [28, 30) wks, [500, 1000) g, interpolated annual results',\n",
"cat16='Birth prevalence - [24, 26) wks, [1000, 1500) g, interpolated annual results',\n",
"cat17='Birth prevalence - [26, 28) wks, [1000, 1500) g, interpolated annual results',\n",
"cat18='Birth prevalence - [26, 28) wks, [1500, 2000) g, interpolated annual results',\n",
"cat19='Birth prevalence - [34, 36) wks, [1000, 1500) g, interpolated annual results',\n",
"cat20='Birth prevalence - [28, 30) wks, [1500, 2000) g, interpolated annual results',\n",
"cat21='Birth prevalence - [28, 30) wks, [1000, 1500) g, interpolated annual results',\n",
"cat22='Birth prevalence - [32, 34) wks, [1000, 1500) g, interpolated annual results',\n",
"cat23='Birth prevalence - [30, 32) wks, [1000, 1500) g, interpolated annual results',\n",
"cat24='Birth prevalence - [37, 38) wks, [1500, 2000) g, interpolated annual results',\n",
"cat25='Birth prevalence - [36, 37) wks, [1500, 2000) g, interpolated annual results',\n",
"cat26='Birth prevalence - [30, 32) wks, [2000, 2500) g, interpolated annual results',\n",
"cat27='Birth prevalence - [30, 32) wks, [1500, 2000) g, interpolated annual results',\n",
"cat28='Birth prevalence - [34, 36) wks, [1500, 2000) g, interpolated annual results',\n",
"cat29='Birth prevalence - [32, 34) wks, [1500, 2000) g, interpolated annual results',\n",
"cat30='Birth prevalence - [32, 34) wks, [2000, 2500) g, interpolated annual results',\n",
"cat31='Birth prevalence - [40, 42) wks, [2000, 2500) g, interpolated annual results',\n",
"cat32='Birth prevalence - [38, 40) wks, [2000, 2500) g, interpolated annual results',\n",
"cat33='Birth prevalence - [32, 34) wks, [2500, 3000) g, interpolated annual results',\n",
"cat34='Birth prevalence - [34, 36) wks, [2000, 2500) g, interpolated annual results',\n",
"cat35='Birth prevalence - [37, 38) wks, [2000, 2500) g, interpolated annual results',\n",
"cat36='Birth prevalence - [36, 37) wks, [2000, 2500) g, interpolated annual results',\n",
"cat37='Birth prevalence - [34, 36) wks, [2500, 3000) g, interpolated annual results',\n",
"cat38='Birth prevalence - [34, 36) wks, [4000, 4500) g, interpolated annual results',\n",
"cat39='Birth prevalence - [34, 36) wks, [3000, 3500) g, interpolated annual results',\n",
"cat40='Birth prevalence - [36, 37) wks, [2500, 3000) g, interpolated annual results',\n",
"cat41='Birth prevalence - [34, 36) wks, [3500, 4000) g, interpolated annual results',\n",
"cat42='Birth prevalence - [37, 38) wks, [2500, 3000) g, interpolated annual results',\n",
"cat43='Birth prevalence - [40, 42) wks, [2500, 3000) g, interpolated annual results',\n",
"cat44='Birth prevalence - [38, 40) wks, [2500, 3000) g, interpolated annual results',\n",
"cat45='Birth prevalence - [36, 37) wks, [3000, 3500) g, interpolated annual results',\n",
"cat46='Birth prevalence - [36, 37) wks, [4000, 4500) g, interpolated annual results',\n",
"cat47='Birth prevalence - [36, 37) wks, [3500, 4000) g, interpolated annual results',\n",
"cat48='Birth prevalence - [37, 38) wks, [3000, 3500) g, interpolated annual results',\n",
"cat49='Birth prevalence - [37, 38) wks, [4000, 4500) g, interpolated annual results',\n",
"cat50='Birth prevalence - [37, 38) wks, [3500, 4000) g, interpolated annual results',\n",
"cat51='Birth prevalence - [40, 42) wks, [3000, 3500) g, interpolated annual results',\n",
"cat52='Birth prevalence - [38, 40) wks, [3000, 3500) g, interpolated annual results',\n",
"cat53='Birth prevalence - [38, 40) wks, [4000, 4500) g, interpolated annual results',\n",
"cat54='Birth prevalence - [38, 40) wks, [3500, 4000) g, interpolated annual results',\n",
"cat55='Birth prevalence - [40, 42) wks, [3500, 4000) g, interpolated annual results',\n",
"cat56='Birth prevalence - [40, 42) wks, [4000, 4500) g, interpolated annual results',\n",
"cat80='Birth prevalence - [28, 30) wks, [2000, 2500) g',\n",
"cat81='Birth prevalence - [28, 30) wks, [2500, 3000) g',\n",
"cat82='Birth prevalence - [28, 30) wks, [3000, 3500) g',\n",
"cat88='Birth prevalence - [30, 32) wks, [2500, 3000) g',\n",
"cat89='Birth prevalence - [30, 32) wks, [3000, 3500) g',\n",
"cat90='Birth prevalence - [30, 32) wks, [3500, 4000) g',\n",
"cat95='Birth prevalence - [32, 34) wks, [3000, 3500) g',\n",
"cat96='Birth prevalence - [32, 34) wks, [3500, 4000) g',\n",
"cat106='Birth prevalence - [36, 37) wks, [1000, 1500) g',\n",
"cat116='Birth prevalence - [38, 40) wks, [1000, 1500) g',\n",
"cat117='Birth prevalence - [38, 40) wks, [1500, 2000) g',\n",
"cat123='Birth prevalence - [40, 42) wks, [1500, 2000) g')"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# the LBWSG risk may have the most categories of any GBD risk, and we are going to\n",
"# see which of these categories appear in the data\n",
"\n",
"categories = gbd.risk_factors.low_birth_weight_and_short_gestation.categories\n",
"categories"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"categories = categories.to_dict()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Up to here, everything should work on the open-internet. But the following lines will access data on LBWSG and need to be run behind the IHME VPN. The module that I am using to get the data is open-source, but it is useless without access to the databases that are currently on the IHME private network.\n",
"\n",
"There is a tutorial on how to pull data with `vivarium_inputs` here:\n",
"https://vivarium-inputs.readthedocs.io/en/latest/tutorials/pulling_data.html"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"# vivarium_inputs is a python module that loads and validates data for use in Vivarium\n",
"# https://github.com/ihmeuw/vivarium_inputs\n",
"# https://vivarium-inputs.readthedocs.io\n",
"from vivarium_inputs import get_measure, get_raw_data"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"________________________________________________________________________________\n",
"[Memory] Calling vivarium_gbd_access.gbd.get_location_ids...\n",
"get_location_ids()\n",
"_________________________________________________get_location_ids - 1.7s, 0.0min\n",
"________________________________________________________________________________\n",
"[Memory] Calling vivarium_gbd_access.gbd.get_exposure...\n",
"get_exposure(reiid(339), 182)\n",
"________________________________________________________________________________\n",
"[Memory] Calling vivarium_gbd_access.gbd.get_age_group_id...\n",
"get_age_group_id()\n",
"_________________________________________________get_age_group_id - 0.0s, 0.0min\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/share/code/abie/miniconda3/lib/python3.6/site-packages/vivarium_inputs/validation/raw.py:1401: UserWarning: Risk_factor low_birth_weight_and_short_gestation exposure data may violate the following restrictions: age restriction.\n",
" warnings.warn(f'{entity.kind.capitalize()} {entity.name} {measure} data may violate the '\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"_________________________________________________get_exposure - 1178.8s, 19.6min\n",
"________________________________________________________________________________\n",
"[Memory] Calling vivarium_gbd_access.gbd.get_estimation_years...\n",
"get_estimation_years()\n",
"_____________________________________________get_estimation_years - 0.2s, 0.0min\n",
"________________________________________________________________________________\n",
"[Memory] Calling vivarium_gbd_access.gbd.get_location_path_to_global...\n",
"get_location_path_to_global()\n",
"______________________________________get_location_path_to_global - 0.0s, 0.0min\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/share/code/abie/miniconda3/lib/python3.6/site-packages/vivarium_inputs/validation/raw.py:1954: UserWarning: Data was expected to contain all age groups between ids 2 and 235 but was missing the following: {4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 31, 32, 235}.\n",
" warnings.warn(message)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"________________________________________________________________________________\n",
"[Memory] Calling vivarium_gbd_access.gbd.get_age_bins...\n",
"get_age_bins()\n",
"_____________________________________________________get_age_bins - 0.1s, 0.0min\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>draw</th>\n",
" <th>location</th>\n",
" <th>sex</th>\n",
" <th>age_group_start</th>\n",
" <th>age_group_end</th>\n",
" <th>year_start</th>\n",
" <th>year_end</th>\n",
" <th>parameter</th>\n",
" <th>value</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>Malawi</td>\n",
" <td>Female</td>\n",
" <td>0.0</td>\n",
" <td>0.019178</td>\n",
" <td>1990</td>\n",
" <td>1991</td>\n",
" <td>cat10</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0</td>\n",
" <td>Malawi</td>\n",
" <td>Female</td>\n",
" <td>0.0</td>\n",
" <td>0.019178</td>\n",
" <td>1990</td>\n",
" <td>1991</td>\n",
" <td>cat106</td>\n",
" <td>0.001315</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0</td>\n",
" <td>Malawi</td>\n",
" <td>Female</td>\n",
" <td>0.0</td>\n",
" <td>0.019178</td>\n",
" <td>1990</td>\n",
" <td>1991</td>\n",
" <td>cat11</td>\n",
" <td>0.001245</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0</td>\n",
" <td>Malawi</td>\n",
" <td>Female</td>\n",
" <td>0.0</td>\n",
" <td>0.019178</td>\n",
" <td>1990</td>\n",
" <td>1991</td>\n",
" <td>cat116</td>\n",
" <td>0.002377</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>74703996</th>\n",
" <td>999</td>\n",
" <td>Malawi</td>\n",
" <td>Male</td>\n",
" <td>95.0</td>\n",
" <td>125.000000</td>\n",
" <td>2017</td>\n",
" <td>2018</td>\n",
" <td>cat89</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>74703997</th>\n",
" <td>999</td>\n",
" <td>Malawi</td>\n",
" <td>Male</td>\n",
" <td>95.0</td>\n",
" <td>125.000000</td>\n",
" <td>2017</td>\n",
" <td>2018</td>\n",
" <td>cat90</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>74703998</th>\n",
" <td>999</td>\n",
" <td>Malawi</td>\n",
" <td>Male</td>\n",
" <td>95.0</td>\n",
" <td>125.000000</td>\n",
" <td>2017</td>\n",
" <td>2018</td>\n",
" <td>cat95</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>74703999</th>\n",
" <td>999</td>\n",
" <td>Malawi</td>\n",
" <td>Male</td>\n",
" <td>95.0</td>\n",
" <td>125.000000</td>\n",
" <td>2017</td>\n",
" <td>2018</td>\n",
" <td>cat96</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>74704000 rows × 9 columns</p>\n",
"</div>"
],
"text/plain": [
" draw location sex age_group_start age_group_end year_start \\\n",
"0 0 Malawi Female 0.0 0.019178 1990 \n",
"1 0 Malawi Female 0.0 0.019178 1990 \n",
"2 0 Malawi Female 0.0 0.019178 1990 \n",
"3 0 Malawi Female 0.0 0.019178 1990 \n",
"... ... ... ... ... ... ... \n",
"74703996 999 Malawi Male 95.0 125.000000 2017 \n",
"74703997 999 Malawi Male 95.0 125.000000 2017 \n",
"74703998 999 Malawi Male 95.0 125.000000 2017 \n",
"74703999 999 Malawi Male 95.0 125.000000 2017 \n",
"\n",
" year_end parameter value \n",
"0 1991 cat10 0.000000 \n",
"1 1991 cat106 0.001315 \n",
"2 1991 cat11 0.001245 \n",
"3 1991 cat116 0.002377 \n",
"... ... ... ... \n",
"74703996 2018 cat89 0.000000 \n",
"74703997 2018 cat90 0.000000 \n",
"74703998 2018 cat95 0.000000 \n",
"74703999 2018 cat96 0.000000 \n",
"\n",
"[74704000 rows x 9 columns]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# I expect that this will not work, because get_measure has a validation step after fetching the raw data\n",
"# I expect it to fail on the validation step, because the data categories do not match the expected categories\n",
"get_measure(gbd.risk_factors.low_birth_weight_and_short_gestation, 'exposure', 'Malawi')"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"# hmm, i wonder if that was helped by some hack I made"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/share/code/abie/miniconda3/lib/python3.6/site-packages/vivarium_inputs/validation/raw.py:1401: UserWarning: Risk_factor low_birth_weight_and_short_gestation exposure data may violate the following restrictions: age restriction.\n",
" warnings.warn(f'{entity.kind.capitalize()} {entity.name} {measure} data may violate the '\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>rei_id</th>\n",
" <th>modelable_entity_id</th>\n",
" <th>location_id</th>\n",
" <th>year_id</th>\n",
" <th>age_group_id</th>\n",
" <th>sex_id</th>\n",
" <th>parameter</th>\n",
" <th>measure_id</th>\n",
" <th>metric_id</th>\n",
" <th>draw_0</th>\n",
" <th>...</th>\n",
" <th>draw_990</th>\n",
" <th>draw_991</th>\n",
" <th>draw_992</th>\n",
" <th>draw_993</th>\n",
" <th>draw_994</th>\n",
" <th>draw_995</th>\n",
" <th>draw_996</th>\n",
" <th>draw_997</th>\n",
" <th>draw_998</th>\n",
" <th>draw_999</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>339</td>\n",
" <td>10797.0</td>\n",
" <td>182</td>\n",
" <td>1990</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>cat44</td>\n",
" <td>5</td>\n",
" <td>3</td>\n",
" <td>1.447758e-01</td>\n",
" <td>...</td>\n",
" <td>1.188209e-01</td>\n",
" <td>1.593703e-01</td>\n",
" <td>1.634323e-01</td>\n",
" <td>1.640281e-01</td>\n",
" <td>1.339196e-01</td>\n",
" <td>1.252877e-01</td>\n",
" <td>1.652826e-01</td>\n",
" <td>1.562171e-01</td>\n",
" <td>1.283715e-01</td>\n",
" <td>1.609087e-01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>339</td>\n",
" <td>10797.0</td>\n",
" <td>182</td>\n",
" <td>1990</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>cat44</td>\n",
" <td>5</td>\n",
" <td>3</td>\n",
" <td>1.456578e-01</td>\n",
" <td>...</td>\n",
" <td>1.193614e-01</td>\n",
" <td>1.609060e-01</td>\n",
" <td>1.647490e-01</td>\n",
" <td>1.650582e-01</td>\n",
" <td>1.347376e-01</td>\n",
" <td>1.262138e-01</td>\n",
" <td>1.666289e-01</td>\n",
" <td>1.576168e-01</td>\n",
" <td>1.294122e-01</td>\n",
" <td>1.623555e-01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>339</td>\n",
" <td>10797.0</td>\n",
" <td>182</td>\n",
" <td>1990</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>cat44</td>\n",
" <td>5</td>\n",
" <td>3</td>\n",
" <td>1.523357e-01</td>\n",
" <td>...</td>\n",
" <td>1.533115e-01</td>\n",
" <td>1.582424e-01</td>\n",
" <td>1.578901e-01</td>\n",
" <td>1.571100e-01</td>\n",
" <td>1.499935e-01</td>\n",
" <td>1.178913e-01</td>\n",
" <td>1.580012e-01</td>\n",
" <td>1.492179e-01</td>\n",
" <td>1.618553e-01</td>\n",
" <td>1.462230e-01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>339</td>\n",
" <td>10797.0</td>\n",
" <td>182</td>\n",
" <td>1990</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>cat44</td>\n",
" <td>5</td>\n",
" <td>3</td>\n",
" <td>1.535263e-01</td>\n",
" <td>...</td>\n",
" <td>1.547306e-01</td>\n",
" <td>1.592596e-01</td>\n",
" <td>1.593356e-01</td>\n",
" <td>1.585752e-01</td>\n",
" <td>1.511650e-01</td>\n",
" <td>1.186676e-01</td>\n",
" <td>1.594340e-01</td>\n",
" <td>1.503803e-01</td>\n",
" <td>1.629384e-01</td>\n",
" <td>1.474090e-01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>339</td>\n",
" <td>NaN</td>\n",
" <td>182</td>\n",
" <td>2000</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>cat124</td>\n",
" <td>5</td>\n",
" <td>3</td>\n",
" <td>0.000000e+00</td>\n",
" <td>...</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>5.551115e-16</td>\n",
" <td>0.000000e+00</td>\n",
" <td>1.332268e-15</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>339</td>\n",
" <td>NaN</td>\n",
" <td>182</td>\n",
" <td>2005</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>cat124</td>\n",
" <td>5</td>\n",
" <td>3</td>\n",
" <td>0.000000e+00</td>\n",
" <td>...</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>3.330669e-16</td>\n",
" <td>9.992007e-16</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>339</td>\n",
" <td>NaN</td>\n",
" <td>182</td>\n",
" <td>2010</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>cat124</td>\n",
" <td>5</td>\n",
" <td>3</td>\n",
" <td>3.330669e-16</td>\n",
" <td>...</td>\n",
" <td>6.661338e-16</td>\n",
" <td>0.000000e+00</td>\n",
" <td>5.551115e-16</td>\n",
" <td>0.000000e+00</td>\n",
" <td>1.110223e-16</td>\n",
" <td>2.220446e-16</td>\n",
" <td>8.881784e-16</td>\n",
" <td>2.220446e-16</td>\n",
" <td>1.110223e-15</td>\n",
" <td>1.110223e-16</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>339</td>\n",
" <td>NaN</td>\n",
" <td>182</td>\n",
" <td>2017</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>cat124</td>\n",
" <td>5</td>\n",
" <td>3</td>\n",
" <td>0.000000e+00</td>\n",
" <td>...</td>\n",
" <td>2.220446e-16</td>\n",
" <td>4.440892e-16</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>8.881784e-16</td>\n",
" <td>0.000000e+00</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1392 rows × 1009 columns</p>\n",
"</div>"
],
"text/plain": [
" rei_id modelable_entity_id location_id year_id age_group_id sex_id \\\n",
"0 339 10797.0 182 1990 2 1 \n",
"1 339 10797.0 182 1990 3 1 \n",
"2 339 10797.0 182 1990 2 2 \n",
"3 339 10797.0 182 1990 3 2 \n",
".. ... ... ... ... ... ... \n",
"20 339 NaN 182 2000 3 2 \n",
"21 339 NaN 182 2005 3 2 \n",
"22 339 NaN 182 2010 3 2 \n",
"23 339 NaN 182 2017 3 2 \n",
"\n",
" parameter measure_id metric_id draw_0 ... draw_990 \\\n",
"0 cat44 5 3 1.447758e-01 ... 1.188209e-01 \n",
"1 cat44 5 3 1.456578e-01 ... 1.193614e-01 \n",
"2 cat44 5 3 1.523357e-01 ... 1.533115e-01 \n",
"3 cat44 5 3 1.535263e-01 ... 1.547306e-01 \n",
".. ... ... ... ... ... ... \n",
"20 cat124 5 3 0.000000e+00 ... 0.000000e+00 \n",
"21 cat124 5 3 0.000000e+00 ... 0.000000e+00 \n",
"22 cat124 5 3 3.330669e-16 ... 6.661338e-16 \n",
"23 cat124 5 3 0.000000e+00 ... 2.220446e-16 \n",
"\n",
" draw_991 draw_992 draw_993 draw_994 draw_995 \\\n",
"0 1.593703e-01 1.634323e-01 1.640281e-01 1.339196e-01 1.252877e-01 \n",
"1 1.609060e-01 1.647490e-01 1.650582e-01 1.347376e-01 1.262138e-01 \n",
"2 1.582424e-01 1.578901e-01 1.571100e-01 1.499935e-01 1.178913e-01 \n",
"3 1.592596e-01 1.593356e-01 1.585752e-01 1.511650e-01 1.186676e-01 \n",
".. ... ... ... ... ... \n",
"20 0.000000e+00 0.000000e+00 0.000000e+00 5.551115e-16 0.000000e+00 \n",
"21 0.000000e+00 0.000000e+00 3.330669e-16 9.992007e-16 0.000000e+00 \n",
"22 0.000000e+00 5.551115e-16 0.000000e+00 1.110223e-16 2.220446e-16 \n",
"23 4.440892e-16 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 \n",
"\n",
" draw_996 draw_997 draw_998 draw_999 \n",
"0 1.652826e-01 1.562171e-01 1.283715e-01 1.609087e-01 \n",
"1 1.666289e-01 1.576168e-01 1.294122e-01 1.623555e-01 \n",
"2 1.580012e-01 1.492179e-01 1.618553e-01 1.462230e-01 \n",
"3 1.594340e-01 1.503803e-01 1.629384e-01 1.474090e-01 \n",
".. ... ... ... ... \n",
"20 1.332268e-15 0.000000e+00 0.000000e+00 0.000000e+00 \n",
"21 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 \n",
"22 8.881784e-16 2.220446e-16 1.110223e-15 1.110223e-16 \n",
"23 0.000000e+00 0.000000e+00 8.881784e-16 0.000000e+00 \n",
"\n",
"[1392 rows x 1009 columns]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# I expect this to work, though, and to give us enough information to examine the discrepancy between categories\n",
"exposure = get_raw_data(gbd.risk_factors.low_birth_weight_and_short_gestation, 'exposure', 'Malawi')\n",
"exposure"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['cat44', 'cat11', 'cat38', 'cat45', 'cat21', 'cat96', 'cat30',\n",
" 'cat54', 'cat95', 'cat23', 'cat49', 'cat116', 'cat47', 'cat31',\n",
" 'cat8', 'cat20', 'cat10', 'cat123', 'cat55', 'cat35', 'cat81',\n",
" 'cat25', 'cat14', 'cat56', 'cat51', 'cat50', 'cat33', 'cat19',\n",
" 'cat53', 'cat52', 'cat46', 'cat34', 'cat28', 'cat15', 'cat82',\n",
" 'cat89', 'cat43', 'cat27', 'cat2', 'cat39', 'cat40', 'cat41',\n",
" 'cat106', 'cat32', 'cat22', 'cat37', 'cat26', 'cat29', 'cat48',\n",
" 'cat90', 'cat117', 'cat24', 'cat80', 'cat36', 'cat42', 'cat88',\n",
" 'cat17', 'cat124'], dtype=object)"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"exposure.parameter.unique()"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'cat124'}"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"observed_cats = set(exposure.parameter)\n",
"expected_cats = set(categories.keys())\n",
"\n",
"# did we observe any categories that were *not* expected?\n",
"observed_cats - expected_cats"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'cat1',\n",
" 'cat12',\n",
" 'cat13',\n",
" 'cat16',\n",
" 'cat18',\n",
" 'cat3',\n",
" 'cat4',\n",
" 'cat5',\n",
" 'cat6',\n",
" 'cat7',\n",
" 'cat9'}"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# and what expected categories were unobserved\n",
"expected_cats - observed_cats"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['Birth prevalence - [26, 28) wks, [1500, 2000) g, interpolated annual results',\n",
" 'Birth prevalence - [20, 22) wks, [0, 500) g',\n",
" 'Birth prevalence - [0, 20) wks, [0, 500) g',\n",
" 'Birth prevalence - [20, 22) wks, [500, 1000) g',\n",
" 'Birth prevalence - [0, 24) wks, [1000, 1500) g, interpolated annual results',\n",
" 'Birth prevalence - [24, 26) wks, [0, 500) g, interpolated annual results',\n",
" 'Birth prevalence - [28, 30) wks, [0, 500) g, interpolated annual results',\n",
" 'Birth prevalence - [0, 20) wks, [500, 1000) g',\n",
" 'Birth prevalence - [32, 34) wks, [500, 1000) g, interpolated annual results',\n",
" 'Birth prevalence - [24, 26) wks, [1000, 1500) g, interpolated annual results',\n",
" 'Birth prevalence - [26, 28) wks, [0, 500) g, interpolated annual results'],\n",
" dtype=object)"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# which are those?\n",
"pd.Series(categories).loc[list(expected_cats - observed_cats)].values"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# the unexpected exposure is very close to zero\n",
"np.allclose(exposure[exposure.parameter == 'cat124'].filter(like='draw'), 0)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"________________________________________________________________________________\n",
"[Memory] Calling vivarium_gbd_access.gbd.get_exposure...\n",
"get_exposure(reiid(339), 102)\n",
"___________________________________________________get_exposure - 326.7s, 5.4min\n",
"CPU times: user 2min 45s, sys: 9.5 s, total: 2min 54s\n",
"Wall time: 5min 26s\n",
"________________________________________________________________________________\n",
"[Memory] Calling vivarium_gbd_access.gbd.get_exposure...\n",
"get_exposure(reiid(339), 180)\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/share/code/abie/miniconda3/lib/python3.6/site-packages/vivarium_inputs/validation/raw.py:1401: UserWarning: Risk_factor low_birth_weight_and_short_gestation exposure data may violate the following restrictions: age restriction.\n",
" warnings.warn(f'{entity.kind.capitalize()} {entity.name} {measure} data may violate the '\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"___________________________________________________get_exposure - 271.5s, 4.5min\n",
"CPU times: user 2min 45s, sys: 8.8 s, total: 2min 54s\n",
"Wall time: 4min 31s\n",
"________________________________________________________________________________\n",
"[Memory] Calling vivarium_gbd_access.gbd.get_exposure...\n",
"get_exposure(reiid(339), 179)\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/share/code/abie/miniconda3/lib/python3.6/site-packages/vivarium_inputs/validation/raw.py:1401: UserWarning: Risk_factor low_birth_weight_and_short_gestation exposure data may violate the following restrictions: age restriction.\n",
" warnings.warn(f'{entity.kind.capitalize()} {entity.name} {measure} data may violate the '\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"___________________________________________________get_exposure - 268.8s, 4.5min\n",
"CPU times: user 2min 45s, sys: 7.55 s, total: 2min 52s\n",
"Wall time: 4min 28s\n",
"________________________________________________________________________________\n",
"[Memory] Calling vivarium_gbd_access.gbd.get_exposure...\n",
"get_exposure(reiid(339), 163)\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/share/code/abie/miniconda3/lib/python3.6/site-packages/vivarium_inputs/validation/raw.py:1401: UserWarning: Risk_factor low_birth_weight_and_short_gestation exposure data may violate the following restrictions: age restriction.\n",
" warnings.warn(f'{entity.kind.capitalize()} {entity.name} {measure} data may violate the '\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"___________________________________________________get_exposure - 275.2s, 4.6min\n",
"CPU times: user 2min 45s, sys: 8.89 s, total: 2min 54s\n",
"Wall time: 4min 35s\n",
"________________________________________________________________________________\n",
"[Memory] Calling vivarium_gbd_access.gbd.get_exposure...\n",
"get_exposure(reiid(339), 161)\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/share/code/abie/miniconda3/lib/python3.6/site-packages/vivarium_inputs/validation/raw.py:1401: UserWarning: Risk_factor low_birth_weight_and_short_gestation exposure data may violate the following restrictions: age restriction.\n",
" warnings.warn(f'{entity.kind.capitalize()} {entity.name} {measure} data may violate the '\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"___________________________________________________get_exposure - 277.7s, 4.6min\n",
"CPU times: user 2min 45s, sys: 8.43 s, total: 2min 53s\n",
"Wall time: 4min 37s\n",
"________________________________________________________________________________\n",
"[Memory] Calling vivarium_gbd_access.gbd.get_exposure...\n",
"get_exposure(reiid(339), 67)\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/share/code/abie/miniconda3/lib/python3.6/site-packages/vivarium_inputs/validation/raw.py:1401: UserWarning: Risk_factor low_birth_weight_and_short_gestation exposure data may violate the following restrictions: age restriction.\n",
" warnings.warn(f'{entity.kind.capitalize()} {entity.name} {measure} data may violate the '\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"___________________________________________________get_exposure - 281.9s, 4.7min\n",
"CPU times: user 2min 44s, sys: 8.45 s, total: 2min 53s\n",
"Wall time: 4min 42s\n",
"________________________________________________________________________________\n",
"[Memory] Calling vivarium_gbd_access.gbd.get_exposure...\n",
"get_exposure(reiid(339), 101)\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/share/code/abie/miniconda3/lib/python3.6/site-packages/vivarium_inputs/validation/raw.py:1401: UserWarning: Risk_factor low_birth_weight_and_short_gestation exposure data may violate the following restrictions: age restriction.\n",
" warnings.warn(f'{entity.kind.capitalize()} {entity.name} {measure} data may violate the '\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"___________________________________________________get_exposure - 276.5s, 4.6min\n",
"CPU times: user 2min 45s, sys: 8.4 s, total: 2min 53s\n",
"Wall time: 4min 36s\n",
"________________________________________________________________________________\n",
"[Memory] Calling vivarium_gbd_access.gbd.get_exposure...\n",
"get_exposure(reiid(339), 130)\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/share/code/abie/miniconda3/lib/python3.6/site-packages/vivarium_inputs/validation/raw.py:1401: UserWarning: Risk_factor low_birth_weight_and_short_gestation exposure data may violate the following restrictions: age restriction.\n",
" warnings.warn(f'{entity.kind.capitalize()} {entity.name} {measure} data may violate the '\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"___________________________________________________get_exposure - 278.9s, 4.6min\n",
"CPU times: user 2min 44s, sys: 8.43 s, total: 2min 53s\n",
"Wall time: 4min 39s\n"
]
}
],
"source": [
"data = {}\n",
"\n",
"for loc in ['United States', 'Kenya', 'Ethiopia', 'India', 'Bangladesh', 'Japan', 'Canada', 'Mexico']:\n",
" %time data[loc] = get_raw_data(gbd.risk_factors.low_birth_weight_and_short_gestation, 'exposure', loc)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"United States\n",
"did we observe any categories that were *not* expected?\n",
"{'cat124'}\n",
"Kenya\n",
"did we observe any categories that were *not* expected?\n",
"{'cat124'}\n",
"Ethiopia\n",
"did we observe any categories that were *not* expected?\n",
"{'cat124'}\n",
"India\n",
"did we observe any categories that were *not* expected?\n",
"{'cat124'}\n",
"Bangladesh\n",
"did we observe any categories that were *not* expected?\n",
"{'cat124'}\n",
"Japan\n",
"did we observe any categories that were *not* expected?\n",
"{'cat124'}\n",
"Canada\n",
"did we observe any categories that were *not* expected?\n",
"{'cat124'}\n",
"Mexico\n",
"did we observe any categories that were *not* expected?\n",
"{'cat124'}\n"
]
}
],
"source": [
"for loc in data.keys():\n",
" print(loc)\n",
" \n",
" observed_cats = set(exposure.parameter)\n",
" expected_cats = set(categories.keys())\n",
"\n",
" print('did we observe any categories that were *not* expected?')\n",
" print(observed_cats - expected_cats)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"United States {cat18, cat1, cat3, cat6, cat13, cat4, cat7, c...\n",
"Kenya {cat18, cat1, cat3, cat6, cat13, cat4, cat7, c...\n",
"Ethiopia {cat18, cat1, cat3, cat6, cat13, cat4, cat7, c...\n",
"India {cat18, cat1, cat3, cat6, cat13, cat4, cat7, c...\n",
"Bangladesh {cat18, cat1, cat3, cat6, cat13, cat4, cat7, c...\n",
"Japan {cat18, cat1, cat3, cat6, cat13, cat4, cat7, c...\n",
"Canada {cat18, cat1, cat3, cat6, cat13, cat4, cat7, c...\n",
"Mexico {cat18, cat1, cat3, cat6, cat13, cat4, cat7, c...\n",
"dtype: object"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# and what expected categories were unobserved\n",
"unobserved = {}\n",
"\n",
"for loc in data.keys():\n",
" observed_cats = set(exposure.parameter)\n",
" expected_cats = set(categories.keys())\n",
"\n",
" unobserved[loc] = (expected_cats - observed_cats)\n",
"pd.Series(unobserved)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"United States True\n",
"Kenya True\n",
"Ethiopia True\n",
"India True\n",
"Bangladesh True\n",
"Japan True\n",
"Canada True\n",
"Mexico True\n",
"dtype: bool"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.Series(unobserved) == (expected_cats - observed_cats)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"for loc in data.keys():\n",
" observed_cats = set(exposure.parameter)\n",
" expected_cats = set(categories.keys())\n",
"\n",
"\n",
" assert np.allclose(exposure[exposure.parameter == 'cat124'].filter(like='draw'), 0), \\\n",
" 'the unexpected exposure should be very close to zero'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# In summary\n",
"\n",
"There are 11 expected categories that are not observed in my purposeful sample of locations ('cat1', 'cat12', 'cat13', 'cat16', 'cat18', 'cat3', 'cat4', 'cat5', 'cat6', 'cat7', 'cat9'). There is also one category in the data that is not expected ('cat124').\n",
"\n",
"It seems safe to proceed assuming that all 12 of these categories have exposure level zero."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment