Skip to content

Instantly share code, notes, and snippets.

@knu2xs
Created May 28, 2021 15:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save knu2xs/f57f515e39f5fa7249d4d1c8e9efe069 to your computer and use it in GitHub Desktop.
Save knu2xs/f57f515e39f5fa7249d4d1c8e9efe069 to your computer and use it in GitHub Desktop.
Enrichment introspection examples
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Enrich Introspection\n",
"\n",
"Although not sure what to call it, I am showing it being called `Enrich` here. I am building this by updating and adding a _lot_ of functionality to the existing `_GeoEnrichment` object."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"from arcgis.gis import GIS\n",
"from arcgis.geoenrichment._ge import _GeoEnrichment as Enrich"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Online Introspection (REST)\n",
"\n",
"First, getting the available countries can be discovered using the `get_countries` method."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>iso2</th>\n",
" <th>iso3</th>\n",
" <th>country_name</th>\n",
" <th>country_id</th>\n",
" <th>alt_name</th>\n",
" <th>continent</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>AL</td>\n",
" <td>ALB</td>\n",
" <td>Albania</td>\n",
" <td>ALB_MBR_2019</td>\n",
" <td>ALBANIA</td>\n",
" <td>Europe</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>DZ</td>\n",
" <td>DZA</td>\n",
" <td>Algeria</td>\n",
" <td>DZA_MBR_2019</td>\n",
" <td>ALGERIA</td>\n",
" <td>Africa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>AD</td>\n",
" <td>AND</td>\n",
" <td>Andorra</td>\n",
" <td>AND_MBR_2019</td>\n",
" <td>ANDORRA</td>\n",
" <td>Europe</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>AO</td>\n",
" <td>AGO</td>\n",
" <td>Angola</td>\n",
" <td>AGO_MBR_2019</td>\n",
" <td>ANGOLA</td>\n",
" <td>Africa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>AR</td>\n",
" <td>ARG</td>\n",
" <td>Argentina</td>\n",
" <td>ARG_MBR_2020</td>\n",
" <td>ARGENTINA</td>\n",
" <td>South America</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>131</th>\n",
" <td>UY</td>\n",
" <td>URY</td>\n",
" <td>Uruguay</td>\n",
" <td>URY_MBR_2020</td>\n",
" <td>URUGUAY</td>\n",
" <td>South America</td>\n",
" </tr>\n",
" <tr>\n",
" <th>132</th>\n",
" <td>UZ</td>\n",
" <td>UZB</td>\n",
" <td>Uzbekistan</td>\n",
" <td>UZB_MBR_2020</td>\n",
" <td>UZBEKISTAN</td>\n",
" <td>Asia</td>\n",
" </tr>\n",
" <tr>\n",
" <th>133</th>\n",
" <td>VE</td>\n",
" <td>VEN</td>\n",
" <td>Venezuela</td>\n",
" <td>VEN_MBR_2020</td>\n",
" <td>VENEZUELA, BOLIVARIAN REPUBLIC OF</td>\n",
" <td>South America</td>\n",
" </tr>\n",
" <tr>\n",
" <th>134</th>\n",
" <td>VN</td>\n",
" <td>VNM</td>\n",
" <td>Vietnam</td>\n",
" <td>VNM_MBR_2020</td>\n",
" <td>VIET NAM</td>\n",
" <td>Asia</td>\n",
" </tr>\n",
" <tr>\n",
" <th>135</th>\n",
" <td>ZM</td>\n",
" <td>ZMB</td>\n",
" <td>Zambia</td>\n",
" <td>ZMB_MBR_2019</td>\n",
" <td>ZAMBIA</td>\n",
" <td>Africa</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>136 rows × 6 columns</p>\n",
"</div>"
],
"text/plain": [
" iso2 iso3 country_name country_id alt_name \\\n",
"0 AL ALB Albania ALB_MBR_2019 ALBANIA \n",
"1 DZ DZA Algeria DZA_MBR_2019 ALGERIA \n",
"2 AD AND Andorra AND_MBR_2019 ANDORRA \n",
"3 AO AGO Angola AGO_MBR_2019 ANGOLA \n",
"4 AR ARG Argentina ARG_MBR_2020 ARGENTINA \n",
".. ... ... ... ... ... \n",
"131 UY URY Uruguay URY_MBR_2020 URUGUAY \n",
"132 UZ UZB Uzbekistan UZB_MBR_2020 UZBEKISTAN \n",
"133 VE VEN Venezuela VEN_MBR_2020 VENEZUELA, BOLIVARIAN REPUBLIC OF \n",
"134 VN VNM Vietnam VNM_MBR_2020 VIET NAM \n",
"135 ZM ZMB Zambia ZMB_MBR_2019 ZAMBIA \n",
"\n",
" continent \n",
"0 Europe \n",
"1 Africa \n",
"2 Europe \n",
"3 Africa \n",
"4 South America \n",
".. ... \n",
"131 South America \n",
"132 Asia \n",
"133 South America \n",
"134 Asia \n",
"135 Africa \n",
"\n",
"[136 rows x 6 columns]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gis = GIS()\n",
"enrich_gis = Enrich(gis)\n",
"\n",
"enrich_gis.get_countries()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Variable Introspection\n",
"\n",
"Next, `get_variables` can be used with the three letter ISO3 code to retrieve available variables. Incidentally, the `_standardize_country` method can handle ISO2, ISO3, and names while compensating for case as well, but ISO3 is the \"expected\" input."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 18399 entries, 0 to 18398\n",
"Data columns (total 8 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 name 18399 non-null object\n",
" 1 alias 18399 non-null object\n",
" 2 data_collection 18399 non-null object\n",
" 3 enrich_name 18399 non-null object\n",
" 4 enrich_field_name 18399 non-null object\n",
" 5 description 18306 non-null object\n",
" 6 vintage 18297 non-null object\n",
" 7 units 18399 non-null object\n",
"dtypes: object(8)\n",
"memory usage: 1.1+ MB\n",
"None\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>alias</th>\n",
" <th>data_collection</th>\n",
" <th>enrich_name</th>\n",
" <th>enrich_field_name</th>\n",
" <th>description</th>\n",
" <th>vintage</th>\n",
" <th>units</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>3140</th>\n",
" <td>X14075_A</td>\n",
" <td>2020 Avg: Value Owed: Non-student Loans-Yr Ago</td>\n",
" <td>Financial_Expenditures_rep</td>\n",
" <td>Financial_Expenditures_rep.X14075_A</td>\n",
" <td>Financial_Expenditures_rep_X14075_A</td>\n",
" <td>2020 Value Owed on Non-student Loans - Yr Ago:...</td>\n",
" <td>2020</td>\n",
" <td>currency</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4193</th>\n",
" <td>MP25054a_I</td>\n",
" <td>2020 Index: Spent $150+ at beauty salons in la...</td>\n",
" <td>HealthPersonalCare</td>\n",
" <td>HealthPersonalCare.MP25054a_I</td>\n",
" <td>HealthPersonalCare_MP25054a_I</td>\n",
" <td>2020 Spent $150+ at beauty salons in last 6 mo...</td>\n",
" <td>2020</td>\n",
" <td>count</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14200</th>\n",
" <td>NFHHR15C10</td>\n",
" <td>2010 Nonfamily HHs: HHr 15-24</td>\n",
" <td>householdsbyageofhouseholder</td>\n",
" <td>householdsbyageofhouseholder.NFHHR15C10</td>\n",
" <td>householdsbyageofhouseholder_NFHHR15C10</td>\n",
" <td>2010 Nonfamily Households w/Householder 15-24 ...</td>\n",
" <td>2010</td>\n",
" <td>count</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name alias \\\n",
"3140 X14075_A 2020 Avg: Value Owed: Non-student Loans-Yr Ago \n",
"4193 MP25054a_I 2020 Index: Spent $150+ at beauty salons in la... \n",
"14200 NFHHR15C10 2010 Nonfamily HHs: HHr 15-24 \n",
"\n",
" data_collection enrich_name \\\n",
"3140 Financial_Expenditures_rep Financial_Expenditures_rep.X14075_A \n",
"4193 HealthPersonalCare HealthPersonalCare.MP25054a_I \n",
"14200 householdsbyageofhouseholder householdsbyageofhouseholder.NFHHR15C10 \n",
"\n",
" enrich_field_name \\\n",
"3140 Financial_Expenditures_rep_X14075_A \n",
"4193 HealthPersonalCare_MP25054a_I \n",
"14200 householdsbyageofhouseholder_NFHHR15C10 \n",
"\n",
" description vintage units \n",
"3140 2020 Value Owed on Non-student Loans - Yr Ago:... 2020 currency \n",
"4193 2020 Spent $150+ at beauty salons in last 6 mo... 2020 count \n",
"14200 2010 Nonfamily Households w/Householder 15-24 ... 2010 count "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"vars_gis = enrich_gis.get_variables('USA')\n",
"\n",
"print(vars_gis.info())\n",
"vars_gis.sample(3)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Enrich Input\n",
"\n",
"From here, the enrich method has not yet been updated to be able to ingest this dataframe, but in the interim, for online enrichment, a string of variables can be created using the `name` values separated by a semicolon."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'WHTM45_FY;X5033FY_A;X3027_I;X10002_I;X1145_I'"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"enrich_str = ';'.join(vars_gis.sample(5).name)\n",
"\n",
"enrich_str"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Local\n",
"\n",
"Prior to this, there has been no way for local variable introspection, to be able to discover available enrichment variables in Python in a Python environment with ArcGIS Pro, Business Analyst, and local data installed. This, incidentally, is the _most common_ implementation of Business Analyst."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Country Introspection\n",
"\n",
"Now, using identical syntax, the available countries on the machine can be discovered. Since many customers have models built against previous years' datasets, the vintage, or year of the dataset is also reported."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>iso2</th>\n",
" <th>iso3</th>\n",
" <th>country_name</th>\n",
" <th>vintage</th>\n",
" <th>country_id</th>\n",
" <th>data_source_id</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>CA</td>\n",
" <td>CAN</td>\n",
" <td>Canada</td>\n",
" <td>2020</td>\n",
" <td>CAN_ESRI_2019</td>\n",
" <td>LOCAL;;CAN_ESRI_2019</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>US</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>2019</td>\n",
" <td>USA_ESRI_2019</td>\n",
" <td>LOCAL;;USA_ESRI_2019</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>US</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>2020</td>\n",
" <td>USA_ESRI_2020</td>\n",
" <td>LOCAL;;USA_ESRI_2020</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" iso2 iso3 country_name vintage country_id data_source_id\n",
"0 CA CAN Canada 2020 CAN_ESRI_2019 LOCAL;;CAN_ESRI_2019\n",
"1 US USA United States 2019 USA_ESRI_2019 LOCAL;;USA_ESRI_2019\n",
"2 US USA United States 2020 USA_ESRI_2020 LOCAL;;USA_ESRI_2020"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"enrich_local = Enrich('local')\n",
"\n",
"enrich_local.get_countries()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Enrichment Variable Introspection\n",
"\n",
"Just as before, using the same syntax, available variables can be discovered."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 16874 entries, 0 to 16873\n",
"Data columns (total 5 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 name 16874 non-null object\n",
" 1 alias 16874 non-null object\n",
" 2 data_collection 16874 non-null object\n",
" 3 enrich_name 16874 non-null object\n",
" 4 enrich_field_name 16874 non-null object\n",
"dtypes: object(5)\n",
"memory usage: 659.3+ KB\n",
"None\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>alias</th>\n",
" <th>data_collection</th>\n",
" <th>enrich_name</th>\n",
" <th>enrich_field_name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>9887</th>\n",
" <td>X8033FY_X</td>\n",
" <td>2025 Medical Supplies</td>\n",
" <td>HealthPersonalCareCEX</td>\n",
" <td>HealthPersonalCareCEX.X8033FY_X</td>\n",
" <td>HealthPersonalCareCEX_X8033FY_X</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6445</th>\n",
" <td>MP18035a_I</td>\n",
" <td>2020 Index: Acquired home/pers property insur ...</td>\n",
" <td>FinancialInsurance</td>\n",
" <td>FinancialInsurance.MP18035a_I</td>\n",
" <td>FinancialInsurance_MP18035a_I</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3080</th>\n",
" <td>MEDMAGE_FY</td>\n",
" <td>2025 Median Male Age</td>\n",
" <td>5yearincrements</td>\n",
" <td>5yearincrements.MEDMAGE_FY</td>\n",
" <td>F5yearincrements_MEDMAGE_FY</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name alias \\\n",
"9887 X8033FY_X 2025 Medical Supplies \n",
"6445 MP18035a_I 2020 Index: Acquired home/pers property insur ... \n",
"3080 MEDMAGE_FY 2025 Median Male Age \n",
"\n",
" data_collection enrich_name \\\n",
"9887 HealthPersonalCareCEX HealthPersonalCareCEX.X8033FY_X \n",
"6445 FinancialInsurance FinancialInsurance.MP18035a_I \n",
"3080 5yearincrements 5yearincrements.MEDMAGE_FY \n",
"\n",
" enrich_field_name \n",
"9887 HealthPersonalCareCEX_X8033FY_X \n",
"6445 FinancialInsurance_MP18035a_I \n",
"3080 F5yearincrements_MEDMAGE_FY "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"edf = enrich_local.get_variables('USA')\n",
"\n",
"print(edf.info())\n",
"edf.sample(3)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Pandas Filtering to Get Variables\n",
"\n",
"This time, taking advantage of the naming convention, we can select only the _current year_ variables, drop any variables inadvertently duplicated due to being included in multiple data collections, and save this to a new data frame."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>alias</th>\n",
" <th>data_collection</th>\n",
" <th>enrich_name</th>\n",
" <th>enrich_field_name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>CHILD_CY</td>\n",
" <td>2020 Child Population</td>\n",
" <td>AgeDependency</td>\n",
" <td>AgeDependency.CHILD_CY</td>\n",
" <td>AgeDependency_CHILD_CY</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>WORKAGE_CY</td>\n",
" <td>2020 Working-Age Population</td>\n",
" <td>AgeDependency</td>\n",
" <td>AgeDependency.WORKAGE_CY</td>\n",
" <td>AgeDependency_WORKAGE_CY</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>SENIOR_CY</td>\n",
" <td>2020 Senior Population</td>\n",
" <td>AgeDependency</td>\n",
" <td>AgeDependency.SENIOR_CY</td>\n",
" <td>AgeDependency_SENIOR_CY</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>CHLDDEP_CY</td>\n",
" <td>2020 Child Dependency Ratio</td>\n",
" <td>AgeDependency</td>\n",
" <td>AgeDependency.CHLDDEP_CY</td>\n",
" <td>AgeDependency_CHLDDEP_CY</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>AGEDEP_CY</td>\n",
" <td>2020 Age Dependency Ratio</td>\n",
" <td>AgeDependency</td>\n",
" <td>AgeDependency.AGEDEP_CY</td>\n",
" <td>AgeDependency_AGEDEP_CY</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1318</th>\n",
" <td>NHSPASN_CY</td>\n",
" <td>2020 Non-Hispanic Asian Pop</td>\n",
" <td>raceandhispanicorigin</td>\n",
" <td>raceandhispanicorigin.NHSPASN_CY</td>\n",
" <td>raceandhispanicorigin_NHSPASN_CY</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1319</th>\n",
" <td>NHSPPI_CY</td>\n",
" <td>2020 Non-Hispanic Pacific Islander Pop</td>\n",
" <td>raceandhispanicorigin</td>\n",
" <td>raceandhispanicorigin.NHSPPI_CY</td>\n",
" <td>raceandhispanicorigin_NHSPPI_CY</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1320</th>\n",
" <td>NHSPOTH_CY</td>\n",
" <td>2020 Non-Hispanic Other Race Pop</td>\n",
" <td>raceandhispanicorigin</td>\n",
" <td>raceandhispanicorigin.NHSPOTH_CY</td>\n",
" <td>raceandhispanicorigin_NHSPOTH_CY</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1321</th>\n",
" <td>NHSPMLT_CY</td>\n",
" <td>2020 Non-Hispanic Multiple Race Pop</td>\n",
" <td>raceandhispanicorigin</td>\n",
" <td>raceandhispanicorigin.NHSPMLT_CY</td>\n",
" <td>raceandhispanicorigin_NHSPMLT_CY</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1322</th>\n",
" <td>WLTHINDXCY</td>\n",
" <td>2020 Wealth Index</td>\n",
" <td>Wealth</td>\n",
" <td>Wealth.WLTHINDXCY</td>\n",
" <td>Wealth_WLTHINDXCY</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1323 rows × 5 columns</p>\n",
"</div>"
],
"text/plain": [
" name alias \\\n",
"0 CHILD_CY 2020 Child Population \n",
"1 WORKAGE_CY 2020 Working-Age Population \n",
"2 SENIOR_CY 2020 Senior Population \n",
"3 CHLDDEP_CY 2020 Child Dependency Ratio \n",
"4 AGEDEP_CY 2020 Age Dependency Ratio \n",
"... ... ... \n",
"1318 NHSPASN_CY 2020 Non-Hispanic Asian Pop \n",
"1319 NHSPPI_CY 2020 Non-Hispanic Pacific Islander Pop \n",
"1320 NHSPOTH_CY 2020 Non-Hispanic Other Race Pop \n",
"1321 NHSPMLT_CY 2020 Non-Hispanic Multiple Race Pop \n",
"1322 WLTHINDXCY 2020 Wealth Index \n",
"\n",
" data_collection enrich_name \\\n",
"0 AgeDependency AgeDependency.CHILD_CY \n",
"1 AgeDependency AgeDependency.WORKAGE_CY \n",
"2 AgeDependency AgeDependency.SENIOR_CY \n",
"3 AgeDependency AgeDependency.CHLDDEP_CY \n",
"4 AgeDependency AgeDependency.AGEDEP_CY \n",
"... ... ... \n",
"1318 raceandhispanicorigin raceandhispanicorigin.NHSPASN_CY \n",
"1319 raceandhispanicorigin raceandhispanicorigin.NHSPPI_CY \n",
"1320 raceandhispanicorigin raceandhispanicorigin.NHSPOTH_CY \n",
"1321 raceandhispanicorigin raceandhispanicorigin.NHSPMLT_CY \n",
"1322 Wealth Wealth.WLTHINDXCY \n",
"\n",
" enrich_field_name \n",
"0 AgeDependency_CHILD_CY \n",
"1 AgeDependency_WORKAGE_CY \n",
"2 AgeDependency_SENIOR_CY \n",
"3 AgeDependency_CHLDDEP_CY \n",
"4 AgeDependency_AGEDEP_CY \n",
"... ... \n",
"1318 raceandhispanicorigin_NHSPASN_CY \n",
"1319 raceandhispanicorigin_NHSPPI_CY \n",
"1320 raceandhispanicorigin_NHSPOTH_CY \n",
"1321 raceandhispanicorigin_NHSPMLT_CY \n",
"1322 Wealth_WLTHINDXCY \n",
"\n",
"[1323 rows x 5 columns]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cy_df = edf[edf.name.str.endswith('CY')].drop_duplicates('name').reset_index(drop=True)\n",
"\n",
"cy_df"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Enrich Input\n",
"\n",
"Just as before, the input for the Enrich Layer tool requires a semicolon separated string of values, in this case the values from the `enrich_name` column. This can easily be created using a string `join`."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"AgeDependency.CHILD_CY;AgeDependency.WORKAGE_CY;AgeDependency.SENIOR_CY;AgeDependency.CHLDDEP_CY;AgeDependency.AGEDEP_CY...\n"
]
}
],
"source": [
"enrich_str = ';'.join(cy_df.enrich_name)\n",
"\n",
"print(f'{enrich_str[:120]}...')"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Previous Years\n",
"\n",
"Also, as mentioned before, frequently customers have models developed on pervious years' datasets. Consequently, we can access the same country for previous years' data if installed on the machine by specifying the year."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>alias</th>\n",
" <th>data_collection</th>\n",
" <th>enrich_name</th>\n",
" <th>enrich_field_name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>CHILD_CY</td>\n",
" <td>2019 Child Population</td>\n",
" <td>AgeDependency</td>\n",
" <td>AgeDependency.CHILD_CY</td>\n",
" <td>AgeDependency_CHILD_CY</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>WORKAGE_CY</td>\n",
" <td>2019 Working-age Population</td>\n",
" <td>AgeDependency</td>\n",
" <td>AgeDependency.WORKAGE_CY</td>\n",
" <td>AgeDependency_WORKAGE_CY</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>SENIOR_CY</td>\n",
" <td>2019 Senior Population</td>\n",
" <td>AgeDependency</td>\n",
" <td>AgeDependency.SENIOR_CY</td>\n",
" <td>AgeDependency_SENIOR_CY</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>CHLDDEP_CY</td>\n",
" <td>2019 Child Dependency Ratio</td>\n",
" <td>AgeDependency</td>\n",
" <td>AgeDependency.CHLDDEP_CY</td>\n",
" <td>AgeDependency_CHLDDEP_CY</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>AGEDEP_CY</td>\n",
" <td>2019 Age Dependency Ratio</td>\n",
" <td>AgeDependency</td>\n",
" <td>AgeDependency.AGEDEP_CY</td>\n",
" <td>AgeDependency_AGEDEP_CY</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16906</th>\n",
" <td>MOEMEDYRMV</td>\n",
" <td>MOE Median Year Householder Moved In</td>\n",
" <td>yearmovedin</td>\n",
" <td>yearmovedin.MOEMEDYRMV</td>\n",
" <td>yearmovedin_MOEMEDYRMV</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16907</th>\n",
" <td>RELMEDYRMV</td>\n",
" <td>REL Median Year Householder Moved In</td>\n",
" <td>yearmovedin</td>\n",
" <td>yearmovedin.RELMEDYRMV</td>\n",
" <td>yearmovedin_RELMEDYRMV</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16908</th>\n",
" <td>ACSOWNER</td>\n",
" <td>ACS Owner Households</td>\n",
" <td>yearmovedin</td>\n",
" <td>yearmovedin.ACSOWNER</td>\n",
" <td>yearmovedin_ACSOWNER</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16909</th>\n",
" <td>MOEOWNER</td>\n",
" <td>MOE Owner Households</td>\n",
" <td>yearmovedin</td>\n",
" <td>yearmovedin.MOEOWNER</td>\n",
" <td>yearmovedin_MOEOWNER</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16910</th>\n",
" <td>RELOWNER</td>\n",
" <td>REL Owner Households</td>\n",
" <td>yearmovedin</td>\n",
" <td>yearmovedin.RELOWNER</td>\n",
" <td>yearmovedin_RELOWNER</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>16911 rows × 5 columns</p>\n",
"</div>"
],
"text/plain": [
" name alias data_collection \\\n",
"0 CHILD_CY 2019 Child Population AgeDependency \n",
"1 WORKAGE_CY 2019 Working-age Population AgeDependency \n",
"2 SENIOR_CY 2019 Senior Population AgeDependency \n",
"3 CHLDDEP_CY 2019 Child Dependency Ratio AgeDependency \n",
"4 AGEDEP_CY 2019 Age Dependency Ratio AgeDependency \n",
"... ... ... ... \n",
"16906 MOEMEDYRMV MOE Median Year Householder Moved In yearmovedin \n",
"16907 RELMEDYRMV REL Median Year Householder Moved In yearmovedin \n",
"16908 ACSOWNER ACS Owner Households yearmovedin \n",
"16909 MOEOWNER MOE Owner Households yearmovedin \n",
"16910 RELOWNER REL Owner Households yearmovedin \n",
"\n",
" enrich_name enrich_field_name \n",
"0 AgeDependency.CHILD_CY AgeDependency_CHILD_CY \n",
"1 AgeDependency.WORKAGE_CY AgeDependency_WORKAGE_CY \n",
"2 AgeDependency.SENIOR_CY AgeDependency_SENIOR_CY \n",
"3 AgeDependency.CHLDDEP_CY AgeDependency_CHLDDEP_CY \n",
"4 AgeDependency.AGEDEP_CY AgeDependency_AGEDEP_CY \n",
"... ... ... \n",
"16906 yearmovedin.MOEMEDYRMV yearmovedin_MOEMEDYRMV \n",
"16907 yearmovedin.RELMEDYRMV yearmovedin_RELMEDYRMV \n",
"16908 yearmovedin.ACSOWNER yearmovedin_ACSOWNER \n",
"16909 yearmovedin.MOEOWNER yearmovedin_MOEOWNER \n",
"16910 yearmovedin.RELOWNER yearmovedin_RELOWNER \n",
"\n",
"[16911 rows x 5 columns]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"enrich_local.get_variables('USA', year=2019)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Next Step - Enrich Method\n",
"\n",
"The next step is updating the `_GeoEnrichment.enrich` method to support both local and remote resources, and handle these variable introspection dataframes as input."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment