Skip to content

Instantly share code, notes, and snippets.

@vingkan
Created March 5, 2019 20:38
Show Gist options
  • Save vingkan/25c74b0e1ea87110a740a9c29a901200 to your computer and use it in GitHub Desktop.
Save vingkan/25c74b0e1ea87110a740a9c29a901200 to your computer and use it in GitHub Desktop.
Facility Types Issue
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Facility Types Issue\n",
"\n",
"Cross-referencing the model data with food inspection records from the Chicago data portal suggests that the model train/test data includes many different facility types. Among others, the data appears to include hospitals and schools, which [the paper](https://github.com/Chicago/food-inspections-evaluation/blob/master/REPORTS/forecasting-restaurants-with-critical-violations-in-Chicago.pdf) specifically states should be excluded.\n",
"\n",
"This notebook cross-references `DATA/30_glmnet_data.Rds`, the data file produced by running `CODE/30_glmnet_model.R`, with the public dataset of [food inspection records](https://data.cityofchicago.org/Health-Human-Services/Food-Inspections/4ijn-s7e5) on the Chicago data portal.\n",
"\n",
"- Is there a discrepancy between how food establishments are classified in the data portal and how they are classified in their business license?\n",
"\n",
"There are at least two locations in the code that appear to filter out other types of facilities:\n",
"\n",
"- [`CODE/23_generate_model_dat.R` line 51](https://github.com/Chicago/food-inspections-evaluation/blob/master/CODE/23_generate_model_dat.R#L51)\n",
"- [`CODE/30_glmnet_model.R` line 23](https://github.com/Chicago/food-inspections-evaluation/blob/master/CODE/30_glmnet_model.R#L23)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: pandas in /anaconda/lib/python3.6/site-packages\n",
"Requirement already satisfied: pytz>=2011k in /anaconda/lib/python3.6/site-packages (from pandas)\n",
"Requirement already satisfied: numpy>=1.12.0 in /anaconda/lib/python3.6/site-packages (from pandas)\n",
"Requirement already satisfied: python-dateutil>=2.5.0 in /anaconda/lib/python3.6/site-packages (from pandas)\n",
"Requirement already satisfied: six>=1.5 in /anaconda/lib/python3.6/site-packages (from python-dateutil>=2.5.0->pandas)\n",
"\u001b[33mYou are using pip version 9.0.3, however version 19.0.3 is available.\n",
"You should consider upgrading via the 'pip install --upgrade pip' command.\u001b[0m\n",
"Requirement already satisfied: numpy in /anaconda/lib/python3.6/site-packages\n",
"\u001b[33mYou are using pip version 9.0.3, however version 19.0.3 is available.\n",
"You should consider upgrading via the 'pip install --upgrade pip' command.\u001b[0m\n",
"Requirement already satisfied: requests in /anaconda/lib/python3.6/site-packages\n",
"Requirement already satisfied: idna<2.7,>=2.5 in /anaconda/lib/python3.6/site-packages (from requests)\n",
"Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /anaconda/lib/python3.6/site-packages (from requests)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /anaconda/lib/python3.6/site-packages (from requests)\n",
"Requirement already satisfied: urllib3<1.23,>=1.21.1 in /anaconda/lib/python3.6/site-packages (from requests)\n",
"\u001b[33mYou are using pip version 9.0.3, however version 19.0.3 is available.\n",
"You should consider upgrading via the 'pip install --upgrade pip' command.\u001b[0m\n",
"Requirement already satisfied: pyreadr in /anaconda/lib/python3.6/site-packages\n",
"\u001b[33mYou are using pip version 9.0.3, however version 19.0.3 is available.\n",
"You should consider upgrading via the 'pip install --upgrade pip' command.\u001b[0m\n"
]
}
],
"source": [
"# Notebook dependencies\n",
"!pip install pandas\n",
"!pip install numpy\n",
"!pip install requests\n",
"!pip install pyreadr\n",
"import pandas as pd\n",
"import numpy as np\n",
"import requests\n",
"import pyreadr"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Download my copy of `30_glmnet_data.Rds`. Substitute with your own version of the file, or follow these instructions from the [original GitHub repository](https://github.com/Chicago/food-inspections-evaluation#file-layout) to reproduce.\n",
"\n",
"```\n",
"# In the directory: food_inspections_evaluation\n",
"$ RScript CODE/00_Startup.R\n",
"$ RScript CODE/30_glmnet_model.R\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"GLMNET_DATA_URL = \"https://raw.githubusercontent.com/vingkan/foodinspectionforecasting/master/DATA/30_glmnet_data.Rds\"\n",
"GLMNET_DOWNLOADED_PATH = \"downloaded_30_glmnet_data.Rds\"\n",
"res = requests.get(GLMNET_DATA_URL)\n",
"with open(GLMNET_DOWNLOADED_PATH, \"wb\") as file:\n",
" file.write(res.content)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"GLMNet table contains 18712 records.\n",
"GLMNet table contains 18712 unique inspection IDs.\n",
"GLMNet table contains 9710 unique license IDs.\n"
]
}
],
"source": [
"raw_dat = pyreadr.read_r(GLMNET_DOWNLOADED_PATH)[None]\n",
"print(\"GLMNet table contains {} records.\".format(len(raw_dat)))\n",
"raw_dat[\"inspection_id_str\"] = raw_dat[\"Inspection_ID\"].astype(int).astype(str)\n",
"print(\"GLMNet table contains {} unique inspection IDs.\".format(len(raw_dat[\"inspection_id_str\"].unique())))\n",
"print(\"GLMNet table contains {} unique license IDs.\".format(len(raw_dat[\"License\"].unique())))"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Inspection_Date</th>\n",
" <th>License</th>\n",
" <th>Inspection_ID</th>\n",
" <th>Business_ID</th>\n",
" <th>criticalCount</th>\n",
" <th>seriousCount</th>\n",
" <th>minorCount</th>\n",
" <th>Facility_Type</th>\n",
" <th>pass_flag</th>\n",
" <th>fail_flag</th>\n",
" <th>...</th>\n",
" <th>Inspector_Assigned</th>\n",
" <th>precipIntensity</th>\n",
" <th>temperatureMax</th>\n",
" <th>windSpeed</th>\n",
" <th>humidity</th>\n",
" <th>criticalFound</th>\n",
" <th>score</th>\n",
" <th>Test</th>\n",
" <th>Train</th>\n",
" <th>inspection_id_str</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>15736</td>\n",
" <td>30790</td>\n",
" <td>269961.0</td>\n",
" <td>30790-20110416</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>2.0</td>\n",
" <td>Grocery_Store</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>green</td>\n",
" <td>0.014587</td>\n",
" <td>53.496667</td>\n",
" <td>13.340000</td>\n",
" <td>0.900000</td>\n",
" <td>0.0</td>\n",
" <td>0.098925</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>269961</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>15265</td>\n",
" <td>1475890</td>\n",
" <td>507211.0</td>\n",
" <td>1475890-20110416</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>3.0</td>\n",
" <td>Restaurant</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>blue</td>\n",
" <td>0.001907</td>\n",
" <td>59.046667</td>\n",
" <td>13.016667</td>\n",
" <td>0.550000</td>\n",
" <td>0.0</td>\n",
" <td>0.225081</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>507211</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>15265</td>\n",
" <td>1740130</td>\n",
" <td>507212.0</td>\n",
" <td>1740130-20110216</td>\n",
" <td>0.0</td>\n",
" <td>2.0</td>\n",
" <td>6.0</td>\n",
" <td>Restaurant</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>...</td>\n",
" <td>blue</td>\n",
" <td>0.001907</td>\n",
" <td>59.046667</td>\n",
" <td>13.016667</td>\n",
" <td>0.550000</td>\n",
" <td>0.0</td>\n",
" <td>0.225782</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>507212</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>15266</td>\n",
" <td>1447363</td>\n",
" <td>507216.0</td>\n",
" <td>1447363-20110216</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>6.0</td>\n",
" <td>Restaurant</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>blue</td>\n",
" <td>0.002737</td>\n",
" <td>56.153333</td>\n",
" <td>10.863333</td>\n",
" <td>0.616667</td>\n",
" <td>0.0</td>\n",
" <td>0.229768</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>507216</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>15267</td>\n",
" <td>1679459</td>\n",
" <td>507219.0</td>\n",
" <td>1679459-20100216</td>\n",
" <td>0.0</td>\n",
" <td>2.0</td>\n",
" <td>6.0</td>\n",
" <td>Restaurant</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>...</td>\n",
" <td>blue</td>\n",
" <td>0.009987</td>\n",
" <td>52.730000</td>\n",
" <td>16.266667</td>\n",
" <td>0.690000</td>\n",
" <td>0.0</td>\n",
" <td>0.220462</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>507219</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 42 columns</p>\n",
"</div>"
],
"text/plain": [
" Inspection_Date License Inspection_ID Business_ID criticalCount \\\n",
"0 15736 30790 269961.0 30790-20110416 0.0 \n",
"1 15265 1475890 507211.0 1475890-20110416 0.0 \n",
"2 15265 1740130 507212.0 1740130-20110216 0.0 \n",
"3 15266 1447363 507216.0 1447363-20110216 0.0 \n",
"4 15267 1679459 507219.0 1679459-20100216 0.0 \n",
"\n",
" seriousCount minorCount Facility_Type pass_flag fail_flag ... \\\n",
"0 0.0 2.0 Grocery_Store 1.0 0.0 ... \n",
"1 0.0 3.0 Restaurant 1.0 0.0 ... \n",
"2 2.0 6.0 Restaurant 0.0 1.0 ... \n",
"3 0.0 6.0 Restaurant 1.0 0.0 ... \n",
"4 2.0 6.0 Restaurant 0.0 1.0 ... \n",
"\n",
" Inspector_Assigned precipIntensity temperatureMax windSpeed humidity \\\n",
"0 green 0.014587 53.496667 13.340000 0.900000 \n",
"1 blue 0.001907 59.046667 13.016667 0.550000 \n",
"2 blue 0.001907 59.046667 13.016667 0.550000 \n",
"3 blue 0.002737 56.153333 10.863333 0.616667 \n",
"4 blue 0.009987 52.730000 16.266667 0.690000 \n",
"\n",
" criticalFound score Test Train inspection_id_str \n",
"0 0.0 0.098925 False True 269961 \n",
"1 0.0 0.225081 False True 507211 \n",
"2 0.0 0.225782 False True 507212 \n",
"3 0.0 0.229768 False True 507216 \n",
"4 0.0 0.220462 False True 507219 \n",
"\n",
"[5 rows x 42 columns]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"raw_dat.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"According to the `Facility_Type` column in this data table, there are `1003` facilities with types other than restaurant or grocery store."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Counts of facility type over 18712 inspections:\n",
"\n",
"Restaurant 15115\n",
"Grocery_Store 2594\n",
"Other 1003\n",
"Name: Facility_Type, dtype: int64\n"
]
}
],
"source": [
"facility_type_counts = raw_dat[\"Facility_Type\"].value_counts()\n",
"print(\"Counts of facility type over {} inspections:\".format(facility_type_counts.sum()))\n",
"print()\n",
"print(facility_type_counts)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Retrieve all canvass inspections from the data portal during the time range specified in the paper. This should return `41366` rows."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Fetched 41366 rows from data portal inspections. Expected 41366. Match? True\n"
]
}
],
"source": [
"INSPECTIONS_DATA_URL = \"https://data.cityofchicago.org/resource/cwig-ma7x.json\"\n",
"query = \"\"\"\n",
" SELECT\n",
" address AS address,\n",
" aka_name AS aka_name,\n",
" city AS city,\n",
" dba_name AS dba_name,\n",
" facility_type AS facility_type,\n",
" inspection_date AS inspection_date,\n",
" inspection_id AS inspection_id,\n",
" inspection_type AS inspection_type,\n",
" latitude AS latitude,\n",
" license_ AS license_id,\n",
" longitude AS longitude,\n",
" results AS results,\n",
" risk AS risk,\n",
" state AS state,\n",
" violations AS violations,\n",
" zip AS zip\n",
" WHERE\n",
" inspection_type = \"Canvass\"\n",
" AND inspection_date >= \"2011-01-01\"\n",
" AND inspection_date <= \"2014-11-01\"\n",
" LIMIT 100000\n",
"\"\"\"\n",
"\n",
"\n",
"r = requests.get(INSPECTIONS_DATA_URL, params={\"$query\": query})\n",
"rows = r.json()\n",
"expected = 41366 # Expect 41366 rows\n",
"print(\"Fetched {} rows from data portal inspections. Expected {}. Match? {}\".format(len(rows), expected, len(rows) == expected))\n",
"inspecs = pd.DataFrame(rows)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Join the data table with the data portal records using Pandas join."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Merged table contains 18712 records.\n"
]
}
],
"source": [
"merged = raw_dat.set_index(\"inspection_id_str\").join(inspecs.set_index(\"inspection_id\"))\n",
"print(\"Merged table contains {} records.\".format(len(merged)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"According to the `facility_type` column in the data portal records, the train/test data includes facility types such as schools and hospitals, among others.\n",
"\n",
"There also appear to be `11` inspection IDs from the model data that were not returned in the data portal query."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Merged table contains 141 unique facility types:\n",
"\n",
"Restaurant 15080\n",
"Grocery Store 2570\n",
"Bakery 379\n",
"School 148\n",
"Catering 142\n",
"Hospital 47\n",
"BANQUET HALL 16\n",
"Long Term Care 15\n",
"Liquor 14\n",
"GAS STATION 11\n",
"STADIUM 10\n",
"Shelter 10\n",
"CAFETERIA 8\n",
"Wholesale 7\n",
"GROCERY/RESTAURANT 7\n",
"LIVE POULTRY 6\n",
"Special Event 5\n",
"Grocery & Restaurant 4\n",
"BANQUET 4\n",
"Tavern 4\n",
"convenience store 4\n",
"coffee shop 4\n",
"HEALTH/ JUICE BAR 3\n",
"GROCERY STORE/TAQUERIA 3\n",
"GROCERY/BAKERY 3\n",
"cafeteria 3\n",
"fish market 3\n",
"BEFORE AND AFTER SCHOOL PROGRAM 3\n",
"butcher shop 3\n",
"bar 3\n",
" ... \n",
"Commissary 1\n",
"grocery 1\n",
"Mobile Food Dispenser 1\n",
"BAR/GRILL 1\n",
"TEACHING SCHOOL 1\n",
"grocery & restaurant 1\n",
"COFFEE CART 1\n",
"donut shop 1\n",
"PALETERIA 1\n",
"RESTUARANT AND BAR 1\n",
"Deli 1\n",
"Daycare (Under 2 Years) 1\n",
"convenience/drug store 1\n",
"night club 1\n",
"GROCERY STORE/BAKERY 1\n",
"AFTER SCHOOL PROGRAM 1\n",
"theater 1\n",
"day spa 1\n",
"gas station 1\n",
"UNIVERSITY CAFETERIA 1\n",
"ROOM SERVICE 1\n",
"TAVERN/RESTAURANT 1\n",
"COMMISSARY FOR SOFT SERVE ICE CREAM TRUCKS 1\n",
"COFFEE KIOSK 1\n",
"UNUSED STORAGE 1\n",
"JUICE BAR 1\n",
"CHARITY AID KITCHEN 1\n",
"GELATO SHOP 1\n",
"pharmacy/grocery 1\n",
"blockbuster video 1\n",
"Name: facility_type, Length: 140, dtype: int64\n"
]
}
],
"source": [
"facility_types_data_portal = merged[\"facility_type\"]\n",
"print(\"Merged table contains {} unique facility types:\".format(len(facility_types_data_portal.unique())))\n",
"print()\n",
"print(facility_types_data_portal.value_counts())"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Value is not a string: nan\n",
"Value is not a string: nan\n",
"Value is not a string: nan\n",
"Value is not a string: nan\n",
"Value is not a string: nan\n",
"Value is not a string: nan\n",
"Value is not a string: nan\n",
"Value is not a string: nan\n",
"Value is not a string: nan\n",
"Value is not a string: nan\n",
"Value is not a string: nan\n",
"\n",
"Approximately 994 records appear to be facility types other than restaurant/grocery store.\n"
]
}
],
"source": [
"def is_other_facility(val):\n",
" try:\n",
" text = val.lower()\n",
" is_restaurant = \"restaurant\" in text\n",
" is_grocery = \"grocery\" in text\n",
" return not (is_restaurant or is_grocery)\n",
" except:\n",
" print(\"Value is not a string: {}\".format(val))\n",
" return False\n",
" \n",
"other_facilities = list(filter(lambda val: is_other_facility(val), facility_types_data_portal))\n",
"print()\n",
"msg = \"Approximately {} records appear to be facility types other than restaurant/grocery store.\"\n",
"print(msg.format(len(other_facilities)))\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As a sanity check, to be sure there was nothing wrong with the join, find the facility_type for each inspection ID."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NOT FOUND IN QUERY RESULTS: 569761\n",
"NOT FOUND IN QUERY RESULTS: 606447\n",
"NOT FOUND IN QUERY RESULTS: 608301\n",
"NOT FOUND IN QUERY RESULTS: 610285\n",
"NOT FOUND IN QUERY RESULTS: 635049\n",
"NOT FOUND IN QUERY RESULTS: 1092477\n",
"NOT FOUND IN QUERY RESULTS: 1138542\n",
"NOT FOUND IN QUERY RESULTS: 1230083\n",
"NOT FOUND IN QUERY RESULTS: 1236048\n",
"NOT FOUND IN QUERY RESULTS: 1322477\n",
"NOT FOUND IN QUERY RESULTS: 1439578\n",
"\n",
"Query results contain 140 unique facility types.\n",
"\n",
"Restaurant 15080\n",
"Grocery Store 2570\n",
"Bakery 379\n",
"School 148\n",
"Catering 142\n",
"Hospital 47\n",
"BANQUET HALL 16\n",
"Long Term Care 15\n",
"Liquor 14\n",
"GAS STATION 11\n",
"STADIUM 10\n",
"Shelter 10\n",
"CAFETERIA 8\n",
"Wholesale 7\n",
"GROCERY/RESTAURANT 7\n",
"LIVE POULTRY 6\n",
"Special Event 5\n",
"Grocery & Restaurant 4\n",
"BANQUET 4\n",
"Tavern 4\n",
"convenience store 4\n",
"coffee shop 4\n",
"HEALTH/ JUICE BAR 3\n",
"GROCERY STORE/TAQUERIA 3\n",
"GROCERY/BAKERY 3\n",
"cafeteria 3\n",
"fish market 3\n",
"BEFORE AND AFTER SCHOOL PROGRAM 3\n",
"butcher shop 3\n",
"bar 3\n",
" ... \n",
"Commissary 1\n",
"grocery 1\n",
"Mobile Food Dispenser 1\n",
"BAR/GRILL 1\n",
"TEACHING SCHOOL 1\n",
"grocery & restaurant 1\n",
"COFFEE CART 1\n",
"donut shop 1\n",
"PALETERIA 1\n",
"RESTUARANT AND BAR 1\n",
"Deli 1\n",
"Daycare (Under 2 Years) 1\n",
"convenience/drug store 1\n",
"night club 1\n",
"GROCERY STORE/BAKERY 1\n",
"AFTER SCHOOL PROGRAM 1\n",
"theater 1\n",
"day spa 1\n",
"gas station 1\n",
"UNIVERSITY CAFETERIA 1\n",
"ROOM SERVICE 1\n",
"TAVERN/RESTAURANT 1\n",
"COMMISSARY FOR SOFT SERVE ICE CREAM TRUCKS 1\n",
"COFFEE KIOSK 1\n",
"UNUSED STORAGE 1\n",
"JUICE BAR 1\n",
"CHARITY AID KITCHEN 1\n",
"GELATO SHOP 1\n",
"pharmacy/grocery 1\n",
"blockbuster video 1\n",
"Length: 140, dtype: int64\n"
]
}
],
"source": [
"inspection_map = {}\n",
"for row in rows:\n",
" if \"facility_type\" in row:\n",
" inspection_id = row[\"inspection_id\"]\n",
" facility_type = row[\"facility_type\"]\n",
" if inspection_id in inspection_map:\n",
" print(\"DUPLICATE INSPECTION_ID: {}\".format(inspection_id))\n",
" inspection_map[inspection_id] = facility_type\n",
"\n",
"facility_types = []\n",
"for inspection_id in raw_dat[\"inspection_id_str\"].values:\n",
" if inspection_id not in inspection_map:\n",
" print(\"NOT FOUND IN QUERY RESULTS: {}\".format(inspection_id))\n",
" continue\n",
" facility_types.append(inspection_map[inspection_id])\n",
"print()\n",
"print(\"Query results contain {} unique facility types.\".format(len(np.unique(facility_types))))\n",
"print()\n",
"print(pd.Series(facility_types).value_counts())"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Approximately 994 records appear to be facility types other than restaurant/grocery store.\n"
]
}
],
"source": [
"other_facilities_check = list(filter(lambda val: is_other_facility(val), facility_types))\n",
"msg = \"Approximately {} records appear to be facility types other than restaurant/grocery store.\"\n",
"print(msg.format(len(other_facilities_check)))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment