Skip to content

Instantly share code, notes, and snippets.

@drdebian
Created August 20, 2019 09:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save drdebian/9ff9fbd446cbfcc3370c06ce754e5a7c to your computer and use it in GitHub Desktop.
Save drdebian/9ff9fbd446cbfcc3370c06ce754e5a7c to your computer and use it in GitHub Desktop.
Created on Cognitive Class Labs
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Explore, segment, and cluster the neighborhoods in the city of Toronto"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this assignment, you will be required to explore, segment, and cluster the neighborhoods in the city of Toronto. However, unlike New York, the neighborhood data is not readily available on the internet. What is interesting about the field of data science is that each project can be challenging in its unique way, so you need to learn to be agile and refine the skill to learn new libraries and tools quickly depending on the project.\n",
"\n",
"For the Toronto neighborhood data, a Wikipedia page exists that has all the information we need to explore and cluster the neighborhoods in Toronto. You will be required to scrape the Wikipedia page and wrangle the data, clean it, and then read it into a pandas dataframe so that it is in a structured format like the New York dataset.\n",
"\n",
"Once the data is in a structured format, you can replicate the analysis that we did to the New York City dataset to explore and cluster the neighborhoods in the city of Toronto.\n",
"\n",
"Your submission will be a link to your Jupyter Notebook on your Github repository."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get Toronto neighborhood data\n",
"\n",
"For this assignment, you will be required to explore and cluster the neighborhoods in Toronto.\n",
"\n",
"1. Start by creating a new Notebook for this assignment.\n",
"2. Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe like the one shown below:\n",
"![](https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/7JXaz3NNEeiMwApe4i-fLg_40e690ae0e927abda2d4bde7d94ed133_Screen-Shot-2018-06-18-at-7.17.57-PM.png?expiry=1566000000000&hmac=AN3CJ7qeqs6bod-Dt7oM7fnL3e5Hx_ERYwMV3M1TSyM)\n",
"\n",
"3. To create the above dataframe:\n",
"\n",
"- The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood\n",
"- Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.\n",
"- More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.\n",
"- If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.\n",
"- Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.\n",
"- In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.\n",
"4. Submit a link to your Notebook on your Github repository."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pull the data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's load the necessary libraries first..."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Solving environment: done\n",
"\n",
"\n",
"==> WARNING: A newer version of conda exists. <==\n",
" current version: 4.5.11\n",
" latest version: 4.7.11\n",
"\n",
"Please update conda by running\n",
"\n",
" $ conda update -n base -c defaults conda\n",
"\n",
"\n",
"\n",
"# All requested packages already installed.\n",
"\n"
]
}
],
"source": [
"### libraries\n",
"import requests\n",
"import pandas as pd\n",
"pd.set_option('display.max_columns', 100)\n",
"pd.set_option('display.max_rows', 100)\n",
"\n",
"!conda install -c conda-forge beautifulsoup4 lxml html5lib --yes\n",
"from bs4 import BeautifulSoup"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's pull down the data from that Wiki page..."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Postcode</th>\n",
" <th>Borough</th>\n",
" <th>Neighbourhood</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>M1A</td>\n",
" <td>Not assigned</td>\n",
" <td>Not assigned</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>M2A</td>\n",
" <td>Not assigned</td>\n",
" <td>Not assigned</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>M3A</td>\n",
" <td>North York</td>\n",
" <td>Parkwoods</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>M4A</td>\n",
" <td>North York</td>\n",
" <td>Victoria Village</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>M5A</td>\n",
" <td>Downtown Toronto</td>\n",
" <td>Harbourfront</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Postcode Borough Neighbourhood\n",
"0 M1A Not assigned Not assigned\n",
"1 M2A Not assigned Not assigned\n",
"2 M3A North York Parkwoods\n",
"3 M4A North York Victoria Village\n",
"4 M5A Downtown Toronto Harbourfront"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"### data\n",
"wiki_url = \"https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M\"\n",
"\n",
"# request\n",
"myres = requests.get(wiki_url)\n",
"mysoup = BeautifulSoup(myres.content, 'html5lib')\n",
"\n",
"# read table into df\n",
"mytable = mysoup.find_all('table')[0] \n",
"mydf = pd.read_html(str(mytable))[0]\n",
"\n",
"# check it\n",
"mydf.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Process the data\n",
"\n",
"Time for some filtering and postprocessing..."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Postcode</th>\n",
" <th>Borough</th>\n",
" <th>Neighbourhood</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>M3A</td>\n",
" <td>North York</td>\n",
" <td>Parkwoods</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>M4A</td>\n",
" <td>North York</td>\n",
" <td>Victoria Village</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>M5A</td>\n",
" <td>Downtown Toronto</td>\n",
" <td>Harbourfront</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>M5A</td>\n",
" <td>Downtown Toronto</td>\n",
" <td>Regent Park</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>M6A</td>\n",
" <td>North York</td>\n",
" <td>Lawrence Heights</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Postcode Borough Neighbourhood\n",
"2 M3A North York Parkwoods\n",
"3 M4A North York Victoria Village\n",
"4 M5A Downtown Toronto Harbourfront\n",
"5 M5A Downtown Toronto Regent Park\n",
"6 M6A North York Lawrence Heights"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# get rid of the \"Not assigned\" boroughs\n",
"mydf_bor = mydf[~mydf.Borough.isin(['Not assigned'])]\n",
"mydf_bor.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Postcodes appearing multiple times should be combined into one line with the neighbourhoods concatenated..."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Postcode</th>\n",
" <th>Borough</th>\n",
" <th>Neighbourhood</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>M1B</td>\n",
" <td>Scarborough</td>\n",
" <td>Rouge, Malvern</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>M1C</td>\n",
" <td>Scarborough</td>\n",
" <td>Highland Creek, Rouge Hill, Port Union</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>M1E</td>\n",
" <td>Scarborough</td>\n",
" <td>Guildwood, Morningside, West Hill</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>M1G</td>\n",
" <td>Scarborough</td>\n",
" <td>Woburn</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>M1H</td>\n",
" <td>Scarborough</td>\n",
" <td>Cedarbrae</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Postcode Borough Neighbourhood\n",
"0 M1B Scarborough Rouge, Malvern\n",
"1 M1C Scarborough Highland Creek, Rouge Hill, Port Union\n",
"2 M1E Scarborough Guildwood, Morningside, West Hill\n",
"3 M1G Scarborough Woburn\n",
"4 M1H Scarborough Cedarbrae"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# make those postcodes unique\n",
"mydf_combined = mydf_bor.groupby(['Postcode','Borough'])['Neighbourhood'].apply(', '.join).reset_index()\n",
"mydf_combined.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, let's fix those rows where the Neighbourhood is \"not assigned\" by filling it with the content of the Borough column..."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Postcode</th>\n",
" <th>Borough</th>\n",
" <th>Neighbourhood</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>M1B</td>\n",
" <td>Scarborough</td>\n",
" <td>Rouge, Malvern</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>M1C</td>\n",
" <td>Scarborough</td>\n",
" <td>Highland Creek, Rouge Hill, Port Union</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>M1E</td>\n",
" <td>Scarborough</td>\n",
" <td>Guildwood, Morningside, West Hill</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>M1G</td>\n",
" <td>Scarborough</td>\n",
" <td>Woburn</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>M1H</td>\n",
" <td>Scarborough</td>\n",
" <td>Cedarbrae</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Postcode Borough Neighbourhood\n",
"0 M1B Scarborough Rouge, Malvern\n",
"1 M1C Scarborough Highland Creek, Rouge Hill, Port Union\n",
"2 M1E Scarborough Guildwood, Morningside, West Hill\n",
"3 M1G Scarborough Woburn\n",
"4 M1H Scarborough Cedarbrae"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# fix the neighbourhood column where it contains \"not assigned\"\n",
"mydf_combined.loc[mydf_combined['Neighbourhood']==\"Not assigned\", 'Neighbourhood'] = mydf_combined.loc[mydf_combined['Neighbourhood']==\"Not assigned\", 'Borough']\n",
"mydf_combined.head()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# save the Toronto data to CSV for later use\n",
"mydf_combined.to_csv('toronto_postcodes.csv', index=False)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(103, 3)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mydf_combined.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Enrich the Data\n",
"\n",
"### Getting geospatial coordinates\n",
"\n",
"Since the routine for obtaining geo coordinates described in the task outline did indeed prove to be rather unreliable, I opted for proceeding with the provided CSV file.\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Postal Code</th>\n",
" <th>Latitude</th>\n",
" <th>Longitude</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>M1B</td>\n",
" <td>43.806686</td>\n",
" <td>-79.194353</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>M1C</td>\n",
" <td>43.784535</td>\n",
" <td>-79.160497</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>M1E</td>\n",
" <td>43.763573</td>\n",
" <td>-79.188711</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>M1G</td>\n",
" <td>43.770992</td>\n",
" <td>-79.216917</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>M1H</td>\n",
" <td>43.773136</td>\n",
" <td>-79.239476</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Postal Code Latitude Longitude\n",
"0 M1B 43.806686 -79.194353\n",
"1 M1C 43.784535 -79.160497\n",
"2 M1E 43.763573 -79.188711\n",
"3 M1G 43.770992 -79.216917\n",
"4 M1H 43.773136 -79.239476"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# read geo coordinates from CSV\n",
"mydf_geo = pd.read_csv('Geospatial_Coordinates.csv')\n",
"mydf_geo.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's add the 2 new Latitude and Longitude columns to the existing data frame."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Postcode</th>\n",
" <th>Borough</th>\n",
" <th>Neighbourhood</th>\n",
" <th>Latitude</th>\n",
" <th>Longitude</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>M1B</td>\n",
" <td>Scarborough</td>\n",
" <td>Rouge, Malvern</td>\n",
" <td>43.806686</td>\n",
" <td>-79.194353</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>M1C</td>\n",
" <td>Scarborough</td>\n",
" <td>Highland Creek, Rouge Hill, Port Union</td>\n",
" <td>43.784535</td>\n",
" <td>-79.160497</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>M1E</td>\n",
" <td>Scarborough</td>\n",
" <td>Guildwood, Morningside, West Hill</td>\n",
" <td>43.763573</td>\n",
" <td>-79.188711</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>M1G</td>\n",
" <td>Scarborough</td>\n",
" <td>Woburn</td>\n",
" <td>43.770992</td>\n",
" <td>-79.216917</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>M1H</td>\n",
" <td>Scarborough</td>\n",
" <td>Cedarbrae</td>\n",
" <td>43.773136</td>\n",
" <td>-79.239476</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Postcode Borough Neighbourhood Latitude \\\n",
"0 M1B Scarborough Rouge, Malvern 43.806686 \n",
"1 M1C Scarborough Highland Creek, Rouge Hill, Port Union 43.784535 \n",
"2 M1E Scarborough Guildwood, Morningside, West Hill 43.763573 \n",
"3 M1G Scarborough Woburn 43.770992 \n",
"4 M1H Scarborough Cedarbrae 43.773136 \n",
"\n",
" Longitude \n",
"0 -79.194353 \n",
"1 -79.160497 \n",
"2 -79.188711 \n",
"3 -79.216917 \n",
"4 -79.239476 "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# add geo columns to the existing data frame, remove redundant Postal Code column\n",
"mydf_post = pd.merge(mydf_combined, mydf_geo, how='left', left_on='Postcode', right_on='Postal Code')\n",
"mydf_post.drop('Postal Code', axis=1, inplace=True)\n",
"mydf_post.head()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(103, 5)"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mydf_post.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Explore and cluster the neighborhoods in Toronto\n",
"\n",
"In the following, we want to separate Toronto's Neighbourhoods into different clusters as a guide to people thinking about moving to or within Toronto. To do so, we use the previously prepared data and enrich it with the venues present in Toronto using Foursquare as an external data source. With this data, we proceed to determine the top 10 venues found in each neighbourhood and use this data to actually find clusters. In the process, we will also make an informed decision about how many clusters make sense and proceed to visualize them using Folium, after which the clusters are inspected and a final conclusion will be drawn."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's download all the dependencies that we will need."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Solving environment: done\n",
"\n",
"\n",
"==> WARNING: A newer version of conda exists. <==\n",
" current version: 4.5.11\n",
" latest version: 4.7.11\n",
"\n",
"Please update conda by running\n",
"\n",
" $ conda update -n base -c defaults conda\n",
"\n",
"\n",
"\n",
"# All requested packages already installed.\n",
"\n",
"Solving environment: done\n",
"\n",
"\n",
"==> WARNING: A newer version of conda exists. <==\n",
" current version: 4.5.11\n",
" latest version: 4.7.11\n",
"\n",
"Please update conda by running\n",
"\n",
" $ conda update -n base -c defaults conda\n",
"\n",
"\n",
"\n",
"# All requested packages already installed.\n",
"\n",
"Libraries imported.\n"
]
}
],
"source": [
"### libraries\n",
"import numpy as np # library to handle data in a vectorized manner\n",
"\n",
"import json # library to handle JSON files\n",
"\n",
"!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab\n",
"from geopy.geocoders import Nominatim # convert an address into latitude and longitude values\n",
"\n",
"import requests # library to handle requests\n",
"from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe\n",
"\n",
"# Matplotlib and associated plotting modules\n",
"import matplotlib.cm as cm\n",
"import matplotlib.colors as colors\n",
"\n",
"# import k-means from clustering stage\n",
"from sklearn.cluster import KMeans\n",
"\n",
"!conda install -c conda-forge folium --yes \n",
"import folium # map rendering library\n",
"\n",
"print('Libraries imported.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a map of Toronto with neighborhoods superimposed on top.\n",
"\n",
"Since our analysis will focus on Toronto, let's determine the central coordinates using Nominatim."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The geograpical coordinate of Toronto City are 43.653963, -79.387207.\n"
]
}
],
"source": [
"address = 'Toronto, ON'\n",
"\n",
"geolocator = Nominatim(user_agent=\"toronto_explorer\")\n",
"location = geolocator.geocode(address)\n",
"latitude = location.latitude\n",
"longitude = location.longitude\n",
"print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To verify that we're on the right track, let's quickly plot a map centered on those coordinates."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"width:100%;\"><div style=\"position:relative;width:100%;height:0;padding-bottom:60%;\"><iframe src=\"data:text/html;charset=utf-8;base64,\" style=\"position:absolute;width:100%;height:100%;left:0;top:0;border:none !important;\" allowfullscreen webkitallowfullscreen mozallowfullscreen></iframe></div></div>"
],
"text/plain": [
"<folium.folium.Map at 0x7f89b7274eb8>"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# create map of Toronto using latitude and longitude values\n",
"map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)\n",
"\n",
"# add markers to map\n",
"for lat, lng, borough, neighborhood in zip(mydf_post['Latitude'], mydf_post['Longitude'], mydf_post['Borough'], mydf_post['Neighbourhood']):\n",
" label = '{}, {}'.format(neighborhood, borough)\n",
" label = folium.Popup(label, parse_html=True)\n",
" folium.CircleMarker(\n",
" [lat, lng],\n",
" radius=9,\n",
" popup=label,\n",
" color='blue',\n",
" fill=True,\n",
" fill_color='#3287cd',\n",
" fill_opacity=0.5,\n",
" parse_html=False).add_to(map_toronto) \n",
" \n",
"map_toronto"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Get Foursquare data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First off, we need to enter our credentials in order to access the Foursquare API."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"jupyter": {
"source_hidden": true
}
},
"outputs": [],
"source": [
"CLIENT_ID = 'EOGS2ZA3IH1DZAGOT0G0MLFHMRLQSAV1TMGAIW4N2EJEGPFG' # your Foursquare ID\n",
"CLIENT_SECRET = 'M2QXMSBBAOTLRW5RCOSNJUGYM0SRZB3WMT3QMR4SB1HVOYIT' # your Foursquare Secret\n",
"VERSION = '20180605' # Foursquare API version\n",
"\n",
"#print('Your credentails:')\n",
"#print('CLIENT_ID: ' + CLIENT_ID)\n",
"#print('CLIENT_SECRET:' + CLIENT_SECRET)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, let's set some sensible defaults to limit the amount of data we are pulling from Foursquare."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"LIMIT = 100 # limit of number of venues returned by Foursquare API\n",
"radius = 500 # define radius"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To make gathering the nearby venues more efficient, let's create a function to pull the relevant data for all the Boroughs that returns the data in a neat data frame."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"def getNearbyVenues(names, latitudes, longitudes, radius=500):\n",
" \n",
" venues_list=[]\n",
" for name, lat, lng in zip(names, latitudes, longitudes):\n",
" print(name)\n",
" \n",
" # create the API request URL\n",
" url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(\n",
" CLIENT_ID, \n",
" CLIENT_SECRET, \n",
" VERSION, \n",
" lat, \n",
" lng, \n",
" radius, \n",
" LIMIT)\n",
" \n",
" # make the GET request\n",
" results = requests.get(url).json()[\"response\"]['groups'][0]['items']\n",
" \n",
" # return only relevant information for each nearby venue\n",
" venues_list.append([(\n",
" name, \n",
" lat, \n",
" lng, \n",
" v['venue']['name'], \n",
" v['venue']['location']['lat'], \n",
" v['venue']['location']['lng'], \n",
" v['venue']['categories'][0]['name']) for v in results])\n",
"\n",
" nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])\n",
" nearby_venues.columns = ['Neighbourhood', \n",
" 'Neighbourhood Latitude', \n",
" 'Neighbourhood Longitude', \n",
" 'Venue', \n",
" 'Venue Latitude', \n",
" 'Venue Longitude', \n",
" 'Venue Category']\n",
" \n",
" return(nearby_venues)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's call the function with our data as parameter."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Rouge, Malvern\n",
"Highland Creek, Rouge Hill, Port Union\n",
"Guildwood, Morningside, West Hill\n",
"Woburn\n",
"Cedarbrae\n",
"Scarborough Village\n",
"East Birchmount Park, Ionview, Kennedy Park\n",
"Clairlea, Golden Mile, Oakridge\n",
"Cliffcrest, Cliffside, Scarborough Village West\n",
"Birch Cliff, Cliffside West\n",
"Dorset Park, Scarborough Town Centre, Wexford Heights\n",
"Maryvale, Wexford\n",
"Agincourt\n",
"Clarks Corners, Sullivan, Tam O'Shanter\n",
"Agincourt North, L'Amoreaux East, Milliken, Steeles East\n",
"L'Amoreaux West\n",
"Upper Rouge\n",
"Hillcrest Village\n",
"Fairview, Henry Farm, Oriole\n",
"Bayview Village\n",
"Silver Hills, York Mills\n",
"Newtonbrook, Willowdale\n",
"Willowdale South\n",
"York Mills West\n",
"Willowdale West\n",
"Parkwoods\n",
"Don Mills North\n",
"Flemingdon Park, Don Mills South\n",
"Bathurst Manor, Downsview North, Wilson Heights\n",
"Northwood Park, York University\n",
"CFB Toronto, Downsview East\n",
"Downsview West\n",
"Downsview Central\n",
"Downsview Northwest\n",
"Victoria Village\n",
"Woodbine Gardens, Parkview Hill\n",
"Woodbine Heights\n",
"The Beaches\n",
"Leaside\n",
"Thorncliffe Park\n",
"East Toronto\n",
"The Danforth West, Riverdale\n",
"The Beaches West, India Bazaar\n",
"Studio District\n",
"Lawrence Park\n",
"Davisville North\n",
"North Toronto West\n",
"Davisville\n",
"Moore Park, Summerhill East\n",
"Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West\n",
"Rosedale\n",
"Cabbagetown, St. James Town\n",
"Church and Wellesley\n",
"Harbourfront, Regent Park\n",
"Ryerson, Garden District\n",
"St. James Town\n",
"Berczy Park\n",
"Central Bay Street\n",
"Adelaide, King, Richmond\n",
"Harbourfront East, Toronto Islands, Union Station\n",
"Design Exchange, Toronto Dominion Centre\n",
"Commerce Court, Victoria Hotel\n",
"Bedford Park, Lawrence Manor East\n",
"Roselawn\n",
"Forest Hill North, Forest Hill West\n",
"The Annex, North Midtown, Yorkville\n",
"Harbord, University of Toronto\n",
"Chinatown, Grange Park, Kensington Market\n",
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara\n",
"Stn A PO Boxes 25 The Esplanade\n",
"First Canadian Place, Underground city\n",
"Lawrence Heights, Lawrence Manor\n",
"Glencairn\n",
"Humewood-Cedarvale\n",
"Caledonia-Fairbanks\n",
"Christie\n",
"Dovercourt Village, Dufferin\n",
"Little Portugal, Trinity\n",
"Brockton, Exhibition Place, Parkdale Village\n",
"Downsview, North Park, Upwood Park\n",
"Del Ray, Keelesdale, Mount Dennis, Silverthorn\n",
"The Junction North, Runnymede\n",
"High Park, The Junction South\n",
"Parkdale, Roncesvalles\n",
"Runnymede, Swansea\n",
"Queen's Park\n",
"Canada Post Gateway Processing Centre\n",
"Business Reply Mail Processing Centre 969 Eastern\n",
"Humber Bay Shores, Mimico South, New Toronto\n",
"Alderwood, Long Branch\n",
"The Kingsway, Montgomery Road, Old Mill North\n",
"Humber Bay, King's Mill Park, Kingsway Park South East, Mimico NE, Old Mill South, The Queensway East, Royal York South East, Sunnylea\n",
"Kingsway Park South West, Mimico NW, The Queensway West, Royal York South West, South of Bloor\n",
"Islington Avenue\n",
"Cloverdale, Islington, Martin Grove, Princess Gardens, West Deane Park\n",
"Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe\n",
"Humber Summit\n",
"Emery, Humberlea\n",
"Weston\n",
"Westmount\n",
"Kingsview Village, Martin Grove Gardens, Richview Gardens, St. Phillips\n",
"Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown\n",
"Northwest\n"
]
}
],
"source": [
"toronto_venues = getNearbyVenues(names=mydf_post['Neighbourhood'],\n",
" latitudes=mydf_post['Latitude'],\n",
" longitudes=mydf_post['Longitude']\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's inspect how much data we got back."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(2244, 7)\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Neighbourhood</th>\n",
" <th>Neighbourhood Latitude</th>\n",
" <th>Neighbourhood Longitude</th>\n",
" <th>Venue</th>\n",
" <th>Venue Latitude</th>\n",
" <th>Venue Longitude</th>\n",
" <th>Venue Category</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Rouge, Malvern</td>\n",
" <td>43.806686</td>\n",
" <td>-79.194353</td>\n",
" <td>Wendy's</td>\n",
" <td>43.807448</td>\n",
" <td>-79.199056</td>\n",
" <td>Fast Food Restaurant</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Highland Creek, Rouge Hill, Port Union</td>\n",
" <td>43.784535</td>\n",
" <td>-79.160497</td>\n",
" <td>Royal Canadian Legion</td>\n",
" <td>43.782533</td>\n",
" <td>-79.163085</td>\n",
" <td>Bar</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Guildwood, Morningside, West Hill</td>\n",
" <td>43.763573</td>\n",
" <td>-79.188711</td>\n",
" <td>Swiss Chalet Rotisserie &amp; Grill</td>\n",
" <td>43.767697</td>\n",
" <td>-79.189914</td>\n",
" <td>Pizza Place</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Guildwood, Morningside, West Hill</td>\n",
" <td>43.763573</td>\n",
" <td>-79.188711</td>\n",
" <td>G &amp; G Electronics</td>\n",
" <td>43.765309</td>\n",
" <td>-79.191537</td>\n",
" <td>Electronics Store</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Guildwood, Morningside, West Hill</td>\n",
" <td>43.763573</td>\n",
" <td>-79.188711</td>\n",
" <td>Big Bite Burrito</td>\n",
" <td>43.766299</td>\n",
" <td>-79.190720</td>\n",
" <td>Mexican Restaurant</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Neighbourhood Neighbourhood Latitude \\\n",
"0 Rouge, Malvern 43.806686 \n",
"1 Highland Creek, Rouge Hill, Port Union 43.784535 \n",
"2 Guildwood, Morningside, West Hill 43.763573 \n",
"3 Guildwood, Morningside, West Hill 43.763573 \n",
"4 Guildwood, Morningside, West Hill 43.763573 \n",
"\n",
" Neighbourhood Longitude Venue Venue Latitude \\\n",
"0 -79.194353 Wendy's 43.807448 \n",
"1 -79.160497 Royal Canadian Legion 43.782533 \n",
"2 -79.188711 Swiss Chalet Rotisserie & Grill 43.767697 \n",
"3 -79.188711 G & G Electronics 43.765309 \n",
"4 -79.188711 Big Bite Burrito 43.766299 \n",
"\n",
" Venue Longitude Venue Category \n",
"0 -79.199056 Fast Food Restaurant \n",
"1 -79.163085 Bar \n",
"2 -79.189914 Pizza Place \n",
"3 -79.191537 Electronics Store \n",
"4 -79.190720 Mexican Restaurant "
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(toronto_venues.shape)\n",
"toronto_venues.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Analyze Neighbourhoods"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With the venue data from Foursquare, we are now ready to analyze the neighborhoods. Since we are interested in the frequency of the venue categories in each neighbourhood, we'll first one-hot encode those features and proceed with calculating the relative frequencies."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<bound method Series.unique of 0 Rouge, Malvern\n",
"1 Highland Creek, Rouge Hill, Port Union\n",
"2 Guildwood, Morningside, West Hill\n",
"3 Guildwood, Morningside, West Hill\n",
"4 Guildwood, Morningside, West Hill\n",
" ... \n",
"2239 Albion Gardens, Beaumond Heights, Humbergate, ...\n",
"2240 Albion Gardens, Beaumond Heights, Humbergate, ...\n",
"2241 Northwest\n",
"2242 Northwest\n",
"2243 Northwest\n",
"Name: Neighbourhood, Length: 2244, dtype: object>"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toronto_venues['Neighbourhood'].unique"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(2244, 280)\n",
"(2244, 281)\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Neighbourhood</th>\n",
" <th>Accessories Store</th>\n",
" <th>Afghan Restaurant</th>\n",
" <th>Airport</th>\n",
" <th>Airport Food Court</th>\n",
" <th>Airport Gate</th>\n",
" <th>Airport Lounge</th>\n",
" <th>Airport Service</th>\n",
" <th>Airport Terminal</th>\n",
" <th>American Restaurant</th>\n",
" <th>Antique Shop</th>\n",
" <th>Aquarium</th>\n",
" <th>Art Gallery</th>\n",
" <th>Art Museum</th>\n",
" <th>Arts &amp; Crafts Store</th>\n",
" <th>Asian Restaurant</th>\n",
" <th>Athletics &amp; Sports</th>\n",
" <th>Auto Garage</th>\n",
" <th>Auto Workshop</th>\n",
" <th>BBQ Joint</th>\n",
" <th>Baby Store</th>\n",
" <th>Bagel Shop</th>\n",
" <th>Bakery</th>\n",
" <th>Bank</th>\n",
" <th>Bar</th>\n",
" <th>Baseball Field</th>\n",
" <th>Baseball Stadium</th>\n",
" <th>Basketball Court</th>\n",
" <th>Basketball Stadium</th>\n",
" <th>Beach</th>\n",
" <th>Beer Bar</th>\n",
" <th>Beer Store</th>\n",
" <th>Bike Shop</th>\n",
" <th>Bistro</th>\n",
" <th>Boat or Ferry</th>\n",
" <th>Bookstore</th>\n",
" <th>Boutique</th>\n",
" <th>Brazilian Restaurant</th>\n",
" <th>Breakfast Spot</th>\n",
" <th>Brewery</th>\n",
" <th>Bridal Shop</th>\n",
" <th>Bubble Tea Shop</th>\n",
" <th>Building</th>\n",
" <th>Burger Joint</th>\n",
" <th>Burrito Place</th>\n",
" <th>Bus Line</th>\n",
" <th>Bus Station</th>\n",
" <th>Bus Stop</th>\n",
" <th>Butcher</th>\n",
" <th>Cafeteria</th>\n",
" <th>...</th>\n",
" <th>Salon / Barbershop</th>\n",
" <th>Sandwich Place</th>\n",
" <th>Scenic Lookout</th>\n",
" <th>Sculpture Garden</th>\n",
" <th>Seafood Restaurant</th>\n",
" <th>Shoe Store</th>\n",
" <th>Shopping Mall</th>\n",
" <th>Shopping Plaza</th>\n",
" <th>Skate Park</th>\n",
" <th>Skating Rink</th>\n",
" <th>Smoke Shop</th>\n",
" <th>Smoothie Shop</th>\n",
" <th>Snack Place</th>\n",
" <th>Soccer Field</th>\n",
" <th>Soup Place</th>\n",
" <th>Southern / Soul Food Restaurant</th>\n",
" <th>Spa</th>\n",
" <th>Speakeasy</th>\n",
" <th>Sporting Goods Shop</th>\n",
" <th>Sports Bar</th>\n",
" <th>Stadium</th>\n",
" <th>Stationery Store</th>\n",
" <th>Steakhouse</th>\n",
" <th>Strip Club</th>\n",
" <th>Supermarket</th>\n",
" <th>Supplement Shop</th>\n",
" <th>Sushi Restaurant</th>\n",
" <th>Swim School</th>\n",
" <th>Taco Place</th>\n",
" <th>Tailor Shop</th>\n",
" <th>Taiwanese Restaurant</th>\n",
" <th>Tanning Salon</th>\n",
" <th>Tapas Restaurant</th>\n",
" <th>Tea Room</th>\n",
" <th>Thai Restaurant</th>\n",
" <th>Theater</th>\n",
" <th>Theme Restaurant</th>\n",
" <th>Thrift / Vintage Store</th>\n",
" <th>Toy / Game Store</th>\n",
" <th>Trail</th>\n",
" <th>Train Station</th>\n",
" <th>Vegetarian / Vegan Restaurant</th>\n",
" <th>Video Game Store</th>\n",
" <th>Video Store</th>\n",
" <th>Vietnamese Restaurant</th>\n",
" <th>Warehouse Store</th>\n",
" <th>Wine Bar</th>\n",
" <th>Wings Joint</th>\n",
" <th>Women's Store</th>\n",
" <th>Yoga Studio</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Rouge, Malvern</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Highland Creek, Rouge Hill, Port Union</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Guildwood, Morningside, West Hill</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Guildwood, Morningside, West Hill</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Guildwood, Morningside, West Hill</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 281 columns</p>\n",
"</div>"
],
"text/plain": [
" Neighbourhood Accessories Store \\\n",
"0 Rouge, Malvern 0 \n",
"1 Highland Creek, Rouge Hill, Port Union 0 \n",
"2 Guildwood, Morningside, West Hill 0 \n",
"3 Guildwood, Morningside, West Hill 0 \n",
"4 Guildwood, Morningside, West Hill 0 \n",
"\n",
" Afghan Restaurant Airport Airport Food Court Airport Gate \\\n",
"0 0 0 0 0 \n",
"1 0 0 0 0 \n",
"2 0 0 0 0 \n",
"3 0 0 0 0 \n",
"4 0 0 0 0 \n",
"\n",
" Airport Lounge Airport Service Airport Terminal American Restaurant \\\n",
"0 0 0 0 0 \n",
"1 0 0 0 0 \n",
"2 0 0 0 0 \n",
"3 0 0 0 0 \n",
"4 0 0 0 0 \n",
"\n",
" Antique Shop Aquarium Art Gallery Art Museum Arts & Crafts Store \\\n",
"0 0 0 0 0 0 \n",
"1 0 0 0 0 0 \n",
"2 0 0 0 0 0 \n",
"3 0 0 0 0 0 \n",
"4 0 0 0 0 0 \n",
"\n",
" Asian Restaurant Athletics & Sports Auto Garage Auto Workshop \\\n",
"0 0 0 0 0 \n",
"1 0 0 0 0 \n",
"2 0 0 0 0 \n",
"3 0 0 0 0 \n",
"4 0 0 0 0 \n",
"\n",
" BBQ Joint Baby Store Bagel Shop Bakery Bank Bar Baseball Field \\\n",
"0 0 0 0 0 0 0 0 \n",
"1 0 0 0 0 0 1 0 \n",
"2 0 0 0 0 0 0 0 \n",
"3 0 0 0 0 0 0 0 \n",
"4 0 0 0 0 0 0 0 \n",
"\n",
" Baseball Stadium Basketball Court Basketball Stadium Beach Beer Bar \\\n",
"0 0 0 0 0 0 \n",
"1 0 0 0 0 0 \n",
"2 0 0 0 0 0 \n",
"3 0 0 0 0 0 \n",
"4 0 0 0 0 0 \n",
"\n",
" Beer Store Bike Shop Bistro Boat or Ferry Bookstore Boutique \\\n",
"0 0 0 0 0 0 0 \n",
"1 0 0 0 0 0 0 \n",
"2 0 0 0 0 0 0 \n",
"3 0 0 0 0 0 0 \n",
"4 0 0 0 0 0 0 \n",
"\n",
" Brazilian Restaurant Breakfast Spot Brewery Bridal Shop \\\n",
"0 0 0 0 0 \n",
"1 0 0 0 0 \n",
"2 0 0 0 0 \n",
"3 0 0 0 0 \n",
"4 0 0 0 0 \n",
"\n",
" Bubble Tea Shop Building Burger Joint Burrito Place Bus Line \\\n",
"0 0 0 0 0 0 \n",
"1 0 0 0 0 0 \n",
"2 0 0 0 0 0 \n",
"3 0 0 0 0 0 \n",
"4 0 0 0 0 0 \n",
"\n",
" Bus Station Bus Stop Butcher Cafeteria ... Salon / Barbershop \\\n",
"0 0 0 0 0 ... 0 \n",
"1 0 0 0 0 ... 0 \n",
"2 0 0 0 0 ... 0 \n",
"3 0 0 0 0 ... 0 \n",
"4 0 0 0 0 ... 0 \n",
"\n",
" Sandwich Place Scenic Lookout Sculpture Garden Seafood Restaurant \\\n",
"0 0 0 0 0 \n",
"1 0 0 0 0 \n",
"2 0 0 0 0 \n",
"3 0 0 0 0 \n",
"4 0 0 0 0 \n",
"\n",
" Shoe Store Shopping Mall Shopping Plaza Skate Park Skating Rink \\\n",
"0 0 0 0 0 0 \n",
"1 0 0 0 0 0 \n",
"2 0 0 0 0 0 \n",
"3 0 0 0 0 0 \n",
"4 0 0 0 0 0 \n",
"\n",
" Smoke Shop Smoothie Shop Snack Place Soccer Field Soup Place \\\n",
"0 0 0 0 0 0 \n",
"1 0 0 0 0 0 \n",
"2 0 0 0 0 0 \n",
"3 0 0 0 0 0 \n",
"4 0 0 0 0 0 \n",
"\n",
" Southern / Soul Food Restaurant Spa Speakeasy Sporting Goods Shop \\\n",
"0 0 0 0 0 \n",
"1 0 0 0 0 \n",
"2 0 0 0 0 \n",
"3 0 0 0 0 \n",
"4 0 0 0 0 \n",
"\n",
" Sports Bar Stadium Stationery Store Steakhouse Strip Club Supermarket \\\n",
"0 0 0 0 0 0 0 \n",
"1 0 0 0 0 0 0 \n",
"2 0 0 0 0 0 0 \n",
"3 0 0 0 0 0 0 \n",
"4 0 0 0 0 0 0 \n",
"\n",
" Supplement Shop Sushi Restaurant Swim School Taco Place Tailor Shop \\\n",
"0 0 0 0 0 0 \n",
"1 0 0 0 0 0 \n",
"2 0 0 0 0 0 \n",
"3 0 0 0 0 0 \n",
"4 0 0 0 0 0 \n",
"\n",
" Taiwanese Restaurant Tanning Salon Tapas Restaurant Tea Room \\\n",
"0 0 0 0 0 \n",
"1 0 0 0 0 \n",
"2 0 0 0 0 \n",
"3 0 0 0 0 \n",
"4 0 0 0 0 \n",
"\n",
" Thai Restaurant Theater Theme Restaurant Thrift / Vintage Store \\\n",
"0 0 0 0 0 \n",
"1 0 0 0 0 \n",
"2 0 0 0 0 \n",
"3 0 0 0 0 \n",
"4 0 0 0 0 \n",
"\n",
" Toy / Game Store Trail Train Station Vegetarian / Vegan Restaurant \\\n",
"0 0 0 0 0 \n",
"1 0 0 0 0 \n",
"2 0 0 0 0 \n",
"3 0 0 0 0 \n",
"4 0 0 0 0 \n",
"\n",
" Video Game Store Video Store Vietnamese Restaurant Warehouse Store \\\n",
"0 0 0 0 0 \n",
"1 0 0 0 0 \n",
"2 0 0 0 0 \n",
"3 0 0 0 0 \n",
"4 0 0 0 0 \n",
"\n",
" Wine Bar Wings Joint Women's Store Yoga Studio \n",
"0 0 0 0 0 \n",
"1 0 0 0 0 \n",
"2 0 0 0 0 \n",
"3 0 0 0 0 \n",
"4 0 0 0 0 \n",
"\n",
"[5 rows x 281 columns]"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# one hot encoding\n",
"toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix=\"\", prefix_sep=\"\")\n",
"print(toronto_onehot.shape)\n",
"\n",
"# add neighborhood column back to dataframe\n",
"toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood'] \n",
"print(toronto_onehot.shape)\n",
"\n",
"# move neighborhood column to the first column\n",
"fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])\n",
"toronto_onehot = toronto_onehot[fixed_columns]\n",
"\n",
"toronto_onehot.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's group by Neighbourhood to determine the frequencies of the venue types."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(101, 281)\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Neighbourhood</th>\n",
" <th>Accessories Store</th>\n",
" <th>Afghan Restaurant</th>\n",
" <th>Airport</th>\n",
" <th>Airport Food Court</th>\n",
" <th>Airport Gate</th>\n",
" <th>Airport Lounge</th>\n",
" <th>Airport Service</th>\n",
" <th>Airport Terminal</th>\n",
" <th>American Restaurant</th>\n",
" <th>Antique Shop</th>\n",
" <th>Aquarium</th>\n",
" <th>Art Gallery</th>\n",
" <th>Art Museum</th>\n",
" <th>Arts &amp; Crafts Store</th>\n",
" <th>Asian Restaurant</th>\n",
" <th>Athletics &amp; Sports</th>\n",
" <th>Auto Garage</th>\n",
" <th>Auto Workshop</th>\n",
" <th>BBQ Joint</th>\n",
" <th>Baby Store</th>\n",
" <th>Bagel Shop</th>\n",
" <th>Bakery</th>\n",
" <th>Bank</th>\n",
" <th>Bar</th>\n",
" <th>Baseball Field</th>\n",
" <th>Baseball Stadium</th>\n",
" <th>Basketball Court</th>\n",
" <th>Basketball Stadium</th>\n",
" <th>Beach</th>\n",
" <th>Beer Bar</th>\n",
" <th>Beer Store</th>\n",
" <th>Bike Shop</th>\n",
" <th>Bistro</th>\n",
" <th>Boat or Ferry</th>\n",
" <th>Bookstore</th>\n",
" <th>Boutique</th>\n",
" <th>Brazilian Restaurant</th>\n",
" <th>Breakfast Spot</th>\n",
" <th>Brewery</th>\n",
" <th>Bridal Shop</th>\n",
" <th>Bubble Tea Shop</th>\n",
" <th>Building</th>\n",
" <th>Burger Joint</th>\n",
" <th>Burrito Place</th>\n",
" <th>Bus Line</th>\n",
" <th>Bus Station</th>\n",
" <th>Bus Stop</th>\n",
" <th>Butcher</th>\n",
" <th>Cafeteria</th>\n",
" <th>...</th>\n",
" <th>Salon / Barbershop</th>\n",
" <th>Sandwich Place</th>\n",
" <th>Scenic Lookout</th>\n",
" <th>Sculpture Garden</th>\n",
" <th>Seafood Restaurant</th>\n",
" <th>Shoe Store</th>\n",
" <th>Shopping Mall</th>\n",
" <th>Shopping Plaza</th>\n",
" <th>Skate Park</th>\n",
" <th>Skating Rink</th>\n",
" <th>Smoke Shop</th>\n",
" <th>Smoothie Shop</th>\n",
" <th>Snack Place</th>\n",
" <th>Soccer Field</th>\n",
" <th>Soup Place</th>\n",
" <th>Southern / Soul Food Restaurant</th>\n",
" <th>Spa</th>\n",
" <th>Speakeasy</th>\n",
" <th>Sporting Goods Shop</th>\n",
" <th>Sports Bar</th>\n",
" <th>Stadium</th>\n",
" <th>Stationery Store</th>\n",
" <th>Steakhouse</th>\n",
" <th>Strip Club</th>\n",
" <th>Supermarket</th>\n",
" <th>Supplement Shop</th>\n",
" <th>Sushi Restaurant</th>\n",
" <th>Swim School</th>\n",
" <th>Taco Place</th>\n",
" <th>Tailor Shop</th>\n",
" <th>Taiwanese Restaurant</th>\n",
" <th>Tanning Salon</th>\n",
" <th>Tapas Restaurant</th>\n",
" <th>Tea Room</th>\n",
" <th>Thai Restaurant</th>\n",
" <th>Theater</th>\n",
" <th>Theme Restaurant</th>\n",
" <th>Thrift / Vintage Store</th>\n",
" <th>Toy / Game Store</th>\n",
" <th>Trail</th>\n",
" <th>Train Station</th>\n",
" <th>Vegetarian / Vegan Restaurant</th>\n",
" <th>Video Game Store</th>\n",
" <th>Video Store</th>\n",
" <th>Vietnamese Restaurant</th>\n",
" <th>Warehouse Store</th>\n",
" <th>Wine Bar</th>\n",
" <th>Wings Joint</th>\n",
" <th>Women's Store</th>\n",
" <th>Yoga Studio</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Adelaide, King, Richmond</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.03</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.01</td>\n",
" <td>0.01</td>\n",
" <td>0.0</td>\n",
" <td>0.03</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.02</td>\n",
" <td>0.0</td>\n",
" <td>0.04</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.01</td>\n",
" <td>0.0</td>\n",
" <td>0.01</td>\n",
" <td>0.03</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.01</td>\n",
" <td>0.02</td>\n",
" <td>0.01</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.01</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.01</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>0.01</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.01</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.04</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.02</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.04</td>\n",
" <td>0.01</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.01</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.01</td>\n",
" <td>0.0</td>\n",
" <td>0.01</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Agincourt</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.20</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.00</td>\n",
" <td>0.200000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.200000</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Agincourt North, L'Amoreaux East, Milliken, St...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.00</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Albion Gardens, Beaumond Heights, Humbergate, ...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.111111</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.00</td>\n",
" <td>0.111111</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Alderwood, Long Branch</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.00</td>\n",
" <td>0.111111</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.111111</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 281 columns</p>\n",
"</div>"
],
"text/plain": [
" Neighbourhood Accessories Store \\\n",
"0 Adelaide, King, Richmond 0.0 \n",
"1 Agincourt 0.0 \n",
"2 Agincourt North, L'Amoreaux East, Milliken, St... 0.0 \n",
"3 Albion Gardens, Beaumond Heights, Humbergate, ... 0.0 \n",
"4 Alderwood, Long Branch 0.0 \n",
"\n",
" Afghan Restaurant Airport Airport Food Court Airport Gate \\\n",
"0 0.0 0.0 0.0 0.0 \n",
"1 0.0 0.0 0.0 0.0 \n",
"2 0.0 0.0 0.0 0.0 \n",
"3 0.0 0.0 0.0 0.0 \n",
"4 0.0 0.0 0.0 0.0 \n",
"\n",
" Airport Lounge Airport Service Airport Terminal American Restaurant \\\n",
"0 0.0 0.0 0.0 0.03 \n",
"1 0.0 0.0 0.0 0.00 \n",
"2 0.0 0.0 0.0 0.00 \n",
"3 0.0 0.0 0.0 0.00 \n",
"4 0.0 0.0 0.0 0.00 \n",
"\n",
" Antique Shop Aquarium Art Gallery Art Museum Arts & Crafts Store \\\n",
"0 0.0 0.0 0.01 0.01 0.0 \n",
"1 0.0 0.0 0.00 0.00 0.0 \n",
"2 0.0 0.0 0.00 0.00 0.0 \n",
"3 0.0 0.0 0.00 0.00 0.0 \n",
"4 0.0 0.0 0.00 0.00 0.0 \n",
"\n",
" Asian Restaurant Athletics & Sports Auto Garage Auto Workshop \\\n",
"0 0.03 0.0 0.0 0.0 \n",
"1 0.00 0.0 0.0 0.0 \n",
"2 0.00 0.0 0.0 0.0 \n",
"3 0.00 0.0 0.0 0.0 \n",
"4 0.00 0.0 0.0 0.0 \n",
"\n",
" BBQ Joint Baby Store Bagel Shop Bakery Bank Bar Baseball Field \\\n",
"0 0.0 0.0 0.0 0.02 0.0 0.04 0.0 \n",
"1 0.0 0.0 0.0 0.00 0.0 0.00 0.0 \n",
"2 0.0 0.0 0.0 0.00 0.0 0.00 0.0 \n",
"3 0.0 0.0 0.0 0.00 0.0 0.00 0.0 \n",
"4 0.0 0.0 0.0 0.00 0.0 0.00 0.0 \n",
"\n",
" Baseball Stadium Basketball Court Basketball Stadium Beach Beer Bar \\\n",
"0 0.0 0.0 0.0 0.0 0.0 \n",
"1 0.0 0.0 0.0 0.0 0.0 \n",
"2 0.0 0.0 0.0 0.0 0.0 \n",
"3 0.0 0.0 0.0 0.0 0.0 \n",
"4 0.0 0.0 0.0 0.0 0.0 \n",
"\n",
" Beer Store Bike Shop Bistro Boat or Ferry Bookstore Boutique \\\n",
"0 0.000000 0.0 0.0 0.0 0.01 0.0 \n",
"1 0.000000 0.0 0.0 0.0 0.00 0.0 \n",
"2 0.000000 0.0 0.0 0.0 0.00 0.0 \n",
"3 0.111111 0.0 0.0 0.0 0.00 0.0 \n",
"4 0.000000 0.0 0.0 0.0 0.00 0.0 \n",
"\n",
" Brazilian Restaurant Breakfast Spot Brewery Bridal Shop \\\n",
"0 0.01 0.03 0.0 0.0 \n",
"1 0.00 0.20 0.0 0.0 \n",
"2 0.00 0.00 0.0 0.0 \n",
"3 0.00 0.00 0.0 0.0 \n",
"4 0.00 0.00 0.0 0.0 \n",
"\n",
" Bubble Tea Shop Building Burger Joint Burrito Place Bus Line \\\n",
"0 0.0 0.01 0.02 0.01 0.0 \n",
"1 0.0 0.00 0.00 0.00 0.0 \n",
"2 0.0 0.00 0.00 0.00 0.0 \n",
"3 0.0 0.00 0.00 0.00 0.0 \n",
"4 0.0 0.00 0.00 0.00 0.0 \n",
"\n",
" Bus Station Bus Stop Butcher Cafeteria ... Salon / Barbershop \\\n",
"0 0.0 0.0 0.0 0.0 ... 0.01 \n",
"1 0.0 0.0 0.0 0.0 ... 0.00 \n",
"2 0.0 0.0 0.0 0.0 ... 0.00 \n",
"3 0.0 0.0 0.0 0.0 ... 0.00 \n",
"4 0.0 0.0 0.0 0.0 ... 0.00 \n",
"\n",
" Sandwich Place Scenic Lookout Sculpture Garden Seafood Restaurant \\\n",
"0 0.000000 0.0 0.0 0.01 \n",
"1 0.200000 0.0 0.0 0.00 \n",
"2 0.000000 0.0 0.0 0.00 \n",
"3 0.111111 0.0 0.0 0.00 \n",
"4 0.111111 0.0 0.0 0.00 \n",
"\n",
" Shoe Store Shopping Mall Shopping Plaza Skate Park Skating Rink \\\n",
"0 0.0 0.0 0.0 0.0 0.000000 \n",
"1 0.0 0.0 0.0 0.0 0.200000 \n",
"2 0.0 0.0 0.0 0.0 0.000000 \n",
"3 0.0 0.0 0.0 0.0 0.000000 \n",
"4 0.0 0.0 0.0 0.0 0.111111 \n",
"\n",
" Smoke Shop Smoothie Shop Snack Place Soccer Field Soup Place \\\n",
"0 0.01 0.0 0.0 0.0 0.0 \n",
"1 0.00 0.0 0.0 0.0 0.0 \n",
"2 0.00 0.0 0.0 0.0 0.0 \n",
"3 0.00 0.0 0.0 0.0 0.0 \n",
"4 0.00 0.0 0.0 0.0 0.0 \n",
"\n",
" Southern / Soul Food Restaurant Spa Speakeasy Sporting Goods Shop \\\n",
"0 0.0 0.0 0.01 0.0 \n",
"1 0.0 0.0 0.00 0.0 \n",
"2 0.0 0.0 0.00 0.0 \n",
"3 0.0 0.0 0.00 0.0 \n",
"4 0.0 0.0 0.00 0.0 \n",
"\n",
" Sports Bar Stadium Stationery Store Steakhouse Strip Club Supermarket \\\n",
"0 0.0 0.0 0.0 0.04 0.0 0.0 \n",
"1 0.0 0.0 0.0 0.00 0.0 0.0 \n",
"2 0.0 0.0 0.0 0.00 0.0 0.0 \n",
"3 0.0 0.0 0.0 0.00 0.0 0.0 \n",
"4 0.0 0.0 0.0 0.00 0.0 0.0 \n",
"\n",
" Supplement Shop Sushi Restaurant Swim School Taco Place Tailor Shop \\\n",
"0 0.0 0.02 0.0 0.0 0.0 \n",
"1 0.0 0.00 0.0 0.0 0.0 \n",
"2 0.0 0.00 0.0 0.0 0.0 \n",
"3 0.0 0.00 0.0 0.0 0.0 \n",
"4 0.0 0.00 0.0 0.0 0.0 \n",
"\n",
" Taiwanese Restaurant Tanning Salon Tapas Restaurant Tea Room \\\n",
"0 0.0 0.0 0.0 0.0 \n",
"1 0.0 0.0 0.0 0.0 \n",
"2 0.0 0.0 0.0 0.0 \n",
"3 0.0 0.0 0.0 0.0 \n",
"4 0.0 0.0 0.0 0.0 \n",
"\n",
" Thai Restaurant Theater Theme Restaurant Thrift / Vintage Store \\\n",
"0 0.04 0.01 0.0 0.0 \n",
"1 0.00 0.00 0.0 0.0 \n",
"2 0.00 0.00 0.0 0.0 \n",
"3 0.00 0.00 0.0 0.0 \n",
"4 0.00 0.00 0.0 0.0 \n",
"\n",
" Toy / Game Store Trail Train Station Vegetarian / Vegan Restaurant \\\n",
"0 0.0 0.0 0.0 0.01 \n",
"1 0.0 0.0 0.0 0.00 \n",
"2 0.0 0.0 0.0 0.00 \n",
"3 0.0 0.0 0.0 0.00 \n",
"4 0.0 0.0 0.0 0.00 \n",
"\n",
" Video Game Store Video Store Vietnamese Restaurant Warehouse Store \\\n",
"0 0.0 0.0 0.0 0.0 \n",
"1 0.0 0.0 0.0 0.0 \n",
"2 0.0 0.0 0.0 0.0 \n",
"3 0.0 0.0 0.0 0.0 \n",
"4 0.0 0.0 0.0 0.0 \n",
"\n",
" Wine Bar Wings Joint Women's Store Yoga Studio \n",
"0 0.01 0.0 0.01 0.0 \n",
"1 0.00 0.0 0.00 0.0 \n",
"2 0.00 0.0 0.00 0.0 \n",
"3 0.00 0.0 0.00 0.0 \n",
"4 0.00 0.0 0.00 0.0 \n",
"\n",
"[5 rows x 281 columns]"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()\n",
"print(toronto_grouped.shape)\n",
"toronto_grouped.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Obviously, we're dealing with a very sparse data set here, so let's determine the top 10 most frequent venue types per neighbourhood. First, let's create a function that sorts the venue frequencies in descending order."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"def return_most_common_venues(row, num_top_venues):\n",
" row_categories = row.iloc[1:]\n",
" row_categories_sorted = row_categories.sort_values(ascending=False)\n",
" \n",
" return row_categories_sorted.index.values[0:num_top_venues]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's create the new dataframe containing the top 10 venues for each neighborhood."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Neighbourhood</th>\n",
" <th>1st Most Common Venue</th>\n",
" <th>2nd Most Common Venue</th>\n",
" <th>3rd Most Common Venue</th>\n",
" <th>4th Most Common Venue</th>\n",
" <th>5th Most Common Venue</th>\n",
" <th>6th Most Common Venue</th>\n",
" <th>7th Most Common Venue</th>\n",
" <th>8th Most Common Venue</th>\n",
" <th>9th Most Common Venue</th>\n",
" <th>10th Most Common Venue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Adelaide, King, Richmond</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Café</td>\n",
" <td>Thai Restaurant</td>\n",
" <td>Bar</td>\n",
" <td>Steakhouse</td>\n",
" <td>Gym</td>\n",
" <td>Restaurant</td>\n",
" <td>American Restaurant</td>\n",
" <td>Breakfast Spot</td>\n",
" <td>Hotel</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Agincourt</td>\n",
" <td>Lounge</td>\n",
" <td>Breakfast Spot</td>\n",
" <td>Skating Rink</td>\n",
" <td>Chinese Restaurant</td>\n",
" <td>Sandwich Place</td>\n",
" <td>Eastern European Restaurant</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Donut Shop</td>\n",
" <td>Drugstore</td>\n",
" <td>Dumpling Restaurant</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Agincourt North, L'Amoreaux East, Milliken, St...</td>\n",
" <td>Park</td>\n",
" <td>Playground</td>\n",
" <td>Yoga Studio</td>\n",
" <td>Eastern European Restaurant</td>\n",
" <td>Dive Bar</td>\n",
" <td>Dog Run</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Donut Shop</td>\n",
" <td>Drugstore</td>\n",
" <td>Dumpling Restaurant</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Albion Gardens, Beaumond Heights, Humbergate, ...</td>\n",
" <td>Grocery Store</td>\n",
" <td>Fast Food Restaurant</td>\n",
" <td>Pizza Place</td>\n",
" <td>Sandwich Place</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Beer Store</td>\n",
" <td>Pharmacy</td>\n",
" <td>Fried Chicken Joint</td>\n",
" <td>Empanada Restaurant</td>\n",
" <td>Electronics Store</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Alderwood, Long Branch</td>\n",
" <td>Pizza Place</td>\n",
" <td>Gym</td>\n",
" <td>Pool</td>\n",
" <td>Skating Rink</td>\n",
" <td>Pharmacy</td>\n",
" <td>Pub</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Sandwich Place</td>\n",
" <td>Diner</td>\n",
" <td>Discount Store</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Neighbourhood 1st Most Common Venue \\\n",
"0 Adelaide, King, Richmond Coffee Shop \n",
"1 Agincourt Lounge \n",
"2 Agincourt North, L'Amoreaux East, Milliken, St... Park \n",
"3 Albion Gardens, Beaumond Heights, Humbergate, ... Grocery Store \n",
"4 Alderwood, Long Branch Pizza Place \n",
"\n",
" 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue \\\n",
"0 Café Thai Restaurant Bar \n",
"1 Breakfast Spot Skating Rink Chinese Restaurant \n",
"2 Playground Yoga Studio Eastern European Restaurant \n",
"3 Fast Food Restaurant Pizza Place Sandwich Place \n",
"4 Gym Pool Skating Rink \n",
"\n",
" 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue \\\n",
"0 Steakhouse Gym Restaurant \n",
"1 Sandwich Place Eastern European Restaurant Doner Restaurant \n",
"2 Dive Bar Dog Run Doner Restaurant \n",
"3 Coffee Shop Beer Store Pharmacy \n",
"4 Pharmacy Pub Coffee Shop \n",
"\n",
" 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue \n",
"0 American Restaurant Breakfast Spot Hotel \n",
"1 Donut Shop Drugstore Dumpling Restaurant \n",
"2 Donut Shop Drugstore Dumpling Restaurant \n",
"3 Fried Chicken Joint Empanada Restaurant Electronics Store \n",
"4 Sandwich Place Diner Discount Store "
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"num_top_venues = 10\n",
"\n",
"indicators = ['st', 'nd', 'rd']\n",
"\n",
"# create columns according to number of top venues\n",
"columns = ['Neighbourhood']\n",
"for ind in np.arange(num_top_venues):\n",
" try:\n",
" columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))\n",
" except:\n",
" columns.append('{}th Most Common Venue'.format(ind+1))\n",
"\n",
"# create a new dataframe\n",
"neighborhoods_venues_sorted = pd.DataFrame(columns=columns)\n",
"neighborhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']\n",
"\n",
"for ind in np.arange(toronto_grouped.shape[0]):\n",
" neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)\n",
"\n",
"neighborhoods_venues_sorted.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Neighbourhood clustering"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We're now ready to cluster the neighbourhoods based on the prevalence of venue types. \n",
"\n",
"### Finding k\n",
"\n",
"One question that always comes up is how to choose k for the clustering. We solve this by looking at the inertia of the clusters and applying the elbox rule."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"\n",
"# remove Neighbourhood column\n",
"toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)\n",
"\n",
"\n",
"# set number of clusters to test\n",
"maxk = 8\n",
"cost = np.zeros((maxk-1))\n",
"\n",
"for n in range(1, maxk):\n",
" # run k-means clustering\n",
" kmeans = KMeans(n_clusters=n, random_state=0).fit(toronto_grouped_clustering)\n",
" cost[n-1] = kmeans.inertia_\n",
" #print(cost[n-1])\n",
"\n",
"# plot inertia to find best value for K\n",
"plt.plot(range(2, maxk), cost[1:maxk], 'g')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Performing the clustering\n",
"\n",
"Judging from the plot above, it looks like setting k to 7 will be the best balance, so let's run the final clustering like this."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 0, 6, 5, 5, 0, 3, 0, 0, 0], dtype=int32)"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kclusters = 7\n",
"kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)\n",
"# check cluster labels generated for each row in the dataframe\n",
"kmeans.labels_[0:10] "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Postcode</th>\n",
" <th>Borough</th>\n",
" <th>Neighbourhood</th>\n",
" <th>Latitude</th>\n",
" <th>Longitude</th>\n",
" <th>Cluster Labels</th>\n",
" <th>1st Most Common Venue</th>\n",
" <th>2nd Most Common Venue</th>\n",
" <th>3rd Most Common Venue</th>\n",
" <th>4th Most Common Venue</th>\n",
" <th>5th Most Common Venue</th>\n",
" <th>6th Most Common Venue</th>\n",
" <th>7th Most Common Venue</th>\n",
" <th>8th Most Common Venue</th>\n",
" <th>9th Most Common Venue</th>\n",
" <th>10th Most Common Venue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>M1B</td>\n",
" <td>Scarborough</td>\n",
" <td>Rouge, Malvern</td>\n",
" <td>43.806686</td>\n",
" <td>-79.194353</td>\n",
" <td>5</td>\n",
" <td>Fast Food Restaurant</td>\n",
" <td>Yoga Studio</td>\n",
" <td>Electronics Store</td>\n",
" <td>Dog Run</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Donut Shop</td>\n",
" <td>Drugstore</td>\n",
" <td>Dumpling Restaurant</td>\n",
" <td>Eastern European Restaurant</td>\n",
" <td>Empanada Restaurant</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>M1C</td>\n",
" <td>Scarborough</td>\n",
" <td>Highland Creek, Rouge Hill, Port Union</td>\n",
" <td>43.784535</td>\n",
" <td>-79.160497</td>\n",
" <td>0</td>\n",
" <td>Bar</td>\n",
" <td>Yoga Studio</td>\n",
" <td>Electronics Store</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Donut Shop</td>\n",
" <td>Drugstore</td>\n",
" <td>Dumpling Restaurant</td>\n",
" <td>Eastern European Restaurant</td>\n",
" <td>Empanada Restaurant</td>\n",
" <td>Dive Bar</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>M1E</td>\n",
" <td>Scarborough</td>\n",
" <td>Guildwood, Morningside, West Hill</td>\n",
" <td>43.763573</td>\n",
" <td>-79.188711</td>\n",
" <td>0</td>\n",
" <td>Breakfast Spot</td>\n",
" <td>Rental Car Location</td>\n",
" <td>Intersection</td>\n",
" <td>Pizza Place</td>\n",
" <td>Electronics Store</td>\n",
" <td>Medical Center</td>\n",
" <td>Mexican Restaurant</td>\n",
" <td>Drugstore</td>\n",
" <td>Dog Run</td>\n",
" <td>Doner Restaurant</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>M1G</td>\n",
" <td>Scarborough</td>\n",
" <td>Woburn</td>\n",
" <td>43.770992</td>\n",
" <td>-79.216917</td>\n",
" <td>0</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Korean Restaurant</td>\n",
" <td>Electronics Store</td>\n",
" <td>Dog Run</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Donut Shop</td>\n",
" <td>Drugstore</td>\n",
" <td>Dumpling Restaurant</td>\n",
" <td>Eastern European Restaurant</td>\n",
" <td>Yoga Studio</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>M1H</td>\n",
" <td>Scarborough</td>\n",
" <td>Cedarbrae</td>\n",
" <td>43.773136</td>\n",
" <td>-79.239476</td>\n",
" <td>0</td>\n",
" <td>Hakka Restaurant</td>\n",
" <td>Thai Restaurant</td>\n",
" <td>Fried Chicken Joint</td>\n",
" <td>Bank</td>\n",
" <td>Bakery</td>\n",
" <td>Athletics &amp; Sports</td>\n",
" <td>Caribbean Restaurant</td>\n",
" <td>Cuban Restaurant</td>\n",
" <td>Costume Shop</td>\n",
" <td>Farmers Market</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Postcode Borough Neighbourhood Latitude \\\n",
"0 M1B Scarborough Rouge, Malvern 43.806686 \n",
"1 M1C Scarborough Highland Creek, Rouge Hill, Port Union 43.784535 \n",
"2 M1E Scarborough Guildwood, Morningside, West Hill 43.763573 \n",
"3 M1G Scarborough Woburn 43.770992 \n",
"4 M1H Scarborough Cedarbrae 43.773136 \n",
"\n",
" Longitude Cluster Labels 1st Most Common Venue 2nd Most Common Venue \\\n",
"0 -79.194353 5 Fast Food Restaurant Yoga Studio \n",
"1 -79.160497 0 Bar Yoga Studio \n",
"2 -79.188711 0 Breakfast Spot Rental Car Location \n",
"3 -79.216917 0 Coffee Shop Korean Restaurant \n",
"4 -79.239476 0 Hakka Restaurant Thai Restaurant \n",
"\n",
" 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue \\\n",
"0 Electronics Store Dog Run Doner Restaurant \n",
"1 Electronics Store Doner Restaurant Donut Shop \n",
"2 Intersection Pizza Place Electronics Store \n",
"3 Electronics Store Dog Run Doner Restaurant \n",
"4 Fried Chicken Joint Bank Bakery \n",
"\n",
" 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue \\\n",
"0 Donut Shop Drugstore Dumpling Restaurant \n",
"1 Drugstore Dumpling Restaurant Eastern European Restaurant \n",
"2 Medical Center Mexican Restaurant Drugstore \n",
"3 Donut Shop Drugstore Dumpling Restaurant \n",
"4 Athletics & Sports Caribbean Restaurant Cuban Restaurant \n",
"\n",
" 9th Most Common Venue 10th Most Common Venue \n",
"0 Eastern European Restaurant Empanada Restaurant \n",
"1 Empanada Restaurant Dive Bar \n",
"2 Dog Run Doner Restaurant \n",
"3 Eastern European Restaurant Yoga Studio \n",
"4 Costume Shop Farmers Market "
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# add clustering labels\n",
"#neighborhoods_venues_sorted.drop('Cluster Labels', axis=1, inplace=True)\n",
"neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)\n",
"\n",
"toronto_merged = mydf_post\n",
"\n",
"# inner join toronto_merged with neighborhood_venues_sorted to add latitude/longitude for each neighbourhood\n",
"toronto_labeled = pd.merge(toronto_merged, neighborhoods_venues_sorted, how='inner', on='Neighbourhood')\n",
"\n",
"toronto_labeled.head() # check the last columns!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Clustering results\n",
"\n",
"Finally, let's visualize the resulting clusters of similar neighbourhoods. Let's start by counting the number of neighbourhoods in each cluster."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Cluster Labels\n",
"0 72\n",
"1 1\n",
"2 1\n",
"3 2\n",
"4 1\n",
"5 10\n",
"6 14\n",
"Name: Neighbourhood, dtype: int64"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toronto_labeled.groupby('Cluster Labels').count()['Neighbourhood']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is very interesting, as it appears that there are 3 main clusters accompanied by 4 outliers. \n",
"\n",
"## Mapping the clusters\n",
"\n",
"Let's see how this looks on the map using Folium."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"width:100%;\"><div style=\"position:relative;width:100%;height:0;padding-bottom:60%;\"><iframe src=\"data:text/html;charset=utf-8;base64,\" style=\"position:absolute;width:100%;height:100%;left:0;top:0;border:none !important;\" allowfullscreen webkitallowfullscreen mozallowfullscreen></iframe></div></div>"
],
"text/plain": [
"<folium.folium.Map at 0x7f89b40225f8>"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# create map\n",
"map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)\n",
"\n",
"# set color scheme for the clusters\n",
"x = np.arange(kclusters)\n",
"ys = [i + x + (i*x)**2 for i in range(kclusters)]\n",
"#colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))\n",
"colors_array = cm.gist_rainbow(np.linspace(0, 1, len(ys)))\n",
"rainbow = [colors.rgb2hex(i) for i in colors_array]\n",
"\n",
"# add markers to the map\n",
"markers_colors = []\n",
"for lat, lon, poi, cluster in zip(toronto_labeled['Latitude'], toronto_labeled['Longitude'], toronto_labeled['Neighbourhood'], toronto_labeled['Cluster Labels']):\n",
" label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)\n",
" folium.CircleMarker(\n",
" [lat, lon],\n",
" radius=11,\n",
" popup=label,\n",
" color=rainbow[cluster-1],\n",
" fill=True,\n",
" fill_color=rainbow[cluster-1],\n",
" fill_opacity=0.5).add_to(map_clusters)\n",
" \n",
"map_clusters"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Analysis of the clusters\n",
"\n",
"Now, we are ready to examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, we can assign a name to each cluster."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Cluster 1: Coffee to go"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Borough</th>\n",
" <th>Cluster Labels</th>\n",
" <th>1st Most Common Venue</th>\n",
" <th>2nd Most Common Venue</th>\n",
" <th>3rd Most Common Venue</th>\n",
" <th>4th Most Common Venue</th>\n",
" <th>5th Most Common Venue</th>\n",
" <th>6th Most Common Venue</th>\n",
" <th>7th Most Common Venue</th>\n",
" <th>8th Most Common Venue</th>\n",
" <th>9th Most Common Venue</th>\n",
" <th>10th Most Common Venue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>North York</td>\n",
" <td>0</td>\n",
" <td>Clothing Store</td>\n",
" <td>Fast Food Restaurant</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Shoe Store</td>\n",
" <td>Cosmetics Shop</td>\n",
" <td>Japanese Restaurant</td>\n",
" <td>Bakery</td>\n",
" <td>Chinese Restaurant</td>\n",
" <td>Pharmacy</td>\n",
" <td>Sporting Goods Shop</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Scarborough</td>\n",
" <td>0</td>\n",
" <td>Breakfast Spot</td>\n",
" <td>Rental Car Location</td>\n",
" <td>Intersection</td>\n",
" <td>Pizza Place</td>\n",
" <td>Electronics Store</td>\n",
" <td>Medical Center</td>\n",
" <td>Mexican Restaurant</td>\n",
" <td>Drugstore</td>\n",
" <td>Dog Run</td>\n",
" <td>Doner Restaurant</td>\n",
" </tr>\n",
" <tr>\n",
" <th>77</th>\n",
" <td>West Toronto</td>\n",
" <td>0</td>\n",
" <td>Café</td>\n",
" <td>Breakfast Spot</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Italian Restaurant</td>\n",
" <td>Bar</td>\n",
" <td>Intersection</td>\n",
" <td>Bakery</td>\n",
" <td>Stadium</td>\n",
" <td>Restaurant</td>\n",
" <td>Climbing Gym</td>\n",
" </tr>\n",
" <tr>\n",
" <th>57</th>\n",
" <td>Downtown Toronto</td>\n",
" <td>0</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Café</td>\n",
" <td>Thai Restaurant</td>\n",
" <td>Bar</td>\n",
" <td>Steakhouse</td>\n",
" <td>Gym</td>\n",
" <td>Restaurant</td>\n",
" <td>American Restaurant</td>\n",
" <td>Breakfast Spot</td>\n",
" <td>Hotel</td>\n",
" </tr>\n",
" <tr>\n",
" <th>32</th>\n",
" <td>North York</td>\n",
" <td>0</td>\n",
" <td>Gym / Fitness Center</td>\n",
" <td>Grocery Store</td>\n",
" <td>Liquor Store</td>\n",
" <td>Athletics &amp; Sports</td>\n",
" <td>Yoga Studio</td>\n",
" <td>Eastern European Restaurant</td>\n",
" <td>Dog Run</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Donut Shop</td>\n",
" <td>Drugstore</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Borough Cluster Labels 1st Most Common Venue \\\n",
"17 North York 0 Clothing Store \n",
"2 Scarborough 0 Breakfast Spot \n",
"77 West Toronto 0 Café \n",
"57 Downtown Toronto 0 Coffee Shop \n",
"32 North York 0 Gym / Fitness Center \n",
"\n",
" 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue \\\n",
"17 Fast Food Restaurant Coffee Shop Shoe Store \n",
"2 Rental Car Location Intersection Pizza Place \n",
"77 Breakfast Spot Coffee Shop Italian Restaurant \n",
"57 Café Thai Restaurant Bar \n",
"32 Grocery Store Liquor Store Athletics & Sports \n",
"\n",
" 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue \\\n",
"17 Cosmetics Shop Japanese Restaurant Bakery \n",
"2 Electronics Store Medical Center Mexican Restaurant \n",
"77 Bar Intersection Bakery \n",
"57 Steakhouse Gym Restaurant \n",
"32 Yoga Studio Eastern European Restaurant Dog Run \n",
"\n",
" 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue \n",
"17 Chinese Restaurant Pharmacy Sporting Goods Shop \n",
"2 Drugstore Dog Run Doner Restaurant \n",
"77 Stadium Restaurant Climbing Gym \n",
"57 American Restaurant Breakfast Spot Hotel \n",
"32 Doner Restaurant Donut Shop Drugstore "
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toronto_labeled.loc[toronto_labeled['Cluster Labels'] == 0, toronto_labeled.columns[[1] + list(range(5, toronto_labeled.shape[1]))]].sample(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Cluster 2: Mixed bag"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Borough</th>\n",
" <th>Cluster Labels</th>\n",
" <th>1st Most Common Venue</th>\n",
" <th>2nd Most Common Venue</th>\n",
" <th>3rd Most Common Venue</th>\n",
" <th>4th Most Common Venue</th>\n",
" <th>5th Most Common Venue</th>\n",
" <th>6th Most Common Venue</th>\n",
" <th>7th Most Common Venue</th>\n",
" <th>8th Most Common Venue</th>\n",
" <th>9th Most Common Venue</th>\n",
" <th>10th Most Common Venue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>North York</td>\n",
" <td>1</td>\n",
" <td>Park</td>\n",
" <td>Yoga Studio</td>\n",
" <td>Eastern European Restaurant</td>\n",
" <td>Dive Bar</td>\n",
" <td>Dog Run</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Donut Shop</td>\n",
" <td>Drugstore</td>\n",
" <td>Dumpling Restaurant</td>\n",
" <td>Electronics Store</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Borough Cluster Labels 1st Most Common Venue 2nd Most Common Venue \\\n",
"20 North York 1 Park Yoga Studio \n",
"\n",
" 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue \\\n",
"20 Eastern European Restaurant Dive Bar Dog Run \n",
"\n",
" 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue \\\n",
"20 Doner Restaurant Donut Shop Drugstore \n",
"\n",
" 9th Most Common Venue 10th Most Common Venue \n",
"20 Dumpling Restaurant Electronics Store "
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toronto_labeled.loc[toronto_labeled['Cluster Labels'] == 1, toronto_labeled.columns[[1] + list(range(5, toronto_labeled.shape[1]))]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Cluster 3: Sports"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Borough</th>\n",
" <th>Cluster Labels</th>\n",
" <th>1st Most Common Venue</th>\n",
" <th>2nd Most Common Venue</th>\n",
" <th>3rd Most Common Venue</th>\n",
" <th>4th Most Common Venue</th>\n",
" <th>5th Most Common Venue</th>\n",
" <th>6th Most Common Venue</th>\n",
" <th>7th Most Common Venue</th>\n",
" <th>8th Most Common Venue</th>\n",
" <th>9th Most Common Venue</th>\n",
" <th>10th Most Common Venue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>95</th>\n",
" <td>North York</td>\n",
" <td>2</td>\n",
" <td>Baseball Field</td>\n",
" <td>Yoga Studio</td>\n",
" <td>Electronics Store</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Donut Shop</td>\n",
" <td>Drugstore</td>\n",
" <td>Dumpling Restaurant</td>\n",
" <td>Eastern European Restaurant</td>\n",
" <td>Empanada Restaurant</td>\n",
" <td>Dive Bar</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Borough Cluster Labels 1st Most Common Venue 2nd Most Common Venue \\\n",
"95 North York 2 Baseball Field Yoga Studio \n",
"\n",
" 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue \\\n",
"95 Electronics Store Doner Restaurant Donut Shop \n",
"\n",
" 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue \\\n",
"95 Drugstore Dumpling Restaurant Eastern European Restaurant \n",
"\n",
" 9th Most Common Venue 10th Most Common Venue \n",
"95 Empanada Restaurant Dive Bar "
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toronto_labeled.loc[toronto_labeled['Cluster Labels'] == 2, toronto_labeled.columns[[1] + list(range(5, toronto_labeled.shape[1]))]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Cluster 4: Banking"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Borough</th>\n",
" <th>Cluster Labels</th>\n",
" <th>1st Most Common Venue</th>\n",
" <th>2nd Most Common Venue</th>\n",
" <th>3rd Most Common Venue</th>\n",
" <th>4th Most Common Venue</th>\n",
" <th>5th Most Common Venue</th>\n",
" <th>6th Most Common Venue</th>\n",
" <th>7th Most Common Venue</th>\n",
" <th>8th Most Common Venue</th>\n",
" <th>9th Most Common Venue</th>\n",
" <th>10th Most Common Venue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>North York</td>\n",
" <td>3</td>\n",
" <td>Café</td>\n",
" <td>Chinese Restaurant</td>\n",
" <td>Japanese Restaurant</td>\n",
" <td>Bank</td>\n",
" <td>Yoga Studio</td>\n",
" <td>Dog Run</td>\n",
" <td>Donut Shop</td>\n",
" <td>Drugstore</td>\n",
" <td>Dumpling Restaurant</td>\n",
" <td>Eastern European Restaurant</td>\n",
" </tr>\n",
" <tr>\n",
" <th>92</th>\n",
" <td>Etobicoke</td>\n",
" <td>3</td>\n",
" <td>Bank</td>\n",
" <td>Yoga Studio</td>\n",
" <td>Electronics Store</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Donut Shop</td>\n",
" <td>Drugstore</td>\n",
" <td>Dumpling Restaurant</td>\n",
" <td>Eastern European Restaurant</td>\n",
" <td>Empanada Restaurant</td>\n",
" <td>Dive Bar</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Borough Cluster Labels 1st Most Common Venue 2nd Most Common Venue \\\n",
"18 North York 3 Café Chinese Restaurant \n",
"92 Etobicoke 3 Bank Yoga Studio \n",
"\n",
" 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue \\\n",
"18 Japanese Restaurant Bank Yoga Studio \n",
"92 Electronics Store Doner Restaurant Donut Shop \n",
"\n",
" 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue \\\n",
"18 Dog Run Donut Shop Drugstore \n",
"92 Drugstore Dumpling Restaurant Eastern European Restaurant \n",
"\n",
" 9th Most Common Venue 10th Most Common Venue \n",
"18 Dumpling Restaurant Eastern European Restaurant \n",
"92 Empanada Restaurant Dive Bar "
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toronto_labeled.loc[toronto_labeled['Cluster Labels'] == 3, toronto_labeled.columns[[1] + list(range(5, toronto_labeled.shape[1]))]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Cluster 5: Oddball"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Borough</th>\n",
" <th>Cluster Labels</th>\n",
" <th>1st Most Common Venue</th>\n",
" <th>2nd Most Common Venue</th>\n",
" <th>3rd Most Common Venue</th>\n",
" <th>4th Most Common Venue</th>\n",
" <th>5th Most Common Venue</th>\n",
" <th>6th Most Common Venue</th>\n",
" <th>7th Most Common Venue</th>\n",
" <th>8th Most Common Venue</th>\n",
" <th>9th Most Common Venue</th>\n",
" <th>10th Most Common Venue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>North York</td>\n",
" <td>4</td>\n",
" <td>Cafeteria</td>\n",
" <td>Yoga Studio</td>\n",
" <td>Electronics Store</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Donut Shop</td>\n",
" <td>Drugstore</td>\n",
" <td>Dumpling Restaurant</td>\n",
" <td>Eastern European Restaurant</td>\n",
" <td>Empanada Restaurant</td>\n",
" <td>Dive Bar</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Borough Cluster Labels 1st Most Common Venue 2nd Most Common Venue \\\n",
"19 North York 4 Cafeteria Yoga Studio \n",
"\n",
" 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue \\\n",
"19 Electronics Store Doner Restaurant Donut Shop \n",
"\n",
" 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue \\\n",
"19 Drugstore Dumpling Restaurant Eastern European Restaurant \n",
"\n",
" 9th Most Common Venue 10th Most Common Venue \n",
"19 Empanada Restaurant Dive Bar "
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toronto_labeled.loc[toronto_labeled['Cluster Labels'] == 4, toronto_labeled.columns[[1] + list(range(5, toronto_labeled.shape[1]))]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Cluster 6: Food & Shopping"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Borough</th>\n",
" <th>Cluster Labels</th>\n",
" <th>1st Most Common Venue</th>\n",
" <th>2nd Most Common Venue</th>\n",
" <th>3rd Most Common Venue</th>\n",
" <th>4th Most Common Venue</th>\n",
" <th>5th Most Common Venue</th>\n",
" <th>6th Most Common Venue</th>\n",
" <th>7th Most Common Venue</th>\n",
" <th>8th Most Common Venue</th>\n",
" <th>9th Most Common Venue</th>\n",
" <th>10th Most Common Venue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>34</th>\n",
" <td>East York</td>\n",
" <td>5</td>\n",
" <td>Fast Food Restaurant</td>\n",
" <td>Pizza Place</td>\n",
" <td>Gastropub</td>\n",
" <td>Bank</td>\n",
" <td>Intersection</td>\n",
" <td>Athletics &amp; Sports</td>\n",
" <td>Gym / Fitness Center</td>\n",
" <td>Pharmacy</td>\n",
" <td>Pet Store</td>\n",
" <td>Cosmetics Shop</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>Scarborough</td>\n",
" <td>5</td>\n",
" <td>Fast Food Restaurant</td>\n",
" <td>Chinese Restaurant</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Nail Salon</td>\n",
" <td>Grocery Store</td>\n",
" <td>Pharmacy</td>\n",
" <td>Pizza Place</td>\n",
" <td>Breakfast Spot</td>\n",
" <td>Sandwich Place</td>\n",
" <td>Thrift / Vintage Store</td>\n",
" </tr>\n",
" <tr>\n",
" <th>88</th>\n",
" <td>Etobicoke</td>\n",
" <td>5</td>\n",
" <td>Pizza Place</td>\n",
" <td>Gym</td>\n",
" <td>Pool</td>\n",
" <td>Skating Rink</td>\n",
" <td>Pharmacy</td>\n",
" <td>Pub</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Sandwich Place</td>\n",
" <td>Diner</td>\n",
" <td>Discount Store</td>\n",
" </tr>\n",
" <tr>\n",
" <th>97</th>\n",
" <td>Etobicoke</td>\n",
" <td>5</td>\n",
" <td>Pizza Place</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Discount Store</td>\n",
" <td>Chinese Restaurant</td>\n",
" <td>Middle Eastern Restaurant</td>\n",
" <td>Sandwich Place</td>\n",
" <td>Intersection</td>\n",
" <td>Electronics Store</td>\n",
" <td>Eastern European Restaurant</td>\n",
" <td>Dumpling Restaurant</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Scarborough</td>\n",
" <td>5</td>\n",
" <td>Fast Food Restaurant</td>\n",
" <td>Yoga Studio</td>\n",
" <td>Electronics Store</td>\n",
" <td>Dog Run</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Donut Shop</td>\n",
" <td>Drugstore</td>\n",
" <td>Dumpling Restaurant</td>\n",
" <td>Eastern European Restaurant</td>\n",
" <td>Empanada Restaurant</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Borough Cluster Labels 1st Most Common Venue 2nd Most Common Venue \\\n",
"34 East York 5 Fast Food Restaurant Pizza Place \n",
"15 Scarborough 5 Fast Food Restaurant Chinese Restaurant \n",
"88 Etobicoke 5 Pizza Place Gym \n",
"97 Etobicoke 5 Pizza Place Coffee Shop \n",
"0 Scarborough 5 Fast Food Restaurant Yoga Studio \n",
"\n",
" 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue \\\n",
"34 Gastropub Bank Intersection \n",
"15 Coffee Shop Nail Salon Grocery Store \n",
"88 Pool Skating Rink Pharmacy \n",
"97 Discount Store Chinese Restaurant Middle Eastern Restaurant \n",
"0 Electronics Store Dog Run Doner Restaurant \n",
"\n",
" 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue \\\n",
"34 Athletics & Sports Gym / Fitness Center Pharmacy \n",
"15 Pharmacy Pizza Place Breakfast Spot \n",
"88 Pub Coffee Shop Sandwich Place \n",
"97 Sandwich Place Intersection Electronics Store \n",
"0 Donut Shop Drugstore Dumpling Restaurant \n",
"\n",
" 9th Most Common Venue 10th Most Common Venue \n",
"34 Pet Store Cosmetics Shop \n",
"15 Sandwich Place Thrift / Vintage Store \n",
"88 Diner Discount Store \n",
"97 Eastern European Restaurant Dumpling Restaurant \n",
"0 Eastern European Restaurant Empanada Restaurant "
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toronto_labeled.loc[toronto_labeled['Cluster Labels'] == 5, toronto_labeled.columns[[1] + list(range(5, toronto_labeled.shape[1]))]].sample(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Cluster 7: Rest & Relaxation"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Borough</th>\n",
" <th>Cluster Labels</th>\n",
" <th>1st Most Common Venue</th>\n",
" <th>2nd Most Common Venue</th>\n",
" <th>3rd Most Common Venue</th>\n",
" <th>4th Most Common Venue</th>\n",
" <th>5th Most Common Venue</th>\n",
" <th>6th Most Common Venue</th>\n",
" <th>7th Most Common Venue</th>\n",
" <th>8th Most Common Venue</th>\n",
" <th>9th Most Common Venue</th>\n",
" <th>10th Most Common Venue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>47</th>\n",
" <td>Central Toronto</td>\n",
" <td>6</td>\n",
" <td>Park</td>\n",
" <td>Playground</td>\n",
" <td>Restaurant</td>\n",
" <td>Gym</td>\n",
" <td>Comic Shop</td>\n",
" <td>Dumpling Restaurant</td>\n",
" <td>Discount Store</td>\n",
" <td>Dive Bar</td>\n",
" <td>Dog Run</td>\n",
" <td>Doner Restaurant</td>\n",
" </tr>\n",
" <tr>\n",
" <th>98</th>\n",
" <td>Etobicoke</td>\n",
" <td>6</td>\n",
" <td>Park</td>\n",
" <td>Bus Line</td>\n",
" <td>Yoga Studio</td>\n",
" <td>Eastern European Restaurant</td>\n",
" <td>Dog Run</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Donut Shop</td>\n",
" <td>Drugstore</td>\n",
" <td>Dumpling Restaurant</td>\n",
" <td>Electronics Store</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>North York</td>\n",
" <td>6</td>\n",
" <td>Park</td>\n",
" <td>Convenience Store</td>\n",
" <td>Bank</td>\n",
" <td>Yoga Studio</td>\n",
" <td>Electronics Store</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Donut Shop</td>\n",
" <td>Drugstore</td>\n",
" <td>Dumpling Restaurant</td>\n",
" <td>Eastern European Restaurant</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>Scarborough</td>\n",
" <td>6</td>\n",
" <td>Park</td>\n",
" <td>Playground</td>\n",
" <td>Yoga Studio</td>\n",
" <td>Eastern European Restaurant</td>\n",
" <td>Dive Bar</td>\n",
" <td>Dog Run</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Donut Shop</td>\n",
" <td>Drugstore</td>\n",
" <td>Dumpling Restaurant</td>\n",
" </tr>\n",
" <tr>\n",
" <th>39</th>\n",
" <td>East York</td>\n",
" <td>6</td>\n",
" <td>Park</td>\n",
" <td>Pizza Place</td>\n",
" <td>Convenience Store</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Falafel Restaurant</td>\n",
" <td>Farm</td>\n",
" <td>Event Space</td>\n",
" <td>Ethiopian Restaurant</td>\n",
" <td>Empanada Restaurant</td>\n",
" <td>Diner</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Borough Cluster Labels 1st Most Common Venue \\\n",
"47 Central Toronto 6 Park \n",
"98 Etobicoke 6 Park \n",
"22 North York 6 Park \n",
"14 Scarborough 6 Park \n",
"39 East York 6 Park \n",
"\n",
" 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue \\\n",
"47 Playground Restaurant Gym \n",
"98 Bus Line Yoga Studio Eastern European Restaurant \n",
"22 Convenience Store Bank Yoga Studio \n",
"14 Playground Yoga Studio Eastern European Restaurant \n",
"39 Pizza Place Convenience Store Coffee Shop \n",
"\n",
" 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue \\\n",
"47 Comic Shop Dumpling Restaurant Discount Store \n",
"98 Dog Run Doner Restaurant Donut Shop \n",
"22 Electronics Store Doner Restaurant Donut Shop \n",
"14 Dive Bar Dog Run Doner Restaurant \n",
"39 Falafel Restaurant Farm Event Space \n",
"\n",
" 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue \n",
"47 Dive Bar Dog Run Doner Restaurant \n",
"98 Drugstore Dumpling Restaurant Electronics Store \n",
"22 Drugstore Dumpling Restaurant Eastern European Restaurant \n",
"14 Donut Shop Drugstore Dumpling Restaurant \n",
"39 Ethiopian Restaurant Empanada Restaurant Diner "
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toronto_labeled.loc[toronto_labeled['Cluster Labels'] == 6, toronto_labeled.columns[[1] + list(range(5, toronto_labeled.shape[1]))]].sample(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Conclusion\n",
"\n",
"As we already suspected, the map is dominated by the majority cluster 0, while the other two big clusters (5 & 6) only start appearing once you leave the center of Toronto city. As for the remaining 4 outlier clusters, they appear on the outskirts of the Toronto city area and only have one or two members in them. \n",
"\n",
"Looking at the clusters more closely, there seems to be an abundance of fast food & beverage venues found in the biggest cluster. The 2nd and 3rd biggest clusters seem to address the needs of people in search of places to have lunch and/or dinner, or places to get outdoors and/or relax respectively. The rest of the clusters seem to be in neighbourhoods that cater to more specific needs, like electronics shopping or banking, which explains the low number of member neighbourhoods in them.\n",
"\n",
"Overall, the visualization on the map accompanied by the inspection of the data reveals that there are mainly to types of areas in Toronto. The first is the downtown city life while the other appears to be more work related on the outskirts of the city. This is important information for people considering moving to or within Toronto."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python",
"language": "python",
"name": "conda-env-python-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment