Skip to content

Instantly share code, notes, and snippets.

@pskifast
Created February 19, 2020 01:29
Show Gist options
  • Save pskifast/417aa2ee2d58189f74e1469401bb4ce7 to your computer and use it in GitHub Desktop.
Save pskifast/417aa2ee2d58189f74e1469401bb4ce7 to your computer and use it in GitHub Desktop.
Created on Cognitive Class Labs
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Segmenting and Clustering Neighborhoods in Toronto"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Solving environment: done\n",
"\n",
"\n",
"==> WARNING: A newer version of conda exists. <==\n",
" current version: 4.5.11\n",
" latest version: 4.8.2\n",
"\n",
"Please update conda by running\n",
"\n",
" $ conda update -n base -c defaults conda\n",
"\n",
"\n",
"\n",
"# All requested packages already installed.\n",
"\n",
"Requirement already satisfied: lxml in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (4.5.0)\n",
"All imported!\n"
]
}
],
"source": [
"# importing required libraries\n",
"\n",
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"%matplotlib inline\n",
"\n",
"!conda install -c conda-forge geopy --yes\n",
"from geopy.geocoders import Nominatim\n",
"\n",
"import folium\n",
"\n",
"import requests\n",
"\n",
"from pandas.io.json import json_normalize\n",
"\n",
"import matplotlib.cm as cm\n",
"import matplotlib.colors as colors\n",
"\n",
"from sklearn.cluster import KMeans\n",
"\n",
"!pip install lxml\n",
"\n",
"print('All imported!')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Section 1: Web scraping Wikipedia HTML tables\n",
"\n",
"web site: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PostalCode</th>\n",
" <th>Borough</th>\n",
" <th>Neighborhood</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>M1A</td>\n",
" <td>Not assigned</td>\n",
" <td>Not assigned</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>M2A</td>\n",
" <td>Not assigned</td>\n",
" <td>Not assigned</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>M3A</td>\n",
" <td>North York</td>\n",
" <td>Parkwoods</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>M4A</td>\n",
" <td>North York</td>\n",
" <td>Victoria Village</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>M5A</td>\n",
" <td>Downtown Toronto</td>\n",
" <td>Harbourfront</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>282</th>\n",
" <td>M8Z</td>\n",
" <td>Etobicoke</td>\n",
" <td>Mimico NW</td>\n",
" </tr>\n",
" <tr>\n",
" <th>283</th>\n",
" <td>M8Z</td>\n",
" <td>Etobicoke</td>\n",
" <td>The Queensway West</td>\n",
" </tr>\n",
" <tr>\n",
" <th>284</th>\n",
" <td>M8Z</td>\n",
" <td>Etobicoke</td>\n",
" <td>Royal York South West</td>\n",
" </tr>\n",
" <tr>\n",
" <th>285</th>\n",
" <td>M8Z</td>\n",
" <td>Etobicoke</td>\n",
" <td>South of Bloor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>286</th>\n",
" <td>M9Z</td>\n",
" <td>Not assigned</td>\n",
" <td>Not assigned</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>287 rows × 3 columns</p>\n",
"</div>"
],
"text/plain": [
" PostalCode Borough Neighborhood\n",
"0 M1A Not assigned Not assigned\n",
"1 M2A Not assigned Not assigned\n",
"2 M3A North York Parkwoods\n",
"3 M4A North York Victoria Village\n",
"4 M5A Downtown Toronto Harbourfront\n",
".. ... ... ...\n",
"282 M8Z Etobicoke Mimico NW\n",
"283 M8Z Etobicoke The Queensway West\n",
"284 M8Z Etobicoke Royal York South West\n",
"285 M8Z Etobicoke South of Bloor\n",
"286 M9Z Not assigned Not assigned\n",
"\n",
"[287 rows x 3 columns]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Parsing the tables of the target webpage\n",
"data = pd.read_html('http://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M', header=0)\n",
"df = data[0]\n",
"\n",
"# Changing column names\n",
"df.columns = ['PostalCode', 'Borough', 'Neighborhood']\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Section 2: Performing required operations"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PostalCode</th>\n",
" <th>Borough</th>\n",
" <th>Neighborhood</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>M1B</td>\n",
" <td>Scarborough</td>\n",
" <td>Rouge, Malvern</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>M1C</td>\n",
" <td>Scarborough</td>\n",
" <td>Highland Creek, Rouge Hill, Port Union</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>M1E</td>\n",
" <td>Scarborough</td>\n",
" <td>Guildwood, Morningside, West Hill</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>M1G</td>\n",
" <td>Scarborough</td>\n",
" <td>Woburn</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>M1H</td>\n",
" <td>Scarborough</td>\n",
" <td>Cedarbrae</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>98</th>\n",
" <td>M9N</td>\n",
" <td>York</td>\n",
" <td>Weston</td>\n",
" </tr>\n",
" <tr>\n",
" <th>99</th>\n",
" <td>M9P</td>\n",
" <td>Etobicoke</td>\n",
" <td>Westmount</td>\n",
" </tr>\n",
" <tr>\n",
" <th>100</th>\n",
" <td>M9R</td>\n",
" <td>Etobicoke</td>\n",
" <td>Kingsview Village, Martin Grove Gardens, Richv...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>101</th>\n",
" <td>M9V</td>\n",
" <td>Etobicoke</td>\n",
" <td>Albion Gardens, Beaumond Heights, Humbergate, ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>102</th>\n",
" <td>M9W</td>\n",
" <td>Etobicoke</td>\n",
" <td>Northwest</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>103 rows × 3 columns</p>\n",
"</div>"
],
"text/plain": [
" PostalCode Borough Neighborhood\n",
"0 M1B Scarborough Rouge, Malvern\n",
"1 M1C Scarborough Highland Creek, Rouge Hill, Port Union\n",
"2 M1E Scarborough Guildwood, Morningside, West Hill\n",
"3 M1G Scarborough Woburn\n",
"4 M1H Scarborough Cedarbrae\n",
".. ... ... ...\n",
"98 M9N York Weston\n",
"99 M9P Etobicoke Westmount\n",
"100 M9R Etobicoke Kingsview Village, Martin Grove Gardens, Richv...\n",
"101 M9V Etobicoke Albion Gardens, Beaumond Heights, Humbergate, ...\n",
"102 M9W Etobicoke Northwest\n",
"\n",
"[103 rows x 3 columns]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Data cleaning per the instructions\n",
"df.drop(df[df.Borough == 'Not assigned'].index, inplace=True)\n",
"df.loc[df.Neighborhood == 'Not assigned', 'Neighborhood'] = df['Borough']\n",
"df.reset_index(drop=True, inplace=True)\n",
"df = df.groupby('PostalCode').agg({'Borough':'first','Neighborhood': ', '.join}).reset_index()\n",
"\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Section 3: Verifying results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Testing if grouping works, by comparing with table given in the instructions**"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PostalCode</th>\n",
" <th>Borough</th>\n",
" <th>Neighborhood</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>101</th>\n",
" <td>M9V</td>\n",
" <td>Etobicoke</td>\n",
" <td>Albion Gardens, Beaumond Heights, Humbergate, ...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PostalCode Borough Neighborhood\n",
"101 M9V Etobicoke Albion Gardens, Beaumond Heights, Humbergate, ..."
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[df['PostalCode'] == 'M9V']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Printing number of rows of the resulting table**"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Final table has 103 rows.\n"
]
}
],
"source": [
"rows = df.shape\n",
"print('Final table has',df.shape[0], 'rows.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Section 4: Adding coordinates"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Postal Code</th>\n",
" <th>Latitude</th>\n",
" <th>Longitude</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>M1B</td>\n",
" <td>43.806686</td>\n",
" <td>-79.194353</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>M1C</td>\n",
" <td>43.784535</td>\n",
" <td>-79.160497</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>M1E</td>\n",
" <td>43.763573</td>\n",
" <td>-79.188711</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>M1G</td>\n",
" <td>43.770992</td>\n",
" <td>-79.216917</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>M1H</td>\n",
" <td>43.773136</td>\n",
" <td>-79.239476</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Postal Code Latitude Longitude\n",
"0 M1B 43.806686 -79.194353\n",
"1 M1C 43.784535 -79.160497\n",
"2 M1E 43.763573 -79.188711\n",
"3 M1G 43.770992 -79.216917\n",
"4 M1H 43.773136 -79.239476"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# geocoder module is not working; using the csv file instead\n",
"# importing coordinates\n",
"coordinates = pd.DataFrame()\n",
"coordinates = pd.read_csv('Geospatial_Coordinates.csv',',')\n",
"\n",
"coordinates.head()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PostalCode</th>\n",
" <th>Borough</th>\n",
" <th>Neighborhood</th>\n",
" <th>Latitude</th>\n",
" <th>Longitude</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>M1B</td>\n",
" <td>Scarborough</td>\n",
" <td>Rouge, Malvern</td>\n",
" <td>43.806686</td>\n",
" <td>-79.194353</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>M1C</td>\n",
" <td>Scarborough</td>\n",
" <td>Highland Creek, Rouge Hill, Port Union</td>\n",
" <td>43.784535</td>\n",
" <td>-79.160497</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>M1E</td>\n",
" <td>Scarborough</td>\n",
" <td>Guildwood, Morningside, West Hill</td>\n",
" <td>43.763573</td>\n",
" <td>-79.188711</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>M1G</td>\n",
" <td>Scarborough</td>\n",
" <td>Woburn</td>\n",
" <td>43.770992</td>\n",
" <td>-79.216917</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>M1H</td>\n",
" <td>Scarborough</td>\n",
" <td>Cedarbrae</td>\n",
" <td>43.773136</td>\n",
" <td>-79.239476</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PostalCode Borough Neighborhood Latitude \\\n",
"0 M1B Scarborough Rouge, Malvern 43.806686 \n",
"1 M1C Scarborough Highland Creek, Rouge Hill, Port Union 43.784535 \n",
"2 M1E Scarborough Guildwood, Morningside, West Hill 43.763573 \n",
"3 M1G Scarborough Woburn 43.770992 \n",
"4 M1H Scarborough Cedarbrae 43.773136 \n",
"\n",
" Longitude \n",
"0 -79.194353 \n",
"1 -79.160497 \n",
"2 -79.188711 \n",
"3 -79.216917 \n",
"4 -79.239476 "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# populating original table with coordinates based on Postal Code\n",
"# renaiming columns in order to enable semaless merging\n",
"coordinates.columns = ['PostalCode','Latitude','Longitude']\n",
"# merging the two dataframes\n",
"df1 = pd.merge(df, coordinates, on=['PostalCode'])\n",
"df1.head()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(103, 5)"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df1.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Section 5: Visualising Toronto and its neighborhoods"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**First, I find the coordinates of Toronto and I define an instance of the geocoder**"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The geograpical coordinates of Toronto are 43.653963, -79.387207.\n"
]
}
],
"source": [
"address = 'Toronto, CA'\n",
"\n",
"geolocator = Nominatim(user_agent=\"toronto_explorer\")\n",
"location = geolocator.geocode(address)\n",
"latitude = location.latitude\n",
"longitude = location.longitude\n",
"print('The geograpical coordinates of Toronto are {}, {}.'.format(latitude, longitude))"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"width:100%;\"><div style=\"position:relative;width:100%;height:0;padding-bottom:60%;\"><iframe src=\"data:text/html;charset=utf-8;base64,\" style=\"position:absolute;width:100%;height:100%;left:0;top:0;border:none !important;\" allowfullscreen webkitallowfullscreen mozallowfullscreen></iframe></div></div>"
],
"text/plain": [
"<folium.folium.Map at 0x7f8f8fa82cc0>"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# create map of Toronto using latitude and longitude values\n",
"map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)\n",
"\n",
"# add markers to map\n",
"for lat, lng, borough, neighborhood in zip(df1['Latitude'], df1['Longitude'], df1['Borough'], df1['Neighborhood']):\n",
" label = '{}, {}'.format(neighborhood, borough)\n",
" label = folium.Popup(label, parse_html=True)\n",
" folium.CircleMarker(\n",
" [lat, lng],\n",
" radius=5,\n",
" popup=label,\n",
" color='blue',\n",
" fill=True,\n",
" fill_color='#3186cc',\n",
" fill_opacity=0.7,\n",
" parse_html=False).add_to(map_toronto) \n",
" \n",
"map_toronto"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Section 6: Collecting top venues for each neighborhood"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Connecting to Foursquare in order to retrieve data**"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"CLIENT_ID = 'Y1GHOWVY0PDCNUSIRMZDY41TVH5OF5MGXEMUHMNAFPPONZCN' # your Foursquare ID\n",
"CLIENT_SECRET = 'HBA5ORUDJOMI1AVMMAE1RPXFQWDUCZQGRY0RJHISCZXIUJ4N' # your Foursquare Secret\n",
"VERSION = '20140101' # Foursquare API version\n",
"limit = 100"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"def getNearbyVenues(names, latitudes, longitudes, radius=500):\n",
" venues_list=[]\n",
" hood = 0\n",
" for name, lat, lng in zip(names, latitudes, longitudes):\n",
" print(name)\n",
" # create the API request URL\n",
" url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(\n",
" CLIENT_ID, \n",
" CLIENT_SECRET, \n",
" VERSION, \n",
" lat, \n",
" lng, \n",
" radius, \n",
" limit)\n",
" \n",
" # make the GET request\n",
" results = requests.get(url).json()[\"response\"]['groups'][0]['items']\n",
" \n",
" # return only relevant information for each nearby venue\n",
" venues_list.append([(\n",
" name, \n",
" lat, \n",
" lng, \n",
" v['venue']['name'], \n",
" v['venue']['location']['lat'], \n",
" v['venue']['location']['lng'], \n",
" v['venue']['categories'][0]['name']) for v in results])\n",
"\n",
" nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])\n",
" nearby_venues.columns = ['Neighborhood', \n",
" 'Neighborhood Latitude', \n",
" 'Neighborhood Longitude', \n",
" 'Venue', \n",
" 'Venue Latitude', \n",
" 'Venue Longitude', \n",
" 'Venue Category']\n",
"\n",
" return(nearby_venues)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Rouge, Malvern\n",
"Highland Creek, Rouge Hill, Port Union\n",
"Guildwood, Morningside, West Hill\n",
"Woburn\n",
"Cedarbrae\n",
"Scarborough Village\n",
"East Birchmount Park, Ionview, Kennedy Park\n",
"Clairlea, Golden Mile, Oakridge\n",
"Cliffcrest, Cliffside, Scarborough Village West\n",
"Birch Cliff, Cliffside West\n",
"Dorset Park, Scarborough Town Centre, Wexford Heights\n",
"Maryvale, Wexford\n",
"Agincourt\n",
"Clarks Corners, Sullivan, Tam O'Shanter\n",
"Agincourt North, L'Amoreaux East, Milliken, Steeles East\n",
"L'Amoreaux West\n",
"Upper Rouge\n",
"Hillcrest Village\n",
"Fairview, Henry Farm, Oriole\n",
"Bayview Village\n",
"Silver Hills, York Mills\n",
"Newtonbrook, Willowdale\n",
"Willowdale South\n",
"York Mills West\n",
"Willowdale West\n",
"Parkwoods\n",
"Don Mills North\n",
"Flemingdon Park, Don Mills South\n",
"Bathurst Manor, Downsview North, Wilson Heights\n",
"Northwood Park, York University\n",
"CFB Toronto, Downsview East\n",
"Downsview West\n",
"Downsview Central\n",
"Downsview Northwest\n",
"Victoria Village\n",
"Woodbine Gardens, Parkview Hill\n",
"Woodbine Heights\n",
"The Beaches\n",
"Leaside\n",
"Thorncliffe Park\n",
"East Toronto\n",
"The Danforth West, Riverdale\n",
"The Beaches West, India Bazaar\n",
"Studio District\n",
"Lawrence Park\n",
"Davisville North\n",
"North Toronto West\n",
"Davisville\n",
"Moore Park, Summerhill East\n",
"Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West\n",
"Rosedale\n",
"Cabbagetown, St. James Town\n",
"Church and Wellesley\n",
"Harbourfront\n",
"Ryerson, Garden District\n",
"St. James Town\n",
"Berczy Park\n",
"Central Bay Street\n",
"Adelaide, King, Richmond\n",
"Harbourfront East, Toronto Islands, Union Station\n",
"Design Exchange, Toronto Dominion Centre\n",
"Commerce Court, Victoria Hotel\n",
"Bedford Park, Lawrence Manor East\n",
"Roselawn\n",
"Forest Hill North, Forest Hill West\n",
"The Annex, North Midtown, Yorkville\n",
"Harbord, University of Toronto\n",
"Chinatown, Grange Park, Kensington Market\n",
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara\n",
"Stn A PO Boxes 25 The Esplanade\n",
"First Canadian Place, Underground city\n",
"Lawrence Heights, Lawrence Manor\n",
"Glencairn\n",
"Humewood-Cedarvale\n",
"Caledonia-Fairbanks\n",
"Christie\n",
"Dovercourt Village, Dufferin\n",
"Little Portugal, Trinity\n",
"Brockton, Exhibition Place, Parkdale Village\n",
"Downsview, North Park, Upwood Park\n",
"Del Ray, Keelesdale, Mount Dennis, Silverthorn\n",
"The Junction North, Runnymede\n",
"High Park, The Junction South\n",
"Parkdale, Roncesvalles\n",
"Runnymede, Swansea\n",
"Queen's Park\n",
"Canada Post Gateway Processing Centre\n",
"Business Reply Mail Processing Centre 969 Eastern\n",
"Humber Bay Shores, Mimico South, New Toronto\n",
"Alderwood, Long Branch\n",
"The Kingsway, Montgomery Road, Old Mill North\n",
"Humber Bay, King's Mill Park, Kingsway Park South East, Mimico NE, Old Mill South, The Queensway East, Royal York South East, Sunnylea\n",
"Kingsway Park South West, Mimico NW, The Queensway West, Royal York South West, South of Bloor\n",
"Queen's Park\n",
"Cloverdale, Islington, Martin Grove, Princess Gardens, West Deane Park\n",
"Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe\n",
"Humber Summit\n",
"Emery, Humberlea\n",
"Weston\n",
"Westmount\n",
"Kingsview Village, Martin Grove Gardens, Richview Gardens, St. Phillips\n",
"Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown\n",
"Northwest\n"
]
}
],
"source": [
"toronto_venues = getNearbyVenues(names=df1['Neighborhood'],\n",
" latitudes=df1['Latitude'],\n",
" longitudes=df1['Longitude']\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Neighborhood Latitude</th>\n",
" <th>Neighborhood Longitude</th>\n",
" <th>Venue</th>\n",
" <th>Venue Latitude</th>\n",
" <th>Venue Longitude</th>\n",
" <th>Venue Category</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Neighborhood</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Adelaide, King, Richmond</th>\n",
" <td>100</td>\n",
" <td>100</td>\n",
" <td>100</td>\n",
" <td>100</td>\n",
" <td>100</td>\n",
" <td>100</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Agincourt</th>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Agincourt North, L'Amoreaux East, Milliken, Steeles East</th>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown</th>\n",
" <td>13</td>\n",
" <td>13</td>\n",
" <td>13</td>\n",
" <td>13</td>\n",
" <td>13</td>\n",
" <td>13</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Alderwood, Long Branch</th>\n",
" <td>10</td>\n",
" <td>10</td>\n",
" <td>10</td>\n",
" <td>10</td>\n",
" <td>10</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Willowdale West</th>\n",
" <td>7</td>\n",
" <td>7</td>\n",
" <td>7</td>\n",
" <td>7</td>\n",
" <td>7</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Woburn</th>\n",
" <td>4</td>\n",
" <td>4</td>\n",
" <td>4</td>\n",
" <td>4</td>\n",
" <td>4</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Woodbine Gardens, Parkview Hill</th>\n",
" <td>12</td>\n",
" <td>12</td>\n",
" <td>12</td>\n",
" <td>12</td>\n",
" <td>12</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Woodbine Heights</th>\n",
" <td>9</td>\n",
" <td>9</td>\n",
" <td>9</td>\n",
" <td>9</td>\n",
" <td>9</td>\n",
" <td>9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>York Mills West</th>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>100 rows × 6 columns</p>\n",
"</div>"
],
"text/plain": [
" Neighborhood Latitude \\\n",
"Neighborhood \n",
"Adelaide, King, Richmond 100 \n",
"Agincourt 5 \n",
"Agincourt North, L'Amoreaux East, Milliken, Ste... 2 \n",
"Albion Gardens, Beaumond Heights, Humbergate, J... 13 \n",
"Alderwood, Long Branch 10 \n",
"... ... \n",
"Willowdale West 7 \n",
"Woburn 4 \n",
"Woodbine Gardens, Parkview Hill 12 \n",
"Woodbine Heights 9 \n",
"York Mills West 5 \n",
"\n",
" Neighborhood Longitude \\\n",
"Neighborhood \n",
"Adelaide, King, Richmond 100 \n",
"Agincourt 5 \n",
"Agincourt North, L'Amoreaux East, Milliken, Ste... 2 \n",
"Albion Gardens, Beaumond Heights, Humbergate, J... 13 \n",
"Alderwood, Long Branch 10 \n",
"... ... \n",
"Willowdale West 7 \n",
"Woburn 4 \n",
"Woodbine Gardens, Parkview Hill 12 \n",
"Woodbine Heights 9 \n",
"York Mills West 5 \n",
"\n",
" Venue Venue Latitude \\\n",
"Neighborhood \n",
"Adelaide, King, Richmond 100 100 \n",
"Agincourt 5 5 \n",
"Agincourt North, L'Amoreaux East, Milliken, Ste... 2 2 \n",
"Albion Gardens, Beaumond Heights, Humbergate, J... 13 13 \n",
"Alderwood, Long Branch 10 10 \n",
"... ... ... \n",
"Willowdale West 7 7 \n",
"Woburn 4 4 \n",
"Woodbine Gardens, Parkview Hill 12 12 \n",
"Woodbine Heights 9 9 \n",
"York Mills West 5 5 \n",
"\n",
" Venue Longitude \\\n",
"Neighborhood \n",
"Adelaide, King, Richmond 100 \n",
"Agincourt 5 \n",
"Agincourt North, L'Amoreaux East, Milliken, Ste... 2 \n",
"Albion Gardens, Beaumond Heights, Humbergate, J... 13 \n",
"Alderwood, Long Branch 10 \n",
"... ... \n",
"Willowdale West 7 \n",
"Woburn 4 \n",
"Woodbine Gardens, Parkview Hill 12 \n",
"Woodbine Heights 9 \n",
"York Mills West 5 \n",
"\n",
" Venue Category \n",
"Neighborhood \n",
"Adelaide, King, Richmond 100 \n",
"Agincourt 5 \n",
"Agincourt North, L'Amoreaux East, Milliken, Ste... 2 \n",
"Albion Gardens, Beaumond Heights, Humbergate, J... 13 \n",
"Alderwood, Long Branch 10 \n",
"... ... \n",
"Willowdale West 7 \n",
"Woburn 4 \n",
"Woodbine Gardens, Parkview Hill 12 \n",
"Woodbine Heights 9 \n",
"York Mills West 5 \n",
"\n",
"[100 rows x 6 columns]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toronto_venues.groupby('Neighborhood').count()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Section 7: Analysing neighborhood based on top venues"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Yoga Studio</th>\n",
" <th>Accessories Store</th>\n",
" <th>Afghan Restaurant</th>\n",
" <th>Airport</th>\n",
" <th>Airport Food Court</th>\n",
" <th>Airport Gate</th>\n",
" <th>Airport Lounge</th>\n",
" <th>Airport Service</th>\n",
" <th>Airport Terminal</th>\n",
" <th>American Restaurant</th>\n",
" <th>...</th>\n",
" <th>Trail</th>\n",
" <th>Train Station</th>\n",
" <th>Vegetarian / Vegan Restaurant</th>\n",
" <th>Video Game Store</th>\n",
" <th>Video Store</th>\n",
" <th>Vietnamese Restaurant</th>\n",
" <th>Warehouse Store</th>\n",
" <th>Wine Bar</th>\n",
" <th>Wings Joint</th>\n",
" <th>Women's Store</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 267 columns</p>\n",
"</div>"
],
"text/plain": [
" Yoga Studio Accessories Store Afghan Restaurant Airport \\\n",
"0 0 0 0 0 \n",
"1 0 0 0 0 \n",
"2 0 0 0 0 \n",
"3 0 0 0 0 \n",
"4 0 0 0 0 \n",
"\n",
" Airport Food Court Airport Gate Airport Lounge Airport Service \\\n",
"0 0 0 0 0 \n",
"1 0 0 0 0 \n",
"2 0 0 0 0 \n",
"3 0 0 0 0 \n",
"4 0 0 0 0 \n",
"\n",
" Airport Terminal American Restaurant ... Trail Train Station \\\n",
"0 0 0 ... 0 0 \n",
"1 0 0 ... 0 0 \n",
"2 0 0 ... 0 0 \n",
"3 0 0 ... 0 0 \n",
"4 0 0 ... 0 0 \n",
"\n",
" Vegetarian / Vegan Restaurant Video Game Store Video Store \\\n",
"0 0 0 0 \n",
"1 0 0 0 \n",
"2 0 0 0 \n",
"3 0 0 0 \n",
"4 0 0 0 \n",
"\n",
" Vietnamese Restaurant Warehouse Store Wine Bar Wings Joint \\\n",
"0 0 0 0 0 \n",
"1 0 0 0 0 \n",
"2 0 0 0 0 \n",
"3 0 0 0 0 \n",
"4 0 0 0 0 \n",
"\n",
" Women's Store \n",
"0 0 \n",
"1 0 \n",
"2 0 \n",
"3 0 \n",
"4 0 \n",
"\n",
"[5 rows x 267 columns]"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix=\"\", prefix_sep=\"\")\n",
"\n",
"# add neighborhood column back to dataframe\n",
"toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] \n",
"\n",
"# move neighborhood column to the first column\n",
"fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])\n",
"toronto_onehot = toronto_onehot[fixed_columns]\n",
"\n",
"toronto_onehot.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Grouping rows by neighborhood and by taking the mean of the frequency of occurrence of each category**"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Neighborhood</th>\n",
" <th>Yoga Studio</th>\n",
" <th>Accessories Store</th>\n",
" <th>Afghan Restaurant</th>\n",
" <th>Airport</th>\n",
" <th>Airport Food Court</th>\n",
" <th>Airport Gate</th>\n",
" <th>Airport Lounge</th>\n",
" <th>Airport Service</th>\n",
" <th>Airport Terminal</th>\n",
" <th>...</th>\n",
" <th>Trail</th>\n",
" <th>Train Station</th>\n",
" <th>Vegetarian / Vegan Restaurant</th>\n",
" <th>Video Game Store</th>\n",
" <th>Video Store</th>\n",
" <th>Vietnamese Restaurant</th>\n",
" <th>Warehouse Store</th>\n",
" <th>Wine Bar</th>\n",
" <th>Wings Joint</th>\n",
" <th>Women's Store</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Adelaide, King, Richmond</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.02</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.01</td>\n",
" <td>0.0</td>\n",
" <td>0.01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Agincourt</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Agincourt North, L'Amoreaux East, Milliken, St...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Albion Gardens, Beaumond Heights, Humbergate, ...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Alderwood, Long Branch</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>95</th>\n",
" <td>Willowdale West</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>96</th>\n",
" <td>Woburn</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>97</th>\n",
" <td>Woodbine Gardens, Parkview Hill</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>98</th>\n",
" <td>Woodbine Heights</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.111111</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>99</th>\n",
" <td>York Mills West</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>100 rows × 267 columns</p>\n",
"</div>"
],
"text/plain": [
" Neighborhood Yoga Studio \\\n",
"0 Adelaide, King, Richmond 0.0 \n",
"1 Agincourt 0.0 \n",
"2 Agincourt North, L'Amoreaux East, Milliken, St... 0.0 \n",
"3 Albion Gardens, Beaumond Heights, Humbergate, ... 0.0 \n",
"4 Alderwood, Long Branch 0.0 \n",
".. ... ... \n",
"95 Willowdale West 0.0 \n",
"96 Woburn 0.0 \n",
"97 Woodbine Gardens, Parkview Hill 0.0 \n",
"98 Woodbine Heights 0.0 \n",
"99 York Mills West 0.0 \n",
"\n",
" Accessories Store Afghan Restaurant Airport Airport Food Court \\\n",
"0 0.0 0.0 0.0 0.0 \n",
"1 0.0 0.0 0.0 0.0 \n",
"2 0.0 0.0 0.0 0.0 \n",
"3 0.0 0.0 0.0 0.0 \n",
"4 0.0 0.0 0.0 0.0 \n",
".. ... ... ... ... \n",
"95 0.0 0.0 0.0 0.0 \n",
"96 0.0 0.0 0.0 0.0 \n",
"97 0.0 0.0 0.0 0.0 \n",
"98 0.0 0.0 0.0 0.0 \n",
"99 0.0 0.0 0.0 0.0 \n",
"\n",
" Airport Gate Airport Lounge Airport Service Airport Terminal ... \\\n",
"0 0.0 0.0 0.0 0.0 ... \n",
"1 0.0 0.0 0.0 0.0 ... \n",
"2 0.0 0.0 0.0 0.0 ... \n",
"3 0.0 0.0 0.0 0.0 ... \n",
"4 0.0 0.0 0.0 0.0 ... \n",
".. ... ... ... ... ... \n",
"95 0.0 0.0 0.0 0.0 ... \n",
"96 0.0 0.0 0.0 0.0 ... \n",
"97 0.0 0.0 0.0 0.0 ... \n",
"98 0.0 0.0 0.0 0.0 ... \n",
"99 0.0 0.0 0.0 0.0 ... \n",
"\n",
" Trail Train Station Vegetarian / Vegan Restaurant Video Game Store \\\n",
"0 0.0 0.0 0.02 0.0 \n",
"1 0.0 0.0 0.00 0.0 \n",
"2 0.0 0.0 0.00 0.0 \n",
"3 0.0 0.0 0.00 0.0 \n",
"4 0.0 0.0 0.00 0.0 \n",
".. ... ... ... ... \n",
"95 0.0 0.0 0.00 0.0 \n",
"96 0.0 0.0 0.00 0.0 \n",
"97 0.0 0.0 0.00 0.0 \n",
"98 0.0 0.0 0.00 0.0 \n",
"99 0.0 0.0 0.00 0.0 \n",
"\n",
" Video Store Vietnamese Restaurant Warehouse Store Wine Bar \\\n",
"0 0.000000 0.0 0.0 0.01 \n",
"1 0.000000 0.0 0.0 0.00 \n",
"2 0.000000 0.0 0.0 0.00 \n",
"3 0.000000 0.0 0.0 0.00 \n",
"4 0.000000 0.0 0.0 0.00 \n",
".. ... ... ... ... \n",
"95 0.000000 0.0 0.0 0.00 \n",
"96 0.000000 0.0 0.0 0.00 \n",
"97 0.000000 0.0 0.0 0.00 \n",
"98 0.111111 0.0 0.0 0.00 \n",
"99 0.000000 0.0 0.0 0.00 \n",
"\n",
" Wings Joint Women's Store \n",
"0 0.0 0.01 \n",
"1 0.0 0.00 \n",
"2 0.0 0.00 \n",
"3 0.0 0.00 \n",
"4 0.0 0.00 \n",
".. ... ... \n",
"95 0.0 0.00 \n",
"96 0.0 0.00 \n",
"97 0.0 0.00 \n",
"98 0.0 0.00 \n",
"99 0.0 0.00 \n",
"\n",
"[100 rows x 267 columns]"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()\n",
"toronto_grouped"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Sorting the top 10 venues and putting them into a dataframe**"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"def return_most_common_venues(row, num_top_venues):\n",
" row_categories = row.iloc[1:]\n",
" row_categories_sorted = row_categories.sort_values(ascending=False)\n",
" \n",
" return row_categories_sorted.index.values[0:num_top_venues]"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Neighborhood</th>\n",
" <th>1st Most Common Venue</th>\n",
" <th>2nd Most Common Venue</th>\n",
" <th>3rd Most Common Venue</th>\n",
" <th>4th Most Common Venue</th>\n",
" <th>5th Most Common Venue</th>\n",
" <th>6th Most Common Venue</th>\n",
" <th>7th Most Common Venue</th>\n",
" <th>8th Most Common Venue</th>\n",
" <th>9th Most Common Venue</th>\n",
" <th>10th Most Common Venue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Adelaide, King, Richmond</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Café</td>\n",
" <td>Bar</td>\n",
" <td>Thai Restaurant</td>\n",
" <td>Steakhouse</td>\n",
" <td>Sushi Restaurant</td>\n",
" <td>Bakery</td>\n",
" <td>Cosmetics Shop</td>\n",
" <td>Restaurant</td>\n",
" <td>Burger Joint</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Agincourt</td>\n",
" <td>Latin American Restaurant</td>\n",
" <td>Clothing Store</td>\n",
" <td>Lounge</td>\n",
" <td>Breakfast Spot</td>\n",
" <td>Skating Rink</td>\n",
" <td>Electronics Store</td>\n",
" <td>Eastern European Restaurant</td>\n",
" <td>Dumpling Restaurant</td>\n",
" <td>Department Store</td>\n",
" <td>Drugstore</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Agincourt North, L'Amoreaux East, Milliken, St...</td>\n",
" <td>Park</td>\n",
" <td>Playground</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Department Store</td>\n",
" <td>Dessert Shop</td>\n",
" <td>Dim Sum Restaurant</td>\n",
" <td>Diner</td>\n",
" <td>Discount Store</td>\n",
" <td>Dog Run</td>\n",
" <td>Donut Shop</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Albion Gardens, Beaumond Heights, Humbergate, ...</td>\n",
" <td>Pizza Place</td>\n",
" <td>Grocery Store</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Beer Store</td>\n",
" <td>Fast Food Restaurant</td>\n",
" <td>Sandwich Place</td>\n",
" <td>Japanese Restaurant</td>\n",
" <td>Discount Store</td>\n",
" <td>Fried Chicken Joint</td>\n",
" <td>Liquor Store</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Alderwood, Long Branch</td>\n",
" <td>Pizza Place</td>\n",
" <td>Gym</td>\n",
" <td>Skating Rink</td>\n",
" <td>Sandwich Place</td>\n",
" <td>Dance Studio</td>\n",
" <td>Pub</td>\n",
" <td>Pool</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Pharmacy</td>\n",
" <td>Comfort Food Restaurant</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Neighborhood \\\n",
"0 Adelaide, King, Richmond \n",
"1 Agincourt \n",
"2 Agincourt North, L'Amoreaux East, Milliken, St... \n",
"3 Albion Gardens, Beaumond Heights, Humbergate, ... \n",
"4 Alderwood, Long Branch \n",
"\n",
" 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue \\\n",
"0 Coffee Shop Café Bar \n",
"1 Latin American Restaurant Clothing Store Lounge \n",
"2 Park Playground Doner Restaurant \n",
"3 Pizza Place Grocery Store Coffee Shop \n",
"4 Pizza Place Gym Skating Rink \n",
"\n",
" 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue \\\n",
"0 Thai Restaurant Steakhouse Sushi Restaurant \n",
"1 Breakfast Spot Skating Rink Electronics Store \n",
"2 Department Store Dessert Shop Dim Sum Restaurant \n",
"3 Beer Store Fast Food Restaurant Sandwich Place \n",
"4 Sandwich Place Dance Studio Pub \n",
"\n",
" 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue \\\n",
"0 Bakery Cosmetics Shop Restaurant \n",
"1 Eastern European Restaurant Dumpling Restaurant Department Store \n",
"2 Diner Discount Store Dog Run \n",
"3 Japanese Restaurant Discount Store Fried Chicken Joint \n",
"4 Pool Coffee Shop Pharmacy \n",
"\n",
" 10th Most Common Venue \n",
"0 Burger Joint \n",
"1 Drugstore \n",
"2 Donut Shop \n",
"3 Liquor Store \n",
"4 Comfort Food Restaurant "
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"num_top_venues = 10\n",
"\n",
"indicators = ['st', 'nd', 'rd']\n",
"\n",
"# create columns according to number of top venues\n",
"columns = ['Neighborhood']\n",
"for ind in np.arange(num_top_venues):\n",
" try:\n",
" columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))\n",
" except:\n",
" columns.append('{}th Most Common Venue'.format(ind+1))\n",
"\n",
"# create a new dataframe\n",
"neighborhoods_venues_sorted = pd.DataFrame(columns=columns)\n",
"neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']\n",
"\n",
"for ind in np.arange(toronto_grouped.shape[0]):\n",
" neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)\n",
"\n",
"neighborhoods_venues_sorted.head()"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(100, 11)"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"neighborhoods_venues_sorted.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Section 8: Neighborhood Clustering"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"# set number of clusters\n",
"kclusters = 6\n",
"\n",
"toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)\n",
"\n",
"# run k-means clustering\n",
"kmeans = KMeans(init=\"k-means++\", n_clusters=kclusters, n_init=20).fit(toronto_grouped_clustering)\n",
"#kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PostalCode</th>\n",
" <th>Borough</th>\n",
" <th>Neighborhood</th>\n",
" <th>Latitude</th>\n",
" <th>Longitude</th>\n",
" <th>Cluster Labels</th>\n",
" <th>1st Most Common Venue</th>\n",
" <th>2nd Most Common Venue</th>\n",
" <th>3rd Most Common Venue</th>\n",
" <th>4th Most Common Venue</th>\n",
" <th>5th Most Common Venue</th>\n",
" <th>6th Most Common Venue</th>\n",
" <th>7th Most Common Venue</th>\n",
" <th>8th Most Common Venue</th>\n",
" <th>9th Most Common Venue</th>\n",
" <th>10th Most Common Venue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>M1B</td>\n",
" <td>Scarborough</td>\n",
" <td>Rouge, Malvern</td>\n",
" <td>43.806686</td>\n",
" <td>-79.194353</td>\n",
" <td>0</td>\n",
" <td>Fast Food Restaurant</td>\n",
" <td>Print Shop</td>\n",
" <td>Women's Store</td>\n",
" <td>Dog Run</td>\n",
" <td>Department Store</td>\n",
" <td>Dessert Shop</td>\n",
" <td>Dim Sum Restaurant</td>\n",
" <td>Diner</td>\n",
" <td>Discount Store</td>\n",
" <td>Doner Restaurant</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>M1C</td>\n",
" <td>Scarborough</td>\n",
" <td>Highland Creek, Rouge Hill, Port Union</td>\n",
" <td>43.784535</td>\n",
" <td>-79.160497</td>\n",
" <td>2</td>\n",
" <td>History Museum</td>\n",
" <td>Bar</td>\n",
" <td>Women's Store</td>\n",
" <td>Dessert Shop</td>\n",
" <td>Dim Sum Restaurant</td>\n",
" <td>Diner</td>\n",
" <td>Discount Store</td>\n",
" <td>Dog Run</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Deli / Bodega</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>M1E</td>\n",
" <td>Scarborough</td>\n",
" <td>Guildwood, Morningside, West Hill</td>\n",
" <td>43.763573</td>\n",
" <td>-79.188711</td>\n",
" <td>0</td>\n",
" <td>Electronics Store</td>\n",
" <td>Intersection</td>\n",
" <td>Mexican Restaurant</td>\n",
" <td>Spa</td>\n",
" <td>Breakfast Spot</td>\n",
" <td>Pizza Place</td>\n",
" <td>Rental Car Location</td>\n",
" <td>Medical Center</td>\n",
" <td>Department Store</td>\n",
" <td>Dessert Shop</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>M1G</td>\n",
" <td>Scarborough</td>\n",
" <td>Woburn</td>\n",
" <td>43.770992</td>\n",
" <td>-79.216917</td>\n",
" <td>0</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Pharmacy</td>\n",
" <td>Korean Restaurant</td>\n",
" <td>Women's Store</td>\n",
" <td>Dog Run</td>\n",
" <td>Department Store</td>\n",
" <td>Dessert Shop</td>\n",
" <td>Dim Sum Restaurant</td>\n",
" <td>Diner</td>\n",
" <td>Discount Store</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>M1H</td>\n",
" <td>Scarborough</td>\n",
" <td>Cedarbrae</td>\n",
" <td>43.773136</td>\n",
" <td>-79.239476</td>\n",
" <td>0</td>\n",
" <td>Fried Chicken Joint</td>\n",
" <td>Caribbean Restaurant</td>\n",
" <td>Bank</td>\n",
" <td>Athletics &amp; Sports</td>\n",
" <td>Thai Restaurant</td>\n",
" <td>Gas Station</td>\n",
" <td>Hakka Restaurant</td>\n",
" <td>Lounge</td>\n",
" <td>Bakery</td>\n",
" <td>Dumpling Restaurant</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PostalCode Borough Neighborhood Latitude \\\n",
"0 M1B Scarborough Rouge, Malvern 43.806686 \n",
"1 M1C Scarborough Highland Creek, Rouge Hill, Port Union 43.784535 \n",
"2 M1E Scarborough Guildwood, Morningside, West Hill 43.763573 \n",
"3 M1G Scarborough Woburn 43.770992 \n",
"4 M1H Scarborough Cedarbrae 43.773136 \n",
"\n",
" Longitude Cluster Labels 1st Most Common Venue 2nd Most Common Venue \\\n",
"0 -79.194353 0 Fast Food Restaurant Print Shop \n",
"1 -79.160497 2 History Museum Bar \n",
"2 -79.188711 0 Electronics Store Intersection \n",
"3 -79.216917 0 Coffee Shop Pharmacy \n",
"4 -79.239476 0 Fried Chicken Joint Caribbean Restaurant \n",
"\n",
" 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue \\\n",
"0 Women's Store Dog Run Department Store \n",
"1 Women's Store Dessert Shop Dim Sum Restaurant \n",
"2 Mexican Restaurant Spa Breakfast Spot \n",
"3 Korean Restaurant Women's Store Dog Run \n",
"4 Bank Athletics & Sports Thai Restaurant \n",
"\n",
" 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue \\\n",
"0 Dessert Shop Dim Sum Restaurant Diner \n",
"1 Diner Discount Store Dog Run \n",
"2 Pizza Place Rental Car Location Medical Center \n",
"3 Department Store Dessert Shop Dim Sum Restaurant \n",
"4 Gas Station Hakka Restaurant Lounge \n",
"\n",
" 9th Most Common Venue 10th Most Common Venue \n",
"0 Discount Store Doner Restaurant \n",
"1 Doner Restaurant Deli / Bodega \n",
"2 Department Store Dessert Shop \n",
"3 Diner Discount Store \n",
"4 Bakery Dumpling Restaurant "
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# add clustering labels\n",
"neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)\n",
"\n",
"# merge toronto_grouped with original table to add latitude/longitude for each neighborhood\n",
"toronto_merged = pd.merge(df1, neighborhoods_venues_sorted, on=['Neighborhood'], how='inner') #attention! merge and not join as aout of 103 Postal Codes only 100 return Foursquare data\n",
"\n",
"toronto_merged.head()"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(100, 12)"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"neighborhoods_venues_sorted.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Visualising the clusters**"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"width:100%;\"><div style=\"position:relative;width:100%;height:0;padding-bottom:60%;\"><iframe src=\"data:text/html;charset=utf-8;base64,\" style=\"position:absolute;width:100%;height:100%;left:0;top:0;border:none !important;\" allowfullscreen webkitallowfullscreen mozallowfullscreen></iframe></div></div>"
],
"text/plain": [
"<folium.folium.Map at 0x7f8f8f7bcd68>"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# create map\n",
"map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)\n",
"\n",
"# set color scheme for the clusters\n",
"x = np.arange(kclusters)\n",
"ys = [i + x + (i*x)**2 for i in range(kclusters)]\n",
"colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))\n",
"rainbow = [colors.rgb2hex(i) for i in colors_array]\n",
"\n",
"# add markers to the map\n",
"markers_colors = []\n",
"for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):\n",
" label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)\n",
" folium.CircleMarker(\n",
" [lat, lon],\n",
" radius=5,\n",
" popup=label,\n",
" color=rainbow[cluster-1],\n",
" fill=True,\n",
" fill_color=rainbow[cluster-1],\n",
" fill_opacity=0.7).add_to(map_clusters)\n",
" \n",
"map_clusters"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PostalCode</th>\n",
" <th>Borough</th>\n",
" <th>Neighborhood</th>\n",
" <th>Latitude</th>\n",
" <th>Longitude</th>\n",
" <th>Cluster Labels</th>\n",
" <th>1st Most Common Venue</th>\n",
" <th>2nd Most Common Venue</th>\n",
" <th>3rd Most Common Venue</th>\n",
" <th>4th Most Common Venue</th>\n",
" <th>5th Most Common Venue</th>\n",
" <th>6th Most Common Venue</th>\n",
" <th>7th Most Common Venue</th>\n",
" <th>8th Most Common Venue</th>\n",
" <th>9th Most Common Venue</th>\n",
" <th>10th Most Common Venue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>M1B</td>\n",
" <td>Scarborough</td>\n",
" <td>Rouge, Malvern</td>\n",
" <td>43.806686</td>\n",
" <td>-79.194353</td>\n",
" <td>0</td>\n",
" <td>Fast Food Restaurant</td>\n",
" <td>Print Shop</td>\n",
" <td>Women's Store</td>\n",
" <td>Dog Run</td>\n",
" <td>Department Store</td>\n",
" <td>Dessert Shop</td>\n",
" <td>Dim Sum Restaurant</td>\n",
" <td>Diner</td>\n",
" <td>Discount Store</td>\n",
" <td>Doner Restaurant</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>M1C</td>\n",
" <td>Scarborough</td>\n",
" <td>Highland Creek, Rouge Hill, Port Union</td>\n",
" <td>43.784535</td>\n",
" <td>-79.160497</td>\n",
" <td>2</td>\n",
" <td>History Museum</td>\n",
" <td>Bar</td>\n",
" <td>Women's Store</td>\n",
" <td>Dessert Shop</td>\n",
" <td>Dim Sum Restaurant</td>\n",
" <td>Diner</td>\n",
" <td>Discount Store</td>\n",
" <td>Dog Run</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Deli / Bodega</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>M1E</td>\n",
" <td>Scarborough</td>\n",
" <td>Guildwood, Morningside, West Hill</td>\n",
" <td>43.763573</td>\n",
" <td>-79.188711</td>\n",
" <td>0</td>\n",
" <td>Electronics Store</td>\n",
" <td>Intersection</td>\n",
" <td>Mexican Restaurant</td>\n",
" <td>Spa</td>\n",
" <td>Breakfast Spot</td>\n",
" <td>Pizza Place</td>\n",
" <td>Rental Car Location</td>\n",
" <td>Medical Center</td>\n",
" <td>Department Store</td>\n",
" <td>Dessert Shop</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>M1G</td>\n",
" <td>Scarborough</td>\n",
" <td>Woburn</td>\n",
" <td>43.770992</td>\n",
" <td>-79.216917</td>\n",
" <td>0</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Pharmacy</td>\n",
" <td>Korean Restaurant</td>\n",
" <td>Women's Store</td>\n",
" <td>Dog Run</td>\n",
" <td>Department Store</td>\n",
" <td>Dessert Shop</td>\n",
" <td>Dim Sum Restaurant</td>\n",
" <td>Diner</td>\n",
" <td>Discount Store</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>M1H</td>\n",
" <td>Scarborough</td>\n",
" <td>Cedarbrae</td>\n",
" <td>43.773136</td>\n",
" <td>-79.239476</td>\n",
" <td>0</td>\n",
" <td>Fried Chicken Joint</td>\n",
" <td>Caribbean Restaurant</td>\n",
" <td>Bank</td>\n",
" <td>Athletics &amp; Sports</td>\n",
" <td>Thai Restaurant</td>\n",
" <td>Gas Station</td>\n",
" <td>Hakka Restaurant</td>\n",
" <td>Lounge</td>\n",
" <td>Bakery</td>\n",
" <td>Dumpling Restaurant</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PostalCode Borough Neighborhood Latitude \\\n",
"0 M1B Scarborough Rouge, Malvern 43.806686 \n",
"1 M1C Scarborough Highland Creek, Rouge Hill, Port Union 43.784535 \n",
"2 M1E Scarborough Guildwood, Morningside, West Hill 43.763573 \n",
"3 M1G Scarborough Woburn 43.770992 \n",
"4 M1H Scarborough Cedarbrae 43.773136 \n",
"\n",
" Longitude Cluster Labels 1st Most Common Venue 2nd Most Common Venue \\\n",
"0 -79.194353 0 Fast Food Restaurant Print Shop \n",
"1 -79.160497 2 History Museum Bar \n",
"2 -79.188711 0 Electronics Store Intersection \n",
"3 -79.216917 0 Coffee Shop Pharmacy \n",
"4 -79.239476 0 Fried Chicken Joint Caribbean Restaurant \n",
"\n",
" 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue \\\n",
"0 Women's Store Dog Run Department Store \n",
"1 Women's Store Dessert Shop Dim Sum Restaurant \n",
"2 Mexican Restaurant Spa Breakfast Spot \n",
"3 Korean Restaurant Women's Store Dog Run \n",
"4 Bank Athletics & Sports Thai Restaurant \n",
"\n",
" 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue \\\n",
"0 Dessert Shop Dim Sum Restaurant Diner \n",
"1 Diner Discount Store Dog Run \n",
"2 Pizza Place Rental Car Location Medical Center \n",
"3 Department Store Dessert Shop Dim Sum Restaurant \n",
"4 Gas Station Hakka Restaurant Lounge \n",
"\n",
" 9th Most Common Venue 10th Most Common Venue \n",
"0 Discount Store Doner Restaurant \n",
"1 Doner Restaurant Deli / Bodega \n",
"2 Department Store Dessert Shop \n",
"3 Diner Discount Store \n",
"4 Bakery Dumpling Restaurant "
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toronto_merged.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Section 9: Exploring the clusters"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Cluster 1**"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Neighborhood</th>\n",
" <th>Cluster Labels</th>\n",
" <th>1st Most Common Venue</th>\n",
" <th>2nd Most Common Venue</th>\n",
" <th>3rd Most Common Venue</th>\n",
" <th>4th Most Common Venue</th>\n",
" <th>5th Most Common Venue</th>\n",
" <th>6th Most Common Venue</th>\n",
" <th>7th Most Common Venue</th>\n",
" <th>8th Most Common Venue</th>\n",
" <th>9th Most Common Venue</th>\n",
" <th>10th Most Common Venue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Rouge, Malvern</td>\n",
" <td>0</td>\n",
" <td>Fast Food Restaurant</td>\n",
" <td>Print Shop</td>\n",
" <td>Women's Store</td>\n",
" <td>Dog Run</td>\n",
" <td>Department Store</td>\n",
" <td>Dessert Shop</td>\n",
" <td>Dim Sum Restaurant</td>\n",
" <td>Diner</td>\n",
" <td>Discount Store</td>\n",
" <td>Doner Restaurant</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Guildwood, Morningside, West Hill</td>\n",
" <td>0</td>\n",
" <td>Electronics Store</td>\n",
" <td>Intersection</td>\n",
" <td>Mexican Restaurant</td>\n",
" <td>Spa</td>\n",
" <td>Breakfast Spot</td>\n",
" <td>Pizza Place</td>\n",
" <td>Rental Car Location</td>\n",
" <td>Medical Center</td>\n",
" <td>Department Store</td>\n",
" <td>Dessert Shop</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Woburn</td>\n",
" <td>0</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Pharmacy</td>\n",
" <td>Korean Restaurant</td>\n",
" <td>Women's Store</td>\n",
" <td>Dog Run</td>\n",
" <td>Department Store</td>\n",
" <td>Dessert Shop</td>\n",
" <td>Dim Sum Restaurant</td>\n",
" <td>Diner</td>\n",
" <td>Discount Store</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Cedarbrae</td>\n",
" <td>0</td>\n",
" <td>Fried Chicken Joint</td>\n",
" <td>Caribbean Restaurant</td>\n",
" <td>Bank</td>\n",
" <td>Athletics &amp; Sports</td>\n",
" <td>Thai Restaurant</td>\n",
" <td>Gas Station</td>\n",
" <td>Hakka Restaurant</td>\n",
" <td>Lounge</td>\n",
" <td>Bakery</td>\n",
" <td>Dumpling Restaurant</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>East Birchmount Park, Ionview, Kennedy Park</td>\n",
" <td>0</td>\n",
" <td>Department Store</td>\n",
" <td>Convenience Store</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Discount Store</td>\n",
" <td>Chinese Restaurant</td>\n",
" <td>Dessert Shop</td>\n",
" <td>Dim Sum Restaurant</td>\n",
" <td>Diner</td>\n",
" <td>Dog Run</td>\n",
" <td>Doner Restaurant</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>94</th>\n",
" <td>Humber Summit</td>\n",
" <td>0</td>\n",
" <td>Empanada Restaurant</td>\n",
" <td>Pizza Place</td>\n",
" <td>Dog Run</td>\n",
" <td>Deli / Bodega</td>\n",
" <td>Department Store</td>\n",
" <td>Dessert Shop</td>\n",
" <td>Dim Sum Restaurant</td>\n",
" <td>Diner</td>\n",
" <td>Discount Store</td>\n",
" <td>Doner Restaurant</td>\n",
" </tr>\n",
" <tr>\n",
" <th>97</th>\n",
" <td>Westmount</td>\n",
" <td>0</td>\n",
" <td>Pizza Place</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Intersection</td>\n",
" <td>Sandwich Place</td>\n",
" <td>Discount Store</td>\n",
" <td>Chinese Restaurant</td>\n",
" <td>Middle Eastern Restaurant</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Donut Shop</td>\n",
" <td>Dog Run</td>\n",
" </tr>\n",
" <tr>\n",
" <th>98</th>\n",
" <td>Kingsview Village, Martin Grove Gardens, Richv...</td>\n",
" <td>0</td>\n",
" <td>Mobile Phone Shop</td>\n",
" <td>Bus Line</td>\n",
" <td>Pizza Place</td>\n",
" <td>Sandwich Place</td>\n",
" <td>Discount Store</td>\n",
" <td>Department Store</td>\n",
" <td>Dessert Shop</td>\n",
" <td>Dim Sum Restaurant</td>\n",
" <td>Diner</td>\n",
" <td>Dog Run</td>\n",
" </tr>\n",
" <tr>\n",
" <th>99</th>\n",
" <td>Albion Gardens, Beaumond Heights, Humbergate, ...</td>\n",
" <td>0</td>\n",
" <td>Pizza Place</td>\n",
" <td>Grocery Store</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Beer Store</td>\n",
" <td>Fast Food Restaurant</td>\n",
" <td>Sandwich Place</td>\n",
" <td>Japanese Restaurant</td>\n",
" <td>Discount Store</td>\n",
" <td>Fried Chicken Joint</td>\n",
" <td>Liquor Store</td>\n",
" </tr>\n",
" <tr>\n",
" <th>100</th>\n",
" <td>Northwest</td>\n",
" <td>0</td>\n",
" <td>Rental Car Location</td>\n",
" <td>Drugstore</td>\n",
" <td>Bar</td>\n",
" <td>Dog Run</td>\n",
" <td>Department Store</td>\n",
" <td>Dessert Shop</td>\n",
" <td>Dim Sum Restaurant</td>\n",
" <td>Diner</td>\n",
" <td>Discount Store</td>\n",
" <td>Women's Store</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>84 rows × 12 columns</p>\n",
"</div>"
],
"text/plain": [
" Neighborhood Cluster Labels \\\n",
"0 Rouge, Malvern 0 \n",
"2 Guildwood, Morningside, West Hill 0 \n",
"3 Woburn 0 \n",
"4 Cedarbrae 0 \n",
"6 East Birchmount Park, Ionview, Kennedy Park 0 \n",
".. ... ... \n",
"94 Humber Summit 0 \n",
"97 Westmount 0 \n",
"98 Kingsview Village, Martin Grove Gardens, Richv... 0 \n",
"99 Albion Gardens, Beaumond Heights, Humbergate, ... 0 \n",
"100 Northwest 0 \n",
"\n",
" 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue \\\n",
"0 Fast Food Restaurant Print Shop Women's Store \n",
"2 Electronics Store Intersection Mexican Restaurant \n",
"3 Coffee Shop Pharmacy Korean Restaurant \n",
"4 Fried Chicken Joint Caribbean Restaurant Bank \n",
"6 Department Store Convenience Store Coffee Shop \n",
".. ... ... ... \n",
"94 Empanada Restaurant Pizza Place Dog Run \n",
"97 Pizza Place Coffee Shop Intersection \n",
"98 Mobile Phone Shop Bus Line Pizza Place \n",
"99 Pizza Place Grocery Store Coffee Shop \n",
"100 Rental Car Location Drugstore Bar \n",
"\n",
" 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue \\\n",
"0 Dog Run Department Store Dessert Shop \n",
"2 Spa Breakfast Spot Pizza Place \n",
"3 Women's Store Dog Run Department Store \n",
"4 Athletics & Sports Thai Restaurant Gas Station \n",
"6 Discount Store Chinese Restaurant Dessert Shop \n",
".. ... ... ... \n",
"94 Deli / Bodega Department Store Dessert Shop \n",
"97 Sandwich Place Discount Store Chinese Restaurant \n",
"98 Sandwich Place Discount Store Department Store \n",
"99 Beer Store Fast Food Restaurant Sandwich Place \n",
"100 Dog Run Department Store Dessert Shop \n",
"\n",
" 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue \\\n",
"0 Dim Sum Restaurant Diner Discount Store \n",
"2 Rental Car Location Medical Center Department Store \n",
"3 Dessert Shop Dim Sum Restaurant Diner \n",
"4 Hakka Restaurant Lounge Bakery \n",
"6 Dim Sum Restaurant Diner Dog Run \n",
".. ... ... ... \n",
"94 Dim Sum Restaurant Diner Discount Store \n",
"97 Middle Eastern Restaurant Doner Restaurant Donut Shop \n",
"98 Dessert Shop Dim Sum Restaurant Diner \n",
"99 Japanese Restaurant Discount Store Fried Chicken Joint \n",
"100 Dim Sum Restaurant Diner Discount Store \n",
"\n",
" 10th Most Common Venue \n",
"0 Doner Restaurant \n",
"2 Dessert Shop \n",
"3 Discount Store \n",
"4 Dumpling Restaurant \n",
"6 Doner Restaurant \n",
".. ... \n",
"94 Doner Restaurant \n",
"97 Dog Run \n",
"98 Dog Run \n",
"99 Liquor Store \n",
"100 Women's Store \n",
"\n",
"[84 rows x 12 columns]"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Cluster 2**"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Neighborhood</th>\n",
" <th>Cluster Labels</th>\n",
" <th>1st Most Common Venue</th>\n",
" <th>2nd Most Common Venue</th>\n",
" <th>3rd Most Common Venue</th>\n",
" <th>4th Most Common Venue</th>\n",
" <th>5th Most Common Venue</th>\n",
" <th>6th Most Common Venue</th>\n",
" <th>7th Most Common Venue</th>\n",
" <th>8th Most Common Venue</th>\n",
" <th>9th Most Common Venue</th>\n",
" <th>10th Most Common Venue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>91</th>\n",
" <td>Humber Bay, King's Mill Park, Kingsway Park So...</td>\n",
" <td>1</td>\n",
" <td>Baseball Field</td>\n",
" <td>Women's Store</td>\n",
" <td>Dessert Shop</td>\n",
" <td>Dim Sum Restaurant</td>\n",
" <td>Diner</td>\n",
" <td>Discount Store</td>\n",
" <td>Dog Run</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Donut Shop</td>\n",
" <td>Falafel Restaurant</td>\n",
" </tr>\n",
" <tr>\n",
" <th>95</th>\n",
" <td>Emery, Humberlea</td>\n",
" <td>1</td>\n",
" <td>Baseball Field</td>\n",
" <td>Women's Store</td>\n",
" <td>Dessert Shop</td>\n",
" <td>Dim Sum Restaurant</td>\n",
" <td>Diner</td>\n",
" <td>Discount Store</td>\n",
" <td>Dog Run</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Donut Shop</td>\n",
" <td>Falafel Restaurant</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Neighborhood Cluster Labels \\\n",
"91 Humber Bay, King's Mill Park, Kingsway Park So... 1 \n",
"95 Emery, Humberlea 1 \n",
"\n",
" 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue \\\n",
"91 Baseball Field Women's Store Dessert Shop \n",
"95 Baseball Field Women's Store Dessert Shop \n",
"\n",
" 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue \\\n",
"91 Dim Sum Restaurant Diner Discount Store \n",
"95 Dim Sum Restaurant Diner Discount Store \n",
"\n",
" 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue \\\n",
"91 Dog Run Doner Restaurant Donut Shop \n",
"95 Dog Run Doner Restaurant Donut Shop \n",
"\n",
" 10th Most Common Venue \n",
"91 Falafel Restaurant \n",
"95 Falafel Restaurant "
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Cluster 3**"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Neighborhood</th>\n",
" <th>Cluster Labels</th>\n",
" <th>1st Most Common Venue</th>\n",
" <th>2nd Most Common Venue</th>\n",
" <th>3rd Most Common Venue</th>\n",
" <th>4th Most Common Venue</th>\n",
" <th>5th Most Common Venue</th>\n",
" <th>6th Most Common Venue</th>\n",
" <th>7th Most Common Venue</th>\n",
" <th>8th Most Common Venue</th>\n",
" <th>9th Most Common Venue</th>\n",
" <th>10th Most Common Venue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Highland Creek, Rouge Hill, Port Union</td>\n",
" <td>2</td>\n",
" <td>History Museum</td>\n",
" <td>Bar</td>\n",
" <td>Women's Store</td>\n",
" <td>Dessert Shop</td>\n",
" <td>Dim Sum Restaurant</td>\n",
" <td>Diner</td>\n",
" <td>Discount Store</td>\n",
" <td>Dog Run</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Deli / Bodega</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Neighborhood Cluster Labels \\\n",
"1 Highland Creek, Rouge Hill, Port Union 2 \n",
"\n",
" 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue \\\n",
"1 History Museum Bar Women's Store \n",
"\n",
" 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue \\\n",
"1 Dessert Shop Dim Sum Restaurant Diner \n",
"\n",
" 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue \\\n",
"1 Discount Store Dog Run Doner Restaurant \n",
"\n",
" 10th Most Common Venue \n",
"1 Deli / Bodega "
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Cluster 4**"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Neighborhood</th>\n",
" <th>Cluster Labels</th>\n",
" <th>1st Most Common Venue</th>\n",
" <th>2nd Most Common Venue</th>\n",
" <th>3rd Most Common Venue</th>\n",
" <th>4th Most Common Venue</th>\n",
" <th>5th Most Common Venue</th>\n",
" <th>6th Most Common Venue</th>\n",
" <th>7th Most Common Venue</th>\n",
" <th>8th Most Common Venue</th>\n",
" <th>9th Most Common Venue</th>\n",
" <th>10th Most Common Venue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>62</th>\n",
" <td>Roselawn</td>\n",
" <td>3</td>\n",
" <td>Garden</td>\n",
" <td>Women's Store</td>\n",
" <td>Dog Run</td>\n",
" <td>Department Store</td>\n",
" <td>Dessert Shop</td>\n",
" <td>Dim Sum Restaurant</td>\n",
" <td>Diner</td>\n",
" <td>Discount Store</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Dance Studio</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Neighborhood Cluster Labels 1st Most Common Venue 2nd Most Common Venue \\\n",
"62 Roselawn 3 Garden Women's Store \n",
"\n",
" 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue \\\n",
"62 Dog Run Department Store Dessert Shop \n",
"\n",
" 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue \\\n",
"62 Dim Sum Restaurant Diner Discount Store \n",
"\n",
" 9th Most Common Venue 10th Most Common Venue \n",
"62 Doner Restaurant Dance Studio "
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Cluster 5**"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Neighborhood</th>\n",
" <th>Cluster Labels</th>\n",
" <th>1st Most Common Venue</th>\n",
" <th>2nd Most Common Venue</th>\n",
" <th>3rd Most Common Venue</th>\n",
" <th>4th Most Common Venue</th>\n",
" <th>5th Most Common Venue</th>\n",
" <th>6th Most Common Venue</th>\n",
" <th>7th Most Common Venue</th>\n",
" <th>8th Most Common Venue</th>\n",
" <th>9th Most Common Venue</th>\n",
" <th>10th Most Common Venue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>Silver Hills, York Mills</td>\n",
" <td>4</td>\n",
" <td>Cafeteria</td>\n",
" <td>Women's Store</td>\n",
" <td>Dog Run</td>\n",
" <td>Department Store</td>\n",
" <td>Dessert Shop</td>\n",
" <td>Dim Sum Restaurant</td>\n",
" <td>Diner</td>\n",
" <td>Discount Store</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Dance Studio</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Neighborhood Cluster Labels 1st Most Common Venue \\\n",
"19 Silver Hills, York Mills 4 Cafeteria \n",
"\n",
" 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue \\\n",
"19 Women's Store Dog Run Department Store \n",
"\n",
" 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue \\\n",
"19 Dessert Shop Dim Sum Restaurant Diner \n",
"\n",
" 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue \n",
"19 Discount Store Doner Restaurant Dance Studio "
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Cluster 6**"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Neighborhood</th>\n",
" <th>Cluster Labels</th>\n",
" <th>1st Most Common Venue</th>\n",
" <th>2nd Most Common Venue</th>\n",
" <th>3rd Most Common Venue</th>\n",
" <th>4th Most Common Venue</th>\n",
" <th>5th Most Common Venue</th>\n",
" <th>6th Most Common Venue</th>\n",
" <th>7th Most Common Venue</th>\n",
" <th>8th Most Common Venue</th>\n",
" <th>9th Most Common Venue</th>\n",
" <th>10th Most Common Venue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Scarborough Village</td>\n",
" <td>5</td>\n",
" <td>Playground</td>\n",
" <td>Jewelry Store</td>\n",
" <td>Women's Store</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Dessert Shop</td>\n",
" <td>Dim Sum Restaurant</td>\n",
" <td>Diner</td>\n",
" <td>Discount Store</td>\n",
" <td>Dog Run</td>\n",
" <td>Drugstore</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>Agincourt North, L'Amoreaux East, Milliken, St...</td>\n",
" <td>5</td>\n",
" <td>Park</td>\n",
" <td>Playground</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Department Store</td>\n",
" <td>Dessert Shop</td>\n",
" <td>Dim Sum Restaurant</td>\n",
" <td>Diner</td>\n",
" <td>Discount Store</td>\n",
" <td>Dog Run</td>\n",
" <td>Donut Shop</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>York Mills West</td>\n",
" <td>5</td>\n",
" <td>Park</td>\n",
" <td>Convenience Store</td>\n",
" <td>Bank</td>\n",
" <td>Bar</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Dim Sum Restaurant</td>\n",
" <td>Diner</td>\n",
" <td>Discount Store</td>\n",
" <td>Dog Run</td>\n",
" <td>Women's Store</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>Parkwoods</td>\n",
" <td>5</td>\n",
" <td>Park</td>\n",
" <td>Food &amp; Drink Shop</td>\n",
" <td>Bus Stop</td>\n",
" <td>Eastern European Restaurant</td>\n",
" <td>Dumpling Restaurant</td>\n",
" <td>Drugstore</td>\n",
" <td>Donut Shop</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Ethiopian Restaurant</td>\n",
" <td>Discount Store</td>\n",
" </tr>\n",
" <tr>\n",
" <th>39</th>\n",
" <td>East Toronto</td>\n",
" <td>5</td>\n",
" <td>Park</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Convenience Store</td>\n",
" <td>Dog Run</td>\n",
" <td>Department Store</td>\n",
" <td>Dessert Shop</td>\n",
" <td>Dim Sum Restaurant</td>\n",
" <td>Diner</td>\n",
" <td>Discount Store</td>\n",
" <td>Doner Restaurant</td>\n",
" </tr>\n",
" <tr>\n",
" <th>43</th>\n",
" <td>Lawrence Park</td>\n",
" <td>5</td>\n",
" <td>Park</td>\n",
" <td>Swim School</td>\n",
" <td>Bus Line</td>\n",
" <td>Dog Run</td>\n",
" <td>Dessert Shop</td>\n",
" <td>Dim Sum Restaurant</td>\n",
" <td>Diner</td>\n",
" <td>Discount Store</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Deli / Bodega</td>\n",
" </tr>\n",
" <tr>\n",
" <th>47</th>\n",
" <td>Moore Park, Summerhill East</td>\n",
" <td>5</td>\n",
" <td>Park</td>\n",
" <td>Playground</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Department Store</td>\n",
" <td>Dessert Shop</td>\n",
" <td>Dim Sum Restaurant</td>\n",
" <td>Diner</td>\n",
" <td>Discount Store</td>\n",
" <td>Dog Run</td>\n",
" <td>Donut Shop</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49</th>\n",
" <td>Rosedale</td>\n",
" <td>5</td>\n",
" <td>Park</td>\n",
" <td>Playground</td>\n",
" <td>Trail</td>\n",
" <td>Eastern European Restaurant</td>\n",
" <td>Dumpling Restaurant</td>\n",
" <td>Drugstore</td>\n",
" <td>Donut Shop</td>\n",
" <td>Electronics Store</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Curling Ice</td>\n",
" </tr>\n",
" <tr>\n",
" <th>63</th>\n",
" <td>Forest Hill North, Forest Hill West</td>\n",
" <td>5</td>\n",
" <td>Park</td>\n",
" <td>Jewelry Store</td>\n",
" <td>Sushi Restaurant</td>\n",
" <td>Trail</td>\n",
" <td>Dumpling Restaurant</td>\n",
" <td>Eastern European Restaurant</td>\n",
" <td>Drugstore</td>\n",
" <td>Donut Shop</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Dance Studio</td>\n",
" </tr>\n",
" <tr>\n",
" <th>73</th>\n",
" <td>Caledonia-Fairbanks</td>\n",
" <td>5</td>\n",
" <td>Park</td>\n",
" <td>Pool</td>\n",
" <td>Women's Store</td>\n",
" <td>College Rec Center</td>\n",
" <td>Deli / Bodega</td>\n",
" <td>Empanada Restaurant</td>\n",
" <td>Electronics Store</td>\n",
" <td>Eastern European Restaurant</td>\n",
" <td>Dumpling Restaurant</td>\n",
" <td>Drugstore</td>\n",
" </tr>\n",
" <tr>\n",
" <th>90</th>\n",
" <td>The Kingsway, Montgomery Road, Old Mill North</td>\n",
" <td>5</td>\n",
" <td>Park</td>\n",
" <td>River</td>\n",
" <td>Empanada Restaurant</td>\n",
" <td>Electronics Store</td>\n",
" <td>Eastern European Restaurant</td>\n",
" <td>Dumpling Restaurant</td>\n",
" <td>Drugstore</td>\n",
" <td>Donut Shop</td>\n",
" <td>Ethiopian Restaurant</td>\n",
" <td>Dog Run</td>\n",
" </tr>\n",
" <tr>\n",
" <th>96</th>\n",
" <td>Weston</td>\n",
" <td>5</td>\n",
" <td>Park</td>\n",
" <td>Convenience Store</td>\n",
" <td>Dog Run</td>\n",
" <td>Department Store</td>\n",
" <td>Dessert Shop</td>\n",
" <td>Dim Sum Restaurant</td>\n",
" <td>Diner</td>\n",
" <td>Discount Store</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Dance Studio</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Neighborhood Cluster Labels \\\n",
"5 Scarborough Village 5 \n",
"14 Agincourt North, L'Amoreaux East, Milliken, St... 5 \n",
"22 York Mills West 5 \n",
"24 Parkwoods 5 \n",
"39 East Toronto 5 \n",
"43 Lawrence Park 5 \n",
"47 Moore Park, Summerhill East 5 \n",
"49 Rosedale 5 \n",
"63 Forest Hill North, Forest Hill West 5 \n",
"73 Caledonia-Fairbanks 5 \n",
"90 The Kingsway, Montgomery Road, Old Mill North 5 \n",
"96 Weston 5 \n",
"\n",
" 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue \\\n",
"5 Playground Jewelry Store Women's Store \n",
"14 Park Playground Doner Restaurant \n",
"22 Park Convenience Store Bank \n",
"24 Park Food & Drink Shop Bus Stop \n",
"39 Park Coffee Shop Convenience Store \n",
"43 Park Swim School Bus Line \n",
"47 Park Playground Doner Restaurant \n",
"49 Park Playground Trail \n",
"63 Park Jewelry Store Sushi Restaurant \n",
"73 Park Pool Women's Store \n",
"90 Park River Empanada Restaurant \n",
"96 Park Convenience Store Dog Run \n",
"\n",
" 4th Most Common Venue 5th Most Common Venue \\\n",
"5 Doner Restaurant Dessert Shop \n",
"14 Department Store Dessert Shop \n",
"22 Bar Doner Restaurant \n",
"24 Eastern European Restaurant Dumpling Restaurant \n",
"39 Dog Run Department Store \n",
"43 Dog Run Dessert Shop \n",
"47 Department Store Dessert Shop \n",
"49 Eastern European Restaurant Dumpling Restaurant \n",
"63 Trail Dumpling Restaurant \n",
"73 College Rec Center Deli / Bodega \n",
"90 Electronics Store Eastern European Restaurant \n",
"96 Department Store Dessert Shop \n",
"\n",
" 6th Most Common Venue 7th Most Common Venue \\\n",
"5 Dim Sum Restaurant Diner \n",
"14 Dim Sum Restaurant Diner \n",
"22 Dim Sum Restaurant Diner \n",
"24 Drugstore Donut Shop \n",
"39 Dessert Shop Dim Sum Restaurant \n",
"43 Dim Sum Restaurant Diner \n",
"47 Dim Sum Restaurant Diner \n",
"49 Drugstore Donut Shop \n",
"63 Eastern European Restaurant Drugstore \n",
"73 Empanada Restaurant Electronics Store \n",
"90 Dumpling Restaurant Drugstore \n",
"96 Dim Sum Restaurant Diner \n",
"\n",
" 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue \n",
"5 Discount Store Dog Run Drugstore \n",
"14 Discount Store Dog Run Donut Shop \n",
"22 Discount Store Dog Run Women's Store \n",
"24 Doner Restaurant Ethiopian Restaurant Discount Store \n",
"39 Diner Discount Store Doner Restaurant \n",
"43 Discount Store Doner Restaurant Deli / Bodega \n",
"47 Discount Store Dog Run Donut Shop \n",
"49 Electronics Store Doner Restaurant Curling Ice \n",
"63 Donut Shop Doner Restaurant Dance Studio \n",
"73 Eastern European Restaurant Dumpling Restaurant Drugstore \n",
"90 Donut Shop Ethiopian Restaurant Dog Run \n",
"96 Discount Store Doner Restaurant Dance Studio "
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toronto_merged.loc[toronto_merged['Cluster Labels'] == 5, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Section 10: Conclusions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Started with 10 clusters and observed the qualitative characteristics of the clustering results.\n",
"\n",
"It seemed that clustering was not very successful:\n",
"* no clear similarity patter on many of the clusters \n",
"* some over-fitting was evident; neighborhood (Postal Code) with park was not assigned to the cluster of parks and similar \"green\" areas\n",
" \n",
"Started reducing clusters and monitored the results.\n",
"\n",
"Ended up in the following:\n",
"* manual setup of the main KMeans parameters\n",
"* 6 clusters:\n",
" 1. Places close to a restaurant and various other venues of general interest\n",
" 2. Places close to baseball field and women's store\n",
" 3. PLace closest to a history museum\n",
" 4. Place closest to a garden\n",
" 5. Place closest to a cafeteria\n",
" 6. PLaces closest to a park"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python",
"language": "python",
"name": "conda-env-python-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment