Skip to content

Instantly share code, notes, and snippets.

@ravy101
Created March 25, 2019 01:15
Show Gist options
  • Save ravy101/6d3280f79d9e7fb0f14c1d120209ca57 to your computer and use it in GitHub Desktop.
Save ravy101/6d3280f79d9e7fb0f14c1d120209ca57 to your computer and use it in GitHub Desktop.
Created on Cognitive Class Labs
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Data Science Capstone Project - Week 3\n",
"First import required libraries."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Collecting package metadata: done\n",
"Solving environment: done\n",
"\n",
"## Package Plan ##\n",
"\n",
" environment location: /home/jupyterlab/conda\n",
"\n",
" added / updated specs:\n",
" - geocoder\n",
"\n",
"\n",
"The following packages will be downloaded:\n",
"\n",
" package | build\n",
" ---------------------------|-----------------\n",
" geocoder-1.38.1 | py_0 52 KB conda-forge\n",
" orderedset-2.0 | py36_0 231 KB conda-forge\n",
" ratelim-0.1.6 | py36_0 5 KB conda-forge\n",
" ------------------------------------------------------------\n",
" Total: 288 KB\n",
"\n",
"The following NEW packages will be INSTALLED:\n",
"\n",
" geocoder conda-forge/noarch::geocoder-1.38.1-py_0\n",
" orderedset conda-forge/linux-64::orderedset-2.0-py36_0\n",
" ratelim conda-forge/linux-64::ratelim-0.1.6-py36_0\n",
"\n",
"\n",
"\n",
"Downloading and Extracting Packages\n",
"orderedset-2.0 | 231 KB | ##################################### | 100% \n",
"geocoder-1.38.1 | 52 KB | ##################################### | 100% \n",
"ratelim-0.1.6 | 5 KB | ##################################### | 100% \n",
"Preparing transaction: done\n",
"Verifying transaction: done\n",
"Executing transaction: done\n"
]
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"from geopy.geocoders import Nominatim # convert an address into latitude and longitude values\n",
"import requests # library to handle requests\n",
"from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe\n",
"!conda install -c conda-forge geocoder --yes\n",
"import geocoder\n",
"from sklearn.cluster import KMeans\n",
"\n",
"import folium \n",
"import matplotlib.cm as cm\n",
"import matplotlib.colors as colors"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Part 1 - Parse and Clean the Postcode Data\n",
"Get the wikipedia page to be scraped."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Data downloaded!\n"
]
}
],
"source": [
"!wget -q -O 'toronto_data.html' https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M\n",
"print('Data downloaded!')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Open the HTML file and read the postcodes table using pandas. Pandas has a read_html function to parse tables from HTML content into a list of DataFrames."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"with open('toronto_data.html', 'r') as myfile:\n",
" html_data = myfile.read()\n",
"dfs = pd.read_html(html_data)\n",
"df = dfs[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we will remove the not assigned boroughs and replace the not assigned neighbourhoods with the borough name. We will also join neighbourhoods with the same postcode."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/jupyterlab/conda/lib/python3.6/site-packages/pandas/core/indexing.py:190: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame\n",
"\n",
"See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n",
" self._setitem_with_indexer(indexer, value)\n",
"/home/jupyterlab/conda/lib/python3.6/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame\n",
"\n",
"See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n",
" \n"
]
}
],
"source": [
"df = df[df['Borough'] != 'Not assigned']\n",
"df.loc[df['Neighbourhood'] == 'Not assigned', 'Neighbourhood'] = df.Borough\n",
"df_grp = pd.DataFrame(df.groupby(['Postcode', 'Borough'])['Neighbourhood'].apply(lambda x: \"%s\" % ', '.join(x))).reset_index()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Postcode</th>\n",
" <th>Borough</th>\n",
" <th>Neighbourhood</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>M1B</td>\n",
" <td>Scarborough</td>\n",
" <td>Rouge, Malvern</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>M1C</td>\n",
" <td>Scarborough</td>\n",
" <td>Highland Creek, Rouge Hill, Port Union</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>M1E</td>\n",
" <td>Scarborough</td>\n",
" <td>Guildwood, Morningside, West Hill</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>M1G</td>\n",
" <td>Scarborough</td>\n",
" <td>Woburn</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>M1H</td>\n",
" <td>Scarborough</td>\n",
" <td>Cedarbrae</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Postcode Borough Neighbourhood\n",
"0 M1B Scarborough Rouge, Malvern\n",
"1 M1C Scarborough Highland Creek, Rouge Hill, Port Union\n",
"2 M1E Scarborough Guildwood, Morningside, West Hill\n",
"3 M1G Scarborough Woburn\n",
"4 M1H Scarborough Cedarbrae"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_grp.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Part 2 - Add Longitude and Latitude"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First we will define a function that will take a post code and iterate until coordinates can be returned."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"def get_toronto_coords(postal_code):\n",
" lat_lng_coords = None\n",
" while(lat_lng_coords is None):\n",
" g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))\n",
" lat_lng_coords = g.latlng\n",
"\n",
" return lat_lng_coords"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"#get_toronto_coords('M1B')\n",
"#get_toronto_coords('M1H')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Geocoder is not responding at all so we will use the CSV instead. Starting with downloading the file."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Coords downloaded!\n"
]
}
],
"source": [
"!wget -q -O 'toronto_coords.csv' https://cocl.us/Geospatial_data\n",
"print('Coords downloaded!')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we will join our grouped dataframe with the corresponding coordinates."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Postcode</th>\n",
" <th>Borough</th>\n",
" <th>Neighbourhood</th>\n",
" <th>Latitude</th>\n",
" <th>Longitude</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>M1B</td>\n",
" <td>Scarborough</td>\n",
" <td>Rouge, Malvern</td>\n",
" <td>43.806686</td>\n",
" <td>-79.194353</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>M1C</td>\n",
" <td>Scarborough</td>\n",
" <td>Highland Creek, Rouge Hill, Port Union</td>\n",
" <td>43.784535</td>\n",
" <td>-79.160497</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>M1E</td>\n",
" <td>Scarborough</td>\n",
" <td>Guildwood, Morningside, West Hill</td>\n",
" <td>43.763573</td>\n",
" <td>-79.188711</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>M1G</td>\n",
" <td>Scarborough</td>\n",
" <td>Woburn</td>\n",
" <td>43.770992</td>\n",
" <td>-79.216917</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>M1H</td>\n",
" <td>Scarborough</td>\n",
" <td>Cedarbrae</td>\n",
" <td>43.773136</td>\n",
" <td>-79.239476</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Postcode Borough Neighbourhood Latitude \\\n",
"0 M1B Scarborough Rouge, Malvern 43.806686 \n",
"1 M1C Scarborough Highland Creek, Rouge Hill, Port Union 43.784535 \n",
"2 M1E Scarborough Guildwood, Morningside, West Hill 43.763573 \n",
"3 M1G Scarborough Woburn 43.770992 \n",
"4 M1H Scarborough Cedarbrae 43.773136 \n",
"\n",
" Longitude \n",
"0 -79.194353 \n",
"1 -79.160497 \n",
"2 -79.188711 \n",
"3 -79.216917 \n",
"4 -79.239476 "
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coord_df = pd.read_csv('toronto_coords.csv')\n",
"df_full = df_grp.join(coord_df.set_index('Postal Code'), on='Postcode', how='left')\n",
"df_full.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Part 3 - Analysis\n",
"Now we will make FourSquare API calls to gather venue information for our neighbourhoods and apply cluster analysis. We will modify and use some code from the previous labs."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"CLIENT_ID = 'CDT0MXT105OIT4ZYRPA4L1S4LLKISDCMVOTUO0WKZEYPLTNW' # your Foursquare ID\n",
"CLIENT_SECRET = 'WMP22XO4QNE4R5BFJVRBBTOJXHJ0QB4JZSHGE5DX4CQUGXZE' # your Foursquare Secret\n",
"VERSION = '20180605' # Foursquare API version\n",
"LIMIT = 500\n",
"radius = 500\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Reuse the functions defined in the lab that process FourSquare API responses with some minor changes. This time our coordinates are for the postcode rather than the neighborhood."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"def get_category_type(row):\n",
" try:\n",
" categories_list = row['categories']\n",
" except:\n",
" categories_list = row['venue.categories']\n",
" \n",
" if len(categories_list) == 0:\n",
" return None\n",
" else:\n",
" return categories_list[0]['name']\n",
"\n",
"def getNearbyVenues(names, latitudes, longitudes, LIMIT = 500, radius=750):\n",
" \n",
" venues_list=[]\n",
" for name, lat, lng in zip(names, latitudes, longitudes):\n",
" \n",
" # create the API request URL\n",
" url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(\n",
" CLIENT_ID, \n",
" CLIENT_SECRET, \n",
" VERSION, \n",
" lat, \n",
" lng, \n",
" radius, \n",
" LIMIT)\n",
" \n",
" # make the GET request\n",
" results = requests.get(url).json()[\"response\"]['groups'][0]['items']\n",
" \n",
" # return only relevant information for each nearby venue\n",
" venues_list.append([(\n",
" name, \n",
" lat, \n",
" lng, \n",
" v['venue']['name'], \n",
" v['venue']['location']['lat'], \n",
" v['venue']['location']['lng'], \n",
" v['venue']['categories'][0]['name']) for v in results])\n",
"\n",
" nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])\n",
" nearby_venues.columns = ['Postcode', \n",
" 'Postcode Latitude', \n",
" 'Postcode Longitude', \n",
" 'Venue', \n",
" 'Venue Latitude', \n",
" 'Venue Longitude', \n",
" 'Venue Category']\n",
" \n",
" return(nearby_venues)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Invoke the FourSquare API with our Toronto postcodes and coordinates."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Postcode</th>\n",
" <th>Postcode Latitude</th>\n",
" <th>Postcode Longitude</th>\n",
" <th>Venue</th>\n",
" <th>Venue Latitude</th>\n",
" <th>Venue Longitude</th>\n",
" <th>Venue Category</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>M1B</td>\n",
" <td>43.806686</td>\n",
" <td>-79.194353</td>\n",
" <td>Images Salon &amp; Spa</td>\n",
" <td>43.802283</td>\n",
" <td>-79.198565</td>\n",
" <td>Spa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>M1B</td>\n",
" <td>43.806686</td>\n",
" <td>-79.194353</td>\n",
" <td>Wendy's</td>\n",
" <td>43.807448</td>\n",
" <td>-79.199056</td>\n",
" <td>Fast Food Restaurant</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>M1B</td>\n",
" <td>43.806686</td>\n",
" <td>-79.194353</td>\n",
" <td>Wendy's</td>\n",
" <td>43.802008</td>\n",
" <td>-79.198080</td>\n",
" <td>Fast Food Restaurant</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>M1B</td>\n",
" <td>43.806686</td>\n",
" <td>-79.194353</td>\n",
" <td>Staples Morningside</td>\n",
" <td>43.800285</td>\n",
" <td>-79.196607</td>\n",
" <td>Paper / Office Supplies Store</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>M1B</td>\n",
" <td>43.806686</td>\n",
" <td>-79.194353</td>\n",
" <td>Tim Hortons</td>\n",
" <td>43.802000</td>\n",
" <td>-79.198169</td>\n",
" <td>Coffee Shop</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Postcode Postcode Latitude Postcode Longitude Venue \\\n",
"0 M1B 43.806686 -79.194353 Images Salon & Spa \n",
"1 M1B 43.806686 -79.194353 Wendy's \n",
"2 M1B 43.806686 -79.194353 Wendy's \n",
"3 M1B 43.806686 -79.194353 Staples Morningside \n",
"4 M1B 43.806686 -79.194353 Tim Hortons \n",
"\n",
" Venue Latitude Venue Longitude Venue Category \n",
"0 43.802283 -79.198565 Spa \n",
"1 43.807448 -79.199056 Fast Food Restaurant \n",
"2 43.802008 -79.198080 Fast Food Restaurant \n",
"3 43.800285 -79.196607 Paper / Office Supplies Store \n",
"4 43.802000 -79.198169 Coffee Shop "
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toronto_venues = getNearbyVenues(names=df_full['Postcode'], latitudes=df_full['Latitude'], longitudes=df_full['Longitude'])\n",
"\n",
"toronto_venues.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will one-hot encode our venue data and re-add the postcode column for grouping."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Accessories Store</th>\n",
" <th>Adult Boutique</th>\n",
" <th>Afghan Restaurant</th>\n",
" <th>African Restaurant</th>\n",
" <th>Airport</th>\n",
" <th>Airport Food Court</th>\n",
" <th>Airport Gate</th>\n",
" <th>Airport Lounge</th>\n",
" <th>Airport Service</th>\n",
" <th>Airport Terminal</th>\n",
" <th>...</th>\n",
" <th>Video Game Store</th>\n",
" <th>Video Store</th>\n",
" <th>Vietnamese Restaurant</th>\n",
" <th>Warehouse Store</th>\n",
" <th>Wine Bar</th>\n",
" <th>Wine Shop</th>\n",
" <th>Wings Joint</th>\n",
" <th>Women's Store</th>\n",
" <th>Yoga Studio</th>\n",
" <th>Postcode</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>M1B</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>M1B</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>M1B</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>M1B</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>M1B</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 322 columns</p>\n",
"</div>"
],
"text/plain": [
" Accessories Store Adult Boutique Afghan Restaurant African Restaurant \\\n",
"0 0 0 0 0 \n",
"1 0 0 0 0 \n",
"2 0 0 0 0 \n",
"3 0 0 0 0 \n",
"4 0 0 0 0 \n",
"\n",
" Airport Airport Food Court Airport Gate Airport Lounge Airport Service \\\n",
"0 0 0 0 0 0 \n",
"1 0 0 0 0 0 \n",
"2 0 0 0 0 0 \n",
"3 0 0 0 0 0 \n",
"4 0 0 0 0 0 \n",
"\n",
" Airport Terminal ... Video Game Store Video Store \\\n",
"0 0 ... 0 0 \n",
"1 0 ... 0 0 \n",
"2 0 ... 0 0 \n",
"3 0 ... 0 0 \n",
"4 0 ... 0 0 \n",
"\n",
" Vietnamese Restaurant Warehouse Store Wine Bar Wine Shop Wings Joint \\\n",
"0 0 0 0 0 0 \n",
"1 0 0 0 0 0 \n",
"2 0 0 0 0 0 \n",
"3 0 0 0 0 0 \n",
"4 0 0 0 0 0 \n",
"\n",
" Women's Store Yoga Studio Postcode \n",
"0 0 0 M1B \n",
"1 0 0 M1B \n",
"2 0 0 M1B \n",
"3 0 0 M1B \n",
"4 0 0 M1B \n",
"\n",
"[5 rows x 322 columns]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix=\"\", prefix_sep=\"\")\n",
"toronto_onehot['Postcode'] = toronto_venues['Postcode'] \n",
"\n",
"toronto_onehot.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will group our venue information in 2 ways for cluster analysis:\n",
"* **sum** - Venue type columns will record the total number of venues in the neighborhood of this type. This will give us an idea of what is available nearby and consider the number of venues of each type.\n",
"* **mean** - Venue type columns will record the proportion of venues in the neighborhood that are of this type. This will de-prioritize the total number of venues in order to prevent our clustering from simply representing the overall development/density of the area.\n"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Postcode</th>\n",
" <th>Accessories Store</th>\n",
" <th>Adult Boutique</th>\n",
" <th>Afghan Restaurant</th>\n",
" <th>African Restaurant</th>\n",
" <th>Airport</th>\n",
" <th>Airport Food Court</th>\n",
" <th>Airport Gate</th>\n",
" <th>Airport Lounge</th>\n",
" <th>Airport Service</th>\n",
" <th>...</th>\n",
" <th>Vegetarian / Vegan Restaurant</th>\n",
" <th>Video Game Store</th>\n",
" <th>Video Store</th>\n",
" <th>Vietnamese Restaurant</th>\n",
" <th>Warehouse Store</th>\n",
" <th>Wine Bar</th>\n",
" <th>Wine Shop</th>\n",
" <th>Wings Joint</th>\n",
" <th>Women's Store</th>\n",
" <th>Yoga Studio</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>M1B</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>M1C</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>M1E</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>M1G</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>M1H</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 322 columns</p>\n",
"</div>"
],
"text/plain": [
" Postcode Accessories Store Adult Boutique Afghan Restaurant \\\n",
"0 M1B 0 0 0 \n",
"1 M1C 0 0 0 \n",
"2 M1E 0 0 0 \n",
"3 M1G 0 0 0 \n",
"4 M1H 0 0 0 \n",
"\n",
" African Restaurant Airport Airport Food Court Airport Gate \\\n",
"0 1 0 0 0 \n",
"1 0 0 0 0 \n",
"2 0 0 0 0 \n",
"3 0 0 0 0 \n",
"4 0 0 0 0 \n",
"\n",
" Airport Lounge Airport Service ... Vegetarian / Vegan Restaurant \\\n",
"0 0 0 ... 0 \n",
"1 0 0 ... 0 \n",
"2 0 0 ... 0 \n",
"3 0 0 ... 0 \n",
"4 0 0 ... 0 \n",
"\n",
" Video Game Store Video Store Vietnamese Restaurant Warehouse Store \\\n",
"0 0 0 0 0 \n",
"1 0 0 0 0 \n",
"2 0 0 0 0 \n",
"3 0 0 0 0 \n",
"4 0 0 0 0 \n",
"\n",
" Wine Bar Wine Shop Wings Joint Women's Store Yoga Studio \n",
"0 0 0 0 0 0 \n",
"1 0 0 0 0 0 \n",
"2 0 0 0 0 0 \n",
"3 0 0 0 0 0 \n",
"4 0 0 0 0 1 \n",
"\n",
"[5 rows x 322 columns]"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toronto_grouped_sum = toronto_onehot.groupby('Postcode').sum().reset_index()\n",
"toronto_grouped_mean = toronto_onehot.groupby('Postcode').mean().reset_index()\n",
"toronto_grouped_sum.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we will apply our clustering to both sets of grouped data."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 1,\n",
" 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 3, 4, 0, 4,\n",
" 4, 4, 1, 4, 1, 4, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 4, 4, 4, 1, 1,\n",
" 0, 4, 2, 2, 4, 4, 4, 4, 4, 4, 0, 1, 4, 4, 4, 0, 0, 1, 1, 4, 4, 4,\n",
" 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4], dtype=int32)"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kclusters = 5\n",
"post_temp = toronto_grouped_sum['Postcode']\n",
"toronto_grouped_clustering_sum = toronto_grouped_sum.drop('Postcode', 1)\n",
"toronto_grouped_clustering_mean = toronto_grouped_mean.drop('Postcode', 1)\n",
"# run k-means clustering\n",
"kmeans_sum = KMeans(n_clusters=kclusters, random_state=10).fit(toronto_grouped_clustering_sum)\n",
"kmeans_mean = KMeans(n_clusters=kclusters, random_state=10).fit(toronto_grouped_clustering_mean)\n",
"# check cluster labels generated for each row in the dataframe\n",
"kmeans_sum.labels_[0:100] "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will record the clustering labels and store in a new data frame with the postcode data."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Cluster Sum</th>\n",
" <th>Cluster Mean</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Postcode</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>M1B</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>M1C</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>M1E</th>\n",
" <td>4</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>M1G</th>\n",
" <td>4</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>M1H</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Cluster Sum Cluster Mean\n",
"Postcode \n",
"M1B 4 1\n",
"M1C 4 1\n",
"M1E 4 0\n",
"M1G 4 3\n",
"M1H 4 1"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_clust = pd.DataFrame({'Cluster Sum': kmeans_sum.labels_, 'Cluster Mean': kmeans_mean.labels_})\n",
"df_clust.index = post_temp\n",
"df_clust.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now the cluster results are joined back to the consolidated full data frame."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Postcode</th>\n",
" <th>Borough</th>\n",
" <th>Neighbourhood</th>\n",
" <th>Latitude</th>\n",
" <th>Longitude</th>\n",
" <th>Cluster Sum</th>\n",
" <th>Cluster Mean</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>M1B</td>\n",
" <td>Scarborough</td>\n",
" <td>Rouge, Malvern</td>\n",
" <td>43.806686</td>\n",
" <td>-79.194353</td>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>M1C</td>\n",
" <td>Scarborough</td>\n",
" <td>Highland Creek, Rouge Hill, Port Union</td>\n",
" <td>43.784535</td>\n",
" <td>-79.160497</td>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>M1E</td>\n",
" <td>Scarborough</td>\n",
" <td>Guildwood, Morningside, West Hill</td>\n",
" <td>43.763573</td>\n",
" <td>-79.188711</td>\n",
" <td>4</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>M1G</td>\n",
" <td>Scarborough</td>\n",
" <td>Woburn</td>\n",
" <td>43.770992</td>\n",
" <td>-79.216917</td>\n",
" <td>4</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>M1H</td>\n",
" <td>Scarborough</td>\n",
" <td>Cedarbrae</td>\n",
" <td>43.773136</td>\n",
" <td>-79.239476</td>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Postcode Borough Neighbourhood Latitude \\\n",
"0 M1B Scarborough Rouge, Malvern 43.806686 \n",
"1 M1C Scarborough Highland Creek, Rouge Hill, Port Union 43.784535 \n",
"2 M1E Scarborough Guildwood, Morningside, West Hill 43.763573 \n",
"3 M1G Scarborough Woburn 43.770992 \n",
"4 M1H Scarborough Cedarbrae 43.773136 \n",
"\n",
" Longitude Cluster Sum Cluster Mean \n",
"0 -79.194353 4 1 \n",
"1 -79.160497 4 1 \n",
"2 -79.188711 4 0 \n",
"3 -79.216917 4 3 \n",
"4 -79.239476 4 1 "
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_full_clust = df_full.join(df_clust, on='Postcode', how='inner')\n",
"df_full_clust.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will first use the sum based clustering to colour our postcodes on a folium map."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"width:100%;\"><div style=\"position:relative;width:100%;height:0;padding-bottom:60%;\"><iframe src=\"data:text/html;charset=utf-8;base64,\" style=\"position:absolute;width:100%;height:100%;left:0;top:0;border:none !important;\" allowfullscreen webkitallowfullscreen mozallowfullscreen></iframe></div></div>"
],
"text/plain": [
"<folium.folium.Map at 0x7f83fe7e8390>"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# set color scheme for the clusters\n",
"x = np.arange(kclusters)\n",
"#ys = [i + x + (i*x)**2 for i in range(kclusters)]\n",
"#colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))\n",
"#rainbow = [colors.rgb2hex(i) for i in colors_array]\n",
"\n",
"rainbow = ['red','blue','yellow','green','orange']\n",
"# create map\n",
"map_clusters = folium.Map(location=[43.7,-79.4], zoom_start=10.5)\n",
"\n",
"# add markers to the map\n",
"markers_colors = []\n",
"for lat, lon, poi, cluster in zip(df_full_clust['Latitude'], df_full_clust['Longitude'], df_full_clust['Neighbourhood'], df_full_clust['Cluster Sum']):\n",
" label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)\n",
" folium.CircleMarker(\n",
" [lat, lon],\n",
" radius=5,\n",
" popup=label,\n",
" color=rainbow[cluster-1],\n",
" fill=True,\n",
" fill_color=rainbow[cluster-1],\n",
" fill_opacity=0.7).add_to(map_clusters)\n",
" \n",
"map_clusters"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The sum based clustering seems to be significantly influenced by proximity to the city center but does provide some information. Lets examine the areas of different clusters to see what we might find in each cluster. We start by grouping our postcodes by cluster membership and taking the average number of each venue type."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"---- Average Area in red Cluster Contains ----\n",
"Bar: 6.0\n",
"Café: 4.8\n",
"Coffee Shop: 3.8\n",
"Bakery: 3.6\n",
"Italian Restaurant: 2.6\n",
"---- Average Area in blue Cluster Contains ----\n",
"Coffee Shop: 7.769230769230769\n",
"Café: 4.230769230769231\n",
"Restaurant: 2.769230769230769\n",
"Italian Restaurant: 2.6923076923076925\n",
"Pizza Place: 2.6923076923076925\n",
"---- Average Area in yellow Cluster Contains ----\n",
"Coffee Shop: 9.125\n",
"Hotel: 6.25\n",
"Café: 6.0\n",
"Restaurant: 3.625\n",
"Gastropub: 2.375\n",
"---- Average Area in green Cluster Contains ----\n",
"Greek Restaurant: 12.0\n",
"Coffee Shop: 7.0\n",
"Pub: 5.0\n",
"Café: 4.0\n",
"Fast Food Restaurant: 3.0\n",
"---- Average Area in orange Cluster Contains ----\n",
"Coffee Shop: 1.1756756756756757\n",
"Pizza Place: 0.7567567567567568\n",
"Park: 0.6216216216216216\n",
"Fast Food Restaurant: 0.6081081081081081\n",
"Sandwich Place: 0.5135135135135135\n"
]
}
],
"source": [
"toronto_grouped_clustering_sum['Cluster'] = kmeans_sum.labels_\n",
"sum_cluster_means = toronto_grouped_clustering_sum.groupby('Cluster').mean()\n",
"sum_cluster_means.index = rainbow \n",
"num_top_venues = 5\n",
"\n",
"for c in sum_cluster_means.index:\n",
" print(\"---- Average Area in \"+c+\" Cluster Contains ----\")\n",
" temp = sum_cluster_means[sum_cluster_means.index==c]\n",
" for _ in range(num_top_venues):\n",
" ven_type = temp.idxmax(axis=1)[0]\n",
" print('{}: {}'.format(ven_type, temp.loc[c, ven_type]))\n",
" temp = temp.drop(temp.idxmax(axis=1), axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now lets visualise the mean based clustering."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"width:100%;\"><div style=\"position:relative;width:100%;height:0;padding-bottom:60%;\"><iframe src=\"data:text/html;charset=utf-8;base64,\" style=\"position:absolute;width:100%;height:100%;left:0;top:0;border:none !important;\" allowfullscreen webkitallowfullscreen mozallowfullscreen></iframe></div></div>"
],
"text/plain": [
"<folium.folium.Map at 0x7f842c1797f0>"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"map_clusters_mean = folium.Map(location=[43.7,-79.4], zoom_start=10.5)\n",
"\n",
"# add markers to the map\n",
"markers_colors = []\n",
"for lat, lon, poi, cluster in zip(df_full_clust['Latitude'], df_full_clust['Longitude'], df_full_clust['Neighbourhood'], df_full_clust['Cluster Mean']):\n",
" label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)\n",
" folium.CircleMarker(\n",
" [lat, lon],\n",
" radius=5,\n",
" popup=label,\n",
" color=rainbow[cluster-1],\n",
" fill=True,\n",
" fill_color=rainbow[cluster-1],\n",
" fill_opacity=0.7).add_to(map_clusters_mean)\n",
" \n",
"map_clusters_mean"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The mean based clustering is, as expected, less sensitive to the highly developed city center and provides some interesting distinctions between areas. Lets examine the cluster centers to see what venues are underlying this clustering."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"---- Average Area in red Cluster Contains ----\n",
"Pizza Place: 0.16406371406371406\n",
"Grocery Store: 0.12463092463092464\n",
"Coffee Shop: 0.0704998704998705\n",
"Pharmacy: 0.06225071225071225\n",
"Fast Food Restaurant: 0.049119399119399124\n",
"---- Average Area in blue Cluster Contains ----\n",
"Coffee Shop: 0.07749833920188944\n",
"Pizza Place: 0.037358320729786176\n",
"Café: 0.033658561226974595\n",
"Sandwich Place: 0.03005853238822756\n",
"Fast Food Restaurant: 0.028875273299247304\n",
"---- Average Area in yellow Cluster Contains ----\n",
"Bakery: 0.5\n",
"Empanada Restaurant: 0.25\n",
"Pizza Place: 0.25\n",
"Accessories Store: 0.0\n",
"Adult Boutique: 0.0\n",
"---- Average Area in green Cluster Contains ----\n",
"Park: 0.20305555555555554\n",
"Playground: 0.10527777777777778\n",
"Bank: 0.061388888888888896\n",
"Trail: 0.05\n",
"Business Service: 0.041666666666666664\n",
"---- Average Area in orange Cluster Contains ----\n",
"Home Service: 0.35\n",
"Bakery: 0.125\n",
"Construction & Landscaping: 0.125\n",
"Baseball Field: 0.1\n",
"Market: 0.1\n"
]
}
],
"source": [
"toronto_grouped_clustering_mean['Cluster'] = kmeans_mean.labels_\n",
"mean_cluster_means = toronto_grouped_clustering_mean.groupby('Cluster').mean()\n",
"mean_cluster_means.index = rainbow \n",
"num_top_venues = 5\n",
"\n",
"for c in mean_cluster_means.index:\n",
" print(\"---- Average Area in \"+c+\" Cluster Contains ----\")\n",
" temp = mean_cluster_means[mean_cluster_means.index==c]\n",
" for _ in range(num_top_venues):\n",
" ven_type = temp.idxmax(axis=1)[0]\n",
" print('{}: {}'.format(ven_type, temp.loc[c, ven_type]))\n",
" temp = temp.drop(temp.idxmax(axis=1), axis=1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment