Skip to content

Instantly share code, notes, and snippets.

@digitalWestie
Last active October 30, 2019 12:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save digitalWestie/ea468aa340722c5b985eba17d21e27a3 to your computer and use it in GitHub Desktop.
Save digitalWestie/ea468aa340722c5b985eba17d21e27a3 to your computer and use it in GitHub Desktop.
Basic clustering for UrbanTide analytics
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
import pandas as pd
import io
import requests
from sklearn.cluster import KMeans
#Download datasets
url = "https://gist.githubusercontent.com/digitalWestie/b68b86cae1d893d4d3d3b01aca59be8d/raw/28908e0d394802181762dc7429f67c0f79fb9fad/Make%2520Model%2520Data%25202016-edited.csv"
s=requests.get(url).content
dataset=pd.read_csv(io.StringIO(s.decode('windows-1252')))
dataset.iloc[:3,:]
# Only include specified columns:
subset = dataset.loc[:, ['Label', 'Engine, Noise and Exhaust %', 'Chassis and Body %']]
subset
#Discard label columns (we only want to feed numeric values to algorithm)
subset_data=subset.iloc[:, 1:]
#Run clustering (4 clusters)
kmeans = KMeans(n_clusters=4).fit(subset_data)
y_kmeans = kmeans.predict(subset_data)
y_kmeans
#Draw graph of cluster
from matplotlib import pyplot as plt
plt.scatter(subset_data.iloc[:,0], subset_data.iloc[:,1], c=y_kmeans, s=50, cmap='viridis')
centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='black', s=200, alpha=0.5);
#Combine labels with groups
result = pd.crosstab(subset.iloc[:,0], y_kmeans)
result
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment