Skip to content

Instantly share code, notes, and snippets.

@97k
Created July 12, 2018 11:32
Show Gist options
  • Save 97k/dabd51ae00298b176a58c027ca8b21d0 to your computer and use it in GitHub Desktop.
Save 97k/dabd51ae00298b176a58c027ca8b21d0 to your computer and use it in GitHub Desktop.
Sample Code which represent Kmeans algo and plots the result using matplotlib. Dataset was made using make_blobs(sklearn.dataset.make_blobs)
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_blobs
# data is a tuple with 2 columns, one is the dataset that we made, and second is the labels representing centre.
data = make_blobs(n_samples = 200, n_features=2, centers=3, cluster_std=1.5)
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3)
kmeans.fit(data[0])
# kmeans.labels_ <- These are the labels to the cluster's centroid
# Since we actually do know the labels here (data[1]), we can check how good our kmeans is in finding the clusters!
fig, (ax1, ax2) = plt.subplots(1, 2, sharey=True, figsize=(12, 8))
ax1.set_title('Kmeans clusters')
ax1.scatter(data[0][:, 0], data[0][:, 1], c=kmeans.labels_, cmap='rainbow')
ax2.set_title('Actual dataset using labels from make_blobs')
ax2.scatter(data[0][:, 0], data[0][:, 1], c=data[1], cmap='rainbow')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment