Skip to content

Instantly share code, notes, and snippets.

@sicongzhao
Created July 2, 2020 21:57
Show Gist options
  • Save sicongzhao/987c4052a58020b6ef110aa1b3d3cf05 to your computer and use it in GitHub Desktop.
Save sicongzhao/987c4052a58020b6ef110aa1b3d3cf05 to your computer and use it in GitHub Desktop.
import matplotlib.pyplot as plt
from sklearn.datasets.samples_generator import make_blobs
from sklearn.cluster import KMeans
# Generate data
X, _ = make_blobs(n_samples=300, centers=5,
cluster_std=2, random_state=0)
# Fit K-means with different choice of K,
# and save the corresponding S
S_values =[]
for i in range(1, 11):
clust = KMeans(n_clusters = i).fit(X)
# clust.inertia_ is the sum of the within cluster variance
S_values.append(clust.inertia_)
# Visualize the relationship between S and K
plt.figure(figsize=(10,6))
plt.plot(range(1, 11), cost, color ='#8B81C4', linewidth ='2')
plt.xlabel("K")
plt.ylabel("Sum of Within Cluster Variance (S)")
plt.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment