Skip to content

Instantly share code, notes, and snippets.

@calippo
Last active November 11, 2019 13:21
Show Gist options
  • Save calippo/20a147e657ee5e8d8666 to your computer and use it in GitHub Desktop.
Save calippo/20a147e657ee5e8d8666 to your computer and use it in GitHub Desktop.
[scikit-learn/sklearn, pandas] Plot percent of variance explained for KMeans (Elbow Method)
import pandas as pd
import matplotlib.pyplot as plt
import seaborn
from sklearn.cluster import KMeans
import numpy as np
from scipy.spatial.distance import cdist, pdist
def elbow(df, n):
kMeansVar = [KMeans(n_clusters=k).fit(df.values) for k in range(1, n)]
centroids = [X.cluster_centers_ for X in kMeansVar]
k_euclid = [cdist(df.values, cent) for cent in centroids]
dist = [np.min(ke, axis=1) for ke in k_euclid]
wcss = [sum(d**2) for d in dist]
tss = sum(pdist(df.values)**2)/df.values.shape[0]
bss = tss - wcss
plt.plot(bss)
plt.show()
@cgrinaldi
Copy link

cgrinaldi commented May 16, 2016

I think you are missing from scipy.spatial.distance import cdist. And thanks for posting!

@cheniel
Copy link

cheniel commented May 17, 2016

Missing pdist as well:
from scipy.spatial.distance import cdist, pdist

@cheniel
Copy link

cheniel commented May 18, 2016

@calippo
Copy link
Author

calippo commented May 3, 2017

thanks! including

@divyamounika
Copy link

score function in from sklearn.cluster import KMeans gives the same graph pattern

@ericbf
Copy link

ericbf commented Apr 24, 2018

Did you purposely spell this eblow instead of elbow?

@calippo
Copy link
Author

calippo commented Jul 19, 2018

@ericbf nope, didn't notice :). Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment