Skip to content

Instantly share code, notes, and snippets.

@seenimohamed
Last active April 11, 2022 00:49
Show Gist options
  • Save seenimohamed/61dbffb714cc89e5116eaf5935ea554a to your computer and use it in GitHub Desktop.
Save seenimohamed/61dbffb714cc89e5116eaf5935ea554a to your computer and use it in GitHub Desktop.

Explainable cluster

Ever wonder, is there a way to explain clustered data?

Machine learning practitioners knows that SHAP is the go-to for any ml model explanation. Have you ever thought how can we use the SHAP to explain clustering data?

Here is the way to do it,

After normalising the data, run K-Means algorithm. To find optimal cluster count, can use elbow method. Cluster is formed now.

Now comes to the explanation part, we can take the cluster id as label for each data points. (eg. assume we have 4 features in dataset, now we have 1 more feature as label i.e, cluster id)

Now, run this new dataset against RandomForestClassifier with original features as X and new label as Y. Feed this classified model to SHAP. Plot summary plot in SHAP. Now we can able to see why a particular cluster formed, what feature has impacted that particular cluster.

#clustering #machinelearning #ml

credits : https://towardsdatascience.com/how-to-make-clustering-explainable-1582390476cc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment