Last active
September 25, 2018 18:25
-
-
Save ledell/7430ee045ae32210f656709ac3b80209 to your computer and use it in GitHub Desktop.
H2O K-Means Auto-estimate K (wine data demo)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# H2O's K-Means algo can estimate the optimal number of clusters (method by Leland Wilkinson) | |
# http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/k-means.html#estimating-k-in-k-means | |
# | |
# This demo is an extension of Kasia's blog post here: | |
# https://kkulma.github.io/2017-04-24-determining-optimal-number-of-clusters-in-your-data/ | |
library(rattle) # wine data | |
# Remove the factor col & convert to an H2O Frame | |
# Note: You can skip the scale() here since H2O K-Means standardizes automatically | |
data(wine) | |
wine <- as.h2o(wine[,-1]) | |
h2o.init() # Start a local H2O Cluster | |
# Train a H2O K-Means model, auto-estimate best value for k (note: k here is maximum possible k) | |
fit <- h2o.kmeans(training_frame = wine, k = 20, estimate_k = TRUE) | |
# It found 3 clusters to be optimal | |
print(fit) | |
# Model Details: | |
# ============== | |
# | |
# H2OClusteringModel: kmeans | |
# Model ID: KMeans_model_R_1503103871072_10 | |
# Model Summary: | |
# number_of_rows number_of_clusters number_of_categorical_columns | |
# 1 178 3 0 | |
# number_of_iterations within_cluster_sum_of_squares | |
# 1 19 1270.74912 | |
# total_sum_of_squares between_cluster_sum_of_squares | |
# 1 2301.00000 1030.25088 | |
# | |
# | |
# H2OClusteringMetrics: kmeans | |
# ** Reported on training data. ** | |
# | |
# | |
# Total Within SS: 1270.749 | |
# Between SS: 1030.251 | |
# Total SS: 2301 | |
# Centroid Statistics: | |
# centroid size within_cluster_sum_of_squares | |
# 1 1 51.00000 326.35370 | |
# 2 2 65.00000 558.69710 | |
# 3 3 62.00000 385.69830 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment