Skip to content

Instantly share code, notes, and snippets.

@rrblogdatascience
Last active August 29, 2015 14:05
Computing k-means clustering in MADlib
SELECT data.*, (madlib.closest_column(centroids, points)).column_id as cluster_id
FROM public.iris_data as data,
(SELECT centroids
FROM madlib.kmeanspp('iris_data', 'points',
<Parameters.K>,
<Parameters.distance function>,
<Parameters.aggregation method>,
<Parameters.max number of iterations>,
<Parameters.min frac reassigned >)) as centroids
ORDER BY data.pid
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment