Skip to content

Instantly share code, notes, and snippets.

@cjnolet
Last active August 17, 2022 05:35
Show Gist options
  • Save cjnolet/90257994480f51e377d4fa56a313925f to your computer and use it in GitHub Desktop.
Save cjnolet/90257994480f51e377d4fa56a313925f to your computer and use it in GitHub Desktop.
Simple example of cuML's K-Means Single-GPU (SG) and Multi-Node Multi-GPU (MNMG) APIs compared to Scikit-learn and Dask-ML

Comparing cuML K-Means API Against Scikit-learn & Dask-ML

First, a quick code example of K-Means in Scikit-learn

from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans

n_centers = 5

X, _ = make_blobs(n_samples=10000, n_centers=n_centers)

k_means = KMeans(n_clusters = n_centers)
k_means.fit(X)

labels = k_means.predict(X)

To use cuML's Single-GPU API, we just change the imports

from cuml.cluster import KMeans
from cuml.datasets import make_blobs

n_centers = 5

X, _ = make_blobs(n_samples=10000, n_centers=n_centers)

k_means = KMeans(n_clusters=n_centers)
k_means.fit(X)

labels = k_means.predict(X)

To use KMeans in Dask-ML, which is CPU-based, we just need to create a Dask Client

from dask_ml.cluster import KMeans
from sklearn.datasets import make_blobs

from dask.distributed import Client
c = Client(<scheduler_address>)

n_centers = 5

X, _ = make_blobs(n_samples=10000, n_centers=n_centers)

k_means = KMeans(n_clusters=n_centers)
k_means.fit(X)

labels = k_means.predict(X)

And to use the multi-node multi-GPU API, we just change the imports again

from cuml.dask.cluster import KMeans
from cuml.dask.datasets import make_blobs

from dask.distributed import Client
c = Client(<scheduler_address>)

n_centers = 5

X, _ = make_blobs(n_samples=10000, n_centers=n_centers)

k_means = KMeans(n_clusters=n_centers)
k_means.fit(X)

labels = k_means.predict(X)

Note: cuml.dask.datasets.make_blobs is available in cuML as of version 0.10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment