Skip to content

Instantly share code, notes, and snippets.

@cjnolet
Last active July 9, 2021 19:18
Show Gist options
  • Save cjnolet/e775deaa3f6ac369eeee36d1b3009d3c to your computer and use it in GitHub Desktop.
Save cjnolet/e775deaa3f6ac369eeee36d1b3009d3c to your computer and use it in GitHub Desktop.
cuML HDBSCAN

Basic Usage

Example of training an HDBSCAN model using the hdbscan Python package in Scikit-learn contrib:

from sklearn import datasets
from hdbscan import HDBSCAN

X = datasets.make_moons(n_samples=50, noise=0.05)

model = HDBSCAN(min_samples=5)
y_hat = model.fit_predict(X)

And the same code using the GPU-Accelerated HDBSCAN in cuML (spoiler alert: the only difference is the import).

from sklearn import datasets
from cuml.cluster import HDBSCAN

X = datasets.make_moons(n_samples=50, noise=0.05)

model = HDBSCAN(min_samples=5)
y_hat = model.fit_predict(X)

Plotting

We can plot the minimum spanning tree the same way we would for the original HDBSCAN implementation:

model.minimum_spanning_tree_.plot()

The single linkage dendrogram and condensed tree can be plotted as well:

model.single_linkage_tree_.plot()
model.condensed_tree_.plot()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment