Skip to content

Instantly share code, notes, and snippets.

@cjnolet
Last active January 14, 2023 03:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cjnolet/6638574451ad8a822d89fa6cc9dbf3a0 to your computer and use it in GitHub Desktop.
Save cjnolet/6638574451ad8a822d89fa6cc9dbf3a0 to your computer and use it in GitHub Desktop.
Demonstration of GPU-enabled HDBSCAN on Single-Cell RNA Dataset
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# RAPIDS & Scanpy Single-Cell RNA-seq Workflow"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) 2020, NVIDIA CORPORATION.\n",
"\n",
"Licensed under the Apache License, Version 2.0 (the \"License\") you may not use this file except in compliance with the License. You may obtain a copy of the License at\n",
"\n",
" http://www.apache.org/licenses/LICENSE-2.0 \n",
"\n",
"Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook demonstrates a single-cell RNA analysis workflow that begins with preprocessing a count matrix of size `(n_gene, n_cell)` and results in a visualization of the clustered cells for further analysis."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For demonstration purposes, we use a dataset of ~70,000 human lung cells from Travaglini et al. 2020 (https://www.biorxiv.org/content/10.1101/742320v2) and label cells using the ACE2 and TMPRSS2 genes. See the README for instructions to download this dataset."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Import requirements"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import scanpy as sc\n",
"import anndata\n",
"\n",
"import time\n",
"import os\n",
"\n",
"import cudf\n",
"import cupy as cp\n",
"\n",
"from cuml.decomposition import PCA\n",
"from cuml.manifold import TSNE\n",
"from cuml.cluster import KMeans\n",
"\n",
"import rapids_scanpy_funcs\n",
"\n",
"import warnings\n",
"warnings.filterwarnings('ignore', 'Expected ')\n",
"warnings.simplefilter('ignore')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We use the RAPIDS memory manager on the GPU to control how memory is allocated."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import rmm\n",
"\n",
"rmm.reinitialize(\n",
" managed_memory=True, # Allows oversubscription\n",
" pool_allocator=False, # default is False\n",
" devices=0, # GPU device IDs to register. By default registers only GPU 0.\n",
")\n",
"\n",
"cp.cuda.set_allocator(rmm.rmm_cupy_allocator)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Input data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the cell below, we provide the path to the `.h5ad` file containing the count matrix to analyze. Please see the README for instructions on how to download the dataset we use here.\n",
"\n",
"We recommend saving count matrices in the sparse .h5ad format as it is much faster to load than a dense CSV file. To run this notebook using your own dataset, please see the README for instructions to convert your own count matrix into this format. Then, replace the path in the cell below with the path to your generated `.h5ad` file."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"input_file = \"../data/krasnow_hlca_10x.sparse.h5ad\"\n",
"\n",
"if not os.path.exists(input_file):\n",
" print('Downloading import file...')\n",
" os.makedirs('../data', exist_ok=True)\n",
" wget.download('https://rapids-single-cell-examples.s3.us-east-2.amazonaws.com/krasnow_hlca_10x.sparse.h5ad',\n",
" input_file)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set parameters"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# marker genes\n",
"RIBO_GENE_PREFIX = \"RPS\" # Prefix for ribosomal genes to regress out\n",
"markers = [\"ACE2\", \"TMPRSS2\", \"EPCAM\"] # Marker genes for visualization\n",
"\n",
"# filtering cells\n",
"min_genes_per_cell = 200 # Filter out cells with fewer genes than this expressed \n",
"max_genes_per_cell = 6000 # Filter out cells with more genes than this expressed \n",
"\n",
"# filtering genes\n",
"n_top_genes = 5000 # Number of highly variable genes to retain\n",
"\n",
"# PCA\n",
"n_components = 50 # Number of principal components to compute\n",
"\n",
"# t-SNE\n",
"tsne_n_pcs = 20 # Number of principal components to use for t-SNE\n",
"\n",
"# k-means\n",
"k = 35 # Number of clusters for k-means\n",
"\n",
"# KNN\n",
"n_neighbors = 15 # Number of nearest neighbors for KNN graph\n",
"knn_n_pcs = 50 # Number of principal components to use for finding nearest neighbors\n",
"\n",
"# UMAP\n",
"umap_min_dist = 0.3 \n",
"umap_spread = 1.0\n",
"\n",
"# Gene ranking\n",
"ranking_n_top_genes = 50 # Number of differential genes to compute for each cluster"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"start = time.time()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load and Prepare Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We load the sparse count matrix from an `h5ad` file using Scanpy. The sparse count matrix will then be placed on the GPU. "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"data_load_start = time.time()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 125 ms, sys: 228 ms, total: 354 ms\n",
"Wall time: 353 ms\n"
]
}
],
"source": [
"%%time\n",
"adata = sc.read(input_file)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(65662, 26485)"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"adata.X.shape"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"a = np.diff(adata.X.indptr)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1347, 1713, 1185, ..., 651, 1050, 2218], dtype=int32)"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"52985"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(a[a<3000])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We maintain the index of unique genes in our dataset:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 836 ms, sys: 586 ms, total: 1.42 s\n",
"Wall time: 1.45 s\n"
]
}
],
"source": [
"%%time\n",
"genes = cudf.Series(adata.var_names)\n",
"sparse_gpu_array = cp.sparse.csr_matrix(adata.X)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Verify the shape of the resulting sparse matrix:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(65662, 26485)"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sparse_gpu_array.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And the number of non-zero values in the matrix:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"126510394"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sparse_gpu_array.nnz"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Total data load and format time: 1.8720729351043701\n"
]
}
],
"source": [
"data_load_time = time.time()\n",
"print(\"Total data load and format time: %s\" % (data_load_time-data_load_start))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preprocessing"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"preprocess_start = time.time()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Filter"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We filter the count matrix to remove cells with an extreme number of genes expressed."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 535 ms, sys: 304 ms, total: 839 ms\n",
"Wall time: 838 ms\n"
]
}
],
"source": [
"%%time\n",
"sparse_gpu_array = rapids_scanpy_funcs.filter_cells(sparse_gpu_array, min_genes=min_genes_per_cell, max_genes=max_genes_per_cell)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Some genes will now have zero expression in all cells. We filter out such genes."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1.03 s, sys: 162 ms, total: 1.19 s\n",
"Wall time: 1.19 s\n"
]
}
],
"source": [
"%%time\n",
"sparse_gpu_array, genes = rapids_scanpy_funcs.filter_genes(sparse_gpu_array, genes, min_cells=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The size of our count matrix is now reduced."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(65462, 22058)"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sparse_gpu_array.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Normalize"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We normalize the count matrix so that the total counts in each cell sum to 1e4."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 415 µs, sys: 599 µs, total: 1.01 ms\n",
"Wall time: 747 µs\n"
]
}
],
"source": [
"%%time\n",
"sparse_gpu_array = rapids_scanpy_funcs.normalize_total(sparse_gpu_array, target_sum=1e4)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we log transform the count matrix."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 42.7 ms, sys: 52.4 ms, total: 95.1 ms\n",
"Wall time: 94.4 ms\n"
]
}
],
"source": [
"%%time\n",
"sparse_gpu_array = sparse_gpu_array.log1p()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Select Most Variable Genes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will now select the most variable genes in the dataset. However, we first save the 'raw' expression values of the ACE2 and TMPRSS2 genes to use for labeling cells afterward. We will also store the expression of an epithelial marker gene (EPCAM)."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 208 ms, sys: 175 ms, total: 383 ms\n",
"Wall time: 383 ms\n"
]
}
],
"source": [
"%%time\n",
"tmp_norm = sparse_gpu_array.tocsc()\n",
"marker_genes_raw = {\n",
" (\"%s_raw\" % marker): tmp_norm[:, genes[genes == marker].index[0]].todense().ravel()\n",
" for marker in markers\n",
"}\n",
"\n",
"del tmp_norm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we convert the count matrix to an annData object."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 178 ms, sys: 52.9 ms, total: 231 ms\n",
"Wall time: 229 ms\n"
]
}
],
"source": [
"%%time\n",
"adata = anndata.AnnData(sparse_gpu_array.get())\n",
"adata.var_names = genes.to_pandas()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using scanpy, we filter the count matrix to retain only the 5000 most variable genes."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 977 ms, sys: 26.5 ms, total: 1 s\n",
"Wall time: 1 s\n"
]
}
],
"source": [
"%%time\n",
"sc.pp.highly_variable_genes(adata, n_top_genes=n_top_genes, flavor=\"cell_ranger\")\n",
"adata = adata[:, adata.var.highly_variable]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Regress out confounding factors (number of counts, ribosomal gene expression)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now perform regression on the count matrix to correct for confounding factors - for example purposes, we use the number of counts and the expression of ribosomal genes. Many workflows use the expression of mitochondrial genes (named starting with `MT-`)."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"ribo_genes = adata.var_names.str.startswith(RIBO_GENE_PREFIX)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now calculate the total counts and the percentage of ribosomal counts for each cell."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 881 ms, sys: 32.3 ms, total: 914 ms\n",
"Wall time: 913 ms\n"
]
}
],
"source": [
"%%time\n",
"n_counts = adata.X.sum(axis=1)\n",
"percent_ribo = (adata.X[:,ribo_genes].sum(axis=1) / n_counts).ravel()\n",
"\n",
"n_counts = cp.array(n_counts).ravel()\n",
"percent_ribo = cp.array(percent_ribo).ravel()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And perform regression:"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 653 ms, sys: 114 ms, total: 767 ms\n",
"Wall time: 766 ms\n"
]
}
],
"source": [
"%%time\n",
"sparse_gpu_array = cp.sparse.csc_matrix(adata.X)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 31.8 s, sys: 10.5 s, total: 42.3 s\n",
"Wall time: 43.2 s\n"
]
}
],
"source": [
"%%time\n",
"sparse_gpu_array = rapids_scanpy_funcs.regress_out(sparse_gpu_array, n_counts, percent_ribo)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Scale"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we scale the count matrix to obtain a z-score and apply a cutoff value of 10 standard deviations, obtaining the preprocessed count matrix."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 119 ms, sys: 169 ms, total: 289 ms\n",
"Wall time: 287 ms\n"
]
}
],
"source": [
"%%time\n",
"sparse_gpu_array = rapids_scanpy_funcs.scale(sparse_gpu_array, max_value=10)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Total Preprocessing time: 48.95587778091431\n"
]
}
],
"source": [
"preprocess_time = time.time()\n",
"print(\"Total Preprocessing time: %s\" % (preprocess_time-preprocess_start))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Cluster & Visualize"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We store the preprocessed count matrix as an AnnData object, which is currently in host memory. We also add the expression levels of the marker genes as observations to the annData object."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 199 ms, sys: 92.1 ms, total: 292 ms\n",
"Wall time: 291 ms\n"
]
}
],
"source": [
"%%time\n",
"\n",
"genes = adata.var_names\n",
"adata = anndata.AnnData(sparse_gpu_array.get())\n",
"adata.var_names = genes\n",
"\n",
"for name, data in marker_genes_raw.items():\n",
" adata.obs[name] = data.get()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Reduce"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We use PCA to reduce the dimensionality of the matrix to its top 50 principal components."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1.14 s, sys: 1.41 s, total: 2.55 s\n",
"Wall time: 2.55 s\n"
]
}
],
"source": [
"%%time\n",
"adata.obsm[\"X_pca\"] = PCA(n_components=n_components, output_type=\"numpy\").fit_transform(adata.X)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### UMAP + Density clustering"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also visualize the cells using the UMAP algorithm in Rapids. Before UMAP, we need to construct a k-nearest neighbors graph in which each cell is connected to its nearest neighbors. This can be done conveniently using rapids functionality already integrated into Scanpy.\n",
"\n",
"Note that Scanpy uses an approximation to the nearest neighbors on the CPU while the GPU version performs an exact search. While both methods are known to yield useful results, some differences in the resulting visualization and clusters can be observed."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 4.04 s, sys: 276 ms, total: 4.32 s\n",
"Wall time: 4.3 s\n"
]
}
],
"source": [
"%%time\n",
"sc.pp.neighbors(adata, n_neighbors=n_neighbors, n_pcs=knn_n_pcs, method='rapids')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The UMAP function from Rapids is also integrated into Scanpy."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"WARNING: .obsp[\"connectivities\"] have not been computed using umap\n",
"CPU times: user 246 ms, sys: 229 ms, total: 475 ms\n",
"Wall time: 474 ms\n"
]
}
],
"source": [
"%%time\n",
"sc.tl.umap(adata, min_dist=umap_min_dist, spread=umap_spread, method='rapids')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we use the Louvain algorithm for graph-based clustering, once again using the `rapids` option in Scanpy."
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 718 ms, sys: 465 ms, total: 1.18 s\n",
"Wall time: 1.19 s\n"
]
}
],
"source": [
"%%time\n",
"import pandas as pd\n",
"from cuml.cluster import HDBSCAN\n",
"hdbscan = HDBSCAN(min_samples=5, min_cluster_size=30)\n",
"adata.obs['hdbscan_gpu'] = pd.Categorical(pd.Series(hdbscan.fit_predict(adata.obsm['X_pca'])))"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 23.5 s, sys: 1.2 s, total: 24.7 s\n",
"Wall time: 29.6 s\n"
]
}
],
"source": [
"%%time\n",
"import pandas as pd\n",
"from hdbscan import HDBSCAN as refHDBSCAN\n",
"hdbscan = refHDBSCAN(min_samples=5, min_cluster_size=30, core_dist_n_jobs=-1)\n",
"adata.obs['hdbscan_cpu'] = pd.Categorical(pd.Series(hdbscan.fit_predict(adata.obsm['X_pca'])))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We plot the cells using the UMAP visualization, and using the Louvain clusters as labels."
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 597 ms, sys: 12.2 ms, total: 609 ms\n",
"Wall time: 607 ms\n"
]
}
],
"source": [
"%%time\n",
"sc.pl.umap(adata, color=[\"hdbscan_gpu\"])"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python (cuml_2108_070921)",
"language": "python",
"name": "cuml_2108_070921"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
@amitlakhanpal
Copy link

Minor question, in the notebook there is mention of computing Louvain clusters in the text, while the code seems to annotate the adata with the output of HDBSCAN. Is there meant to be a connection between Louvain and HDBSCAN?
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment