Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save cjnolet/b1e5cc7df9b5db727d263ccf2aaf0bdd to your computer and use it in GitHub Desktop.
Save cjnolet/b1e5cc7df9b5db727d263ccf2aaf0bdd to your computer and use it in GitHub Desktop.
Creating a Symmetrized K-Nearest Neighbors Graph on GPU with cuML

Brief Example of Building a KNN Graph from cuML's NearestNeighbors Output

Import cupy and cuML's NearestNeighbors estimator

import cupy
from cuml.neighbors import NearestNeighbors

Set up some variables and create some random data on GPU

k = 5
n_samples = 100
n_features = 100

X = cupy.random.random((n_samples, n_features))

Execute k-Nearest Neighbors to find neighborhoods

knn = NearestNeighbors(n_neighbors=k)
knn.fit(X)
distances, indices = knn.kneighbors(X)

Create symmetrized KNN graph from our output arrays

distances = distances.reshape(n_samples * k)
indices = indices.reshape(n_samples * k)

indptr = cupy.arange(0, (k*n_samples)+1, k)
knn_graph = cupy.sparse.csr_matrix((distances, indices, indptr), shape=(n_samples, n_samples)).get()
knn_graph_symmetrized = knn_graph.maximum(knn_graph.T)
@anirbankonar123
Copy link

Not working getting error : data indices and indptr should be 1-D. All the input arrays are n,1 shape, so cant figure out wheres the issue..

@cjnolet
Copy link
Author

cjnolet commented Aug 27, 2020

@anirbankonar123, can you share your version of cupy? (You should be able to run cupy.__version__ in the python shell after import).

The indptr, indices and distances arrays should all return a 1d tuple for their shape.

Here's an example:

>>> import cupy
>>> from cuml.neighbors import NearestNeighbors
>>> k = 5
>>> n_samples = 100
>>> n_features = 100
>>> 
>>> X = cupy.random.random((n_samples, n_features))
>>> knn = NearestNeighbors(n_neighbors=k)
>>> knn.fit(X)
[W] [08:52:09.109331] Expected column ('F') major order, but got the opposite. Converting data, this will result in additional memory utilization.
NearestNeighbors(n_neighbors=5, verbose=4, handle=<cuml.common.handle.Handle object at 0x7ff9cb765bb0>, algorithm='brute', metric='euclidean', p=2, metric_params=None, output_type='cupy')
>>> distances, indices = knn.kneighbors(X)
[W] [08:52:09.482936] Expected column ('F') major order, but got the opposite. Converting data, this will result in additional memory utilization.
>>> distances = distances.reshape(n_samples * k)
>>> indices = indices.reshape(n_samples * k)
>>> indptr = cupy.arange(0, (k*n_samples)+1, k)
>>> knn_graph = cupy.sparse.csr_matrix((distances, indices, indptr), shape=(n_samples, n_samples)).get()
>>> knn_graph = cupy.sparse.csr_matrix((distances, indices, indptr), shape=(n_samples, n_samples)).get()
>>> knn_graph_symmetrized = knn_graph.maximum(knn_graph.T)
>>> indices
array([ 0,  1, 35, 38, 81,  1, 73,  0, 38, 44,  2, 15, 52, 71, 31,  3, 62,
       29, 79, 33,  4, 53, 15, 72, 33,  5, 99, 28, 61, 95,  6, 52, 86, 70,
       97,  7, 59, 53, 35, 39,  8, 28, 49, 31, 48,  9, 33, 16, 31, 94, 10,
       33, 43, 60, 53, 11, 48, 18, 66, 63, 12,  7, 73, 24, 53, 13, 48, 96,
       15, 27, 14, 35, 95, 21, 58, 15, 33,  2, 13, 51, 16, 92, 40, 71, 69,
       17, 82, 30, 31, 53, 18, 53, 90, 40, 21, 19, 43, 33, 94, 34, 20, 53,
       47, 49, 82, 21, 80, 53, 49,  7, 22, 69, 43, 41, 46, 23, 90, 47, 99,
       79, 24, 60, 55, 73, 13, 25, 82, 15, 60, 71, 26, 55, 88, 80, 15, 27,
       55, 53, 31, 29, 28, 92,  8,  5, 48, 29, 86, 42,  3, 38, 30, 63, 17,
       71, 74, 31, 94, 61, 17, 27, 32, 68, 62, 53, 54, 33, 15,  9, 53, 10,
       34, 43, 55, 94, 46, 35, 53, 95,  7, 39, 36, 74, 21, 86, 61, 37, 49,
       20, 73, 51, 38, 39, 53, 29,  1, 39, 96, 38, 76,  7, 40, 16, 51, 90,
       53, 41, 66, 49, 71,  2, 42, 29, 53, 39, 54, 43, 69, 55, 71, 86, 44,
        1, 35, 93, 65, 45, 92, 90, 79, 80, 46, 53, 71, 59, 21, 47, 53, 99,
       86, 20, 48, 63, 11, 13, 79, 49, 71, 37, 85, 74, 50, 73, 86, 82, 93,
       51, 88, 40, 71, 49, 52, 94,  6, 90,  2, 53, 90, 47, 54, 82, 54, 53,
       42, 35, 32, 55, 27, 43, 60, 26, 56, 85, 53, 84, 71, 57, 82, 24, 37,
       53, 58, 74, 79, 90, 45, 59,  7, 46, 47, 87, 60, 70, 24, 55, 75, 61,
       31,  5, 86, 74, 62, 74,  3, 32, 90, 63, 30, 48, 31, 35, 64, 90, 95,
       53, 40, 65, 99, 44, 15, 85, 66, 41, 90, 73,  1, 67, 95,  0, 64, 31,
       68, 32, 16, 94, 90, 69, 43, 74, 22, 87, 70, 60,  6, 73, 55, 71, 49,
       16, 46, 51, 72, 78, 42, 69, 87, 73, 86, 99, 50, 96, 74, 62, 69, 77,
       58, 75, 53, 60, 49, 94, 76, 39, 45, 33, 86, 77, 74, 96, 99, 39, 78,
       72, 48, 95, 71, 79, 45, 58, 74, 99, 80, 21, 45, 71, 26, 81,  0, 77,
       35, 46, 82, 53, 88, 17, 25, 83, 33, 90, 39, 15, 84, 95, 45, 53, 21,
       85, 49, 56, 74, 53, 86, 73, 93, 29, 47, 87, 53, 69, 59, 82, 88, 51,
       82, 91, 26, 89, 77, 37, 69,  7, 90, 53, 64, 40, 95, 91, 53, 88, 71,
       49, 92, 45, 16, 28, 82, 93, 86, 73, 50, 39, 94, 31, 52, 74, 75, 95,
       99, 35, 90, 53, 96, 39, 73, 13, 77, 97, 74, 48, 69, 76, 98, 56, 64,
       67, 51, 99, 73, 95,  5, 47])
>>> indices.shape
(500,)
>>> indptr.shape
(101,)
>>> distances.shape
(500,)

@anirbankonar123
Copy link

my n_samples = 100000
n_features = 400
Could that cause the issue. I have 1d tuples for the inputs in last but second step, but still getting the error. That is the only diff i see, you have n_samples and n_features both = 1000.

@anirbankonar123
Copy link

Capture

@cjnolet
Copy link
Author

cjnolet commented Aug 27, 2020

@anirbankonar123, can you share your version of cupy?

@anirbankonar123
Copy link

7.7.0

@vdet
Copy link

vdet commented Nov 21, 2022

Hi,

Has the above issue been solved, I also encountered it with cupy 11.0.0.

All the best

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment