Skip to content

Instantly share code, notes, and snippets.

@mdouze
Last active November 21, 2023 11:35
Show Gist options
  • Save mdouze/46d6bbbaabca0b9778fca37ed2bcccf6 to your computer and use it in GitHub Desktop.
Save mdouze/46d6bbbaabca0b9778fca37ed2bcccf6 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@mylyu
Copy link

mylyu commented Dec 15, 2019 via email

@mylyu
Copy link

mylyu commented Dec 19, 2019

@mdouze How should I save such index of mixed gpu and cpu parts? I tried directly using write_index, there was no error. Then I reloaded the index and added vectors on a cpu machine, but search was extremely slow. I checked

il = faiss.extract_index_ivf(index).invlists
list_sizes = [il.list_size(i) for i in range(il.nlist)] 

The cluster sizes are very imbalanced.
image

Should I use something like:
index_ivf.clustering_index = index_gpu_to_cpu(index_ivf.clustering_index)
before saving?

@mdouze
Copy link
Author

mdouze commented Dec 19, 2019

Saving to GPU is not supported. Any index has to be converted to CPU first.
The clustering_index is not stored when saving. It is intended only for use at training time.

@mylyu
Copy link

mylyu commented Dec 19, 2019 via email

@mylyu
Copy link

mylyu commented Dec 24, 2019 via email

@naveenkumarmarri
Copy link

naveenkumarmarri commented Mar 17, 2020

@mdouze

index = faiss.index_factory(2048, "PCA64,IVF1048576_HNSW32,Flat")
xt = faiss.rand((140000000, 2048))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/anaconda3/envs/pyt1.4/lib/python3.7/site-packages/faiss/__init__.py", line 595, in rand
    res = np.empty(n, dtype='float32')
MemoryError: Unable to allocate 1.04 TiB for an array with shape (140000000, 2048) and data type float32

Getting memory error when adding 140M data points, which is the size of my dataset.
Any tips to overcome this? Even if I add 50% of the data 70M points, I get memory error

@QwertyJack
Copy link

Your dataset requires 1 TiB ram! Try to reduce the dimension as well as the nlist, or you will never get this index trained.

@vigneshmj1997
Copy link

My dataset isnt getting trained on cpu rather that gpu inspite of using
index_ivf = faiss.extract_index_ivf(index2) clustering_index = faiss.index_cpu_to_all_gpus(faiss.IndexFlatL2(64)) index_ivf.clustering_index = clustering_index
Can any one help me

@anoubhav
Copy link

anoubhav commented Jan 18, 2022

Can we use the GPU version of the Binary Flat index as the clustering index for the binary indexes? Like below:
faiss.index_cpu_to_all_gpus(faiss.IndexBinaryFlat(d))

@rajharshiitb
Copy link

My dataset isnt getting trained on cpu rather that gpu inspite of using index_ivf = faiss.extract_index_ivf(index2) clustering_index = faiss.index_cpu_to_all_gpus(faiss.IndexFlatL2(64)) index_ivf.clustering_index = clustering_index Can any one help me

Do you have faiss-gpu installed? What do you get for faiss.get_num_gpu()?

@fferroni
Copy link

fferroni commented Apr 13, 2023

Hi @mdouze , is it correct that something like:
"PCAR1024,IVF262144_HNSW32,PQ512x8" (with inner product)
cannot be trained on the GPU?
https://gist.github.com/mdouze/46d6bbbaabca0b9778fca37ed2bcccf6?permalink_comment_id=3111847#gistcomment-3111847
You mention that PQ quantisation is not implemented?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment