-
-
Save mdouze/46d6bbbaabca0b9778fca37ed2bcccf6 to your computer and use it in GitHub Desktop.
@pandasMX that is
d
for dimension, which I think is a typo and should be 128 instead.it's not a typo, there's a
PCA64
in the index, so the IVF index takes 64dim vectors.
I seeeeeee ... You are right !
Am I correct that for "OPQ64_128,IVF16384_HNSW32,PQ64", we should still use
clustering_index = faiss.index_cpu_to_all_gpus(faiss.IndexFlatL2(64))
rather than something like
clustering_index = faiss.index_cpu_to_all_gpus(faiss.IndexPQ(64))
?
For this you should use
clustering_index = faiss.index_cpu_to_all_gpus(faiss.IndexFlatL2(128))
because the intermediate dimension after the OPQ transform is 128 dim (OPQ64_128
).
IndexPQ
is not faster than FlatL2
and is not implemented on GPU anyways.
@mdouze How should I save such index of mixed gpu and cpu parts? I tried directly using write_index, there was no error. Then I reloaded the index and added vectors on a cpu machine, but search was extremely slow. I checked
il = faiss.extract_index_ivf(index).invlists
list_sizes = [il.list_size(i) for i in range(il.nlist)]
The cluster sizes are very imbalanced.
Should I use something like:
index_ivf.clustering_index = index_gpu_to_cpu(index_ivf.clustering_index)
before saving?
Saving to GPU is not supported. Any index has to be converted to CPU first.
The clustering_index
is not stored when saving. It is intended only for use at training time.
index = faiss.index_factory(2048, "PCA64,IVF1048576_HNSW32,Flat")
xt = faiss.rand((140000000, 2048))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ubuntu/anaconda3/envs/pyt1.4/lib/python3.7/site-packages/faiss/__init__.py", line 595, in rand
res = np.empty(n, dtype='float32')
MemoryError: Unable to allocate 1.04 TiB for an array with shape (140000000, 2048) and data type float32
Getting memory error when adding 140M data points, which is the size of my dataset.
Any tips to overcome this? Even if I add 50% of the data 70M points, I get memory error
Your dataset requires 1 TiB ram! Try to reduce the dimension as well as the nlist
, or you will never get this index trained.
My dataset isnt getting trained on cpu rather that gpu inspite of using
index_ivf = faiss.extract_index_ivf(index2) clustering_index = faiss.index_cpu_to_all_gpus(faiss.IndexFlatL2(64)) index_ivf.clustering_index = clustering_index
Can any one help me
Can we use the GPU version of the Binary Flat index as the clustering index for the binary indexes? Like below:
faiss.index_cpu_to_all_gpus(faiss.IndexBinaryFlat(d))
My dataset isnt getting trained on cpu rather that gpu inspite of using
index_ivf = faiss.extract_index_ivf(index2) clustering_index = faiss.index_cpu_to_all_gpus(faiss.IndexFlatL2(64)) index_ivf.clustering_index = clustering_index
Can any one help me
Do you have faiss-gpu installed? What do you get for faiss.get_num_gpu()?
Hi @mdouze , is it correct that something like:
"PCAR1024,IVF262144_HNSW32,PQ512x8" (with inner product)
cannot be trained on the GPU?
https://gist.github.com/mdouze/46d6bbbaabca0b9778fca37ed2bcccf6?permalink_comment_id=3111847#gistcomment-3111847
You mention that PQ quantisation is not implemented?
it's not a typo, there's a
PCA64
in the index, so the IVF index takes 64dim vectors.