-
-
Save mdouze/46d6bbbaabca0b9778fca37ed2bcccf6 to your computer and use it in GitHub Desktop.
Yes, use the same code.
I've tried so but it complains with error 'not implemented ...'
Help please.
Could you please attach a minimal snippet of code using GPU to train/index/search a binary IVF?
Thanks in advance.
It should work, see https://gist.github.com/mdouze/dd11f1ebd2f1c2f3bcd74beee303e513
hi, @mdouze, can you explain what is the 64 in faiss.IndexFlatL2(64)?
@pandasMX that is d
for dimension, which I think is a typo and should be 128 instead.
@pandasMX that is
d
for dimension, which I think is a typo and should be 128 instead.
it's not a typo, there's a PCA64
in the index, so the IVF index takes 64dim vectors.
@pandasMX that is
d
for dimension, which I think is a typo and should be 128 instead.it's not a typo, there's a
PCA64
in the index, so the IVF index takes 64dim vectors.
I seeeeeee ... You are right !
Am I correct that for "OPQ64_128,IVF16384_HNSW32,PQ64", we should still use
clustering_index = faiss.index_cpu_to_all_gpus(faiss.IndexFlatL2(64))
rather than something like
clustering_index = faiss.index_cpu_to_all_gpus(faiss.IndexPQ(64))
?
For this you should use
clustering_index = faiss.index_cpu_to_all_gpus(faiss.IndexFlatL2(128))
because the intermediate dimension after the OPQ transform is 128 dim (OPQ64_128
).
IndexPQ
is not faster than FlatL2
and is not implemented on GPU anyways.
@mdouze How should I save such index of mixed gpu and cpu parts? I tried directly using write_index, there was no error. Then I reloaded the index and added vectors on a cpu machine, but search was extremely slow. I checked
il = faiss.extract_index_ivf(index).invlists
list_sizes = [il.list_size(i) for i in range(il.nlist)]
The cluster sizes are very imbalanced.
Should I use something like:
index_ivf.clustering_index = index_gpu_to_cpu(index_ivf.clustering_index)
before saving?
Saving to GPU is not supported. Any index has to be converted to CPU first.
The clustering_index
is not stored when saving. It is intended only for use at training time.
index = faiss.index_factory(2048, "PCA64,IVF1048576_HNSW32,Flat")
xt = faiss.rand((140000000, 2048))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ubuntu/anaconda3/envs/pyt1.4/lib/python3.7/site-packages/faiss/__init__.py", line 595, in rand
res = np.empty(n, dtype='float32')
MemoryError: Unable to allocate 1.04 TiB for an array with shape (140000000, 2048) and data type float32
Getting memory error when adding 140M data points, which is the size of my dataset.
Any tips to overcome this? Even if I add 50% of the data 70M points, I get memory error
Your dataset requires 1 TiB ram! Try to reduce the dimension as well as the nlist
, or you will never get this index trained.
My dataset isnt getting trained on cpu rather that gpu inspite of using
index_ivf = faiss.extract_index_ivf(index2) clustering_index = faiss.index_cpu_to_all_gpus(faiss.IndexFlatL2(64)) index_ivf.clustering_index = clustering_index
Can any one help me
Can we use the GPU version of the Binary Flat index as the clustering index for the binary indexes? Like below:
faiss.index_cpu_to_all_gpus(faiss.IndexBinaryFlat(d))
My dataset isnt getting trained on cpu rather that gpu inspite of using
index_ivf = faiss.extract_index_ivf(index2) clustering_index = faiss.index_cpu_to_all_gpus(faiss.IndexFlatL2(64)) index_ivf.clustering_index = clustering_index
Can any one help me
Do you have faiss-gpu installed? What do you get for faiss.get_num_gpu()?
Hi @mdouze , is it correct that something like:
"PCAR1024,IVF262144_HNSW32,PQ512x8" (with inner product)
cannot be trained on the GPU?
https://gist.github.com/mdouze/46d6bbbaabca0b9778fca37ed2bcccf6?permalink_comment_id=3111847#gistcomment-3111847
You mention that PQ quantisation is not implemented?
hi,dear
my dataset dim=768 and num=1billion
and what type of ivf PQ is good for the index in CPU ?
thx
@mdouze
the up code
index2 could be write directly ?
such as faiss.write_index(index2,"index2.idx") ?
Is it possible to train a binary IVF index with gpu?