Last active
February 11, 2024 16:08
-
-
Save avidale/c6b19687d333655da483421880441950 to your computer and use it in GitHub Desktop.
bert_knn.ipynb
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi @avidale ! Thanks for the answer!
I tried converting the vectors to float16 it does help to reduce the size but not that much as I am working with large dataset.
I tried the second approach of Faiss, it worked good when I tried Flat index, so I can add the index incrementally. But on saving that to disk taking lots of storage. Approx 1 GB of 15K sentences. here is what I did:
Then I tried with faiss.IndexIVFPQ, it works good, but did not works for incremental index as it needs the training data too. So need to calculate all the embeddings and then train and add. Again the size is small but its taking too much RAM that is causing issue while working with large data. here is what I did: