Skip to content

Instantly share code, notes, and snippets.

@mdouze
Created February 26, 2021 11:03
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save mdouze/7331e6fc1da2334f30706b9b9962068b to your computer and use it in GitHub Desktop.
Save mdouze/7331e6fc1da2334f30706b9b9962068b to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@splinter21
Copy link

Is it faster to successively query each small index or to query the merged large index( When the CPU is fully used)?

@Sankalp991
Copy link

@mdouze this doesn't work with multiple index. it only considers first index while searching hence the assertion is failing. Do you have a solution for it?

@mdouze
Copy link
Author

mdouze commented Dec 20, 2022

What do you mean with multiple index?

@Sankalp991
Copy link

@mdouze The issue I'm encountering is if I give index_1 file, index_2 file , and index_3 file, if I serve them individually, the results are spread across them. After running the merging procedure I would expect the results to be the same as you also mentioned in your code (in cell 7 # make sure the results are the same) . However I see that tendentially, the search return items only included in the index_1 file (not in index_2 file and index_3 file) which eventually leads to assert error i.e. assert np.all(sI == I[i]). If I start with index_2 here (in cell 6 : faiss.extract_index_ivf(index).invlists ) then only it returns item from index_2. It seems like after merging, it just considers the first index file it encounters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment