The core idea in simple terms in the ColPali paper implementation of late-interaction is:
-
We have query embeddings: an array with n vectors, each of size 128 (floats).
-
We have document embeddings: an array with 1038 vectors, each of size 128 (floats).
-
For each query vector (n), we find the most similar document vector by computing the dot product. Here is simple code for straight-forward dot product similarity.
import numpy as np