The core idea in simple terms in the ColPali paper implementation of late-interaction is:
-
We have query embeddings: an array with n vectors, each of size 128 (floats).
-
We have document embeddings: an array with 1038 vectors, each of size 128 (floats).
-
For each query vector (n), we find the most similar document vector by computing the dot product. Here is simple code for straight-forward dot product similarity.
import numpy as np
# Define the query vector
query = np.array([1, 2, 3])
# Define the document vectors
document_a = np.array([4, 5, 6])
document_b = np.array([7, 8, 0])
# Compute the dot product between the query and each document
dot_product_a = np.dot(query, document_a) # 1*4 + 2*5 + 3*6 = 32
dot_product_b = np.dot(query, document_b) # 1*7 + 2*8 + 3*0 = 23
# Output the results
print(f"Dot product between query and document a: {dot_product_a}")
print(f"Dot product between query and document b: {dot_product_b}")
# document a is more similar than document b to our query
Late interaction works like this: take each query vector, find which document vector it matches best with (using dot product), and add up all these best matches. Here's a simple example that shows exactly how this works:
import numpy as np
# One query generates multiple vectors (each vector is 128 floats)
query_vector_1 = np.array([1, 2, 3]) # simplified from 128 floats
query_vector_2 = np.array([0, 1, 1]) # simplified from 128 floats
query = [query_vector_1, query_vector_2] # one query -> n vectors
# One document generates multiple vectors (each vector is 128 floats)
document_vector_1 = np.array([4, 5, 6]) # simplified from 128 floats
document_vector_2 = np.array([7, 8, 0]) # simplified from 128 floats
document_vector_3 = np.array([1, 1, 1]) # simplified from 128 floats
document = [document_vector_1, document_vector_2, document_vector_3] # one document -> 1038 vectors
# For each vector in our query, find its maximum dot product with ANY vector in the document
# Then sum these maximums
# First query vector against ALL document vectors
dot_products_vector1 = [
np.dot(query_vector_1, document_vector_1), # 1*4 + 2*5 + 3*6 = 32
np.dot(query_vector_1, document_vector_2), # 1*7 + 2*8 + 3*0 = 23
np.dot(query_vector_1, document_vector_3) # 1*1 + 2*1 + 3*1 = 6
]
max_similarity_vector1 = max(dot_products_vector1) # 32
# Second query vector against ALL document vectors
dot_products_vector2 = [
np.dot(query_vector_2, document_vector_1), # 0*4 + 1*5 + 1*6 = 11
np.dot(query_vector_2, document_vector_2), # 0*7 + 1*8 + 1*0 = 8
np.dot(query_vector_2, document_vector_3) # 0*1 + 1*1 + 1*1 = 2
]
max_similarity_vector2 = max(dot_products_vector2) # 11
# Final similarity score is the sum of maximum similarities
final_score = max_similarity_vector1 + max_similarity_vector2 # 32 + 11 = 43
print(f"Final similarity score: {final_score}")
Let's break down the scale. For each comparison:
-
Query: n vectors
-
Document: 1038 vectors per page
-
Corpus: 500 pages
So for ONE query against ONE page we're doing: n * 1038 vector comparisons. And for a 500 pages corpus: n * 1038 * 500 vector comparisons
This creates a massive cartesian product. For example, if n=10 (for query):
-
10,380 comparisons per page
-
5,190,000 comparisons for 500 pages
This is quadratic complexity O(n*m) where:
-
n = number of query vectors
-
m = number of document vectors