Skip to content

Instantly share code, notes, and snippets.

@Jonathan-Adly
Created December 2, 2024 16:18
Show Gist options
  • Select an option

  • Save Jonathan-Adly/cb905cc3958b973d7b3b6f25d9915c39 to your computer and use it in GitHub Desktop.

Select an option

Save Jonathan-Adly/cb905cc3958b973d7b3b6f25d9915c39 to your computer and use it in GitHub Desktop.

The core idea in simple terms in the ColPali paper implementation of late-interaction is:

  • We have query embeddings: an array with n vectors, each of size 128 (floats).

  • We have document embeddings: an array with 1038 vectors, each of size 128 (floats).

  • For each query vector (n), we find the most similar document vector by computing the dot product. Here is simple code for straight-forward dot product similarity.

import numpy as np

# Define the query vector
query = np.array([1, 2, 3])

# Define the document vectors
document_a = np.array([4, 5, 6])
document_b = np.array([7, 8, 0])

# Compute the dot product between the query and each document
dot_product_a = np.dot(query, document_a)  # 1*4 + 2*5 + 3*6 = 32
dot_product_b = np.dot(query, document_b)  # 1*7 + 2*8 + 3*0 = 23

# Output the results
print(f"Dot product between query and document a: {dot_product_a}")
print(f"Dot product between query and document b: {dot_product_b}")
# document a is more similar than document b to our query

Late interaction works like this: take each query vector, find which document vector it matches best with (using dot product), and add up all these best matches. Here's a simple example that shows exactly how this works:

import numpy as np

# One query generates multiple vectors (each vector is 128 floats)
query_vector_1 = np.array([1, 2, 3])  # simplified from 128 floats
query_vector_2 = np.array([0, 1, 1])  # simplified from 128 floats
query = [query_vector_1, query_vector_2]  # one query -> n vectors

# One document generates multiple vectors (each vector is 128 floats)
document_vector_1 = np.array([4, 5, 6])  # simplified from 128 floats
document_vector_2 = np.array([7, 8, 0])  # simplified from 128 floats
document_vector_3 = np.array([1, 1, 1])  # simplified from 128 floats
document = [document_vector_1, document_vector_2, document_vector_3]  # one document -> 1038 vectors

# For each vector in our query, find its maximum dot product with ANY vector in the document
# Then sum these maximums

# First query vector against ALL document vectors
dot_products_vector1 = [
    np.dot(query_vector_1, document_vector_1),  # 1*4 + 2*5 + 3*6 = 32
    np.dot(query_vector_1, document_vector_2),  # 1*7 + 2*8 + 3*0 = 23
    np.dot(query_vector_1, document_vector_3)   # 1*1 + 2*1 + 3*1 = 6
]
max_similarity_vector1 = max(dot_products_vector1)  # 32

# Second query vector against ALL document vectors
dot_products_vector2 = [
    np.dot(query_vector_2, document_vector_1),  # 0*4 + 1*5 + 1*6 = 11
    np.dot(query_vector_2, document_vector_2),  # 0*7 + 1*8 + 1*0 = 8
    np.dot(query_vector_2, document_vector_3)   # 0*1 + 1*1 + 1*1 = 2
]
max_similarity_vector2 = max(dot_products_vector2)  # 11

# Final similarity score is the sum of maximum similarities
final_score = max_similarity_vector1 + max_similarity_vector2  # 32 + 11 = 43

print(f"Final similarity score: {final_score}")

Let's break down the scale. For each comparison:

  • Query: n vectors

  • Document: 1038 vectors per page

  • Corpus: 500 pages

So for ONE query against ONE page we're doing: n * 1038 vector comparisons. And for a 500 pages corpus: n * 1038 * 500 vector comparisons

This creates a massive cartesian product. For example, if n=10 (for query):

  • 10,380 comparisons per page

  • 5,190,000 comparisons for 500 pages

This is quadratic complexity O(n*m) where:

  • n = number of query vectors

  • m = number of document vectors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment