Skip to content

Instantly share code, notes, and snippets.

@AaradhyaSaxena
Last active April 14, 2024 16:02
Show Gist options
  • Save AaradhyaSaxena/4dc701739d941e811efe8ac80eb39147 to your computer and use it in GitHub Desktop.
Save AaradhyaSaxena/4dc701739d941e811efe8ac80eb39147 to your computer and use it in GitHub Desktop.
qdrant-vector-db

# Qdrant

Vector databases are a relatively new way for interacting with abstract data representations derived from opaque machine learning models such as deep learning architectures. These representations are often called vectors or embeddings and they are a compressed version of the data used to train a machine learning model to accomplish a task like sentiment analysis, speech recognition, object detection, and many others.

These new databases shine in many applications like semantic search and recommendation systems, and here, we’ll learn about one of the most popular and fastest growing vector databases in the market, Qdrant.

Screenshot 2023-11-04 at 5 34 18 PM

Concepts

  1. Collection
  2. Payload
  3. Point
  4. Search
  5. Filtering
  6. Optimizer
  7. Storage
  8. Indexing
  9. Snapshots

Collections

A collection is a named set of points (vectors with a payload) among which you can search. The vector of each point within the same collection must have the same dimensionality and be compared by a single metric. Named vectors can be used to have multiple vectors in a single point, each of which can have their own dimensionality and metric requirements.

When should you create multiple collections?

  • When you have a limited number of users and you need isolation. This approach is flexible, but it may be more costly, since creating numerous collections may result in resource overhead. Also, you need to ensure that they do not affect each other in any way, including performance-wise.

Collection with multiple vectors

It is possible to have multiple vectors per record. This feature allows for multiple vector storages per collection. To distinguish vectors in one record, they should have a unique name defined when creating the collection. Each named vector in this mode has its distance and size.

Payload

One of the significant features of Qdrant is the ability to store additional information along with vectors. This information is called payload in Qdrant terminology.

Payload types

In addition to storing payloads, Qdrant also allows you search based on certain kinds of values. This feature is implemented as additional filters during the search and will enable you to incorporate custom logic on top of semantic similarity.

Payload indexing

To search more efficiently with filters, Qdrant allows you to create indexes for payload fields by specifying the name and type of field it is intended to be. The indexed fields also affect the vector index.

In practice, we recommend creating an index on those fields that could potentially constrain the results the most. For example, using an index for the object ID will be much more efficient, being unique for each record, than an index by its color, which has only a few possible values.

Points

The points are the central entity that Qdrant operates with. A point is a record consisting of a vector and an optional payload. You can search among the points grouped in one collection based on vector similarity.

Any point modification operation is asynchronous and takes place in 2 steps.

  • At the first stage, the operation is written to the Write-ahead-log.
  • After this moment, the service will not lose the data, even if the machine loses power supply.

Upload points

  • The Qdrant API supports two ways of creating batches - record-oriented and column-oriented.
  • Internally, these options do not differ and are made only for the convenience of interaction.

All APIs in Qdrant, including point loading, are idempotent. It means that executing the same method several times in a row is equivalent to a single execution.

Similarity search

Searching for the nearest vectors is at the core of many representational learning applications. Modern neural networks are trained to transform objects into vectors so that objects close in the real world appear close in vector space. It could be, for example, texts with similar meanings, visually similar pictures, or songs of the same genre.

Screenshot 2023-11-04 at 5 43 18 PM

Metrics

Qdrant counts this metric in 2 steps, due to which a higher search speed is achieved.

  • The first step is to normalize the vector when adding it to the collection. It happens only once for each vector.
  • The second step is the comparison of vectors. In this case, it becomes equivalent to dot production - a very fast operation due to SIMD.

Query planning

Depending on the filter used in the search - there are several possible scenarios for query execution. Qdrant chooses one of the query execution options depending on the available indexes, the complexity of the conditions and the cardinality of the filtering result. This process is called query planning.

The strategy selection process relies heavily on heuristics and can vary from release to release. However, the general principles are:

  • planning is performed for each segment independently
  • prefer a full scan if the amount of points is below a threshold
  • estimate the cardinality of a filtered result before selecting a strategy
  • retrieve points using payload index (see indexing) if cardinality is below threshold
  • use filterable vector index if the cardinality is above a threshold

Search API

Values under the key params specify custom parameters for the search. Currently, it could be:

  • hnsw_ef - value that specifies ef parameter of the HNSW algorithm.
  • exact - option to not use the approximate search (ANN). If set to true, the search may run for a long as it performs a full scan to retrieve exact results.
  • indexed_only - With this option you can disable the search in those segments where vector index is not built yet. This may be useful if you want to minimize the impact to the search performance whilst the collection is also being updated. Using this option may lead to a partial result if the collection is not fully indexed yet, consider using it only if eventual consistency is acceptable for your use case.

Payload and vector in the result

By default, retrieval methods do not return any stored information such as payload and vectors. Additional parameters with_vectors and with_payload alter this behavior.

Batch search API

  • The batch search API enables to perform multiple search requests via a single request.
  • Its semantic is straightforward, n batched search requests are equivalent to n singular search requests.
  • This approach has several advantages. Logically, fewer network connections are required which can be very beneficial on its own.
  • More importantly, batched requests will be efficiently processed via the query planner which can detect and optimize requests if they have the same filter.
  • This can have a great effect on latency for non trivial filters as the intermediary results can be shared among the request.

Recommendation API

In addition to the regular search, Qdrant also allows you to search based on multiple positive and negative examples. The API is called recommend, and the examples can be point IDs, so that you can leverage the already encoded objects; and, as of v1.6, you can also use raw vectors as input, so that you can create your vectors on the fly without uploading them as points.

Average vector strategy

The default and first strategy added to Qdrant is called average_vector. It preprocesses the input examples to create a single vector that is used for the search. Since the preprocessing step happens very fast, the performance of this strategy is on-par with regular search. The intuition behind this kind of recommendation is that each vector component represents an independent feature of the data, so, by averaging the examples, we should get a good recommendation.

Best score strategy

It is based on the idea that the best way to find similar vectors is to find the ones that are closer to a positive example, while avoiding the ones that are closer to a negative one. The way it works is that each candidate is measured against every example, then we select the best positive and best negative scores.

Since we are computing similarities to every example at each step of the search, the performance of this strategy will be linearly impacted by the amount of examples. This means that the more examples you provide, the slower the search will be. However, this strategy can be very powerful and should be more embedding-agnostic.

Using only negative examples

A beneficial side-effect of best_score strategy is that you can use it with only negative examples. This will allow you to find the most dissimilar vectors to the ones you provide. This can be useful for finding outliers in your data, or for finding the most dissimilar vectors to a given one. Combining negative-only examples with filtering can be a powerful tool for data exploration and cleaning.

Pagination

Vector-based retrieval in general and HNSW index in particular, are not designed to be paginated. It is impossible to retrieve Nth closest vector without retrieving the first N vectors first. However, using the offset parameter saves the resources by reducing network traffic and the number of times the storage is accessed.

Grouping API

With the groups API, you will be able to get the best N points for each document, assuming that the payload of the points contains the document ID. Of course there will be times where the best N points cannot be fulfilled due to lack of points or a big distance with respect to the query.

Filtering

With Qdrant, you can set conditions when searching or retrieving points. For example, you can impose conditions on both the payload and the id of the point.

Optimizer

It is much more efficient to apply changes in batches than perform each change individually, as many other databases do. Qdrant here is no exception. Since Qdrant operates with data structures that are not always easy to change, it is sometimes necessary to rebuild those structures completely.

Screenshot 2023-11-04 at 5 49 07 PM

Vacuum Optimizer

The simplest example of a case where you need to rebuild a segment repository is to remove points. Like many other databases, Qdrant does not delete entries immediately after a query. Instead, it marks records as deleted and ignores them for future queries.

This strategy allows us to minimize disk access - one of the slowest operations. However, a side effect of this strategy is that, over time, deleted records accumulate, occupy memory and slow down the system.

To avoid these adverse effects, Vacuum Optimizer is used. It is used if the segment has accumulated too many deleted records.

Merge Optimizer

The service may require the creation of temporary segments. Such segments, for example, are created as copy-on-write segments during optimization itself.

It is also essential to have at least one small segment that Qdrant will use to store frequently updated data. On the other hand, too many small segments lead to suboptimal search performance. There is the Merge Optimizer, which combines the smallest segments into one large segment. It is used if too many segments are created.

Indexing Optimizer

Qdrant allows you to choose the type of indexes and data storage methods used depending on the number of records. So, for example, if the number of points is less than 10000, using any index would be less efficient than a brute force scan. The Indexing Optimizer is used to implement the enabling of indexes and memmap storage when the minimal amount of records is reached.

Storage

All data within one collection is divided into segments. Each segment has its independent vector and payload storage as well as indexes. Data stored in segments usually do not overlap. However, storing the same point in different segments will not cause problems since the search contains a deduplication mechanism.

The segments consist of vector and payload storages, vector and payload indexes, and id mapper, which stores the relationship between internal and external ids. A segment can be appendable or non-appendable depending on the type of storage and index used. You can freely add, delete and query data in the appendable segment. With non-appendable segment can only read and delete data. The configuration of the segments in the collection can be different and independent of one another, but at least one appendable segment must be present in a collection.

Vector storage

Depending on the requirements of the application, Qdrant can use one of the data storage options. The choice has to be made between the search speed and the size of the RAM used.

  • In-memory storage - Stores all vectors in RAM, has the highest speed since disk access is required only for persistence.
  • Memmap storage - Creates a virtual address space associated with the file on disk. Mmapped files are not directly loaded into RAM. Instead, they use page cache to access the contents of the file. This scheme allows flexible use of available memory. With sufficient RAM, it is almost as fast as in-memory storage.

Versioning

To ensure data integrity, Qdrant performs all data changes in 2 stages. In the first step, the data is written to the Write-ahead-log(WAL), which orders all operations and assigns them a sequential number.

Indexing

A key feature of Qdrant is the effective combination of vector and traditional indexes. It is essential to have this because for vector search to work effectively with filters, having vector index only is not enough. In simpler terms, a vector index speeds up vector search, and payload indexes speed up filtering.

Payload Index

Payload index in Qdrant is similar to the index in conventional document-oriented databases. This index is built for a specific field and type, and is used for quick point requests by the corresponding filtering condition. The index is also used to accurately estimate the filter cardinality, which helps the query planning choose a search strategy.

Payload index may occupy some additional memory, so it is recommended to only use index for those fields that are used in filtering conditions. If you need to filter by many fields and the memory limits does not allow to index all of them, it is recommended to choose the field that limits the search result the most.

Vector Index

A vector index is a data structure built on vectors through a specific mathematical model. Through the vector index, we can efficiently query several vectors similar to the target vector. Qdrant currently only uses HNSW as a vector index. HNSW (Hierarchical Navigable Small World Graph) is a graph-based indexing algorithm. It builds a multi-layer navigation structure for an image according to certain rules. In this structure, the upper layers are more sparse and the distances between nodes are farther. The lower layers are denser and the distances between nodes are closer. The search starts from the uppermost layer, finds the node closest to the target in this layer, and then enters the next layer to begin another search. After multiple iterations, it can quickly approach the target position.

Snapshots

Snapshots are performed on a per-collection basis and consist in a tar archive file containing the necessary data to restore the collection at the time of the snapshot.

# HNSW

Graph based nearest neighbors

  1. Contruct a proximity graph (connect node to its closest neighbor)
  2. Given a query (Q), we choose a random initial node and make greedy steps towards the query on the graph.
  3. On each step on the graph, we evaluate the neighbors of the current element. and pick the one closest to the query.
  4. We proceed until we cannot make anymore steps.
  5. We might find the nearest neighbor or we get stuck in the local minima.

Screenshot 2023-11-05 at 1 07 52 PM

HNSW

Screenshot 2023-11-05 at 1 50 00 PM

  1. This has multi-layer graph structure.
  2. Lowest layer contain all the elements.
  3. In order to create each next layer, we subsample some fraction of elements from the previous layer.
  4. At each layer we create proximity graphs.
  5. The search starts at the top layer and greedily traverses the elements until a local minima is reached.
  6. Then we switch to the lower layer.
  7. COntinue, Repeat!

Other techniques:

  1. Graph Diversification
  2. Beam Search
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment