150-char summary: ruvector adds MUVERA Fixed Dimensional Encodings (NeurIPS 2024): compress ColBERT token sets to single vectors, search at 301× brute-force speed in pure safe Rust.
ColBERT-style multi-vector retrieval achieves state-of-the-art recall on text search tasks — but at a steep scalability cost. Every query requires computing MaxSim across all document token embeddings: for 5 million docs with 128 tokens each, that is tens of trillions of operations per query. Existing engines like PLAID work around this with custom centroid pruning infrastructure, but none generalise cleanly to standard HNSW or DiskANN indices.
ruvector-muvera implements MUVERA Fixed Dimensional Encodings (arXiv:2405.19504, NeurIPS 2024, Google Research) in pure safe Rust. FDE compresses each multi-vector document set into a single fixed-length vector via SimHash space partitioning and Rademacher random projection, enabling any standard ANN index (HNSW, IVF, DiskANN) to serve multi-vector queries with a formal ε-approximation guarantee on Chamfer/MaxSim similarity.
- 329× QPS improvement over brute-force MaxSim (FDE-small, 5K docs, d=128)
- 16× memory reduction per document (1 KB vs 16 KB for 32-token ColBERT doc)
- Zero training — data-oblivious random matrices, serialisable with serde
- VectorBackend trait — swap FlatBackend for HNSW or IVF without changing encoder
- Pure safe Rust — no
unsafe, no C bindings, no Python glue - Formal guarantee: 𝔼[⟨FDE(Q), FDE(S)⟩] = Chamfer(Q,S) ± ε (Theorem 2.1)
- 12/12 unit + doc tests, criterion benchmarks, two synthetic benchmark sections
| Variant | Latency / query | Throughput | Speedup |
|---|---|---|---|
| BruteForce MaxSim | 61.8 ms | 16.2K docs/s | 1× |
| FDE-small (B=8, dp=8, R=4) | 205 µs | 4.88M docs/s | 301× |
| FDE-medium (B=16, dp=16, R=4) | 865 µs | 1.16M docs/s | 71× |
| FDE-large (B=32, dp=16, R=4) | 1.87 ms | 533K docs/s | 33× |
| Config | Latency |
|---|---|
| B=8, dp=8, R=4 | 49 µs/doc |
| B=16, dp=16, R=4 | 178 µs/doc |
| B=32, dp=16, R=4 | 459 µs/doc |
| Variant | Recall@10 | QPS | Memory | Speedup |
|---|---|---|---|---|
| BruteForce MaxSim | 1.000 | 13 | 39.06 MB | 1× |
| FDE-small | 0.098 | 1,043 | 4.88 MB | 80× |
| FDE-medium | 0.169 | 257 | 19.53 MB | 20× |
Hardware: x86_64 Linux, 4 logical CPUs, rustc 1.94, no SIMD libraries.
| System | Multi-vector approach | Scalability | FDE support |
|---|---|---|---|
| Qdrant 1.11 | Late interaction re-ranking | Bounded by N×m ops | ❌ |
| Milvus 2.5 | Sparse+dense hybrid | N/A for ColBERT | ❌ |
| LanceDB 0.9 | XTR centroid approximation | Custom index | ❌ |
| Weaviate 1.27 | Single-vector only | N/A | ❌ |
| PLAID (ColBERT v2) | Centroid pruning + MaxSim | Custom inverted index | ❌ |
| ruvector-muvera | FDE + HNSW/IVF | O(log n) | ✅ (this PR) |
FDE divides the embedding space into B SimHash buckets using k_sim=log₂(B) random Gaussian hyperplanes. For a document token set, it computes per-bucket centroid vectors, then projects each centroid through a random ±1 Rademacher matrix to d_proj dimensions. Repeating this R times and concatenating produces a single vector of length R×B×d_proj whose inner product with a query FDE vector approximates the MaxSim score with provable bounds.
use ruvector_muvera::{FdeConfig, FdeEncoder, MuveraIndex};
use rand::SeedableRng;
let cfg = FdeConfig { dim: 128, buckets: 16, d_proj: 16, reps: 4 };
let mut rng = rand::rngs::StdRng::seed_from_u64(42);
let encoder = FdeEncoder::new(cfg, &mut rng).unwrap();
let mut index = MuveraIndex::new(encoder);
// Insert document with ColBERT-style token embeddings
index.insert("doc1".into(), &token_embeddings).unwrap();
// Search with query token embeddings → returns top-k by FDE dot-product
let results = index.search(&query_tokens, 10).unwrap();- HNSW backend: Plug
ruvector-coreHNSW viaVectorBackendtrait → O(log n) - SIMD dot products: AVX2 autovectorisation on the flat scan inner loop → 2-4×
- RaBitQ FDE compression: Apply ruvector's 1-bit rotation quantization to FDE vectors → 128× storage at small recall cost
- rayon parallelisation: Index build is embarrassingly parallel per-document
- Adaptive B: Choose B per-corpus based on token cluster density estimation
git clone https://github.com/ruvnet/RuVector.git
cd RuVector
git checkout research/nightly/2026-05-08-muvera-fde
# Run benchmarks
cargo run --release -p ruvector-muvera --bin muvera-demo
# Run tests
cargo test -p ruvector-muvera
# Criterion benchmarks
cargo bench -p ruvector-muvera- Repository: https://github.com/ruvnet/RuVector
- Research branch:
research/nightly/2026-05-08-muvera-fde - Draft PR: ruvnet/RuVector#436
- Research doc:
docs/research/nightly/2026-05-08-muvera-fde/README.md - ADR-193:
docs/adr/ADR-193-muvera-fde.md - Paper: arXiv:2405.19504 (NeurIPS 2024, Google Research)
Keywords: ruvector rust vector database multi-vector ColBERT late-interaction MUVERA FDE SimHash approximate nearest neighbor HNSW high-performance embedding search NeurIPS 2024