Model | Type | Strengths | Weaknesses | Best Use Case |
---|---|---|---|---|
DINOv2 (Meta, 2023) | Vision-only self-supervised | - Excellent pure visual embeddings (objects, scenes) - Strong on fine-grained features - Great for clustering & recognition | - No text alignment (needs external captioner + text embedder)- Very heavy (ViT-g/14 = 1B+ params) | Use when you need best visual understanding (object-level indexing) and can afford GPU compute. |
CLIP (OpenAI, 2021) | Vision–Language joint model | - Strong cross-modal embeddings (text ↔ image) - Efficient for retrieval - Smaller versions available (ViT-B/32, ViT-L/14) - Widely adopted, easy to use | - Slightly weaker than DINOv2 in pure visual features - Captions less descriptive than BLIP | Best all-rounder for retrieval. Works well when you want efficient, deployable text + image search. |
BLIP / BLIP-2 (Salesforce, 2022) | Captioning + Vision–Language | - Excellent caption generation - Good at VQA and multimodal tasks - Captions can |
Feature | Query Transformation | Multi-Query Generation | Query Decomposition |
---|---|---|---|
What it does | Rewrites a single query | Creates multiple query variants | Splits query into sub-questions |
Goal | Improve phrasing for better match | Broaden retrieval coverage | Handle complexity, improve accuracy |
Input | One original query | One original query | One multi-part or complex query |
Output | One transformed query | Several alternative queries | Several focused sub-questions |
When to use | Ambiguous or informal queries | Low recall or vocabulary mismatch | Multi-part or reasoning-heavy queries |
Pitfall | Issue | Solution |
---|---|---|
Embedding chunks too large | Silent truncation reduces accuracy. | Limit chunks to embedding model's maximum length. |
Zero overlap | Sentences split across chunks. | Use recommended overlaps (10–15%). |
Too many parent chunks per query | Introduces noise, reducing LLM accuracy. | Keep chunks-per-query (k) around 4–8. |
No reserved context space for the LLM’s response | Risk of truncation or rejected prompts. | Always reserve ~30% of context space. |
Child | Token range (start → end) |
Effective length |
---|---|---|
1 | 0 → 399 | 400 |
2 | 339 → 474 | 136 |
Level | Recommended Overlap | Quick Formula |
---|---|---|
Parent | 8 – 12% | round(parent_chunk × 0.10) |
Child | 12 – 18% | round(child_chunk × 0.15) |
Term | How to pick it | Why |
---|---|---|
usable_window |
(context_window × 0.70) (reserve 30 % for system prompt, user question, and the model’s answer) |
Prevents context overflow and truncation. |
k |
Your chosen chunks-per-query (e.g. 6) | Spreads the available space evenly. |
Parameter | Example Value | Source of Information |
---|---|---|
LLM context window | phi-3-mini-4k (4,096 tokens) |
Model configuration (max_position_embeddings ) |
Embedding-model limit | BAAI/bge-large-en-v1.5 (512 tokens) |
Model card (max_seq_length ) |
Chunks-per-query (k) | Typically 6 parent chunks per user query | Pipeline design (commonly between 4–8) |
Level | Purpose in the Pipeline | Typical Size | Storage Location |
---|---|---|---|
Parent chunk | The passage sent directly to the LLM prompt. | ≈ 1,000 – 2,000 tokens | Stored in NoSQL (MongoDB) |
Child chunk | Smaller slices used for embedding and search. | ≈ 300 – 500 tokens | Stored in vector DB |
Tool / Library | What It Is | How It Helps in This Architecture |
---|---|---|
MediatR | Lightweight in-process mediator (implements the Mediator pattern). | Dispatches Commands, Queries, and Domain Events without tight coupling. Keeps the Application layer free of direct service references—perfect for CQRS and Clean Architecture. |
MassTransit |
Architectural Part | Where It Appears | Key Insight |
---|---|---|
DDD | Account.Deposit + FundsDeposited |
Invariant & business language live in the aggregate. |
Clean Architecture | DepositFundsHandler |
Application layer orchestrates use-case; no framework coupling. |
Event-Driven (EDA) | PublishAsync + projector |
Domain event travels via MediatR/outbox → Kafka → projector. |
CQRS | Write side = Account aggregate; read side = BalanceProjectionDb |
Fast, denormalised reads with eventual consistency. |
NewerOlder