Skip to content

Instantly share code, notes, and snippets.

View Co-Messi's full-sized avatar
🌴
On vacation

Brayden Siew Co-Messi

🌴
On vacation
View GitHub Profile
@Co-Messi
Co-Messi / perplexity-deep-research-architecture.md
Created April 10, 2026 11:27
A complete architectural teardown of how Perplexity's deep research pipeline works — covering RAG orchestration, hybrid retrieval, multi-stage reranking, citation binding, Deep Research vs Standard mode, context window strategy, session memory, and a practical MVP-to-moat rebuild plan with open-source component recommendations.

Perplexity AI — Teardown and Rebuild Plan

A complete architecture reference for building a Perplexity-class AI search agent


Executive Summary

Perplexity is not a smarter model. It is a disciplined Retrieval-Augmented Generation (RAG) pipeline that treats retrieval, source ranking, and inline citation as first-class engineering concerns — not afterthoughts bolted onto a chatbot. The underlying LLMs it uses (GPT-4, Claude, Gemini, its own Sonar) are the same families everyone else has access to. What differentiates it is the orchestration layer around those models.

A competing system does not require secret prompts or proprietary models. It requires robust query analysis, hybrid retrieval (BM25 + dense), multi-layer reranking, structured prompt assembly with embedded citations, constrained LLM generation, and tight observability around citation quality and latency. This is non-trivial engineering, but it is all reproducible with off-the-shelf components.