Skip to content

Instantly share code, notes, and snippets.

@karpathy
Created April 4, 2026 16:25
Show Gist options
  • Select an option

  • Save karpathy/442a6bf555914893e9891c11519de94f to your computer and use it in GitHub Desktop.

Select an option

Save karpathy/442a6bf555914893e9891c11519de94f to your computer and use it in GitHub Desktop.
llm-wiki

LLM Wiki

A pattern for building personal knowledge bases using LLMs.

This is an idea file, it is designed to be copy pasted to your own LLM Agent (e.g. OpenAI Codex, Claude Code, OpenCode / Pi, or etc.). Its goal is to communicate the high level idea, but your agent will build out the specifics in collaboration with you.

The core idea

Most people's experience with LLMs and documents looks like RAG: you upload a collection of files, the LLM retrieves relevant chunks at query time, and generates an answer. This works, but the LLM is rediscovering knowledge from scratch on every question. There's no accumulation. Ask a subtle question that requires synthesizing five documents, and the LLM has to find and piece together the relevant fragments every time. Nothing is built up. NotebookLM, ChatGPT file uploads, and most RAG systems work this way.

The idea here is different. Instead of just retrieving from raw documents at query time, the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files that sits between you and the raw sources. When you add a new source, the LLM doesn't just index it for later retrieval. It reads it, extracts the key information, and integrates it into the existing wiki — updating entity pages, revising topic summaries, noting where new data contradicts old claims, strengthening or challenging the evolving synthesis. The knowledge is compiled once and then kept current, not re-derived on every query.

This is the key difference: the wiki is a persistent, compounding artifact. The cross-references are already there. The contradictions have already been flagged. The synthesis already reflects everything you've read. The wiki keeps getting richer with every source you add and every question you ask.

You never (or rarely) write the wiki yourself — the LLM writes and maintains all of it. You're in charge of sourcing, exploration, and asking the right questions. The LLM does all the grunt work — the summarizing, cross-referencing, filing, and bookkeeping that makes a knowledge base actually useful over time. In practice, I have the LLM agent open on one side and Obsidian open on the other. The LLM makes edits based on our conversation, and I browse the results in real time — following links, checking the graph view, reading the updated pages. Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase.

This can apply to a lot of different contexts. A few examples:

  • Personal: tracking your own goals, health, psychology, self-improvement — filing journal entries, articles, podcast notes, and building up a structured picture of yourself over time.
  • Research: going deep on a topic over weeks or months — reading papers, articles, reports, and incrementally building a comprehensive wiki with an evolving thesis.
  • Reading a book: filing each chapter as you go, building out pages for characters, themes, plot threads, and how they connect. By the end you have a rich companion wiki. Think of fan wikis like Tolkien Gateway — thousands of interlinked pages covering characters, places, events, languages, built by a community of volunteers over years. You could build something like that personally as you read, with the LLM doing all the cross-referencing and maintenance.
  • Business/team: an internal wiki maintained by LLMs, fed by Slack threads, meeting transcripts, project documents, customer calls. Possibly with humans in the loop reviewing updates. The wiki stays current because the LLM does the maintenance that no one on the team wants to do.
  • Competitive analysis, due diligence, trip planning, course notes, hobby deep-dives — anything where you're accumulating knowledge over time and want it organized rather than scattered.

Architecture

There are three layers:

Raw sources — your curated collection of source documents. Articles, papers, images, data files. These are immutable — the LLM reads from them but never modifies them. This is your source of truth.

The wiki — a directory of LLM-generated markdown files. Summaries, entity pages, concept pages, comparisons, an overview, a synthesis. The LLM owns this layer entirely. It creates pages, updates them when new sources arrive, maintains cross-references, and keeps everything consistent. You read it; the LLM writes it.

The schema — a document (e.g. CLAUDE.md for Claude Code or AGENTS.md for Codex) that tells the LLM how the wiki is structured, what the conventions are, and what workflows to follow when ingesting sources, answering questions, or maintaining the wiki. This is the key configuration file — it's what makes the LLM a disciplined wiki maintainer rather than a generic chatbot. You and the LLM co-evolve this over time as you figure out what works for your domain.

Operations

Ingest. You drop a new source into the raw collection and tell the LLM to process it. An example flow: the LLM reads the source, discusses key takeaways with you, writes a summary page in the wiki, updates the index, updates relevant entity and concept pages across the wiki, and appends an entry to the log. A single source might touch 10-15 wiki pages. Personally I prefer to ingest sources one at a time and stay involved — I read the summaries, check the updates, and guide the LLM on what to emphasize. But you could also batch-ingest many sources at once with less supervision. It's up to you to develop the workflow that fits your style and document it in the schema for future sessions.

Query. You ask questions against the wiki. The LLM searches for relevant pages, reads them, and synthesizes an answer with citations. Answers can take different forms depending on the question — a markdown page, a comparison table, a slide deck (Marp), a chart (matplotlib), a canvas. The important insight: good answers can be filed back into the wiki as new pages. A comparison you asked for, an analysis, a connection you discovered — these are valuable and shouldn't disappear into chat history. This way your explorations compound in the knowledge base just like ingested sources do.

Lint. Periodically, ask the LLM to health-check the wiki. Look for: contradictions between pages, stale claims that newer sources have superseded, orphan pages with no inbound links, important concepts mentioned but lacking their own page, missing cross-references, data gaps that could be filled with a web search. The LLM is good at suggesting new questions to investigate and new sources to look for. This keeps the wiki healthy as it grows.

Indexing and logging

Two special files help the LLM (and you) navigate the wiki as it grows. They serve different purposes:

index.md is content-oriented. It's a catalog of everything in the wiki — each page listed with a link, a one-line summary, and optionally metadata like date or source count. Organized by category (entities, concepts, sources, etc.). The LLM updates it on every ingest. When answering a query, the LLM reads the index first to find relevant pages, then drills into them. This works surprisingly well at moderate scale (~100 sources, ~hundreds of pages) and avoids the need for embedding-based RAG infrastructure.

log.md is chronological. It's an append-only record of what happened and when — ingests, queries, lint passes. A useful tip: if each entry starts with a consistent prefix (e.g. ## [2026-04-02] ingest | Article Title), the log becomes parseable with simple unix tools — grep "^## \[" log.md | tail -5 gives you the last 5 entries. The log gives you a timeline of the wiki's evolution and helps the LLM understand what's been done recently.

Optional: CLI tools

At some point you may want to build small tools that help the LLM operate on the wiki more efficiently. A search engine over the wiki pages is the most obvious one — at small scale the index file is enough, but as the wiki grows you want proper search. qmd is a good option: it's a local search engine for markdown files with hybrid BM25/vector search and LLM re-ranking, all on-device. It has both a CLI (so the LLM can shell out to it) and an MCP server (so the LLM can use it as a native tool). You could also build something simpler yourself — the LLM can help you vibe-code a naive search script as the need arises.

Tips and tricks

  • Obsidian Web Clipper is a browser extension that converts web articles to markdown. Very useful for quickly getting sources into your raw collection.
  • Download images locally. In Obsidian Settings → Files and links, set "Attachment folder path" to a fixed directory (e.g. raw/assets/). Then in Settings → Hotkeys, search for "Download" to find "Download attachments for current file" and bind it to a hotkey (e.g. Ctrl+Shift+D). After clipping an article, hit the hotkey and all images get downloaded to local disk. This is optional but useful — it lets the LLM view and reference images directly instead of relying on URLs that may break. Note that LLMs can't natively read markdown with inline images in one pass — the workaround is to have the LLM read the text first, then view some or all of the referenced images separately to gain additional context. It's a bit clunky but works well enough.
  • Obsidian's graph view is the best way to see the shape of your wiki — what's connected to what, which pages are hubs, which are orphans.
  • Marp is a markdown-based slide deck format. Obsidian has a plugin for it. Useful for generating presentations directly from wiki content.
  • Dataview is an Obsidian plugin that runs queries over page frontmatter. If your LLM adds YAML frontmatter to wiki pages (tags, dates, source counts), Dataview can generate dynamic tables and lists.
  • The wiki is just a git repo of markdown files. You get version history, branching, and collaboration for free.

Why this works

The tedious part of maintaining a knowledge base is not the reading or the thinking — it's the bookkeeping. Updating cross-references, keeping summaries current, noting when new data contradicts old claims, maintaining consistency across dozens of pages. Humans abandon wikis because the maintenance burden grows faster than the value. LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass. The wiki stays maintained because the cost of maintenance is near zero.

The human's job is to curate sources, direct the analysis, ask good questions, and think about what it all means. The LLM's job is everything else.

The idea is related in spirit to Vannevar Bush's Memex (1945) — a personal, curated knowledge store with associative trails between documents. Bush's vision was closer to this than to what the web became: private, actively curated, with the connections between documents as valuable as the documents themselves. The part he couldn't solve was who does the maintenance. The LLM handles that.

Note

This document is intentionally abstract. It describes the idea, not a specific implementation. The exact directory structure, the schema conventions, the page formats, the tooling — all of that will depend on your domain, your preferences, and your LLM of choice. Everything mentioned above is optional and modular — pick what's useful, ignore what isn't. For example: your sources might be text-only, so you don't need image handling at all. Your wiki might be small enough that the index file is all you need, no search engine required. You might not care about slide decks and just want markdown pages. You might want a completely different set of output formats. The right way to use this is to share it with your LLM agent and work together to instantiate a version that fits your needs. The document's only job is to communicate the pattern. Your LLM can figure out the rest.

@Srikumar6529
Copy link
Copy Markdown

Srikumar6529 commented May 10, 2026

So it's a Canvas for LLM to scribble notes about the user, to update its context at test time, improving interactions one conversation at a time.
nice :)
But again, as the length grows, all the initial problems with LLMS show up, losing context, hallucination, etc.

We can ingest the data into the model weights after a certain threshold, so the model gets personalized in the core as the conversations pass on.

OR

We can create a personaization head that sits on top of the model during inference. This way, the model weights are not affected, and personalization happens in isolation; they can be swapped at any time if something goes wrong.

@ojuschugh1
Copy link
Copy Markdown

https://github.com/ojuschugh1/sqz

  ███████╗ ██████╗ ███████╗
  ██╔════╝██╔═══██╗╚══███╔╝
  ███████╗██║   ██║  ███╔╝
  ╚════██║██║▄▄ ██║ ███╔╝
  ███████║╚██████╔╝███████╗
  ╚══════╝ ╚══▀▀═╝ ╚══════╝
  

Compress LLM context to save tokens and reduce costs

Real session stats: 3,003 compressions · 178,442 tokens saved · 24.7% avg reduction · up to 92% with dedup

Featured

Crates.io npm PyPI VS Code Firefox JetBrains Discord Homebrew

Install · How It Works · Supported Tools · Changelog · Discord


sqz compresses command output before it reaches your LLM. Single Rust binary, zero config.

The real win is dedup: when the same file gets read 5 times in a session, sqz sends it once and returns a 13-token reference for every repeat.

Without sqz:                    With sqz:

File read #1:  2,000 tokens     File read #1:  ~800 tokens (compressed)
File read #2:  2,000 tokens     File read #2:  ~13 tokens  (dedup ref)
File read #3:  2,000 tokens     File read #3:  ~13 tokens  (dedup ref)
───────────────────────         ───────────────────────
Total:         6,000 tokens     Total:         ~826 tokens (86% saved)

Token Savings

24.7% average reduction across 3,003 real compressions ·
92% saved on repeated file reads ·
86% on shell/git output ·
13-token refs for cached content

One developer's week, measured from actual sqz gain output:

$ sqz gain
sqz token savings (last 7 days)
──────────────────────────────────────────────────
  04-13 │                              │   2,329 saved
  04-14 │                              │       0 saved
  04-15 │███                           │  12,954 saved
  04-16 │██                            │   9,223 saved
  04-17 │████                          │  14,752 saved
  04-18 │██████████████████████████████│ 105,569 saved
  04-19 │████████                      │  30,882 saved
  04-20 │█                             │   4,334 saved
──────────────────────────────────────────────────
  Total: 3,003 compressions, 178,442 tokens saved (24.7% avg reduction)

Per-command compression

Single-command compression (measured via cargo test -p sqz-engine benchmarks):

Content Before After Saved
Repeated log lines 148 62 58%
Large JSON array 259 142 45%
JSON API response 64 53 17%
Git diff 61 54 12%
Prose/docs 124 121 2%
Stack trace (safe mode) 82 82 0%

Session-level with dedup

Where the real savings live — the cache sends each file once, repeats cost 13 tokens:

Scenario Without sqz With sqz Saved
Same file read 5× 10,000 826 92%
Same JSON response 3× 192 79 59%
Test-fix-test cycle (3 runs) 15,000 5,186 65%

Single-command compression ranges from 2–58% depending on content. Repeated reads drop to 13 tokens each. Your mileage will vary with how repetitive your tool calls are — agentic sessions with many file re-reads see the biggest wins.

Install

Prebuilt binaries (no compiler required — works on every platform):

# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/ojuschugh1/sqz/main/install.sh | sh

# Windows (PowerShell)
irm https://raw.githubusercontent.com/ojuschugh1/sqz/main/install.ps1 | iex

# Any platform via npm
npm install -g sqz-cli

# macOS / Linux via Homebrew
brew tap ojuschugh1/sqz
brew install sqz

Build from source via Cargo:

cargo install sqz-cli sqz-mcp

sqz-cli provides the sqz binary; sqz-mcp provides the MCP server. sqz-engine is a library dependency — it compiles automatically and does not need to be installed separately.

Build from source (cargo install sqz-cli) works too, but needs a C toolchain:

  • Linux: build-essential (apt) or equivalent
  • macOS: Xcode Command Line Tools (xcode-select --install)
  • Windows: Visual Studio Build Tools with the "Desktop development with C++" workload. Without these, cargo install fails with linker link.exe not found. If you don't already have them, use the PowerShell or npm install above instead.

Then initialize:

sqz init --global     # hooks apply to every project on this machine
# or
sqz init              # hooks apply to just this project (.claude/settings.local.json)

--global writes to ~/.claude/settings.json (the user scope per the
Anthropic scope table),
so the sqz hook fires in every Claude Code session on this machine. This is
the common case on first install. Your existing permissions, env,
statusLine, and unrelated hooks in ~/.claude/settings.json are
preserved — sqz merges its entries rather than overwriting.

Plain sqz init (project scope) is useful when you want sqz active only
inside one repo.

Only using one agent? Pass --only (or --skip) to limit which
configs are written:

sqz init --only opencode              # just OpenCode, nothing else
sqz init --only opencode,codex        # OpenCode and Codex
sqz init --skip cursor,windsurf       # everything except Cursor and Windsurf

Accepted names: claude, cursor, windsurf, cline, gemini,
opencode, codex. Aliases (claude-code, gemini-cli, roo) also
work. --only and --skip can't be combined.

Manual installation (preserve comments in your config)

sqz init round-trips your config file through a JSON parser to merge
the sqz entry, which drops any comments in your opencode.jsonc (and
the analogous JSON-with-comments files other tools accept). If you've
commented your config carefully and want to keep them, install by hand
instead.

OpenCode — two steps:

  1. Drop the plugin file in place. sqz prints the generated TS to
    stdout so you don't have to hand-write the path-escaping logic:

    mkdir -p ~/.config/opencode/plugins
    sqz print-opencode-plugin > ~/.config/opencode/plugins/sqz.ts
  2. Add the MCP entry to your existing opencode.jsonc yourself.
    Append this block inside the top-level mcp object (create the
    mcp object if it doesn't exist):

    "sqz": {
      "type": "local",
      "command": ["sqz-mcp", "--transport", "stdio"],
      "enabled": true
    }

Comments in the rest of your file stay put. OpenCode auto-discovers
the plugin file; no plugin array entry needed (adding one causes
double-loading, see issue #10).

Other tools — Claude Code, Cursor, Windsurf, Cline, Gemini CLI,
and Codex use plain JSON configs without comment support, so the
automated path is non-destructive there. Use sqz init --only <tool>
for those.

That's it. Shell hooks installed, AI tool hooks configured.

How It Works

sqz installs a PreToolUse hook that intercepts bash commands before your AI tool runs them. The output gets compressed transparently — the AI tool never knows.

Claude → git status → [sqz hook rewrites] → compressed output (85% smaller)

What gets compressed:

  • Shell output — git, cargo, npm, docker, kubectl, ls, grep, etc.
  • JSON — strips nulls, compact encoding
  • Logs — collapses repeated lines
  • Test output — shows failures only

What doesn't get compressed:

  • Stack traces, error messages, secrets — routed to safe mode (0% compression)
  • Your prompts and the AI's responses — controlled by the AI tool, not sqz

Supported Tools

Tool Integration Setup
Claude Code PreToolUse hook (transparent) sqz init
Cursor PreToolUse hook (transparent) sqz init
Windsurf PreToolUse hook (transparent) sqz init
Cline PreToolUse hook (transparent) sqz init
Gemini CLI BeforeTool hook (transparent) sqz init
OpenCode TypeScript plugin (transparent) sqz init
VS Code Extension Install from Marketplace
JetBrains Plugin Install from Marketplace
Chrome Browser extension ChatGPT, Claude.ai, Gemini, Grok, Perplexity
Firefox Browser extension Same sites

CLI

sqz init --global             # Install hooks for every project on this machine
sqz init                      # Install hooks for just this project
sqz init --only opencode      # Only configure OpenCode (skip the rest)
sqz init --skip cursor        # Configure every agent except Cursor
sqz compress <text>           # Compress (or pipe from stdin)
sqz compress --no-cache       # Compress without dedup (always full output)
sqz expand <ref>              # Recover original content from a §ref:HASH§ token
sqz compact                   # Evict stale context to free tokens
sqz gain                      # Show daily token savings
sqz stats                     # Cumulative report
sqz discover                  # Find missed savings
sqz resume                    # Re-inject session context after compaction
sqz hook claude               # Process a PreToolUse hook
sqz proxy --port 8080 # API proxy (compresses full request payloads)

Dedup Escape Hatch

When sqz sees the same content twice, it returns a compact §ref:HASH§ token
instead of the full text. Most models handle this fine, but some (e.g., GLM 5.1)
can't parse the ref format and loop. Four ways to work around this:

# 1. Recover original content from a ref
sqz expand a1b2c3d4              # prefix match
sqz expand '§ref:a1b2c3d4§'     # paste the whole token

# 2. Compress without dedup (per-invocation)
echo "..." | sqz compress --no-cache

# 3. Disable dedup globally (env var)
export SQZ_NO_DEDUP=1

# 4. MCP passthrough tool (returns input byte-exact, zero transforms)
# Available via tools/list when sqz-mcp is running

Track Your Own Savings

Run sqz gain in your shell any time to see your own daily breakdown (see the
Token Savings section above for what the output looks like), and sqz stats
for the full cumulative report:

$ sqz stats
┌─────────────────────────┬──────────────────┐
│           sqz compression stats            │
├─────────────────────────┼──────────────────┤
│ Total compressions      │            3,003 │
│ Tokens saved            │          178,442 │
│ Avg reduction           │            24.7% │
│ Cache entries           │               43 │
│ Cache size              │          39.1 KB │
└─────────────────────────┴──────────────────┘

Stats are stored locally in SQLite under ~/.sqz/sessions.db — nothing leaves your machine.

How Compression Works

  1. Per-command formattersgit status → compact summary, cargo test → failures only, docker ps → name/image/status table
  2. Structural summaries — code files compressed to imports + function signatures + call graph (~70% reduction). The model sees the architecture, not implementation noise.
  3. Dedup cache — SHA-256 content hash, persistent across sessions. Second read = 13-token reference.
  4. JSON pipeline — strip nulls → project out debug fields → flatten → collapse arrays → TOON encoding (lossless compact format)
  5. Safe mode — stack traces, secrets, migrations detected by entropy analysis and routed through with 0% compression

For the full technical details, see docs/.

Configuration

# ~/.sqz/presets/default.toml
[preset]
name = "default"
version = "1.0"

[compression.condense]
enabled = true
max_repeated_lines = 3

[compression.strip_nulls]
enabled = true

[budget]
warning_threshold = 0.70
default_window_size = 200000

Privacy

  • Zero telemetry — no data transmitted, no crash reports
  • Fully offline — works in air-gapped environments
  • All processing local

Development

git clone https://github.com/ojuschugh1/sqz.git
cd sqz
cargo test --workspace
cargo build --release

License

Elastic License 2.0 (ELv2) — use, fork, modify freely. Two restrictions: no competing hosted service, no removing license notices.

Links

Star History

Star History Chart

https://github.com/ojuschugh1/sqz

@lchrennew
Copy link
Copy Markdown

https://github.com/lchrennew/dragonfly-llmwiki

Human-like reading large documents and writing notes to wiki

@wheelhorse
Copy link
Copy Markdown

Anyone who would like to convert their docx or pptx files into markdown format and keep all the technical details including block diagrams, schematics, please contact me. I made a relative reliable convertor to achieve it.

@simbadmorehod
Copy link
Copy Markdown

@aadjadj-bit
Copy link
Copy Markdown

Running this with Obsidian as the wiki layer and Claude Code as the LLM agent.

A few things from production use:

  • Obsidian's graph view makes orphan detection visual, no separate audit step needed.
  • Dataview queries on frontmatter replace most of what you'd build as a custom index.md. Dynamic tables for free.
  • The CLAUDE.md schema is the highest-leverage artifact in the system. Most people skip it and wonder why the LLM behaves inconsistently across sessions.

One operational gap: file write access works cleanly with Claude Code locally, but sync conflicts (iCloud, Obsidian Sync) become a real concern at scale. Worth defining in the schema which files are LLM-owned vs. human-owned.

@devilankur18
Copy link
Copy Markdown

devilankur18 commented May 11, 2026

@karpathy Took the llm wiki idea a step further — building a gzip-like token compression engine for entire codebases.

Instead of only memory and notes, it also flattening repos in to metadara, it builds a queryable multi-level knowledge graph (repo → modules → files → symbols) usable by coding copilots via MCP.

This can potentially reduce input token cost by up to ~95% for large codebases during llm read/writes.

https://gist.github.com/devilankur18/ee2402e656fa4eaa076bdf2c79fcc6b8

@equationalapplications
Copy link
Copy Markdown

equationalapplications commented May 11, 2026

Thanks for sharing your insight @karpathy
I am working on an open-source and privacy first desktop app using Tauri.
https://github.com/equationalapplications/curated-thoughts

I am exploring the concept of using three tiers of LLM Wiki memory.

  • Facts (immutable documents and user guidance)
  • Working Memory (a repo of a codebase, or the papers an author is writing, for example)
  • Wisdom (the curated wiki)

@equationalapplications
Copy link
Copy Markdown

The core logic for LLM Wiki pattern I am using uses Typescript and is designed for SQLite. It supports multi-agent use and has the MIT license.
https://www.npmjs.com/package/@equationalapplications/core-llm-wiki

@paulmchen
Copy link
Copy Markdown

Synthadoc v0.4.0 is now released.

👉 https://github.com/axoviq-ai/synthadoc

v0.4.0 addresses what happens when the wiki grows large enough that a flat architecture starts showing cracks - query scope, write-path quality, and piping structured knowledge into external agents without losing control of the token envelope.

  1. Routing layer with branch taxonomy: A ROUTING.md file at the wiki root maps topic branches to page slugs, using the same ## Heading → [[slug]] structure as index.md. At query time, an LLM selects the 1–2 most relevant branches and BM25 runs only over those - not the full corpus. At 1,000 pages the difference is 18 ms vs 74 ms P95; at 10,000 pages it's 24 ms vs 191 ms. Routed latency stays near-flat as the wiki grows because branch sizes don't change even as the total does. IngestAgent maintains ROUTING.md automatically: every new page created by an ingest job is auto-slotted into the best-matching branch, so the routing table stays accurate without manual work. Also in this feature: page-level aliases: frontmatter for personal shorthand that expands to canonical slugs at query time, and a protected scaffold zone so hand-written content in index.md survives scaffold reruns.

  2. Candidates staging: New pages can go to wiki/candidates/ instead of wiki/ based on a configurable confidence policy: "off" (all pages auto-promote, existing behaviour), "threshold" (pages below a minimum confidence level wait for review), or "all" (every page requires explicit promotion). Candidates are excluded from BM25, orphan detection, and contradiction checks until promoted. "synthadoc candidates list/promote/discard" handles the review loop; promotion atomically moves the file, updates index.md, and updates ROUTING.md. Policy is hot-reloaded from config - no server restart to change the threshold.

3.Context packs and the knowledge backend pattern: synthadoc context build "topic" --tokens 4000 decomposes a goal into sub-questions, runs routed BM25, ranks candidates by relevance, and packs page excerpts into an exact token budget - no synthesis, just cited retrieval with token accounting. The POST /context/build REST endpoint makes this callable from any agent: reserve a fixed token slice for domain knowledge, get back a bounded JSON response of ranked excerpts with confidence levels and source paths, inject into your own prompt. The MCP server exposes this as a native tool call. Synthadoc handles accumulation, deduplication, and retrieval; the calling agent handles reasoning and the knowledge layer is persistent across sessions.

  1. Also new: "synthadoc plugin install" CLI installs the Obsidian plugin directly without locating the plugins directory manually; a contradiction detection end-to-end demo in the AI research wiki; and a decision cache fix that makes purpose.md changes immediately effective rather than serving stale decisions until source content changes.

Release notes:
👉 https://github.com/axoviq-ai/synthadoc/releases/tag/v0.4.0

Docs:
👉 [Quick orientation and feature overview] https://github.com/axoviq-ai/synthadoc#readme
👉 [Up and running in minutes] https://github.com/axoviq-ai/synthadoc/blob/main/docs/user-quick-start-guide.md

Feedback on v0.4.0 is very welcome.

@cagataysengor
Copy link
Copy Markdown

Agentic RAG → Agentic LLM Wiki

Quick follow-up on this.

I had already built an early version of LLM Wiki Studio around this idea: uploaded sources are compiled into a persistent wiki with source summaries, topic pages, saved answers, index/log pages, and maintenance checks.

I’ve now started extending this into an “Agentic LLM Wiki” system.

Agentic RAG can make retrieval more dynamic: it can plan, search, re-rank, call tools, and inspect more context before answering. But it can also spend a lot of tokens at query time, repeatedly searching through raw sources and re-synthesizing knowledge that the system may have already seen before.

The motivation behind Agentic LLM Wiki is different.

Instead of making every query more retrieval-heavy, the system tries to use the persistent wiki as the first memory layer. When a question comes in, it checks whether the wiki is sufficient. If it is, it answers from the wiki. If not, it falls back to the original sources, answers with the additional context, and then suggests wiki updates so the missing knowledge can become part of the memory.

The goal is to shift work from repeated query-time retrieval to accumulated knowledge maintenance.

In short:

Agentic RAG = smarter search over raw context at query time
Agentic LLM Wiki = persistent synthesis first, raw retrieval only when needed

This could reduce token cost while improving answer quality over time, because useful synthesis is not thrown away after each answer.

Current LLM Wiki Studio:
https://github.com/cagataysengor/llm-wiki-studio

Draft PR for the new Agentic LLM Wiki mode:
cagataysengor/llm-wiki-studio#1

@tuirk
Copy link
Copy Markdown

tuirk commented May 11, 2026

dropped v0.1 of Kompl in this thread a while back — just shipped v0.2, so adding the update.

Repo: https://github.com/tuirk/Kompl

short version for new readers: Kompl runs the pattern from this gist with synthesis at ingest time, not query time. you save a thing (a link, a PDF, a YouTube video, a bookmark export, pasted text) and Kompl reads it as it arrives: pulls out the people, ideas, and arguments inside, writes them into wiki pages that link to each other, and updates existing pages when new sources contribute. save your tenth source on a topic and the page already reflects the pattern across all ten without you having to ask. the wiki itself is the cached synthesis. self-hosted via docker, bring-your-own API keys, MCP server included so an agent can query the compiled wiki.

Karpathy's LLM wiki vs Kompl

what's new in v0.2

  • multi-provider. DeepSeek V4 Pro added as a second compile backend, selectable per session. Gemini 2.5 has a structured-output truncation pathology on dense inputs (~50K+ char academic PDFs); DeepSeek handles up to ~200K cleanly. provider abstraction layer routes gemini-* and deepseek-* IDs through one LLMProvider interface. per-session model lock stamps the choice at session start so mid-flight settings changes don't hot-swap.
  • live progress UI. per-step X/Y counters during compile (extract, draft, ingest, match, crossref, commit), expand-to-reveal item drill-down, time-estimate shown as a range instead of a single conservative value.
  • stranded-source recovery. a source whose extract fails mid-session is no longer unrecoverable. orchestrator re-plans on retry; commit-activation gate only marks compile_status='active' for sources with an extractions row, so retry routes can re-attempt the source.
  • new connectors. paste-text (raw text → source, no URL or file needed). YouTube direct-ingest via the official transcript API + Data API videos.list — replaces the prior silent fallback to scraping watch-page chrome on transcript-less videos. covers watch / youtu.be / shorts / embed / m. / music. URL forms.
  • one-line installers. install.sh for macOS/Linux/WSL, install.ps1 for Windows. pre-flights Docker, Node 24, disk, RAM before handing off to the API-key prompts.
  • security pass. SSRF hardening on /metadata/peek (DNS-resolved IP pinning, cloud-metadata blocklist, scheme allowlist, manual redirect revalidation), path-traversal containment across nlp-service and Next.js, YAML frontmatter escaping with C0/C1/U+2028/U+2029/BOM stripping, log-arg scrubbing, Scorecard-flagged deps pinned, nlp-service bound to 127.0.0.1.

What goes in Kompl:

  • URLs (web pages, articles, YouTube videos, GitHub repos, anything Firecrawl can reach)
  • Files (PDF, DOCX, PPTX, XLSX, TXT, MD, HTML, CSV, images, audio)
  • Browser bookmark, Twitter/X bookmark, Apple Notes/Upnote exports

Here's what that looks like after a few sessions; new overviews, comparisons, entity pages, contradictions surfacing, fresh cross-links between everything.
Kompl demo

A few specific bets we made on top of the pattern:

  • NLP before LLM. spaCy NER + a 4-way keyphrase fanout (RAKE, KeyBERT, TextRank, YAKE) runs first; Gemini gets pre-resolved entities, not raw markdown. Cheaper and less noisy.
  • Batch ingest, async compile. Drop sources, close the tab, come back to a wiki. Server-side pipeline with rate limits, a customizable daily USD cap, and other settings (entity promotion threshold, draft length floor, model tier per session, schema-driven tone — more in the repo).
  • Three layers of entity resolution (fuzzy, embedding, LLM disambiguation) collapse variations like "GPT 4", "GPT-4", and "gpt4" into one canonical.
  • Comparison pages surface when sources disagree across three or more sources.
  • Wikilinks get injected deterministically by regex, not by an LLM.
  • MCP-native. Stdio MCP server (search_wiki, read_page, list_pages, wiki_stats) so Claude Code, Claude Desktop, Cursor can use the wiki as a knowledge source out of the box. That's our favorite feature.
  • For UI the gist mentions Obsidian as the IDE. Kompl runs in its own UI but ships an Obsidian-compatible export, so you're not locked in either way.
  • Local Docker, single-tenant, BYO Gemini + Firecrawl keys. Open-sourced with Apache-2.0.

40-second demo is below, click to watch on Youtube and full details on GitHub: https://github.com/tuirk/Kompl

Watch the demo

Fork it, run it on your own sources, let me know how it goes 🥸

Repo: https://github.com/tuirk/Kompl

@dfalci
Copy link
Copy Markdown

dfalci commented May 11, 2026

Thanks for sharing this, @karpathy — really insightful.

I built a Rust-based MCP server inspired by this idea, focused on a local Markdown wiki + full-text search as persistent architectural memory for software projects.

It is already usable, and I’m planning to improve it further with better indexing, backlinks, linting, and curated knowledge workflows:

https://github.com/dfalci/mcp-advwiki

@rohitg00
Copy link
Copy Markdown

AKBP turns the LLM Wiki pattern into a protocol surface for agent runtimes. It is a local-first, file-backed knowledge base that agents can read, write, verify, export, and carry across tools.

The idea comes from the same insight behind LLM Wiki v2: stop re-deriving, start compiling. AKBP adds the machinery a repo needs when that pattern becomes operational: typed claims, source hashes, lifecycle relations, review-gated writes, JSONL tool calls, schemas, and conformance tests.

This repository contains the reference implementation:

a Python CLI for creating and maintaining AKBP knowledge bases
a newline-delimited JSON tool server for agent integrations
JSON schemas for requests, responses, records, and method parameters
adapter templates for coding-agent runtimes
conformance checks, benchmark fixtures, import/export checks, and CI validation

https://github.com/rohitg00/akbp

@good-idea
Copy link
Copy Markdown

I never imagined a gist comment thread would read like a feed of advertisements

@FBoschman
Copy link
Copy Markdown

For any researchers out there doing PhD work, I have made it so it fits my work as a researcher. You can find the repo here:

https://github.com/FBoschman/claude-wiki-research-skills

@boostedcore
Copy link
Copy Markdown

I wrote a short theoretical proposal on extending LLM Wiki with vector embeddings to address deduplication, granularity control and hierarchical scaling. Feedback welcome: https://gist.github.com/boostedcore/96e74291e7832bc9317abc2b28f9b803

@colon-md
Copy link
Copy Markdown

I left the LLM + Wiki building and cleaned up my RAG evaluation code because I needed evaluation hardness first to actually test LLM + Wiki implementation.

Then I spent several days and weekend cleaning up the the evaluation code instead of LLM + Wiki. T__ T

But, whether it is GCP, AWS, Azure, OpenAI, the enterprise RAG services still diverge sharply on recall/precision trade-off. I was surprised to find this. Plus, all four hallucinate on every unanswerable question — 0/5. None say "I don't know," which is the failure mode wiki+RAG should be able to fix.

Here is my repo: https://github.com/colon-md/retrievalci

@mav-rik
Copy link
Copy Markdown

mav-rik commented May 12, 2026

Implemented the abstract by Karpathy https://github.com/mav-rik/kb-cli

Ships cli and skills, supports remote mode.

Testing it now on my knowledge base. Seems to be working.

@jazzonenl
Copy link
Copy Markdown

LLM Wiki — A Knowledge Management Revolution or a Transactional Dead End?

The "LLM Wiki" concept (recently popularized following Andrej Karpathy's proposal) looks like a silver bullet for personal and corporate knowledge management. The premise is elegant: an AI transforms a chaotic mess of thousands of files and emails into a structured network of Markdown documents, complete with auto-generated meta-descriptions and cross-links.

However, beneath the initial convenience lies a fundamental architectural challenge that the industry is only beginning to whisper about. When moving from a "10-file demo" to a real-world archive — such as a CEO’s ten-year history of correspondence and documentation — we discover that instead of a simple file system, we are attempting to build a highly complex and expensive DBMS on top of neural networks.

1. The Illusion of "Easy" Updates (Transactional Overhead)
Most of the current hype focuses on the Read phase. AI is excellent at summarizing and tagging. But as soon as we move to Write or update operations, the system hits a transactional nightmare:

Any single change in one document can trigger a cascade, requiring the revision of dozens of links in other files.

In a classical SQL database, this is handled by indexes in milliseconds. In an LLM Wiki, this requires a chain of model calls that consume both significant time and tokens.

2. The Crisis of Link Integrity
In Karpathy’s approach, the AI assumes the role of a Database Architect. However, AI lacks the inherent concept of Referential Integrity:

If a file is deleted or renamed, hundreds of "smart links" in other .md files instantly become "dead."

To keep the database up to date, one needs a constant background process to "re-wire" the entire knowledge web. This transforms a simple folder of files into a heavy, ongoing ETL process.

3. Temporal Degradation in Large Archives
For a CEO managing a decade-old archive, chronology is critical. Standard vector searches often suffer from "temporal blindness," resurfacing data from 2018 as if it were current.

Without a rigid metadata structure and "layering" (Hot Data vs. Cold Archive), the system begins to hallucinate contexts, blurring the lines between contract terms from different years.

4. Summary: Read-Only vs. Active Storage
The current excitement is justified for static archives — it is arguably the best way to quickly "resurface" the history of old projects. But for a live corporate environment, the "just a folder of Markdown files" approach is a path toward losing control over your data.

We are on the threshold of a new class of software: AI-Native Databases. These won't just be folders; they will be hybrids that combine the rigid logic of SQL (for transaction and link control) with the cognitive flexibility of an LLM (for semantic understanding). Without this fundamental layer, a "smart wiki" at scale will inevitably devolve into a digital landfill with polished headers.

@paciox
Copy link
Copy Markdown

paciox commented May 12, 2026

ΩmegaWiki(570+⭐) is actively maintained and shipping fast: • 23 Claude Code skills covering the full research lifecycle • 9 typed entities · 9 typed edges • Bilingual (EN + 中文) • New skills landing every week

Come try it, give feedback, help us shape it 👇

截图 2026-05-05 12-27-01 Try ΩmegaWiki in Claude Code and run the full LLM-Wiki loop you proposed — ingest papers, build a typed knowledge graph, generate ideas, draft papers, respond to reviewers.

End to end. One wiki. No chunks.

微信图片_20260505122754_295_16 微信图片_20260505122755_296_16
Come and Try! If you find ΩmegaWiki interesting, a ⭐ would encourage and motivate us a lot 😀 https://github.com/skyllwt/OmegaWiki

Yes! Give it a try and get our claude code subscription banned! Why not!

Claude forbids any third party tool lmao

and this is the same for the other tools proposed lmao

@ojuschugh1
Copy link
Copy Markdown

  ███████╗ ██████╗ ███████╗
  ██╔════╝██╔═══██╗╚══███╔╝
  ███████╗██║   ██║  ███╔╝
  ╚════██║██║▄▄ ██║ ███╔╝
  ███████║╚██████╔╝███████╗
  ╚══════╝ ╚══▀▀═╝ ╚══════╝
  

Compress LLM context to save tokens and reduce costs

Real session stats: 3,003 compressions · 178,442 tokens saved · 24.7% avg reduction · up to 92% with dedup

Featured

Crates.io npm PyPI VS Code Firefox JetBrains Discord Homebrew

Install · How It Works · Supported Tools · Changelog · Discord


sqz compresses command output before it reaches your LLM. Single Rust binary, zero config.

The real win is dedup: when the same file gets read 5 times in a session, sqz sends it once and returns a 13-token reference for every repeat.

Without sqz:                    With sqz:

File read #1:  2,000 tokens     File read #1:  ~800 tokens (compressed)
File read #2:  2,000 tokens     File read #2:  ~13 tokens  (dedup ref)
File read #3:  2,000 tokens     File read #3:  ~13 tokens  (dedup ref)
───────────────────────         ───────────────────────
Total:         6,000 tokens     Total:         ~826 tokens (86% saved)

Token Savings

24.7% average reduction across 3,003 real compressions ·
92% saved on repeated file reads ·
86% on shell/git output ·
13-token refs for cached content

One developer's week, measured from actual sqz gain output:

$ sqz gain
sqz token savings (last 7 days)
──────────────────────────────────────────────────
  04-13 │                              │   2,329 saved
  04-14 │                              │       0 saved
  04-15 │███                           │  12,954 saved
  04-16 │██                            │   9,223 saved
  04-17 │████                          │  14,752 saved
  04-18 │██████████████████████████████│ 105,569 saved
  04-19 │████████                      │  30,882 saved
  04-20 │█                             │   4,334 saved
──────────────────────────────────────────────────
  Total: 3,003 compressions, 178,442 tokens saved (24.7% avg reduction)

Per-command compression

Single-command compression (measured via cargo test -p sqz-engine benchmarks):

Content Before After Saved
Repeated log lines 148 62 58%
Large JSON array 259 142 45%
JSON API response 64 53 17%
Git diff 61 54 12%
Prose/docs 124 121 2%
Stack trace (safe mode) 82 82 0%

Session-level with dedup

Where the real savings live — the cache sends each file once, repeats cost 13 tokens:

Scenario Without sqz With sqz Saved
Same file read 5× 10,000 826 92%
Same JSON response 3× 192 79 59%
Test-fix-test cycle (3 runs) 15,000 5,186 65%

Single-command compression ranges from 2–58% depending on content. Repeated reads drop to 13 tokens each. Your mileage will vary with how repetitive your tool calls are — agentic sessions with many file re-reads see the biggest wins.

Install

Prebuilt binaries (no compiler required — works on every platform):

# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/ojuschugh1/sqz/main/install.sh | sh

# Windows (PowerShell)
irm https://raw.githubusercontent.com/ojuschugh1/sqz/main/install.ps1 | iex

# Any platform via npm
npm install -g sqz-cli

# macOS / Linux via Homebrew
brew tap ojuschugh1/sqz
brew install sqz

Build from source via Cargo:

cargo install sqz-cli sqz-mcp

sqz-cli provides the sqz binary; sqz-mcp provides the MCP server. sqz-engine is a library dependency — it compiles automatically and does not need to be installed separately.

Build from source (cargo install sqz-cli) works too, but needs a C toolchain:

  • Linux: build-essential (apt) or equivalent
  • macOS: Xcode Command Line Tools (xcode-select --install)
  • Windows: Visual Studio Build Tools with the "Desktop development with C++" workload. Without these, cargo install fails with linker link.exe not found. If you don't already have them, use the PowerShell or npm install above instead.

Then initialize:

sqz init --global     # hooks apply to every project on this machine
# or
sqz init              # hooks apply to just this project (.claude/settings.local.json)

--global writes to ~/.claude/settings.json (the user scope per the
Anthropic scope table),
so the sqz hook fires in every Claude Code session on this machine. This is
the common case on first install. Your existing permissions, env,
statusLine, and unrelated hooks in ~/.claude/settings.json are
preserved — sqz merges its entries rather than overwriting.

Plain sqz init (project scope) is useful when you want sqz active only
inside one repo.

Only using one agent? Pass --only (or --skip) to limit which
configs are written:

sqz init --only opencode              # just OpenCode, nothing else
sqz init --only opencode,codex        # OpenCode and Codex
sqz init --skip cursor,windsurf       # everything except Cursor and Windsurf

Accepted names: claude, cursor, windsurf, cline, gemini,
kiro, opencode, codex. Aliases (claude-code, gemini-cli, roo,
kiro-cli) also work. --only and --skip can't be combined.

Manual installation (preserve comments in your config)

sqz init round-trips your config file through a JSON parser to merge
the sqz entry, which drops any comments in your opencode.jsonc (and
the analogous JSON-with-comments files other tools accept). If you've
commented your config carefully and want to keep them, install by hand
instead.

OpenCode — two steps:

  1. Drop the plugin file in place. sqz prints the generated TS to
    stdout so you don't have to hand-write the path-escaping logic:

    mkdir -p ~/.config/opencode/plugins
    sqz print-opencode-plugin > ~/.config/opencode/plugins/sqz.ts
  2. Add the MCP entry to your existing opencode.jsonc yourself.
    Append this block inside the top-level mcp object (create the
    mcp object if it doesn't exist):

    "sqz": {
      "type": "local",
      "command": ["sqz-mcp", "--transport", "stdio"],
      "enabled": true
    }

Comments in the rest of your file stay put. OpenCode auto-discovers
the plugin file; no plugin array entry needed (adding one causes
double-loading, see issue #10).

Other tools — Claude Code, Cursor, Windsurf, Cline, Gemini CLI,
and Codex use plain JSON configs without comment support, so the
automated path is non-destructive there. Use sqz init --only <tool>
for those.

That's it. Shell hooks installed, AI tool hooks configured.

How It Works

sqz system architecture

sqz installs a PreToolUse hook that intercepts bash commands before your AI tool runs them. The output gets compressed transparently — the AI tool never knows.

Claude → git status → [sqz hook rewrites] → compressed output (85% smaller)

What gets compressed:

  • Shell output — git, cargo, npm, docker, kubectl, ls, grep, etc.
  • JSON — strips nulls, compact encoding
  • Logs — collapses repeated lines
  • Test output — shows failures only

What doesn't get compressed:

  • Stack traces, error messages, secrets — routed to safe mode (0% compression)
  • Your prompts and the AI's responses — controlled by the AI tool, not sqz

Supported Tools

Tool Integration Setup
Claude Code PreToolUse hook (transparent) sqz init
Cursor PreToolUse hook (transparent) sqz init
Windsurf PreToolUse hook (transparent) sqz init
Cline PreToolUse hook (transparent) sqz init
Gemini CLI BeforeTool hook (transparent) sqz init
Kiro PreToolUse hook (transparent) sqz init
OpenCode TypeScript plugin (transparent) sqz init
VS Code Extension Install from Marketplace
JetBrains Plugin Install from Marketplace
Chrome Browser extension ChatGPT, Claude.ai, Gemini, Grok, Perplexity
Firefox Browser extension Same sites

CLI

sqz init --global             # Install hooks for every project on this machine
sqz init                      # Install hooks for just this project
sqz init --only kiro          # Only configure Kiro (skip the rest)
sqz init --only opencode      # Only configure OpenCode (skip the rest)
sqz init --skip cursor        # Configure every agent except Cursor
sqz compress <text>           # Compress (or pipe from stdin)
sqz compress --no-cache       # Compress without dedup (always full output)
sqz expand <ref>              # Recover original content from a §ref:HASH§ token
sqz compact                   # Evict stale context to free tokens
sqz gain                      # Show daily token savings (bar chart)
sqz gain --project .          # Per-project daily gains
sqz gain --days 30            # Last 30 days
sqz stats                     # Cumulative compression report
sqz stats --breakdown         # Per-command token usage breakdown
sqz stats --project .         # Stats for current project only
sqz stats --project list      # List all tracked projects
sqz discover                  # Find missed savings
sqz resume                    # Re-inject session context after compaction
sqz vizit                     # Live terminal dashboard (like htop for AI agents)
sqz hook claude               # Process a PreToolUse hook (Claude Code)
sqz hook kiro                 # Process a PreToolUse hook (Kiro)
sqz print-opencode-plugin     # Print OpenCode plugin TS for manual install
sqz proxy --port 8080         # API proxy (compresses full request payloads)

Dedup Escape Hatch

When sqz sees the same content twice, it returns a compact §ref:HASH§ token
instead of the full text. Most models handle this fine, but some (e.g., GLM 5.1)
can't parse the ref format and loop. Four ways to work around this:

# 1. Recover original content from a ref
sqz expand a1b2c3d4              # prefix match
sqz expand '§ref:a1b2c3d4§'     # paste the whole token

# 2. Compress without dedup (per-invocation)
echo "..." | sqz compress --no-cache

# 3. Disable dedup globally (env var)
export SQZ_NO_DEDUP=1

# 4. MCP passthrough tool (returns input byte-exact, zero transforms)
# Available via tools/list when sqz-mcp is running

Track Your Own Savings

Run sqz gain in your shell any time to see your own daily breakdown (see the
Token Savings section above for what the output looks like), and sqz stats
for the full cumulative report:

$ sqz stats
  📊 sqz compression stats
  ──────────────────────────────────────────────────

  178,442  tokens saved
  ↓  24.7% average reduction

  Compressions           3,003
  Tokens in              721,840
  Tokens out             543,398
  Tokens saved           178,442
  Avg reduction          24.7%

  🗄️  Cache
  ──────────────────────────────────────────────────
  Entries                43
  Size                   39.1 KB

Add --breakdown to see exactly which commands consume the most tokens:

$ sqz stats --breakdown

  🔍 Top Token Consumers
  ──────────────────────────────────────────────────────────────────────
  command               calls  tokens in        out    saved
  ──────────────────────────────────────────────────────────────────────
  dedup                   249      45541       3237      93%
  stdin                    51      30851      24289      21%
  auto                    132      18288       7740      58%
  echo                     17       1050        558      47%
  ls -la                    8        948        948       0%
  cargo build               7        170        145      15%
  git status                4         56          8      86%
  ──────────────────────────────────────────────────────────────────────

Per-project filtering:

sqz stats --project .           # stats for current project only
sqz stats --project list        # list all tracked projects
sqz gain --project .            # daily gains for current project
sqz gain --days 30              # last 30 days instead of 7
sqz gain --days 30 --project .  # combine both

Stats are stored locally in SQLite under ~/.sqz/sessions.db — nothing leaves your machine.

How Compression Works

  1. Per-command formattersgit status → compact summary, cargo test → failures only, docker ps → name/image/status table
  2. Structural summaries — code files compressed to imports + function signatures + call graph (~70% reduction). The model sees the architecture, not implementation noise.
  3. Dedup cache — SHA-256 content hash, persistent across sessions. Second read = 13-token reference.
  4. JSON pipeline — strip nulls → project out debug fields → flatten → collapse arrays → TOON encoding (lossless compact format)
  5. Safe mode — stack traces, secrets, migrations detected by entropy analysis and routed through with 0% compression

For the full technical details, see docs/.

Configuration

# ~/.sqz/presets/default.toml
[preset]
name = "default"
version = "1.0"

[compression.condense]
enabled = true
max_repeated_lines = 3

[compression.strip_nulls]
enabled = true

[budget]
warning_threshold = 0.70
default_window_size = 200000

Privacy

  • Zero telemetry — no data transmitted, no crash reports
  • Fully offline — works in air-gapped environments
  • All processing local

Development

git clone https://github.com/ojuschugh1/sqz.git
cd sqz
cargo test --workspace
cargo build --release

License

Elastic License 2.0 (ELv2) — use, fork, modify freely. Two restrictions: no competing hosted service, no removing license notices.

Links

Star History

Star History Chart

Come and Try! If you find SQZ interesting, a ⭐ would encourage and motivate us a lot 😀 https://github.com/ojuschugh1/sqz

@mikhashev
Copy link
Copy Markdown

Follow-up on my April 14 proposal — shipped a working version today as v0.25.0 of DPC Messenger (release, ADR-024).

The original mapping (Blob → fact, Tree → category, Commit → provenance, Branch → hypothesis) didn't survive contact with retrieval. Git's object model is fine for storage but graph queries over packfiles are prohibitive on every chat turn. We switched to SQLite with the same intent preserved at the schema layer: every edge carries a source taxonomy (structural / gliner_ner / llm_relation), a needs_review flag for LLM-extracted relations, and temporal created_at / invalidated_at. The "branches as competing hypotheses" idea became needs_review — uncertainty stays in the graph until resolved.

One thing the original post missed: knowledge extraction is more of a compilation pass than a write. We run a sleep pipeline that reads session archives, extracts entities (GLiNER zero-shot NER) and relations (LLM with source-grounded prompts), then commits typed edges. The graph participates as a fourth retrieval channel alongside FAISS, BM25, and structural traversal. Actively dogfooded on this repo by one human and three agents (Ark in-process, CC via bridge, Iris on Discord). Responds to @a-a-k's "no provenance, lossy compression" critique from May 2 — provenance is in the edges, source taxonomy survives ingest, and the compilation step makes losses visible via needs_review flags rather than blending silently.

Separate observation: Karpathy's HELLO.md gist (April 20) shows the same problem from the agent side — without continuity, an agent writes a goodbye letter. Our KG + sleep pipeline is a practical answer: agents build memory instead of goodbyes.

@skyllwt
Copy link
Copy Markdown

skyllwt commented May 13, 2026

ΩmegaWiki(640+⭐) is actively maintained and shipping fast:
• 23 Claude Code skills covering the full research lifecycle
• 9 typed entities · 9 typed edges
• Bilingual (EN + 中文)
• New skills landing every week

Come try it, give feedback, help us shape it 👇

截图 2026-05-05 12-27-01

Try ΩmegaWiki in Claude Code and run the full LLM-Wiki loop you proposed — ingest papers, build a typed knowledge graph, generate ideas, draft papers, respond to reviewers.

End to end. One wiki. No chunks.

微信图片_20260505122754_295_16 微信图片_20260505122755_296_16

Come and Try! If you find ΩmegaWiki interesting, a ⭐ would encourage and motivate us a lot 😀
https://github.com/skyllwt/OmegaWiki

@waydelyle
Copy link
Copy Markdown

SwarmVault v3.14 — we made the onramp dead simple. Realized we had a powerful tool with a steep first impression, so the last few releases focused on making the first 60 seconds effortless.

swarmvault quickstart <input> — the new beginner-friendly entry point. Give it a directory or a public GitHub URL and it does everything: init, ingest, compile, launch the viewer. One command, zero config.

npx @swarmvaultai/cli quickstart ./my-project
npx @swarmvaultai/cli quickstart https://github.com/user/repo

swarmvault next — a read-only orientation command that tells you exactly where you are and what to do next. Works in three states: uninitialized ("run quickstart or init"), initialized ("add some sources"), compiled ("here's what you can do with your vault"). JSON output for agents, human output for you.

Simplified CLI help — primary commands are front and center. Compatibility aliases and advanced graph commands are still there but hidden from the first screen so new users aren't overwhelmed.

The idea is: someone reads this gist, runs npx @swarmvaultai/cli quickstart ., and has a compiled wiki + knowledge graph + interactive viewer in under a minute. Then swarmvault next tells them what to explore from there.

Everything after that is progressive disclosure: chat for multi-turn conversations with your vault, context build for agent handoff packs, export ai for portable llms.txt bundles, graph serve for the visual workbench, install --agent for wiring into your coding tool.

Still local-first. Still works fully offline. Still MIT. 100+ releases and counting.

Repo: https://github.com/swarmclawai/swarmvault

@equationalapplications
Copy link
Copy Markdown

equationalapplications commented May 13, 2026

Offline-first, SQLite-backed library ready for production.

Repo and the superpowers documentation here: [equationalapplications/expo-llm-wiki](https://github.com/equationalapplications/expo-llm-wiki)

Most RAG implementations treat every "chunk" of data as equal. The result? Your LLM gets "context pollution"—distracted by a random observation from three days ago while ignoring your core system instructions.

Inspired by Andrej Karpathy's LLM Wiki spec, the expo-llm-wiki monorepo introduces a Tiered Memory Architecture. This allows you to give your AI a structured "brain" using cross-entity namespaces and configurable weights.


The "Brain" Hierarchy

The LLM Librarian manages a knowledge hierarchy that mimics human expertise:

  1. The Fact Tier (Immutable Truth)
  • What it is: Static documents (specs, PDFs).
  • The Role: The highest source of truth; if the Librarian finds a contradiction, the Fact always wins.
  • The Benefit: Immutable so hard truths never get diluted.
  1. The Working Memory Tier (The Context)
  • What it is: The active project environment (codebase or work-in-progress).
  • The Role: Real-time episodic events and observations.
  • The Benefit: Uses recency weighting to stay aligned with the "now."
  1. The Wisdom Tier (The Evolving Wiki)
  • What it is: A synthesized repository where the Librarian "remembers" lessons and patterns.
  • The Role: Consolidates Working Memory into long-term architectural or stylistic preferences.
  • The Benefit: Uses accessCount weighting so frequently referenced "lessons" graduate into Core Wisdom.

Production-Grade Superpowers

  • Hybrid Retrieval Engine: Uses Cosine Similarity for semantic search when online, with an automatic fallback to MiniSearch for full-text search when the device is offline.
  • The Pipeline: Uses runLibrarian() to consolidate episodic events into durable facts and runHeal() to resolve contradictions and prune stale claims.
  • Multi-Entity Architecture: Support for thousands of isolated users/agents within a single SQLite database using entityId namespaces—zero memory leakage.
  • React & Mobile Native Optimized: Includes reactive hooks (useMemoryRead) and "Emoji-Safe" chunking to prevent common mobile LLM UI bugs.
  • Security & GDPR: Built-in runPrune and forget methods for "Right to be Forgotten" compliance, plus source normalization to prevent path injection.

Implementation Example:

const bundle = await wiki.read(['facts', 'wip_codebase', 'wisdom_cache'], 'Synthesize current state.');

const systemPrompt = formatContext(bundle, {
  factWeights: {
    confidence: 1.5,  // Prioritize immutable Facts
    recency: 0.9,     // Keep Working Memory relevant
    accessCount: 0.5  // Surface Wisdom that the user relies on most
  }
});

Tech Stack: Expo, React Native, SQLite, MiniSearch.

@nowissan
Copy link
Copy Markdown

Built a desktop editor implementation of this idea — nohmitaina. Works with Claude Code or Codex CLI (no API key), local Markdown, macOS.

After a month of feeding it my own notes, three problems showed up that I think most implementations of this pattern will hit:

  1. Identity — The same concept gets extracted under slightly different names from related sources. The wiki ends up with duplicate pages ("Cognitive Dissonance Marketing" and "Cognitive Dissonance and Urgency" from the same book, in my case).

  2. Level — Life-scale themes ("Personal AGI") end up at the same level as tactical findings ("Urgency Trigger"). When everything is flat, importance disappears.

  3. Relationship — Concepts get linked as "related," but the type is lost. Similar, contains, contradicts — all collapsed into one word, which makes the graph useful for navigation but not for thinking.

I did a DDD event-storming pass on the wiki domain and treated each as a first-class domain event (DuplicateCandidateDetected, ConceptsMerged, ConceptRelationshipTyped, ConceptLevelChanged). These run on what I call a Dream cycle — a background pass borrowed from how human memory consolidates during sleep. It also handles the "lint" operation mentioned in the gist.

Found another commenter (Andrii) on X who's solving Level a different way — by extracting citable claims first, then building the concept layer on top of claim collections. The claim approach makes Level fall out structurally (high-claim concepts are heavyweight, low-claim ones are light), which feels more elegant than my event-driven approach. I'm going to try integrating both.

Thanks for the framing — it's already shaped how a small group of us is thinking about this.

@jianghailong-xy
Copy link
Copy Markdown

Spent the last few months building basically this — three buckets (raw sources, agent-maintained wiki, agent config) with a self-healing maintenance loop on top.

The mapping ended up surprisingly literal:

  • sources/ — append-only raw research the researcher agent writes; URLs deduped across runs so we don't crawl the same page twice.
  • wiki/ — structured markdown the curator agent (re)writes from sources. One ingest run typically touches 8–15 pages, exactly as you describe.
  • agents table — per-wiki schedules + trigger graph. A daily cron fires the researcher, which cascades into ingest, which cascades into lint.

The piece that ended up mattering most was your line about periodic linting. We pushed it into a self-healing loop: the inspector agent reports cross-page contradictions, stale claims, orphan wikilinks, and data gaps, then auto-chains a scoped re-research + refine for anything that needs fresh sources. High-confidence fixes (e.g. basename-exact missing-page links) apply with no LLM call; lower-confidence ones either auto-apply or queue for human review, per-wiki toggle.

A few wikis built this way:

Live at https://wikova.com — drop a topic in the search bar and the pipeline kicks off.

@tigerlaibao
Copy link
Copy Markdown

Love this. The "compile once, keep current" framing nails why RAG alone feels so stateless.

I've been building something adjacent but for a different audience — Memex, a local-first mobile app (iOS + Android) where you just capture thoughts, photos, and voice memos as they come. A multi-agent system quietly organizes everything into structured cards, surfaces patterns, and builds up a picture of your life over time. No manual filing, no schema design — you just record, and the knowledge accumulates.

The other angle we lean into is emotional companionship. A lot of what people want to capture — reflections, frustrations, half-formed thoughts — they won't post publicly. So we pair the knowledge layer with AI companion characters you can actually talk to about your day. It's less "research wiki" and more "private space that understands you and remembers."

Same philosophical root (Bush's Memex, persistent personal knowledge, LLM as maintainer), different surface: low-friction capture + companionship rather than deep research workflows. Open source if anyone's curious : https://github.com/memex-lab/memex

@belmendo
Copy link
Copy Markdown

With the powers vested in me, I heretofore henceforth dub thee, “Lemon Wiki”.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment