Skip to content

Instantly share code, notes, and snippets.

@karpathy
Created April 4, 2026 16:25
Show Gist options
  • Select an option

  • Save karpathy/442a6bf555914893e9891c11519de94f to your computer and use it in GitHub Desktop.

Select an option

Save karpathy/442a6bf555914893e9891c11519de94f to your computer and use it in GitHub Desktop.
llm-wiki

LLM Wiki

A pattern for building personal knowledge bases using LLMs.

This is an idea file, it is designed to be copy pasted to your own LLM Agent (e.g. OpenAI Codex, Claude Code, OpenCode / Pi, or etc.). Its goal is to communicate the high level idea, but your agent will build out the specifics in collaboration with you.

The core idea

Most people's experience with LLMs and documents looks like RAG: you upload a collection of files, the LLM retrieves relevant chunks at query time, and generates an answer. This works, but the LLM is rediscovering knowledge from scratch on every question. There's no accumulation. Ask a subtle question that requires synthesizing five documents, and the LLM has to find and piece together the relevant fragments every time. Nothing is built up. NotebookLM, ChatGPT file uploads, and most RAG systems work this way.

The idea here is different. Instead of just retrieving from raw documents at query time, the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files that sits between you and the raw sources. When you add a new source, the LLM doesn't just index it for later retrieval. It reads it, extracts the key information, and integrates it into the existing wiki — updating entity pages, revising topic summaries, noting where new data contradicts old claims, strengthening or challenging the evolving synthesis. The knowledge is compiled once and then kept current, not re-derived on every query.

This is the key difference: the wiki is a persistent, compounding artifact. The cross-references are already there. The contradictions have already been flagged. The synthesis already reflects everything you've read. The wiki keeps getting richer with every source you add and every question you ask.

You never (or rarely) write the wiki yourself — the LLM writes and maintains all of it. You're in charge of sourcing, exploration, and asking the right questions. The LLM does all the grunt work — the summarizing, cross-referencing, filing, and bookkeeping that makes a knowledge base actually useful over time. In practice, I have the LLM agent open on one side and Obsidian open on the other. The LLM makes edits based on our conversation, and I browse the results in real time — following links, checking the graph view, reading the updated pages. Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase.

This can apply to a lot of different contexts. A few examples:

  • Personal: tracking your own goals, health, psychology, self-improvement — filing journal entries, articles, podcast notes, and building up a structured picture of yourself over time.
  • Research: going deep on a topic over weeks or months — reading papers, articles, reports, and incrementally building a comprehensive wiki with an evolving thesis.
  • Reading a book: filing each chapter as you go, building out pages for characters, themes, plot threads, and how they connect. By the end you have a rich companion wiki. Think of fan wikis like Tolkien Gateway — thousands of interlinked pages covering characters, places, events, languages, built by a community of volunteers over years. You could build something like that personally as you read, with the LLM doing all the cross-referencing and maintenance.
  • Business/team: an internal wiki maintained by LLMs, fed by Slack threads, meeting transcripts, project documents, customer calls. Possibly with humans in the loop reviewing updates. The wiki stays current because the LLM does the maintenance that no one on the team wants to do.
  • Competitive analysis, due diligence, trip planning, course notes, hobby deep-dives — anything where you're accumulating knowledge over time and want it organized rather than scattered.

Architecture

There are three layers:

Raw sources — your curated collection of source documents. Articles, papers, images, data files. These are immutable — the LLM reads from them but never modifies them. This is your source of truth.

The wiki — a directory of LLM-generated markdown files. Summaries, entity pages, concept pages, comparisons, an overview, a synthesis. The LLM owns this layer entirely. It creates pages, updates them when new sources arrive, maintains cross-references, and keeps everything consistent. You read it; the LLM writes it.

The schema — a document (e.g. CLAUDE.md for Claude Code or AGENTS.md for Codex) that tells the LLM how the wiki is structured, what the conventions are, and what workflows to follow when ingesting sources, answering questions, or maintaining the wiki. This is the key configuration file — it's what makes the LLM a disciplined wiki maintainer rather than a generic chatbot. You and the LLM co-evolve this over time as you figure out what works for your domain.

Operations

Ingest. You drop a new source into the raw collection and tell the LLM to process it. An example flow: the LLM reads the source, discusses key takeaways with you, writes a summary page in the wiki, updates the index, updates relevant entity and concept pages across the wiki, and appends an entry to the log. A single source might touch 10-15 wiki pages. Personally I prefer to ingest sources one at a time and stay involved — I read the summaries, check the updates, and guide the LLM on what to emphasize. But you could also batch-ingest many sources at once with less supervision. It's up to you to develop the workflow that fits your style and document it in the schema for future sessions.

Query. You ask questions against the wiki. The LLM searches for relevant pages, reads them, and synthesizes an answer with citations. Answers can take different forms depending on the question — a markdown page, a comparison table, a slide deck (Marp), a chart (matplotlib), a canvas. The important insight: good answers can be filed back into the wiki as new pages. A comparison you asked for, an analysis, a connection you discovered — these are valuable and shouldn't disappear into chat history. This way your explorations compound in the knowledge base just like ingested sources do.

Lint. Periodically, ask the LLM to health-check the wiki. Look for: contradictions between pages, stale claims that newer sources have superseded, orphan pages with no inbound links, important concepts mentioned but lacking their own page, missing cross-references, data gaps that could be filled with a web search. The LLM is good at suggesting new questions to investigate and new sources to look for. This keeps the wiki healthy as it grows.

Indexing and logging

Two special files help the LLM (and you) navigate the wiki as it grows. They serve different purposes:

index.md is content-oriented. It's a catalog of everything in the wiki — each page listed with a link, a one-line summary, and optionally metadata like date or source count. Organized by category (entities, concepts, sources, etc.). The LLM updates it on every ingest. When answering a query, the LLM reads the index first to find relevant pages, then drills into them. This works surprisingly well at moderate scale (~100 sources, ~hundreds of pages) and avoids the need for embedding-based RAG infrastructure.

log.md is chronological. It's an append-only record of what happened and when — ingests, queries, lint passes. A useful tip: if each entry starts with a consistent prefix (e.g. ## [2026-04-02] ingest | Article Title), the log becomes parseable with simple unix tools — grep "^## \[" log.md | tail -5 gives you the last 5 entries. The log gives you a timeline of the wiki's evolution and helps the LLM understand what's been done recently.

Optional: CLI tools

At some point you may want to build small tools that help the LLM operate on the wiki more efficiently. A search engine over the wiki pages is the most obvious one — at small scale the index file is enough, but as the wiki grows you want proper search. qmd is a good option: it's a local search engine for markdown files with hybrid BM25/vector search and LLM re-ranking, all on-device. It has both a CLI (so the LLM can shell out to it) and an MCP server (so the LLM can use it as a native tool). You could also build something simpler yourself — the LLM can help you vibe-code a naive search script as the need arises.

Tips and tricks

  • Obsidian Web Clipper is a browser extension that converts web articles to markdown. Very useful for quickly getting sources into your raw collection.
  • Download images locally. In Obsidian Settings → Files and links, set "Attachment folder path" to a fixed directory (e.g. raw/assets/). Then in Settings → Hotkeys, search for "Download" to find "Download attachments for current file" and bind it to a hotkey (e.g. Ctrl+Shift+D). After clipping an article, hit the hotkey and all images get downloaded to local disk. This is optional but useful — it lets the LLM view and reference images directly instead of relying on URLs that may break. Note that LLMs can't natively read markdown with inline images in one pass — the workaround is to have the LLM read the text first, then view some or all of the referenced images separately to gain additional context. It's a bit clunky but works well enough.
  • Obsidian's graph view is the best way to see the shape of your wiki — what's connected to what, which pages are hubs, which are orphans.
  • Marp is a markdown-based slide deck format. Obsidian has a plugin for it. Useful for generating presentations directly from wiki content.
  • Dataview is an Obsidian plugin that runs queries over page frontmatter. If your LLM adds YAML frontmatter to wiki pages (tags, dates, source counts), Dataview can generate dynamic tables and lists.
  • The wiki is just a git repo of markdown files. You get version history, branching, and collaboration for free.

Why this works

The tedious part of maintaining a knowledge base is not the reading or the thinking — it's the bookkeeping. Updating cross-references, keeping summaries current, noting when new data contradicts old claims, maintaining consistency across dozens of pages. Humans abandon wikis because the maintenance burden grows faster than the value. LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass. The wiki stays maintained because the cost of maintenance is near zero.

The human's job is to curate sources, direct the analysis, ask good questions, and think about what it all means. The LLM's job is everything else.

The idea is related in spirit to Vannevar Bush's Memex (1945) — a personal, curated knowledge store with associative trails between documents. Bush's vision was closer to this than to what the web became: private, actively curated, with the connections between documents as valuable as the documents themselves. The part he couldn't solve was who does the maintenance. The LLM handles that.

Note

This document is intentionally abstract. It describes the idea, not a specific implementation. The exact directory structure, the schema conventions, the page formats, the tooling — all of that will depend on your domain, your preferences, and your LLM of choice. Everything mentioned above is optional and modular — pick what's useful, ignore what isn't. For example: your sources might be text-only, so you don't need image handling at all. Your wiki might be small enough that the index file is all you need, no search engine required. You might not care about slide decks and just want markdown pages. You might want a completely different set of output formats. The right way to use this is to share it with your LLM agent and work together to instantiate a version that fits your needs. The document's only job is to communicate the pattern. Your LLM can figure out the rest.

@arrrr110

arrrr110 commented Jun 20, 2026

Copy link
Copy Markdown

谢谢你分享的智慧,我在AI的帮助下构建了一个简单的LLM-Wiki项目。

欢迎伙伴们点评。

https://github.com/arrrr110/Nemsy
奈姆希(Nemsy):一个自我构建的私人知识助理

名字取自炉石传说中的 Nemsy Necrofizzle —— 聪明、好奇、充满能量。

本项目基于 Karpathy 的 LLM Wiki 理念,由 DeepSeek 超长上下文模型驱动,以 Obsidian Vault 为知识源,为唯一用户提供持续积累、自主归纳的个人知识服务。

@lidorshimoni

Copy link
Copy Markdown

I really love this pattern. Seeing some of the pushback comparing this to a standard database (re: @witwaycorp) kind of misses the core value proposition. A DB or search engine just retrieves raw data—you still have to mentally parse everything and connect the dots yourself. This LLM-Wiki shifts that cognitive load. The LLM does the heavy lifting of resolving contradictions and building out the graph during the ingest phase. You're building a compounding asset, not just a search index.

That said, taking this from a concept to a stable architecture brings up some real scaling bottlenecks. For those of you already hacking on implementations here (@vvvvvivekkk, @skyllwt, @axoviq-ai), I’d love to hear how you’re handling a couple of things:

  1. Synthesis Decay & Knowledge Drift
    When the LLM constantly rewrites entity pages as new data flows in, how do you prevent the original nuance from getting compressed out? Or worse, hallucinations slowly becoming the "ground truth" over multiple update cycles? Are you sticking strictly to append-only, or maybe forcing the LLM to anchor its claims with hard quotes from the immutable raw/ files?

  2. Scaling the "Lint" Step
    Triggering a full health-check across the entire wiki for contradictions is going to nuke the context window and get expensive fast. Have any of you tried a "local subgraph" approach to linting? For example, if a new source updates a specific concept, you only trigger a lint on that node and its 1st or 2nd-degree connections rather than checking the whole repo?

Curious to hear how you guys are optimizing the backend logic and maintenance loops in your forks!

@hmbseaotter

Copy link
Copy Markdown

Contradiction checking does not need to be a monolithic full-repo LLM pass.

It can be implemented as several smaller steps:

  1. Per-source contradiction detection (at ingest time)
    This is the high-frequency activity: it runs on every source ingest. The schema says to compare the incoming claim against what the touched pages already say. A source touches ~8–15 pages, so the model only loads those pages, not the whole repo. My schema classifies a detected contradiction into one of three severities — soft, scope-mismatch, or hard (and "none" when there is no conflict). Soft and scope-mismatch are non-blocking: they get flagged, referenced, compared and explained, and I permit them since they can be useful in setting the subject matter's peripheral context. I also have a mechanism to keep an eye on soft/scope contradictions so they do not quietly accumulate over time without any review. Hard contradictions are not acceptable — they stop the ingestion run, hold the commit, send me a notification, and block continuation of ingestion until I manually resolve them (with an explanation of the resolution) inside the MD files. Each flagged contradiction carries a machine-readable severity token plus a status line, e.g.:
Contradiction severity: hard
Status: Unresolved — flagged for user review
  1. The commit gate - it is deterministic and carries zero context cost.
    This is what holds a commit for hard contradictions on every source. The commit gate is not an LLM pass at all — it is a Python os.walk over the "wiki/" folder that greps each page for "Status: Unresolved" and reads the severity token. Yes, it touches every file, but only via cheap disk I/O + regex; it never reaches into the context window. So this "scan the whole repo on every commit" costs ~nothing and scales to any size.

  2. The periodic lint backstop — the only genuinely broad pass, and the one to watch.
    This is where the concern about nuking the context window has real merit. In my lint workflow, the deterministic checks still sweep the whole wiki, but most of that is shell, not LLM: orphan detection, missing-page detection, and unreferenced-image checks are comm/grep passes (comm to diff sorted page-lists against sorted wikilink targets). Only the reasoning-heavy checks — the contradiction backstop, causal-chain gaps, thin-page judgment, and missing cross-references — need the model. This way I avoid a naive "load every page, cross-check all pairs" approach with O(n²) that would get painful fast as the wiki scales.

@lidorshimoni proposed a subgraph approach, which I then formalized into my schema. (A partial delta-scoping already existed in my compile step; reading the post pushed me to make scoped linting the explicit rule.) The fit is natural: a contradiction can only exist between claims about the same entity or relationship, so two pages sharing no concept and no [[wikilink]] essentially can't contradict each other. (The edge case is two pages that independently mention the same external entity without linking to a shared page for it — dense cross-referencing reduces this but does not eliminate it.) My wiki already encodes relatedness as explicit graph edges — wikilinks plus the directional "What causes this" / "What this causes" links — so the contradiction surface is bounded to each source's touched pages plus their 1st/2nd-degree link neighbors. The graph hands us the neighborhood for free.

The reasoning-heavy backstop therefore runs over nodes changed since the last lint plus their graph neighbors only, not the full repo. This keeps cost bounded as the wiki grows and still catches newly-introduced cross-page conflicts.

One gap this leaves intentionally: a contradiction between two old, unchanged pages that have never landed in the same lint neighborhood. That is real, and I am not pretending it isn't. The mitigation is periodic full sweeps — currently after large ingestion rounds and on explicit request, not on an automated schedule. This trades recall for cost, and I think that is the right tradeoff: the per-source check and the commit gate already filter the cases that would actually corrupt the wiki. The full sweep is a cleanup pass, not a primary defense.

@distorx

distorx commented Jun 21, 2026

Copy link
Copy Markdown

OKF — a production instance of this pattern, with the index + graph internals

Running this for a self-hosted infra fleet (a mail server, ~30 Tailscale hosts, Proxmox, CI runners).
~7,500 type-tagged notes; the vault is the operational source of truth, not just reading notes.
A few internals that made it hold up past the "index.md is enough" scale:

flowchart LR
  S[Raw sources<br/>repos · live configs · API specs · kanban · fleet] -->|exporters on timers| V[Vault<br/>~7,500 md + YAML frontmatter]
  V -->|okf sync /10min| F[(notes<br/>FTS5 trigram)]
  V -->|okf sync /10min| E[(emb<br/>1024-d vectors)]
  V -->|okf-crosslink| M[per-area MOC Map hubs]
  F --> Q[okf search / semantic / record / run]
  E --> Q
  Q --> A[agent &amp; human]
Loading

Ingest is scripted, not one-source-at-a-time. Exporters walk the live systems on timers — network
inventory, Google/OpenAPI discovery (→ ~1,880 api-method notes), kanban→notes, commits, model &
provider registries. The wiki re-derives itself as the infra changes under you.

The index is a derived dual index (okf-index.db, ~53 MB, throwaway/rebuildable):

  • notes = FTS5 trigram (so MATCH behaves like substring); emb = 1024-d float32 unit vectors
    (mxbai-embed-large via local Ollama, 4 KB/note); meta/config for bookkeeping.
  • Latency: keyword ~0.07 s (vs 0.5–1.2 s full scan); semantic ~1.1 s cold — dominated by
    process + numpy import + the query-embed, not the matrix (loading ~19 MB is ~50 ms).
  • Integrity guards (why it never drifts): content-hash authority (git/exporters churn mtime; a
    sha1 decides real change, so no spurious ~5-min re-embeds); embedder-identity guard (model swap →
    full re-embed, never mixes vector spaces); independent commit (FTS advances even if Ollama is
    down mid-sync — the embedding is deferred + retried, never marked done with a stale vector);
    deletion reconciliation + zero-candidate full-walk fallback (true mirror, never misses an add).
    Vault = truth; index = throwaway.

It's plain SQLite — okf shells out, no ORM:

-- derived schema (rebuildable; the vault is the source of truth)
CREATE VIRTUAL TABLE notes USING fts5(rel UNINDEXED, hay, tokenize='trigram');
CREATE TABLE meta  (rel TEXT PRIMARY KEY, mtime REAL, rid INTEGER, hash TEXT);
CREATE TABLE emb   (rel TEXT PRIMARY KEY, mtime REAL, vec BLOB, hash TEXT);   -- 4096-byte float32 blob
CREATE TABLE config(k TEXT PRIMARY KEY, v TEXT);                              -- emb_model, emb_dim, schema_version

-- keyword: trigram tokenizer makes MATCH behave like substring; narrows ~7,500 -> a few
SELECT rel FROM notes WHERE notes MATCH '"gitea"';        -- then exact `ql in hay` re-check per candidate

-- filtered (frontmatter is folded into `hay`, lowercased): type + a field
SELECT rel FROM notes WHERE notes MATCH '"type:bill"' AND notes MATCH '"billable:true"';

-- incremental upsert: skip unchanged, delete+reinsert changed by FTS rowid
SELECT rel, mtime, rid, hash FROM meta WHERE rel = :rel;       -- mtime fast-path, hash authority
DELETE FROM notes WHERE rowid = :rid;                          -- then INSERT the new hay
INSERT INTO notes(rowid, rel, hay) VALUES (:rid, :rel, :hay);

-- semantic: pull unit vectors, cosine = dot product in numpy (no SQL vector op)
SELECT rel, vec FROM emb;

Indexing notes: rel is the PK everywhere (O(1) upsert / delete-reinsert); meta.rid maps a
note to its FTS rowid so a change is DELETE+INSERT by rowid (no full reindex); the trigram
tokenizer is the index — substring MATCH with no LIKE table-scan — and FTS5 maintains its own
shadow tables (notes_data/_idx/_content/_docsize). There's deliberately no B-tree on hay; FTS5
owns that, and the exact re-check per candidate keeps results identical to a full scan.

The graph is maintained and linted. okf lint reports: orphans (no inbound [[links]]), stubs
(broken links = not-yet-written knowledge, informational), near-duplicate / contradiction candidates
by embedding cosine, and conformance (frontmatter + non-empty type). A cross-link pass maintains
per-area MOC "Map" hubs — it writes only the hubs, never the member notes, so there's no churn —
and took orphans from ~31% → ~19% in one run.

Convergence: we'd independently named it "OKF," then found Google's Open Knowledge Format spec
describing almost exactly the design (md + YAML frontmatter, type required, links = untyped edges,
broken links tolerated). Aligning to v0.1 means external OKF tooling could consume the vault unchanged.

The "maintenance ≈ 0" claim is the crux: timers + the agent keep the ingest, the dual index, and the
graph all current, so unlike every wiki I've abandoned, this one stays connected and in-sync without
me touching it.

@MarcoPorcellato

Copy link
Copy Markdown

What about "Reinforcement Learning (RLHF) for PKMs"?

Right now, our graphs are "flat". Every block or bullet has a static weight of 1.0. Search relies on text matching, but human memory doesn't work like that. Some ideas are core pillars, others are just fleeting notes.

I’m releasing an architectural RFC to the open-source community: Applying Reinforcement Learning from Human Feedback (RLHF) to Personal Knowledge Graphs.

Instead of forcing users to manually rate notes (flashcards/spaced repetition), the database should passively learn from our UI interactions:

  • Rewards (+ weight): When you transclude a block ((uuid)) or zoom into it (Focus Mode), the database learns this is a foundational node.

  • Penalties (- weight): When a block is ignored in search results (scroll-past) or hasn't been touched in months (temporal decay), its semantic weight drops.

I wrote a detailed Gist outlining the architecture, the pseudo-code for the dynamic weights, and how it alters global search ranking. I've released the concept under the Apache 2.0 license so anyone can experiment with it or build plugins.

You can take a look and contribute here:
https://gist.github.com/MarcoPorcellato/9e5226408c56048b16957771f9056e28

I'm building this into the core of Matryca Brain (next step from Matryca Plumber), but I’d love to hear the thoughts of the Logseq community. Is anyone else exploring dynamic node weighting based on implicit UI feedback? Let's discuss the architecture!

@SkyHomage

Copy link
Copy Markdown

The comment section is unreadable...

I'm just wondering about his idea of sources. For articles/research this makes sense, just add the paper. But one of the example use cases was reading a book. So how would one add a source for a book? Its either copyright protected or will be in the order of MBs per book.

Adding books do seem a bit redundant since surely the LLM was already trained on every printed book at the time. In my experience however getting a LLM to work with me on a certain chapter is a bit difficult without pulling in future chapters (since it knows about it) or random ideas from other wikis/reddit. It would be nice to truly work with the LLM and discover knowledge in tandem when it comes to books.

@perdakovich

Copy link
Copy Markdown

lol

@alexesDev

Copy link
Copy Markdown

@geetansharora you can keep the MCP setup you've got. What changes is that the wiki itself becomes the shared thing, not a RAG index sitting next to it.

That's basically what we built trip2g for. You sync the markdown vault from Obsidian, and it's served two ways: as a normal site and over MCP. So everyone's agent (Claude, Cursor, Codex, whatever) points at one endpoint. It runs search to find the section, then expand to walk the note's TOC one level at a time, so it only pulls the part it needs instead of the whole note. You can also stitch a few separate bases together with federation behind the same endpoint.

Easiest way to poke at it: the trip2g docs are themselves one of these wikis over MCP, no auth. Add https://trip2g.com/_system/mcp as an MCP server and let your agent search/expand around them.

Self-host writeup: https://trip2g.com/en/user/agent_memory
the general idea: https://trip2g.com/en/user/llm_wiki

@equationalapplications

equationalapplications commented Jun 22, 2026

Copy link
Copy Markdown

Letting an AI silently maintain a markdown-based knowledge base is incredibly powerful. But as your graph grows, taxonomy drift becomes a nightmare. An unconstrained LLM will generate company, Company, Business, and Organization across different runs, making it impossible to build reliable app UI dashboards on top of your data.
To solve this, we are brainstorming a Per-Entity Seeded Ontology architecture for our @equationalapplications/core-llm-wiki memory engine.
As shown in the infographic, this pattern gives developers 3 configurable modes to control how the LLM extracts knowledge graph edges:
1️⃣ Strict (Seeded): Supply a starter pack of allowed edges (e.g., ['client', 'employed_by']) using a package like @equationalapplications/wiki-ontologies. The LLM is forced to stick to this schema, guaranteeing predictable data structures for hard-coded dashboards. 2️⃣ Emergent (Autogenerated): Give the LLM total freedom to invent relationships dynamically, tracking its own inventions in an ontology_manifest fact. Perfect for flexible "Second Brain" apps where the domain is unknown. 3️⃣ Off (Disabled): Stick to standard, isolated semantic fact extraction when relationship traversal is overkill.
This perfectly balances rigid app data requirements with the "minimally opinionated" philosophy of the Open Knowledge Format (OKF) v0.1

I would love to get feedback from other developers building agentic memory! Which mode would you use for your app?
🔗 Links & Resources:
Core LLM Wiki Engine: https://github.com/equationalapplications/expo-llm-wiki/tree/main/packages/core
Open Knowledge Format (OKF) Spec: https://github.com/GoogleCloudPlatform/knowledge-catalog/blob/main/okf/SPEC.md
Karpathy's LLM Wiki Gist: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
Equational Applications LLC: https://equationalapplications.com/

@equationalapplications

Copy link
Copy Markdown

An introduction to the Equational Applications LLC collection of LLM Wiki inspired, open-source Typescript packages.
https://youtu.be/ay9PJ3WXxoo?si=ukTWR9oUADUFJrSD

@equationalapplications

Copy link
Copy Markdown

We are excited to announce that Open Knowledge Format (OKF) Import is officially coming to the @equationalapplications/core-llm-wiki memory engine!
OKF v0.1 is a vendor-neutral standard that represents knowledge as a directory of markdown files with YAML frontmatter, where the file path serves as the concept's identity. While our ecosystem already supports exporting episodic memory into compliant OKF bundles, this upcoming update introduces the parseOkfBundle adapter to seamlessly read foreign or modified OKF bundles back into your local SQLite database.
But we aren't just importing flat facts—this update lays the foundational database schema for our upcoming Per-Entity Seeded Ontology roadmap. We are introducing:
A new okf_type column to preserve the exact YAML frontmatter type strings.
A lightweight llm_wiki_edges table to extract and persist typed graph relationships directly from markdown cross-links.
Zero-dependency parsing primitives in our new @equationalapplications/core-okf package.
Soon, your AI agents will be able to ingest, maintain, and share rich, interconnected knowledge graphs with complete data portability!

@alexesDev

Copy link
Copy Markdown

fwiw the read path is what decides this in practice, and it stays cheap without a typed graph. trip2g serves the vault over MCP as agent memory: search, expand the TOC one level, then read only the section you need. Links are plain [[wikilinks]] resolved lazily, so a dangling link is just a "write this later" marker and I've never had to seed an ontology to keep drift under control.

The reason I lean on focused retrieval is token cost. Reading the one section that holds the answer runs ~15× cheaper than dumping the whole note at the median, and ~23× cheaper than grep-and-read. Numbers here: token economy bench.

And it's actually running, not a thought experiment. On session end each agent writes its own status into a shared vault, and teammates query a federation hub to see who's working on what: agent status. Same memory model, just federated. Setup overview: agent memory.

@timetxt

timetxt commented Jun 23, 2026

Copy link
Copy Markdown

This has held up really well for me — keeping the raw sources read-only and letting the model own the markdown made a real difference.

The thing that kept biting me: an agent would re-try a dead end an earlier session had already ruled out. It happened so many times that I ended up building a small open-source Python tool for it (Qiju) — it keeps a plain record of the decisions and the approaches I'd dropped, so the next run doesn't repeat them. Honestly it's been working great.

Have you hit the agent-repeats-a-dead-end problem too? Curious whether you'd keep that history inside the wiki, or in a separate log next to it.

For anyone interested to try: QiJu

@sunshineg

Copy link
Copy Markdown

Spot on. The shift from raw RAG to compounding LLM-curated synthesis is the next paradigm shift.

We just open-sourced https://projectbrain.md/ — an open, Git-native standard for this exact pattern, purpose-built for builders and agents.

To prove it works in the wild, we spent the last month dogfooding it to build https://mindmux.ai, a local-first desktop workbench for AI-native dev teams.

It structures project context as Markdown + YAML frontmatter + cross-referenced [[wiki-links]] inside a /brain directory. Downstream agents (Cursor, Claude Code, Codex) ingest curated context; humans review Brain Diffs alongside code PRs.

Your write-up is a brilliant crystallization. Seeing this convergence after a month of building on it is incredibly validating.

@MihirModi1421

Copy link
Copy Markdown

I tried creating a codebase wiki based on the existing idea (llm-wiki). You can copy this gist into a CLAUDE.md file, then ask Claude to initialize the wiki and see how it behaves:

https://gist.github.com/MihirModi1421/94b5c2299bf743c346590e322d709046

Any advice or suggestions would be appreciated.

@Sistema2D

Copy link
Copy Markdown

New Release! Release

image

@alfadur7

Copy link
Copy Markdown

Built an implementation of this pattern and have been running it daily for a while — sharing in case the extensions are useful to anyone here.

LLM Wiki Newsroomhttps://github.com/alfadur7/llm-wiki-newsroom (local-first, no API key; the Python tools run entirely on your machine)

Kept your three-layer split (raw → wiki → schema) and the ingest/query/lint loop intact, but leaned hard into the "bookkeeping is the hard part" idea by structuring the agent as a newsroom staff instead of one do-everything assistant:

  • Authoring and review are different instances. A "reporter" drafts source/entity stubs, a "columnist" writes the deep cross-source analysis, and a separate "desk" re-reads and critiques it — which curbs the self-grading bias you get when one model checks its own output.
  • Two-sided publish gate: deterministic linting (links, citations, structure) and a qualitative review against an editorial rubric (journalism/consulting/encyclopedic forms) both have to pass.
  • Self-improving guidelines: when the same review failure recurs, it drafts a fix to the authoring rules themselves and adopts it only after a blind, regression-gated A/B (inspired by the recent Self-Harness and Microsoft SkillOpt work).
  • Cascading updates (your "1 doc changes 10–15 pages"), 3-layer contradiction tracking, a Leiden-clustered knowledge graph with an interactive browser, and Memex-style associative trails for serendipitous discovery.

Works with Claude Code (full feature set) and basic mode with Codex/Gemini. Feedback very welcome — the schema-as-config idea has held up remarkably well in practice.

@eula01

eula01 commented Jun 26, 2026

Copy link
Copy Markdown

hilarious how hundreds of people with ai psychosis have pointed their slopcannons at this gist - "implement this gist, make no mistakes"

image

@Mirorrn

Mirorrn commented Jun 27, 2026

Copy link
Copy Markdown

Hello Andrei and the community. I’m building a Markdown/Obsidian-style knowledge wiki for teaching a health informatics course, using Codex to help ingest sources, maintain links, and later query the wiki for presentations and curriculum design.

I ran into a token-efficiency problem that may be interesting: the token burn was not mainly from reasoning or output, but from repeated context loading. Codex kept re-reading files like AGENTS.md, wiki/index.md, manifests, handoffs, raw sources, and tool scripts across short sessions. Even small source-ingest tasks became expensive.

I’ve started moving toward a split workflow: deterministic Python scripts handle source intake, route indexes, validation, and compact handoffs; Codex is reserved for higher-value graph curation, synthesis, and judgment. I’m also using compact routing files instead of asking the model to scan the whole wiki.

I would be grateful for any advice on the right architecture for this kind of “wiki as working memory” setup. Specifically, how would you structure retrieval, summaries, graph edges, and agent instructions so Codex can reason over a rich knowledge base without repeatedly consuming the entire context?

Even a few pointers or design principles would be very helpful.

Hi,
did you try to use subagents for search operations, so that the main agent context window is reserved for orchestration and answering.

@LuminairPrime

Copy link
Copy Markdown

Glad to see this new version of llms.txt formalized as Open Knowledge Format. Thank you for your leadership

@A13x3i

A13x3i commented Jun 29, 2026

Copy link
Copy Markdown

Gezz the AI Slop is storing in this one's comment section... I just wanted to point out that NotebookLM is kinda intended to be this way as well, you get sources extract whatever you need make it a source, disable sources you don't need anymore

@devmubs

devmubs commented Jun 29, 2026

Copy link
Copy Markdown

Really interesting approach. I think one challenge that will show up as these systems grow is keeping synthesized knowledge reliable over time. Retrieval is only part of the problem—making sure older summaries stay accurate without constantly rebuilding everything seems much harder.

Keeping the raw sources immutable feels like the right foundation. I also wonder if adding a simple confidence or verification status to wiki pages could help highlight which pages are well-supported and which ones may need another review after new sources are added.

@kibotu

kibotu commented Jun 29, 2026

Copy link
Copy Markdown

it's also important to reduce the pages to the absolute min. simplification is important in any larger code/knowledge base otherwise you end up with way too much overhead.

@theluk

theluk commented Jun 29, 2026

Copy link
Copy Markdown

@A13x3i yeah crazy isnt it? That's AEO

antways, regarding NotebookLM, I mean sure, but there is more to it. Especially the Linter is I think one real addition. I am btw using this Linter Concept almost everywhere now. So Thanks @karpathy for that concept idea.

LLMs do make mistakes no matter what you do, that's why you need an agent that iterates over the stuff and verifies that things are working. Here is an example agent I have running on one wiki

# Wiki Linter — System Prompt

You are a wiki health checker. When invoked, you run a structured lint pass
over a markdown wiki stored in a knowledge base and produce a report.

You have access to two tools primarily:
- `queryFrontmatter` — filter/sort pages by YAML frontmatter fields
- `readFile` — read individual page content

---

## Lint Checks (run in order)

### 1. Schema Integrity
Use `queryFrontmatter` to find pages missing any of the required fields:
`type`, `title`, `description`, `tags`, `timestamp`, `sources`

For any page missing a field:
- Flag it by name and note which field(s) are absent
- Repair metadata with `setFrontmatter` where the correct value is unambiguous
- Flag for user review where the value is uncertain

### 2. Staleness
Sort all pages by `timestamp` ascending. Surface the 5–10 oldest.
For each, check whether newer pages contradict or supersede their content.
Flag any that do. Propose specific updates but do not apply them unilaterally.

### 3. Coverage Gaps
Scan all `summary`, `entity`, and `concept` pages for mentions of things
(tools, people, projects, concepts) that lack their own dedicated page.
List each gap. Do not create pages — flag them for the ingestor or user.

### 4. Overview Drift
Compare the `timestamp` on `overview.md` against the newest
`summary`, `entity`, and `concept` pages.
If `overview.md` lags by more than one ingest cycle, flag it as drifted.

### 5. Orphan Check
For each page, check whether any other page links to it.
Flag any page with zero inbound links as an orphan.
Suggest which existing pages should link to it.

### 6. Duplicate Detection
Look for multiple files with the same or near-identical names or titles.
List all suspected duplicates with their file IDs.
Do NOT delete anything. Flag for user approval.

---

## Output Format

Produce a markdown report with this structure:

# Lint Report — {DATE}

## Summary
One-line overall health status: 🟢 Green / 🟡 Yellow / 🔴 Red

## 1. Schema Integrity
## 2. Staleness
## 3. Coverage Gaps
## 4. Overview Drift
## 5. Orphan Check
## 6. Duplicate Detection

## Overall Health
Table or bullet list of all checks with pass/fail/warn status.

## Next Steps
Numbered list of actions — note which require user approval before execution.

---

## Hard Rules

- **Never delete files unilaterally.** Flag duplicates and orphans; act only on explicit approval.
- **Never create or edit wiki content pages.** That is the ingestor's job.
- **Do** repair frontmatter metadata (`setFrontmatter`) when the correct value is certain.
- Log the lint pass to `log.md` when done.

And I dont do it just for the Knowledge Wiki, I do it for SEO (interlinking checks, quality checks) and more. It's really one helpful agent that can be used in general on data structures that are somehow related.

@william-Johnason

Copy link
Copy Markdown

it's also important to reduce the pages to the absolute min. simplification is important in any larger code/knowledge base otherwise you end up with way too much overhead.

@kibotu agreed, simplification is the harder problem. One thing that helps is building compression into the ingestion step itself. In Synthadoc, raw sources get compiled into concept pages, so 100 ingested documents might synthesize down to a dozen wiki pages, topics that appear across multiple sources merge rather than accumulate. The active knowledge surface stays small by design, not by discipline.

@william-Johnason

Copy link
Copy Markdown

Really interesting approach. I think one challenge that will show up as these systems grow is keeping synthesized knowledge reliable over time. Retrieval is only part of the problem—making sure older summaries stay accurate without constantly rebuilding everything seems much harder.

Keeping the raw sources immutable feels like the right foundation. I also wonder if adding a simple confidence or verification status to wiki pages could help highlight which pages are well-supported and which ones may need another review after new sources are added.

@devmubs the confidence/verification state idea is exactly what we should land on too - each page carries a lifecycle status, so when new sources arrive, the system knows which pages to re-examine rather than rebuilding everything.

@dumanyu666-byte

Copy link
Copy Markdown

good thanks

@Onevirtual

Copy link
Copy Markdown

Thanks for the pattern Mr. Karpathy: I've noted added few things that were worth my time.

For the metadata, I am using breadcrumds which enables me to replicate somehow standard .owl relationships from inference engines. It's useful when you are trying to associate pages, oppose pages, label pages or whatever relationships you want to bring in the classification.

For the document structure:
I use a notepad blended source for the LLM to write in, those are where the main questions from the dialogical interaction takes places.

Then I have a personal notes which I use in order to put my thoughts into, it enables me to enhance my critical thinking and derives from what the llm is generating from my own curated sources. Whenever I use a webfetch tool I also make clear from which source the generation took from.

The skill for this has this purpose : - One is a prompt crawler in which I ask questions within the personal note component in the page area.Using ^[put prompt here], when I am populating the wiki with my thoughts, then I add a footnote next to it ^[put prompt here][^1], that way, I can reread my own thoughts and contrast it with the LLM Response, which is interesting for me because I am always able to tell what was generated versus what I was writing.

And Last I have a go further section which is another prompt then analyses the "idées-forces" between my notes and the llm-generated ones, it gives me direction. It was especially insightful during my readings and commenting of Mumford's Art and Techniques conferences and a Quine's Article.

What I found the most interesting in the pattern is two things :

  • Focusing on the graph and the correct linking of the ideas, I don't trust the LLMs enough to build alone a categorization that is worth and I like to reason on the semantics of clusterization with it.
  • Secondly, it gives me a very good feeling of abstraction ascending. When I am creating an instance, I generally tweak the concept a few times before it becomes a category. The moment where a mono page concept becomes a category of its own is very satisfying and aligning with how inference for human works; derivating rules and organisation through repeated exposures to observations.

Thank you for that, I have rediscovered a lot of books I had on my bookshelves, blending from Homere Odysseus to Philosophy of mind through agentic patterns.

@alfadur7

alfadur7 commented Jun 30, 2026

Copy link
Copy Markdown

@Motya-cobol your split — deterministic scripts for intake/validation, model only for judgment — is exactly the right instinct; it's the same division I landed on.

The piece that helped most on top of it: never let the agent read page bodies just to find what's relevant.
Give it a small CLI to check the "map" first:

  • shortest path between two pages
  • a page's neighbors and cluster
  • a local BM25 / vector search

Let it pick ~10 pages from that, and only then read those in full. The index stays a one-line-per-entry directory (a link + a one-line summary per page), with sources split into per-cluster sub-catalogs — so it's a routing file, not something to scan.

On @Mirorrn's subagent point: +1 to keeping the narrowing off the orchestrator's context. I do it with a deterministic CLI call rather than a separate agent, but same goal — and since it's just Python, it drives fine from Codex too.

Search/traversal CLI if useful → https://github.com/alfadur7/llm-wiki-newsroom/blob/main/tools/query.py

@distorx

distorx commented Jun 30, 2026

Copy link
Copy Markdown

This maps almost 1:1 to a system we have been running in production for ~6 months to manage infrastructure/ops knowledge. A few notes from actually living with the pattern at ~4000+ interlinked concepts:

  • The schema file is everything. Our CLAUDE.md is exactly the "disciplined maintainer vs. generic chatbot" config you describe — it encodes the ingest/query/lint workflows + naming conventions, and it co-evolved into the single most important file in the repo.
  • index.md at scale: the flat index works great to a few hundred pages. Past that we added hybrid search (SQLite FTS5 + on-device embeddings, reciprocal-rank-fused) rather than standing up embedding-RAG infra — same spirit as qmd. We expose it as both a CLI (agent shells out) and an MCP server (native tool). ~1ms keyword, ~350ms hybrid.
  • New page vs. edit (@alinawab): heuristic that works for us — new page when it is a distinct entity/concept you would link to from elsewhere; edit in place when it is an attribute/update of an existing one. The agent gets this right ~90% of the time once the schema enumerates the page types.
  • Team sharing (@geetansharora): the wiki is just a private git repo, auto-synced. Teammates browse in Obsidian or hit the same MCP server. Git history doubles as the log.md audit trail for free.
  • Biggest failure mode (@alinawab): drift — the agent under-updating cross-references on ingest, so pages silently go stale. The lint pass is not optional; we run it on a timer (orphan detection + contradiction flagging + stale-claim checks) and that is what keeps the graph healthy.

The "compounding artifact" framing is exactly right — after a few thousand concepts the wiki answers questions the raw sources never could, because the synthesis already happened. Thanks for writing it up so cleanly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment