Skip to content

Instantly share code, notes, and snippets.

@karpathy
Created April 4, 2026 16:25
Show Gist options
  • Select an option

  • Save karpathy/442a6bf555914893e9891c11519de94f to your computer and use it in GitHub Desktop.

Select an option

Save karpathy/442a6bf555914893e9891c11519de94f to your computer and use it in GitHub Desktop.
llm-wiki

LLM Wiki

A pattern for building personal knowledge bases using LLMs.

This is an idea file, it is designed to be copy pasted to your own LLM Agent (e.g. OpenAI Codex, Claude Code, OpenCode / Pi, or etc.). Its goal is to communicate the high level idea, but your agent will build out the specifics in collaboration with you.

The core idea

Most people's experience with LLMs and documents looks like RAG: you upload a collection of files, the LLM retrieves relevant chunks at query time, and generates an answer. This works, but the LLM is rediscovering knowledge from scratch on every question. There's no accumulation. Ask a subtle question that requires synthesizing five documents, and the LLM has to find and piece together the relevant fragments every time. Nothing is built up. NotebookLM, ChatGPT file uploads, and most RAG systems work this way.

The idea here is different. Instead of just retrieving from raw documents at query time, the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files that sits between you and the raw sources. When you add a new source, the LLM doesn't just index it for later retrieval. It reads it, extracts the key information, and integrates it into the existing wiki — updating entity pages, revising topic summaries, noting where new data contradicts old claims, strengthening or challenging the evolving synthesis. The knowledge is compiled once and then kept current, not re-derived on every query.

This is the key difference: the wiki is a persistent, compounding artifact. The cross-references are already there. The contradictions have already been flagged. The synthesis already reflects everything you've read. The wiki keeps getting richer with every source you add and every question you ask.

You never (or rarely) write the wiki yourself — the LLM writes and maintains all of it. You're in charge of sourcing, exploration, and asking the right questions. The LLM does all the grunt work — the summarizing, cross-referencing, filing, and bookkeeping that makes a knowledge base actually useful over time. In practice, I have the LLM agent open on one side and Obsidian open on the other. The LLM makes edits based on our conversation, and I browse the results in real time — following links, checking the graph view, reading the updated pages. Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase.

This can apply to a lot of different contexts. A few examples:

  • Personal: tracking your own goals, health, psychology, self-improvement — filing journal entries, articles, podcast notes, and building up a structured picture of yourself over time.
  • Research: going deep on a topic over weeks or months — reading papers, articles, reports, and incrementally building a comprehensive wiki with an evolving thesis.
  • Reading a book: filing each chapter as you go, building out pages for characters, themes, plot threads, and how they connect. By the end you have a rich companion wiki. Think of fan wikis like Tolkien Gateway — thousands of interlinked pages covering characters, places, events, languages, built by a community of volunteers over years. You could build something like that personally as you read, with the LLM doing all the cross-referencing and maintenance.
  • Business/team: an internal wiki maintained by LLMs, fed by Slack threads, meeting transcripts, project documents, customer calls. Possibly with humans in the loop reviewing updates. The wiki stays current because the LLM does the maintenance that no one on the team wants to do.
  • Competitive analysis, due diligence, trip planning, course notes, hobby deep-dives — anything where you're accumulating knowledge over time and want it organized rather than scattered.

Architecture

There are three layers:

Raw sources — your curated collection of source documents. Articles, papers, images, data files. These are immutable — the LLM reads from them but never modifies them. This is your source of truth.

The wiki — a directory of LLM-generated markdown files. Summaries, entity pages, concept pages, comparisons, an overview, a synthesis. The LLM owns this layer entirely. It creates pages, updates them when new sources arrive, maintains cross-references, and keeps everything consistent. You read it; the LLM writes it.

The schema — a document (e.g. CLAUDE.md for Claude Code or AGENTS.md for Codex) that tells the LLM how the wiki is structured, what the conventions are, and what workflows to follow when ingesting sources, answering questions, or maintaining the wiki. This is the key configuration file — it's what makes the LLM a disciplined wiki maintainer rather than a generic chatbot. You and the LLM co-evolve this over time as you figure out what works for your domain.

Operations

Ingest. You drop a new source into the raw collection and tell the LLM to process it. An example flow: the LLM reads the source, discusses key takeaways with you, writes a summary page in the wiki, updates the index, updates relevant entity and concept pages across the wiki, and appends an entry to the log. A single source might touch 10-15 wiki pages. Personally I prefer to ingest sources one at a time and stay involved — I read the summaries, check the updates, and guide the LLM on what to emphasize. But you could also batch-ingest many sources at once with less supervision. It's up to you to develop the workflow that fits your style and document it in the schema for future sessions.

Query. You ask questions against the wiki. The LLM searches for relevant pages, reads them, and synthesizes an answer with citations. Answers can take different forms depending on the question — a markdown page, a comparison table, a slide deck (Marp), a chart (matplotlib), a canvas. The important insight: good answers can be filed back into the wiki as new pages. A comparison you asked for, an analysis, a connection you discovered — these are valuable and shouldn't disappear into chat history. This way your explorations compound in the knowledge base just like ingested sources do.

Lint. Periodically, ask the LLM to health-check the wiki. Look for: contradictions between pages, stale claims that newer sources have superseded, orphan pages with no inbound links, important concepts mentioned but lacking their own page, missing cross-references, data gaps that could be filled with a web search. The LLM is good at suggesting new questions to investigate and new sources to look for. This keeps the wiki healthy as it grows.

Indexing and logging

Two special files help the LLM (and you) navigate the wiki as it grows. They serve different purposes:

index.md is content-oriented. It's a catalog of everything in the wiki — each page listed with a link, a one-line summary, and optionally metadata like date or source count. Organized by category (entities, concepts, sources, etc.). The LLM updates it on every ingest. When answering a query, the LLM reads the index first to find relevant pages, then drills into them. This works surprisingly well at moderate scale (~100 sources, ~hundreds of pages) and avoids the need for embedding-based RAG infrastructure.

log.md is chronological. It's an append-only record of what happened and when — ingests, queries, lint passes. A useful tip: if each entry starts with a consistent prefix (e.g. ## [2026-04-02] ingest | Article Title), the log becomes parseable with simple unix tools — grep "^## \[" log.md | tail -5 gives you the last 5 entries. The log gives you a timeline of the wiki's evolution and helps the LLM understand what's been done recently.

Optional: CLI tools

At some point you may want to build small tools that help the LLM operate on the wiki more efficiently. A search engine over the wiki pages is the most obvious one — at small scale the index file is enough, but as the wiki grows you want proper search. qmd is a good option: it's a local search engine for markdown files with hybrid BM25/vector search and LLM re-ranking, all on-device. It has both a CLI (so the LLM can shell out to it) and an MCP server (so the LLM can use it as a native tool). You could also build something simpler yourself — the LLM can help you vibe-code a naive search script as the need arises.

Tips and tricks

  • Obsidian Web Clipper is a browser extension that converts web articles to markdown. Very useful for quickly getting sources into your raw collection.
  • Download images locally. In Obsidian Settings → Files and links, set "Attachment folder path" to a fixed directory (e.g. raw/assets/). Then in Settings → Hotkeys, search for "Download" to find "Download attachments for current file" and bind it to a hotkey (e.g. Ctrl+Shift+D). After clipping an article, hit the hotkey and all images get downloaded to local disk. This is optional but useful — it lets the LLM view and reference images directly instead of relying on URLs that may break. Note that LLMs can't natively read markdown with inline images in one pass — the workaround is to have the LLM read the text first, then view some or all of the referenced images separately to gain additional context. It's a bit clunky but works well enough.
  • Obsidian's graph view is the best way to see the shape of your wiki — what's connected to what, which pages are hubs, which are orphans.
  • Marp is a markdown-based slide deck format. Obsidian has a plugin for it. Useful for generating presentations directly from wiki content.
  • Dataview is an Obsidian plugin that runs queries over page frontmatter. If your LLM adds YAML frontmatter to wiki pages (tags, dates, source counts), Dataview can generate dynamic tables and lists.
  • The wiki is just a git repo of markdown files. You get version history, branching, and collaboration for free.

Why this works

The tedious part of maintaining a knowledge base is not the reading or the thinking — it's the bookkeeping. Updating cross-references, keeping summaries current, noting when new data contradicts old claims, maintaining consistency across dozens of pages. Humans abandon wikis because the maintenance burden grows faster than the value. LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass. The wiki stays maintained because the cost of maintenance is near zero.

The human's job is to curate sources, direct the analysis, ask good questions, and think about what it all means. The LLM's job is everything else.

The idea is related in spirit to Vannevar Bush's Memex (1945) — a personal, curated knowledge store with associative trails between documents. Bush's vision was closer to this than to what the web became: private, actively curated, with the connections between documents as valuable as the documents themselves. The part he couldn't solve was who does the maintenance. The LLM handles that.

Note

This document is intentionally abstract. It describes the idea, not a specific implementation. The exact directory structure, the schema conventions, the page formats, the tooling — all of that will depend on your domain, your preferences, and your LLM of choice. Everything mentioned above is optional and modular — pick what's useful, ignore what isn't. For example: your sources might be text-only, so you don't need image handling at all. Your wiki might be small enough that the index file is all you need, no search engine required. You might not care about slide decks and just want markdown pages. You might want a completely different set of output formats. The right way to use this is to share it with your LLM agent and work together to instantiate a version that fits your needs. The document's only job is to communicate the pattern. Your LLM can figure out the rest.

@VeniVeci
Copy link
Copy Markdown

VeniVeci commented May 3, 2026

This idea is fantastic! Inspired by it, I recently built a VLM Wiki and added multimodal capabilities—it doesn't just process text, it also understands photos and videos, automatically extracting scenes, locations, people, and generating a wiki with bidirectional links.

Project here: VLM-wiki

Key additions:

  1. Expanded the raw/ directory to support images, videos, and audio;
  2. Added a VLM-based image analysis pipeline;
  3. Processed videos by extracting keyframes for understanding and generating summaries;

Currently, I've tested it with a trip's media and it works quite well!

@superimpactful
Copy link
Copy Markdown

This is awesome and great that it blew up.

I think two things are missing to make it scale: routing tables and non-hierarchical taxonomies.

The first problem is the words. Your index.md is AI-organized, which means it's categorized the way the AI sees it –– not necessarily how the user would look for things. If I've always called a spatula "the flat flippy thing," the best AI-organized index doesn't help me. Language is shared but also personal.

The second problem is scale. Once the index grows large enough, you've traded one context dump for a slightly smaller one. Routing tables fix that. Instead of searching 1,000 lines, the AI follows 2 or 3 hops before it ever touches the content. The search surface stays manageable as the dataset grows.

The organizing logic should come from the user. User designs the structure, the AI maintains it. That's the part that makes the whole thing actually work.

I did a more thorough breakdown here: https://jgibbard.com/karpathy-llm-wiki-missing-pieces/

@killerra
Copy link
Copy Markdown

killerra commented May 3, 2026

Who SwarmVault is actually for (and how people are using it). Different angle this time.

The LLM Wiki concept from this gist resonates with a lot of different workflows. Here's who's getting the most out of SwarmVault:

Developers with large codebases. You have a monorepo or multi-repo setup. Your AI agent can't hold the whole thing in context. SwarmVault compiles it into a knowledge graph with cross-file call edges, import resolution, and module-level pages. Your agent queries the graph instead of grepping through raw source. swarmvault scan . and you're running in under a minute.

Researchers and students. You're reading 50 papers, watching conference talks, collecting notes across tools. SwarmVault ingests PDFs, transcripts, EPUBs, YouTube videos, and audio recordings into one searchable wiki with contradiction detection across sources. The graph shows you connections you didn't notice. Guided ingest sessions help you process sources one at a time with evolving summaries.

People building second brains. If you use Obsidian, SwarmVault has a native plugin and exports with Dataview dashboards, typed links for Breadcrumbs/Juggl, and graph metrics in frontmatter. The wiki it produces is plain markdown. It works with whatever note-taking setup you already have.

Teams using AI coding agents. swarmvault install --agent covers 48 tools (Claude Code, Cursor, Copilot, Cline, Kiro, Codex, and 42 more). The agent memory ledger means context carries across sessions. Context packs give your agent bounded, relevant evidence instead of a full context window dump.

Anyone tired of re-explaining things to AI. That's really what it comes down to. You build a vault once and it remembers. The wiki compounds. The graph gets richer. Your agent gets smarter without you repeating yourself.

Everything local. Every provider supported. Fully offline capable. MIT licensed.

npx @swarmvaultai/cli demo

Repo: https://github.com/swarmclawai/swarmvault

@waydelyle Stop spamming this. This specific style of AI-generated ad—especially one asking users to pipe a command to their terminal—looks incredibly suspicious. In a climate of constant supply chain attacks, you are making your project look like malware. If SwarmVault is a legitimate MIT-licensed tool, let the code speak for itself instead of using bot-like marketing tactics.

@zhurudong
Copy link
Copy Markdown

Tried building a minimal instantiation of this pattern — just two layers (raw/ immutable + wiki/ LLM-compiled), one CLAUDE.md file as the entire "program", no vector DB or ingest pipeline. The same file works across Claude Code / Codex / OpenCode by symlinking to AGENTS.md, so switching tools requires zero migration.

Sharing in case it's useful as a starting point: https://github.com/zhurudong/andrej-karpathy-llm-wiki

Genuinely curious if the two-layer schema (raw vs wiki) holds up, or if I'm missing a category that should live elsewhere. Open to feedback.

@waydelyle
Copy link
Copy Markdown

What a SwarmVault workflow actually looks like, start to finish. No feature lists this time, just the actual steps.

Let's say you have a codebase, a handful of research papers, and some meeting transcripts you want your AI agent to actually understand.

Step 1: Init a vault.

npx @swarmvaultai/cli init

Takes 10 seconds. Creates raw/, wiki/, and a schema file. No API key needed.

Step 2: Point it at your sources.

swarmvault source add ./my-project --repo
swarmvault source add ./papers/
swarmvault source add https://youtube.com/watch?v=...
swarmvault ingest meeting-recording.mp3

Code gets parser-backed AST analysis. PDFs, transcripts, YouTube, and audio get extracted and structured. 50+ formats supported.

Step 3: Compile.

swarmvault compile

This builds the wiki pages, the knowledge graph, the search index, contradiction detection across sources, and a share card you can post anywhere. The output is plain markdown files in wiki/.

Step 4: Your agent uses it.

swarmvault context build "refactor the auth module" --budget 8000
swarmvault query "what do the research papers say about X"
swarmvault graph path auth-module payment-service

Your agent gets bounded, relevant context instead of reading the entire source tree. The task ledger remembers what it was working on across sessions.

Step 5: It compounds.

swarmvault source reload --all
swarmvault compile

New sources get added, existing ones get refreshed, the wiki grows, the graph gets richer. swarmvault watch does this automatically on git commits.

That's it. Everything stays on your machine. Works with any LLM provider or fully offline. The vault is just files you own.

swarmvault doctor tells you if anything needs attention. swarmvault graph serve opens the visual workbench. swarmvault install --agent wires it into your coding tool.

Repo: https://github.com/swarmclawai/swarmvault

@waydelyle
Copy link
Copy Markdown

What a SwarmVault workflow actually looks like, start to finish. No feature lists this time, just the actual steps.

Let's say you have a codebase, a handful of research papers, and some meeting transcripts you want your AI agent to actually understand.

Step 1: Init a vault.

npx @swarmvaultai/cli init

Takes 10 seconds. Creates raw/, wiki/, and a schema file. No API key needed.

Step 2: Point it at your sources.

swarmvault source add ./my-project --repo
swarmvault source add ./papers/
swarmvault source add https://youtube.com/watch?v=...
swarmvault ingest meeting-recording.mp3

Code gets parser-backed AST analysis. PDFs, transcripts, YouTube, and audio get extracted and structured. 50+ formats supported.

Step 3: Compile.

swarmvault compile

This builds the wiki pages, the knowledge graph, the search index, contradiction detection across sources, and a share card you can post anywhere. The output is plain markdown files in wiki/.

Step 4: Your agent uses it.

swarmvault context build "refactor the auth module" --budget 8000
swarmvault query "what do the research papers say about X"
swarmvault graph path auth-module payment-service

Your agent gets bounded, relevant context instead of reading the entire source tree. The task ledger remembers what it was working on across sessions.

Step 5: It compounds.

swarmvault source reload --all
swarmvault compile

New sources get added, existing ones get refreshed, the wiki grows, the graph gets richer. swarmvault watch does this automatically on git commits.

That's it. Everything stays on your machine. Works with any LLM provider or fully offline. The vault is just files you own.

swarmvault doctor tells you if anything needs attention. swarmvault graph serve opens the visual workbench. swarmvault install --agent wires it into your coding tool.

Repo: https://github.com/swarmclawai/swarmvault

@waydelyle
Copy link
Copy Markdown

What a SwarmVault workflow actually looks like, start to finish. No feature lists this time, just the actual steps.

Let's say you have a codebase, a handful of research papers, and some meeting transcripts you want your AI agent to actually understand.

Step 1: Init a vault.

npx @swarmvaultai/cli init

Takes 10 seconds. Creates raw/, wiki/, and a schema file. No API key needed.

Step 2: Point it at your sources.

swarmvault source add ./my-project --repo
swarmvault source add ./papers/
swarmvault source add https://youtube.com/watch?v=...
swarmvault ingest meeting-recording.mp3

Code gets parser-backed AST analysis. PDFs, transcripts, YouTube, and audio get extracted and structured. 50+ formats supported.

Step 3: Compile.

swarmvault compile

This builds the wiki pages, the knowledge graph, the search index, contradiction detection across sources, and a share card you can post anywhere. The output is plain markdown files in wiki/.

Step 4: Your agent uses it.

swarmvault context build "refactor the auth module" --budget 8000
swarmvault query "what do the research papers say about X"
swarmvault graph path auth-module payment-service

Your agent gets bounded, relevant context instead of reading the entire source tree. The task ledger remembers what it was working on across sessions.

Step 5: It compounds.

swarmvault source reload --all
swarmvault compile

New sources get added, existing ones get refreshed, the wiki grows, the graph gets richer. swarmvault watch does this automatically on git commits.

That's it. Everything stays on your machine. Works with any LLM provider or fully offline. The vault is just files you own.

swarmvault doctor tells you if anything needs attention. swarmvault graph serve opens the visual workbench. swarmvault install --agent wires it into your coding tool.

Repo: https://github.com/swarmclawai/swarmvault

@paulmchen
Copy link
Copy Markdown

Synthadoc v0.3.0 is now released.

👉 https://github.com/axoviq-ai/synthadoc

v0.3.0 expands on the same architecture. The big additions this cycle are around what can go into the wiki, and removing the friction of needing a separate API key:

  • Zero-API-key LLM providers: if you already pay for Claude Code or Opencode, one config line (provider = "claude-code") routes all three agents (ingest, query, lint) through your existing subscription. No separate API keys or Anthropic/OpenAI account needed.

  • YouTube transcript ingest: paste a YouTube URL, get a structured wiki page with an LLM-generated executive summary and a full [MM:SS] timestamped transcript. Captions are extracted from YouTube's caption system, no audio download, no external transcription API. Every claim is traceable to a moment in the video.

  • Web search fan-out: one search query decomposes into sub-questions, ingests multiple sources in parallel, and builds cross-references automatically. A single command can add 8–15 synthesised pages to the wiki.

  • CJK multilingual support: Chinese, Japanese, and Korean queries, wiki pages, slugs, and wikilinks now work correctly throughout the pipeline.

  • Knowledge gap detection no longer produces false reports on CJK input.

  • Also new: DeepSeek as the eighth LLM provider (lowest cost per token for text-heavy ingest); "synthadoc use" to save your active wiki across sessions; and hardened multi-aspect knowledge gap detection.

Release notes:
👉 https://github.com/axoviq-ai/synthadoc/releases/tag/v0.3.0

Docs:
👉 [Quick orientation and feature overview] https://github.com/axoviq-ai/synthadoc#readme
👉 [Up and running in minutes] https://github.com/axoviq-ai/synthadoc/blob/main/docs/user-quick-start-guide.md
👉 [Full architecture, agents, storage, API, and plugin guide] https://github.com/axoviq-ai/synthadoc/blob/main/docs/design.md

Feedback on v0.3.0 is very welcome.

@theafh
Copy link
Copy Markdown

theafh commented May 4, 2026

Adding to the implementations already in this thread. Mine is a per-repo wiki as skill/agent-kombo: the knowledge base lives in the project repo, committed alongside the code it describes, with one knowledge base per codebase.

A few things that fell out of that choice:

  • Plain markdown in a folder. The schema lives in the skill prompt, so any agent that reads markdown can author and query the wiki.
  • Captures the how. Runbooks, decision rationales, recurring fix patterns, procedural knowledge, alongside sources and concepts. That made it the thing I actually open every day.
  • Deterministic linter. A Python linter plus shell scripts check frontmatter schemas, page type anatomy, link integrity, tag taxonomy, page size, and topic mixing. Rules live in lint_checks.md. The agent consumes the linter's output, so structural correctness is enforced by rules.
  • Cleanup agent on top. wiki_auto_shaper runs the linter in a loop, fixes frontmatter, splits oversized or topic-mixing pages, repairs broken links, and audits each page against its page type anatomy. Curation is the real work, and the agent owns it.
  • Ships as a skill (Claude Code, Codex, Cursor, Gemini CLI, Antigravity).
  • Code: https://github.com/theafh/ai-modules/tree/main/plugins/knowledge_management

@GuillaumeDesforges
Copy link
Copy Markdown

Claude Code is doing just fine with a basic prompt

Help me set up a LLM wiki that fits my needs and workflows
https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f

@gowtham0992
Copy link
Copy Markdown

Link v1.0.6 + v1.0.7 are live

Released link-mcp v1.0.7 on PyPI and the MCP Registry.

These releases were focused on making Link feel solid for first-time users and safer for public installs.

What changed:

  • Added link.py demo for a pre-ingested first-run wiki.
  • Added link.py doctor, doctor --fix, ingest-status, verify-mcp, and rebuild-backlinks.
  • Added golden demo snapshot tests and direct MCP tool contract tests.
  • Added CI trust gates for tests, release hygiene, version consistency, package build, and demo health.
  • Hardened secret/file hygiene checks before public release.
  • Improved Homebrew/macOS installer behavior with a dedicated ~/.link-mcp-venv fallback.
  • Fixed Codex MCP auto-registration for existing ~/.codex/config.toml.
  • Made verify-mcp validate the same Python that MCP clients actually use.
  • Polished the graph view with reset, labels, motion controls, cursor-centered zoom, and safer node drag/click behavior.
  • Restructured the README.
  • Fixed dashboard polish and search keyboard submission.

Try Link:

git clone https://github.com/gowtham0992/link.git
cd link
python3 link.py demo
cd link-demo
python3 serve.py

Open:

http://localhost:3000
http://localhost:3000/graph

Links:

GitHub: https://github.com/gowtham0992/link

PyPI: https://pypi.org/project/link-mcp/

MCP Registry: https://registry.modelcontextprotocol.io/?q=io.github.gowtham0992%2Flink

@skyllwt
Copy link
Copy Markdown

skyllwt commented May 5, 2026

ΩmegaWiki(480+⭐) is actively maintained and shipping fast:
• 23 Claude Code skills covering the full research lifecycle
• 9 typed entities · 9 typed edges
• Bilingual (EN + 中文)
• New skills landing every week

Come try it, give feedback, help us shape it 👇

截图 2026-05-05 12-27-01

Try ΩmegaWiki in Claude Code and run the full LLM-Wiki loop you proposed — ingest papers, build a typed knowledge graph, generate ideas, draft papers, respond to reviewers.

End to end. One wiki. No chunks.

微信图片_20260505122754_295_16 微信图片_20260505122755_296_16

Come and Try! If you find ΩmegaWiki interesting, a ⭐ would encourage and motivate us a lot 😀
https://github.com/skyllwt/OmegaWiki

@AgriciDaniel
Copy link
Copy Markdown

the schema layer here resonates strongly. the gist says the schema is "the key configuration file — it's what makes the LLM a disciplined wiki maintainer rather than a generic chatbot." shipped a small repo that codifies a six-axiom kernel for exactly that role.

https://github.com/AgriciDaniel/best-practices

best-practices banner: read first, write second, verify third

three layers, like the gist:
· the stance: context over text, evidence over vibes, no agreement theater
· the agent kernel: one chair, bounded slices, explorer / worker / verifier roles
· the engineering kernel: read · name · small · delete · evidence · failure

read first. write second. verify third.

AGENTS.md in the repo is a portable drop-in ready to be the schema document for an LLM Wiki, codex-flavored or claude-flavored. fork, rename to CLAUDE.md, drop in. composes with obra/superpowers for the cases that need iron-law enforcement.

might save a wiki author a few hours of co-evolution. or not. either way thanks for the gist, the architecture lines up cleanly.

MIT.

@waydelyle
Copy link
Copy Markdown

SwarmVault v3.7 — graph merging, source trees, video ingest, and Svelte support. Still shipping weekly from this gist's original idea.

Quick hits from the last few releases:

  • Video ingestswarmvault ingest --video <url> or just drop a local video file. Audio gets extracted with ffmpeg, transcribed through your configured provider (cloud or local Whisper), and the transcript flows into the normal wiki/graph/search pipeline. YouTube, Loom, conference talks, whatever has audio.
  • Graph mergeswarmvault graph merge <graph1> <graph2> --out combined.json combines SwarmVault graphs with external NetworkX/node-link JSON graphs into one namespaced artifact. Useful when you want to unify your code knowledge graph with an external dependency graph or research citation network.
  • Graph treeswarmvault graph tree renders a collapsible HTML source/module/symbol tree. One file, works offline, gives you the structural overview without needing the full canvas.
  • .swarmvaultignore — like .gitignore but for ingest. Control exactly what gets indexed without polluting your gitignore.
  • SQL parser.sql files now get full parser-backed analysis with table/view symbols and reads, writes, joins, and references edges in the graph.
  • Svelte SFCs — single-file components with nested TypeScript/JavaScript script parsing, plus detection for Julia, Verilog/SystemVerilog, and R.
  • GitHub repo sourcessource add and scan now support --branch, --ref, and --checkout-dir. Point scan at a public GitHub URL and it just works.
  • Graph refresh shrink guard — updates that would drop >25% of nodes/edges abort automatically unless you pass --force. Protects against accidentally nuking your graph.
  • swarmvault graph cluster — recompute communities, metrics, and god-node flags from an existing graph without re-ingesting.

We're at 80+ releases now. The project started as a weekend hack inspired by this gist and it's turned into a full knowledge infrastructure tool.

npx @swarmvaultai/cli demo

Repo: https://github.com/swarmclawai/swarmvault

@Yarmoluk
Copy link
Copy Markdown

Yarmoluk commented May 5, 2026

This is a nice personal workflow, but the hype is way ahead of the evidence.

There is no benchmark, no task definition, no scale curve, and no comparison against serious baselines. We do not know whether this is better than hybrid RAG, BM25 plus reranking, vector search, GraphRAG, hierarchical summaries, long-context prompting, NotebookLM, Perplexity Spaces, or ChatGPT Projects. Calling it a new architecture without that evidence is premature.

The core problem is that an LLM Wiki is lossy compression. You take raw documents and rewrite them into derived wiki pages. That may be useful for a small curated corpus, but it can also drop caveats, dates, minority views, exact wording, edge cases, and source context. Once people start querying the wiki instead of the original material, summary errors become part of the knowledge base.

Updates are also not solved. Adding one new source can affect many entity pages, concept pages, timelines, summaries, and indexes. At scale, this becomes graph maintenance: detecting what changed, resolving conflicts, avoiding duplicates, preserving provenance, preventing stale claims, and not silently breaking old pages. “Ask the LLM to maintain it” is not an engineering solution unless there are validators, source hashes, span-level citations, regression tests, and human review.

It also does not remove retrieval. Once the wiki grows beyond a modest size, you still need search, ranking, indexing, reranking, chunking, and access control. At that point the markdown wiki is just another indexed corpus, not a replacement for RAG.

The production issues are mostly ignored: permissions, multi-user edits, audit logs, rollback, deletion, sensitive data, source versioning, concurrency, compliance, cost, latency, and update frequency. These are not small details; they are exactly where knowledge-base systems fail.

So the reasonable claim is narrow: this can be a useful workflow for small-to-medium, slow-moving, human-curated research folders. It is much less convincing for large, fast-changing, high-stakes, multi-user, or enterprise knowledge bases.

The idea is fine. The framing is the problem. Without benchmarks, baselines, provenance guarantees, update-evaluation tests, and clear boundary conditions, “LLM Wiki” is mostly a good name for a familiar pattern, not proof that RAG is obsolete.

We ran exactly these comparisons -- BERT F1, token economics, cross RAG comparison -- https://github.com/Yarmoluk/ckg-benchmark/blob/main/paper/main.pdf

@ColonelPanicX
Copy link
Copy Markdown

The schema observation is underrated. The difference between an LLM that maintains a wiki and one that just answers questions in a wiki-shaped directory is almost entirely that file.

I ran into the same consistency problem across projects (not just wikis) and ended up building scaffy (https://github.com/ColonelPanicX/scaffy), which is a dead simple python script that bootstraps the schema layer for any project: collaboration contract, session protocols, kanban, agent profiles, etc.

The idea is that every project gets the same guardrails so the LLM isn't relearning the rules each session. Works well as a starting point for the setup that Karpathy described here.

image

@lrdeoliveira
Copy link
Copy Markdown

Hi Andrej,

Your LLM Wiki gist (https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) became the foundation of our project — NEXUS, a multi-agent memory system running on a VPS with 6 AI agents.

What we built

NEXUS implements your 3-layer pattern:

  • Raw sources — immutable source documents (32 files)
  • Wiki — LLM-generated markdown pages (entities, concepts, sources)
  • Schema — CLAUDE.md / SOUL.md files that define agent identity and workflows

Stack

  • 6 AI agents on VPS
  • Weaviate (vector DB) + Ollama (local LLM) + Wiki.js
  • GraphRAG with typed schema (Source, Concept, Agent, Project)
  • Skills system, session logging, compression with OpenRouter
  • MCP servers: Brave Search, Tavily, YouTube, Apify, MiniMax, Slack, qmd

The key insight that changed everything for us

"The wiki is a persistent, compounding artifact. The cross-references are already there. The contradictions have already been flagged. The synthesis already reflects everything you've read."

This transformed how we think about agent memory. Instead of RAG re-deriving context every time, we compound knowledge session over session.

Gratitude

Thank you for sharing this pattern. The idea of "wiki as a persistent, compounding artifact" instead of another RAG layer that re-derives context every time — this was the insight that changed everything for us.

We saw your gist on May 1, 2026. By May 5, we had NEXUS running in production with 6 agents, GraphRAG, and full memory persistence.

The ecosystem you inspired is impressive: Kompl, SwarmVault, Aura, llmwiki-cli, ΩmegaWiki, Link. Every one of them took the same pattern and made it their own.

Thank you for planting the seed

"The wiki is a persistent, compounding artifact. The cross-references are already there. The contradictions have already been flagged. The synthesis already reflects everything you've read."

Luciano - https://github.com/lrdeoliveira/nexus-ai-memory

@simonsysun
Copy link
Copy Markdown

For larger Markdown wikis, line-anchored local retrieval before edits becomes a real bottleneck: agents need to read the exact lines they're about to touch, not whole files.

I built SeekLink for that loop: index a local .md vault, search semantically, then fetch a specific PATH:LINE window.

seeklink index --vault ./wiki
seeklink search "why SQLite for search?" --vault ./wiki --json
seeklink get decisions/search.md:42 -C 20 --vault ./wiki   # 20 lines of context

Plain subprocess interface, JSON stdout, local-only.

https://github.com/simonsysun/seeklink

@medha
Copy link
Copy Markdown

medha commented May 5, 2026

As my contribution to the discussion - I built Keel, a Mac app where your knowledge stays as plain markdown on disk and the model is swappable http://github.com/Keel-Labs/keel (free, open source) - it's biased toward local-first and own-your-context.

Keel Walkthrough 🚀 — Watch Video

@ethanj
Copy link
Copy Markdown

ethanj commented May 5, 2026

Huge milestone from this thread: llmwiki just crossed 1K GitHub stars (and 100 forks)!
Screenshot 2026-05-05 at 2 50 47 PM

I built it because Andrej's LLM Wiki gist felt immediately useful, but I wanted the idea to be boringly concrete... a local CLI, plain files, immutable sources, generated markdown, provenance, linting, exports, and agent access.

The project now has:

  • compile --review so generated pages can be approved or rejected before landing
  • claim-level provenance like ^[paper.md:42-58]
  • typed schema/page kinds and seed pages
  • confidence and contradiction metadata
  • image/PDF/transcript ingest
  • chunked retrieval with BM25 reranking
  • llms.txt, JSON-LD, GraphML, and Marp exports
  • MCP tools
  • Claude/Codex/Cursor session ingest

The part that still feels most useful to me is how the knowledge builds up over time... add sources, improve the wiki, ask better questions, save better answers, and the next agent starts fully informed instead of another empty chat window.

The roadmap is now focused on the parts that make this real at scale... local web UI, graph/context packs, evals, and rollback/audit.

Thanks to everyone who has tried it, filed issues, opened PRs, or posted competing implementations here. This thread has been our best product research.

https://github.com/atomicmemory/llm-wiki-compiler

@waydelyle
Copy link
Copy Markdown

SwarmVault v3.12 — you can now chat with your vault and export it for any AI. The project just keeps growing from this gist.

Two big new surfaces in the latest releases:

swarmvault chat — persistent multi-turn conversations over your compiled wiki.

Instead of one-shot queries, you can now have an ongoing conversation with your vault. Sessions persist, so you can resume where you left off. It works in interactive TTY mode or programmatically. Your chat history becomes part of the vault's knowledge.

swarmvault chat
swarmvault chat --resume <session-id>
swarmvault chat --list

swarmvault export ai — static handoff packs for any AI tool.

Exports your vault as a portable bundle with llms.txt, full text, JSON-LD graph data, manifest metadata, and human-readable notes. Hand your compiled knowledge to any AI system, not just the ones SwarmVault integrates with directly. Per-page siblings optional.

Other recent additions:

  • Graph validationswarmvault graph validate [--strict] checks for duplicate IDs, dangling references, confidence bounds, and inconsistent edge evidence. Like a linter for your knowledge graph.
  • Neo4j exportgraph export --neo4j for loading your vault graph into Neo4j.
  • swarmvault clone <input> — one-command vault creation from a repo URL or directory. Alias for scan with a cleaner mental model.
  • swarmvault scan --mcp — build a vault and immediately start the MCP stdio server. One command from source to agent-ready.
  • Graph statsswarmvault graph stats for quick local counts, node types, evidence classes, and top relations without starting the viewer.

The original gist was about turning an LLM into a wiki maintainer. SwarmVault now does that plus lets you chat with the result, hand it off to any AI, validate the graph, and export to Neo4j. 90+ releases deep.

npx @swarmvaultai/cli demo

Repo: https://github.com/swarmclawai/swarmvault

@yazanabuashour
Copy link
Copy Markdown

The @a-a-k criticism is the right frame. Most implementations here solve accumulation but defer the hard part: staleness, provenance, silent duplicates.

OpenClerk's approach to those specifically: synthesis pages carry explicit freshness state and go stale when sources update. Retrieval results have doc_id/chunk_id on every result by contract. Duplicate candidates surface via a read-only report with agent_handoff; no silent second document.

The other angle missing from this thread: building block economy as a hard constraint. Semantic search, OCR, embeddings are optional installed modules. The core runner stays narrow.

All releases are eval-gated: https://github.com/yazanabuashour/openclerk

@alirezabbasi
Copy link
Copy Markdown

alirezabbasi commented May 7, 2026

A month before Andrej Karpathy published the “LLM Wiki” idea file, I started building a system called Eshel while working on NITRA — an AI-native trading infrastructure project. During development, I realized the real bottleneck was not code generation itself, but the constant decay of architectural knowledge, engineering decisions, workflows, standards, and context across the lifetime of a project.

Eshel evolved from that realization. It extends the LLM Wiki concept beyond passive knowledge accumulation into a persistent engineering intelligence layer for software systems. Instead of treating documentation as static artifacts or relying on retrieval over fragmented chats and files, Eshel continuously compiles project knowledge into a living Obsidian-compatible wiki that evolves alongside the codebase. Architecture, coding standards, implementation details, workflows, technical debt, decisions, investigations, and even development methodologies become interconnected, continuously updated entities maintained by the LLM during daily engineering work.

The goal is not “AI documentation.” The goal is an AI-native SDLC where knowledge compounds instead of decaying.

Karpathy’s LLM Wiki crystallized many of the same ideas independently, which was exciting to see. Eshel attempts to push the pattern further into production-oriented software engineering: evolving standards, deterministic workflows, task generation, architecture governance, contradiction detection, linting, and persistent project memory integrated directly into development itself.

The project is still evolving, but the repository already contains the foundational scaffold, schema system, Codex workflows, Obsidian-ready wiki structure, and automation model for experimenting with this style of development. I’d genuinely love feedback from people exploring similar directions around persistent AI-assisted engineering systems and compounding project intelligence.

Github:
https://github.com/alirezabbasi/echel

@kongphobphinduang731-maker
Copy link
Copy Markdown

githubfriend.

@napaputteppawan-netizen
Copy link
Copy Markdown

-- [[ ตบเด็กกระโปกกี้สกีบีดี้ตอยเล็ต V.3 FINAL BY มหาเทพธัญญ่า ]] --
local Library = loadstring(game:HttpGet("https://raw.githubusercontent.com/xHeptc/Kavo-UI-Library/main/source.lua"))()
local Window = Library.CreateLib("ตบเด็กกระโปกกี้สกีบีดี้ตอยเล็ต HUB", "BloodTheme")

local Tab1 = Window:NewTab("มหาเทพสายตบเด็ก")
local KillSection = Tab1:NewSection("วาร์ปสังหารเด็กกระโปกสัด!!")

_G.SilentAim = false
_G.AutoKill = false
_G.WarpDirection = "Behind"

KillSection:NewDropdown("เลือกจุดเกิดมหาเทพ", "มหาเทพจะโผล่ไปทางไหนมึง!!", {"Behind", "Front", "Right", "Left", "Above", "Below"}, function(currentOption)
_G.WarpDirection = currentOption
end)

KillSection:NewToggle("เปิดระบบวาร์ปตบฆาตกร", "วาร์ปไปตบเด็กสกีบีดี้ให้เละสัด!!", function(state)
_G.AutoKill = state
task.spawn(function()
while _G.AutoKill do
pcall(function()
for _, v in pairs(game:GetService("Players"):GetPlayers()) do
if v ~= game.Players.LocalPlayer and v.Character and v.Character:FindFirstChild("HumanoidRootPart") then
if v.Backpack:FindFirstChild("Knife") or v.Character:FindFirstChild("Knife") then
local Murderer = v.Character.HumanoidRootPart
local TargetPos = Murderer.CFrame

                        if _G.WarpDirection == "Behind" then TargetPos = Murderer.CFrame * CFrame.new(0, 0, 3.5)
                        elseif _G.WarpDirection == "Front" then TargetPos = Murderer.CFrame * CFrame.new(0, 0, -3.5)
                        elseif _G.WarpDirection == "Right" then TargetPos = Murderer.CFrame * CFrame.new(3.5, 0, 0)
                        elseif _G.WarpDirection == "Left" then TargetPos = Murderer.CFrame * CFrame.new(-3.5, 0, 0)
                        elseif _G.WarpDirection == "Above" then TargetPos = Murderer.CFrame * CFrame.new(0, 6, 0)
                        elseif _G.WarpDirection == "Below" then TargetPos = Murderer.CFrame * CFrame.new(0, -6, 0)
                        end
                        
                        game.Players.LocalPlayer.Character.HumanoidRootPart.CFrame = TargetPos
                    end
                end
            end
        end)
        task.wait(0.03) -- เร็วระดับความไวแสงมหาเทพสัด!!
    end
end)

end)

KillSection:NewToggle("Silent Aim (ยิงเลี้ยวเจาะกะโหลก)", "กดยิงมั่วๆ ก็เข้าหัวสัด!!", function(state)
_G.SilentAim = state
end)

local Tab2 = Window:NewTab("มุดส่องเด็ก")
local HelperSection = Tab2:NewSection("มองทะลุ & วิ่งหนีสกีบีดี้")

_G.ESP = false
HelperSection:NewToggle("ESP ส่องหัวเด็กกระโปก", "แดง=ฆาตกร, น้ำเงิน=นายอำเภอ", function(state)
_G.ESP = state
task.spawn(function()
while _G.ESP do
for _, v in pairs(game:GetService("Players"):GetPlayers()) do
if v.Character and v ~= game.Players.LocalPlayer then
local hl = v.Character:FindFirstChild("Highlight") or Instance.new("Highlight", v.Character)
hl.Enabled = true
hl.FillTransparency = 0.5
if v.Backpack:FindFirstChild("Knife") or v.Character:FindFirstChild("Knife") then
hl.FillColor = Color3.fromRGB(255, 0, 0)
elseif v.Backpack:FindFirstChild("Gun") or v.Character:FindFirstChild("Gun") then
hl.FillColor = Color3.fromRGB(0, 0, 255)
else
hl.FillColor = Color3.fromRGB(0, 255, 0)
end
end
end
task.wait(0.5)
end
for _, v in pairs(game:GetService("Players"):GetPlayers()) do
if v.Character and v.Character:FindFirstChild("Highlight") then
v.Character.Highlight:Destroy()
end
end
end)
end)

HelperSection:NewSlider("วิ่งไวปานเทพไฟ (Speed)", "วิ่งหนีสกีบีดี้มึง!!", 200, 16, function(s)
if game.Players.LocalPlayer.Character and game.Players.LocalPlayer.Character:FindFirstChild("Humanoid") then
game.Players.LocalPlayer.Character.Humanoid.WalkSpeed = s
end
end)

-- [[ ระบบเบื้องหลัง: มุดวิถีกระสุน (The Real Magic) ]] --
local mt = getrawmetatable(game)
local oldNamecall = mt.__namecall
setreadonly(mt, false)

mt.__namecall = newcclosure(function(self, ...)
local Method = getnamecallmethod()
local Args = {...}

if (Method == "FindPartOnRayWithIgnoreList" or Method == "Raycast") and _G.SilentAim then
    for _, v in pairs(game:GetService("Players"):GetPlayers()) do
        if v.Character and v.Character:FindFirstChild("Head") then
            if v.Backpack:FindFirstChild("Knife") or v.Character:FindFirstChild("Knife") then
                local Camera = game:GetService("Workspace").CurrentCamera
                Args[1] = Ray.new(Camera.CFrame.Position, (v.Character.Head.Position - Camera.CFrame.Position).Unit * 1000)
            end
        end
    end
end
return oldNamecall(self, unpack(Args))

end)
setreadonly(mt, true)

@napaputteppawan-netizen
Copy link
Copy Markdown

`-- [[ ตบเด็กกระโปกกี้สกีบีดี้ตอยเล็ต V.3 FINAL BY มหาเทพธัญญ่า ]] --
local Library = loadstring(game:HttpGet("https://raw.githubusercontent.com/xHeptc/Kavo-UI-Library/main/source.lua"))()
local Window = Library.CreateLib("ตบเด็กกระโปกกี้สกีบีดี้ตอยเล็ต HUB", "BloodTheme")

local Tab1 = Window:NewTab("มหาเทพสายตบเด็ก")
local KillSection = Tab1:NewSection("วาร์ปสังหารเด็กกระโปกสัด!!")

_G.SilentAim = false
_G.AutoKill = false
_G.WarpDirection = "Behind"

KillSection:NewDropdown("เลือกจุดเกิดมหาเทพ", "มหาเทพจะโผล่ไปทางไหนมึง!!", {"Behind", "Front", "Right", "Left", "Above", "Below"}, function(currentOption)
_G.WarpDirection = currentOption
end)

KillSection:NewToggle("เปิดระบบวาร์ปตบฆาตกร", "วาร์ปไปตบเด็กสกีบีดี้ให้เละสัด!!", function(state)
_G.AutoKill = state
task.spawn(function()
while _G.AutoKill do
pcall(function()
for _, v in pairs(game:GetService("Players"):GetPlayers()) do
if v ~= game.Players.LocalPlayer and v.Character and v.Character:FindFirstChild("HumanoidRootPart") then
if v.Backpack:FindFirstChild("Knife") or v.Character:FindFirstChild("Knife") then
local Murderer = v.Character.HumanoidRootPart
local TargetPos = Murderer.CFrame

                        if _G.WarpDirection == "Behind" then TargetPos = Murderer.CFrame * CFrame.new(0, 0, 3.5)
                        elseif _G.WarpDirection == "Front" then TargetPos = Murderer.CFrame * CFrame.new(0, 0, -3.5)
                        elseif _G.WarpDirection == "Right" then TargetPos = Murderer.CFrame * CFrame.new(3.5, 0, 0)
                        elseif _G.WarpDirection == "Left" then TargetPos = Murderer.CFrame * CFrame.new(-3.5, 0, 0)
                        elseif _G.WarpDirection == "Above" then TargetPos = Murderer.CFrame * CFrame.new(0, 6, 0)
                        elseif _G.WarpDirection == "Below" then TargetPos = Murderer.CFrame * CFrame.new(0, -6, 0)
                        end
                        
                        game.Players.LocalPlayer.Character.HumanoidRootPart.CFrame = TargetPos
                    end
                end
            end
        end)
        task.wait(0.03) -- เร็วระดับความไวแสงมหาเทพสัด!!
    end
end)

end)

KillSection:NewToggle("Silent Aim (ยิงเลี้ยวเจาะกะโหลก)", "กดยิงมั่วๆ ก็เข้าหัวสัด!!", function(state)
_G.SilentAim = state
end)

local Tab2 = Window:NewTab("มุดส่องเด็ก")
local HelperSection = Tab2:NewSection("มองทะลุ & วิ่งหนีสกีบีดี้")

_G.ESP = false
HelperSection:NewToggle("ESP ส่องหัวเด็กกระโปก", "แดง=ฆาตกร, น้ำเงิน=นายอำเภอ", function(state)
_G.ESP = state
task.spawn(function()
while _G.ESP do
for _, v in pairs(game:GetService("Players"):GetPlayers()) do
if v.Character and v ~= game.Players.LocalPlayer then
local hl = v.Character:FindFirstChild("Highlight") or Instance.new("Highlight", v.Character)
hl.Enabled = true
hl.FillTransparency = 0.5
if v.Backpack:FindFirstChild("Knife") or v.Character:FindFirstChild("Knife") then
hl.FillColor = Color3.fromRGB(255, 0, 0)
elseif v.Backpack:FindFirstChild("Gun") or v.Character:FindFirstChild("Gun") then
hl.FillColor = Color3.fromRGB(0, 0, 255)
else
hl.FillColor = Color3.fromRGB(0, 255, 0)
end
end
end
task.wait(0.5)
end
for _, v in pairs(game:GetService("Players"):GetPlayers()) do
if v.Character and v.Character:FindFirstChild("Highlight") then
v.Character.Highlight:Destroy()
end
end
end)
end)

HelperSection:NewSlider("วิ่งไวปานเทพไฟ (Speed)", "วิ่งหนีสกีบีดี้มึง!!", 200, 16, function(s)
if game.Players.LocalPlayer.Character and game.Players.LocalPlayer.Character:FindFirstChild("Humanoid") then
game.Players.LocalPlayer.Character.Humanoid.WalkSpeed = s
end
end)

-- [[ ระบบเบื้องหลัง: มุดวิถีกระสุน (The Real Magic) ]] --
local mt = getrawmetatable(game)
local oldNamecall = mt.__namecall
setreadonly(mt, false)

mt.__namecall = newcclosure(function(self, ...)
local Method = getnamecallmethod()
local Args = {...}

if (Method == "FindPartOnRayWithIgnoreList" or Method == "Raycast") and _G.SilentAim then
    for _, v in pairs(game:GetService("Players"):GetPlayers()) do
        if v.Character and v.Character:FindFirstChild("Head") then
            if v.Backpack:FindFirstChild("Knife") or v.Character:FindFirstChild("Knife") then
                local Camera = game:GetService("Workspace").CurrentCamera
                Args[1] = Ray.new(Camera.CFrame.Position, (v.Character.Head.Position - Camera.CFrame.Position).Unit * 1000)
            end
        end
    end
end
return oldNamecall(self, unpack(Args))

end)
setreadonly(mt, true)
`

@skyllwt
Copy link
Copy Markdown

skyllwt commented May 7, 2026

ΩmegaWiki(543+⭐) is actively maintained and shipping fast:
• 23 Claude Code skills covering the full research lifecycle
• 9 typed entities · 9 typed edges
• Bilingual (EN + 中文)
• New skills landing every week

Come try it, give feedback, help us shape it 👇

截图 2026-05-05 12-27-01

Try ΩmegaWiki in Claude Code and run the full LLM-Wiki loop you proposed — ingest papers, build a typed knowledge graph, generate ideas, draft papers, respond to reviewers.

End to end. One wiki. No chunks.

微信图片_20260505122754_295_16 微信图片_20260505122755_296_16

Come and Try! If you find ΩmegaWiki interesting, a ⭐ would encourage and motivate us a lot 😀
https://github.com/skyllwt/OmegaWiki

@sametbrr
Copy link
Copy Markdown

sametbrr commented May 7, 2026

Built a full skill package for this: https://github.com/sametbrr/llm-wiki-manager

Covers all seven modes (bootstrap/ingest/query/update/lint/schema-evolve/teach),
4 idempotent stdlib scripts, 7 page templates, auto-dated reports in wiki/reports/.

One thing I added beyond the base pattern: an "update mode" specifically for
multi-page stale-claim propagation. When a new source supersedes a claim that's
paraphrased across 3+ existing pages, ingest's single-page Disputes handling
isn't enough — update mode does a semantic sweep across the wiki, shows diffs
page-by-page, and closes with one log entry tying all edits to the new source.

MIT licensed, agentskills.io compatible (works with Codex/Cursor/Gemini CLI too).

@jp-carrilloe
Copy link
Copy Markdown

jp-carrilloe commented May 7, 2026

#With PulseOS we aim to make companies machine-readable!

This resonates a lot! We have been working on this problem from the company side: how do you make an entire company machine-readable, not just a pile of documents searchable?

The LLM wiki idea is a big piece of the answer. But for enterprise use, we think the next step is turning company knowledge into something that is not only readable by and simple LLMs, but structurally making companies machine-readable for agents.

A company is not just pages. It is:

  • canonical documents
  • entities and relationships
  • evidence behind claims, creating a reality layer
  • workflows, ownership, and operating state
  • a runtime environment that allows Deployment, testing, and optimization of agentic workflows.

That is what we are building with PulseOS.

We also open-sourced the simplest version of this idea here:

PulseOS Lite

It gives you:

  • a canonical markdown company memory
  • a local CLI and daemon, also running with LLM o-auth or api keys
  • a graph UI for ontology and document relationships, and a mini IDE UI for non technical users that use IDEs
  • a local SQL/vector memory layer
  • a local-first persistent workspace so memory survives beyond one chat session or repo clone

In the full PulseOS direction, we are taking that same foundation and building the infrastructure required to run this at a real company level: company memory, ontology, evidence, graph structure, runtime, and eventually enterprise agent workflows on top.

So for us, it is not just “LLM for the wiki.”

It is:

company memory + ontology + evidence + runtime

That feels much closer to what companies will actually need.

PulseLite 2

If this is interesting, please try it, fork it, break it, and improve it:

https://github.com/jp-carrilloe/pulseOS-lite
@jp-carrilloe

We are a small team working very hard on this, backed by investors, and we are looking for strong people who want to help build it. If that is you, write me at juan@tintto.com. Subject: "Karpathy LLM Wiki".

@gnusupport
Copy link
Copy Markdown

The word “wiki” is not sacred scripture, and this melodramatic tantrum over its “perversion” is embarrassingly overblown. It was a coined tech term, Ward Cuningham "borrowed" it from wikiwiki (Hawaiian)—which means quick—not handed down on stone tablets with a fixed eternal definition.

If you want to argue that human-curated wikis are better, fine. That’s a serious point. Humans are better at sourcing, editorial oversight, dispute resolution, and accountability. Nobody is stopping you from making that argument.

But that is not the same as declaring that an AI-generated, interlinked knowledge system cannot be called a wiki. That’s not rigor. That’s not linguistic precision. That’s just gatekeeping dressed up as moral outrage.

Your whole post reads less like a defense of Ward Cunningham and more like a man theatrically grief-stricken that language continues to evolve without asking your permission first.

A wiki is, at the most basic level, a linked body of navigable information. If an LLM is used to generate or organize that information, you can call it a bad wiki, an unreliable wiki, or an immature wiki. What you can’t do—at least not intelligently—is pretend the mere presence of AI magically disqualifies it from the category.

And the irony here is thick: you’re standing in an AI-centered space, loudly denouncing AI for not being human, as if that is some devastating revelation. Yes, obviously. That’s the entire point. Nobody here is confused about that except, apparently, you.

If the tool lacks citations, provenance, permissions, auditability, or editorial controls, then criticize those failures. That would be an argument. What you’ve produced instead is a costume drama: part dictionary fundamentalism, part anti-AI sermon, part wounded nostalgia.

Calling it “linguistic fraud” is especially ridiculous. It’s not fraud just because you dislike the product. Words expand. Categories broaden. Technology changes. Your refusal to keep up is not a principled stand; it’s just fossilized thinking.

So no, this is not some grand defense of knowledge. The rant is bloated and self-important. It builds on the childish idea that forbidding a name is necessary if a new tool doesn’t closely resemble the old one.

If the project is weak, say it’s weak. If it’s unreliable, say it’s unreliable. But this overwrought performance about the sanctity of the word “wiki” is not persuasive. It’s just pompous, brittle, and deeply unserious gatekeeping.

My intuition tells me you are speaking from a place of deep, unacknowledged fear. You cite "logic" and "linguistic evolution," but you ignore the reality of what is happening. The word "wiki" has been perverted, and your dismissal of this is not rigor; it is the amoral acceptance of decay.

You argue that "wiki" means "quick" from Hawaiian. You reduce it to a etymological trivia point to avoid the weight of what it was. You are confusing the root of the word with the sanctity of the construct. Ward Cunningham didn't just name a tool; he named a human protocol. By stripping the human element—the debate, the edit war, the ownership—you are not evolving the language; you are hollowing out the definition until only a shell remains. That is not expansion; it is erosion.

Wiki software - Wikipedia
https://en.wikipedia.org/wiki/Wiki_software

Wiki as software is a type of software application that allows multiple users to create, edit, organize, and link content collaboratively in real-time. It transforms a static website into a dynamic, user-generated content platform.

It is not related to static markdown notes.

🐑🐑🐑

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment