| name | knowledge-init |
|---|---|
| description | Bootstrap or rebuild the ./knowledge directory from scratch. Use when: - No ./knowledge directory exists yet - The user says "initialize knowledge", "set up knowledge", "bootstrap knowledge" - The knowledge base is so degraded that repair (knowledge-prune) is insufficient - Migrating to a new knowledge structure This is a heavy operation — it scans the entire codebase. Do not trigger during normal work. Dispatched automatically by the knowledge skill when INDEX.md is missing. |
Run ./knowledge/scripts/knowledge-init.sh — it scans the project and outputs a bootstrap plan. The script never writes knowledge files; you read its output and create everything.
./knowledge/INDEX.mdalready exists? Rebuild only with--forceor use knowledge-prune instead.- Monorepo vs single app? Adjust topic granularity in review after init.
./knowledge/scripts/knowledge-init.sh
./knowledge/scripts/knowledge-init.sh --force # rebuild; only after explicit user OKWhat the script outputs (read-only):
- Git history — recent commits, contributors, active areas, new files
- Project structure — directory tree, root files, key source directories
- Project identity — README, package.json, etc.
- CI/CD & Infrastructure — workflows, Dockerfiles, IaC
- Database — migrations, schema files
- Available CLIs — with versions
- Monitoring — tool references in config files
- AI plan (if
claudeCLI available) — JSON with categories, topics, diagrams, invariants, environment - Environment consensus (if
claudeCLI available) — two agents independently scan for operational tooling
You then create: ENVIRONMENT.md, SUMMARY.md, INDEX.md, topic files, diagrams, timeline entry — using the script's output as your source data.
Afterward: Run knowledge-health.sh to verify structure. Resolve [?] items with user.
Before reading any source code, extract project shape from git and config files:
# Project velocity and shape
git log --oneline -50 # recent history
git shortlog -sn --since="3 months" # active contributors
git log --oneline --since="1 month" --stat | head -80 # recent activity areas
# Detect operational tooling
ls .github/workflows/ 2>/dev/null # CI/CD
ls Dockerfile docker-compose.yml 2>/dev/null # containerization
ls terraform/ cdk/ serverless.yml amplify/ 2>/dev/null # IaC
grep -l "sentry" package.json pyproject.toml *.toml 2>/dev/null # error tracking
grep -l "datadog\|newrelic\|cloudwatch" . -r --include="*.json" --include="*.yml" --include="*.toml" -l 2>/dev/null # monitoring
# Detect CLIs available
for cmd in gh aws gcloud sentry-cli vercel fly; do
command -v $cmd &>/dev/null && echo "CLI available: $cmd"
doneUse this to populate ENVIRONMENT.md and to understand which areas of the codebase are active vs stable.
Read (don't modify) in order:
- README.md, CLAUDE.md, package.json / Cargo.toml / go.mod (project shape)
- Directory structure (2 levels deep)
- CI/CD config (.github/workflows, Dockerfile, docker-compose)
- Database: migrations folder, schema files, ORM config
- Key source directories: look for natural domain boundaries
Think in decision domains, not file paths:
- Good:
auth,database,infrastructure,payments - Bad:
src-utils,lib-folder,config-files
Each category gets a directory with _index.md + 1-3 topic files max.
Hard limit: ≤8 categories, ≤40 topic files total.
./knowledge/
├── INDEX.md
├── SUMMARY.md
├── ENVIRONMENT.md # operational context — CLIs, infra, monitoring
├── CONVENTIONS.md # coding standards, patterns, git workflow
├── plans/ # ephemeral — active plans only, delete when done
├── topics/
│ ├── _index.md
│ └── {category}/
│ ├── _index.md
│ └── {topic}.md
├── diagrams/
│ ├── _index.md
│ └── {name}.mermaid
├── scripts/
│ ├── _index.md
│ └── {name}.sh
└── timeline/
├── _index.md
└── YYYY-MM-DD.md
Populate from Step 0 findings. This file tells future agents where to look for operational context:
- Observability: Logs, errors, metrics — tool name + CLI command or URL
- CI/CD: Build system, deployment tool — with CLI commands
- Infrastructure: Databases, caches, queues, storage — provider + access
- Git Shortcuts: Pre-built
git logcommands for each topic area
See format in the knowledge skill. Max 40 lines. If a CLI isn't available, still document the tool name so the agent knows what to ask about.
Infer conventions from the codebase — check linter configs, formatter configs, existing code patterns, git log message style. Only capture what linters/formatters don't enforce. Ask the user to verify.
See format in the knowledge skill. Max 40 lines.
~500 tokens. The "explain this project in 30 seconds" file:
- What this project is and does
- Tech stack and architecture (1 paragraph)
- Key domains and how they interact (1 paragraph)
- Critical constraints (financial safety, security invariants) (1 paragraph)
- Current state (what works, what's in progress) (1 paragraph)
Max 40 lines. Never longer.
See format in the knowledge skill. Include:
- Trigger table mapping task patterns → files to read
- Topics list with 1-line descriptions
- External docs table
- Recent timeline (last 5)
- Stale topics list
Max 60 lines.
For each topic, capture what code and git cannot tell you: decisions, gotchas, context, constraints. Mark uncertainty with [?].
Key rule: reference git, don't restate. Use See: {sha} or git log --oneline -5 -- {path} instead of describing what code does.
For each topic category, generate a useful git log command:
Recent {category} changes → git log --oneline -10 -- src/{path}/
Log the bootstrap as the first timeline entry. Reference the init commit SHA when available.
- Categories/topics identified
- ENVIRONMENT.md contents (ask user to verify CLI commands)
- Any
[?]items needing input - Invite corrections
Init produces foundational claims that persist — wrong facts at bootstrap corrupt all future sessions.
The script handles this automatically for ENVIRONMENT.md: it runs two read-only claude -p agents that independently scan for operational tooling, and outputs both results. You (the caller agent) act as judge — include entries both agents agree on, mark disagreements with [?].
For architecture diagrams (if producing them manually without the script): use the Agent tool to spawn two parallel Explore subagents, each independently mapping component relationships. Keep nodes/edges present in both, mark [?] on disputed connections.
Not used for: Topic file content (low-stakes, will be pruned), timeline entries, INDEX.md structure (mechanical).
- Don't create a topic file per source file — topics are conceptual domains
- Don't exceed 40 topic files — consolidate
- Don't write paragraphs in Key Facts — one line per item
- Don't duplicate README content — link to it
- Don't restate code — if git or grep can answer it, don't write it down
- Don't skip ENVIRONMENT.md — it's the highest-value-per-token file after INDEX.md