Skip to content

Instantly share code, notes, and snippets.

@Saadkhalid913
Last active April 10, 2026 22:42
Show Gist options
  • Select an option

  • Save Saadkhalid913/e15d7462b98749033c7bf0a519deeebf to your computer and use it in GitHub Desktop.

Select an option

Save Saadkhalid913/e15d7462b98749033c7bf0a519deeebf to your computer and use it in GitHub Desktop.
Claude Code skills for managing persistent codebase knowledge. Five skills (knowledge, knowledge-init, knowledge-prune, knowledge-rollup, knowledge-insights) plus read-only analysis scripts. Explained at https://saadkhalid.com/codebase-knowledge
#!/usr/bin/env bash
# Git hook: post-commit
# Purpose: Lightweight timeline entry after manual (non-Claude) git commits
# Install: cp to .git/hooks/post-commit && chmod +x .git/hooks/post-commit
set -euo pipefail
# Only run if knowledge dir exists and claude CLI is available
[[ -d "./knowledge" ]] || exit 0
command -v claude &>/dev/null || exit 0
COMMIT_SHA=$(git rev-parse --short HEAD)
COMMIT_MSG=$(git log -1 --pretty=%s)
FILES_CHANGED=$(git diff-tree --no-commit-id --name-only -r HEAD 2>/dev/null | head -10 || true)
# Skip trivial commits
echo "$COMMIT_MSG" | grep -qiE '^(fix typo|fmt|format|lint|merge|bump|chore|wip)$' && exit 0
# Skip non-code changes
CODE_FILES=$(echo "$FILES_CHANGED" | grep -cE '\.(ts|tsx|js|jsx|py|rs|go|sql|prisma)$' || echo "0")
(( CODE_FILES == 0 )) && exit 0
# Run in background so git doesn't hang
(
TIMELINE_FILE="./knowledge/timeline/$(date +%Y-%m-%d).md"
claude -p "Add a 2-line timeline entry to $TIMELINE_FILE (create if needed).
Commit: $COMMIT_SHA — $COMMIT_MSG
Files: $FILES_CHANGED
Read ./knowledge/INDEX.md to find the right topic slug. Update the > summary line. Max 3 tool calls." \
--allowedTools "Read,Write" \
--max-turns 3 \
> /dev/null 2>&1 || true
) &
exit 0
#!/usr/bin/env bash
# Install knowledge hooks for Claude Code and git
# Usage: bash install-hooks.sh [--claude] [--git] [--all]
# --claude Install Claude Code Stop + PostToolUse hooks
# --git Install git post-commit hook in current repo
# --all Both of the above
# (no args) Interactive menu
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
HOOKS_DIR="$HOME/.claude/hooks"
check_deps() {
if ! command -v jq &>/dev/null; then
echo "Error: jq is required. Install with: brew install jq (mac) or apt install jq (linux)"
exit 1
fi
if ! command -v claude &>/dev/null; then
echo "Warning: claude CLI not found. Hooks will fail until it's installed."
fi
}
install_claude_hooks() {
echo "═══ Installing Claude Code hooks ═══"
mkdir -p "$HOOKS_DIR"
# Copy hook scripts
for hook in knowledge-on-stop.sh knowledge-on-commit.sh; do
if [[ -f "$SCRIPT_DIR/$hook" ]]; then
cp "$SCRIPT_DIR/$hook" "$HOOKS_DIR/$hook"
chmod +x "$HOOKS_DIR/$hook"
echo " ✓ $hook → $HOOKS_DIR/"
else
echo " ✗ $hook not found in $SCRIPT_DIR"
return 1
fi
done
# Merge into ~/.claude/settings.json
SETTINGS="$HOME/.claude/settings.json"
[[ -f "$SETTINGS" ]] || echo '{}' > "$SETTINGS"
EXISTING=$(cat "$SETTINGS")
# Check if hooks already installed
if echo "$EXISTING" | jq -e '.hooks.Stop[]?.hooks[]? | select(.command | test("knowledge-on-stop"))' > /dev/null 2>&1; then
echo " ⚠ Stop hook already installed — skipping"
else
EXISTING=$(echo "$EXISTING" | jq '
.hooks //= {} |
.hooks.Stop //= [] |
.hooks.Stop += [{
"matcher": "",
"hooks": [{
"type": "command",
"command": "bash ~/.claude/hooks/knowledge-on-stop.sh",
"timeout": 60,
"async": true
}]
}]
')
echo " ✓ Added Stop hook (knowledge-on-stop)"
fi
if echo "$EXISTING" | jq -e '.hooks.PostToolUse[]?.hooks[]? | select(.command | test("knowledge-on-commit"))' > /dev/null 2>&1; then
echo " ⚠ PostToolUse commit hook already installed — skipping"
else
EXISTING=$(echo "$EXISTING" | jq '
.hooks //= {} |
.hooks.PostToolUse //= [] |
.hooks.PostToolUse += [{
"matcher": "Bash",
"hooks": [{
"type": "command",
"if": "Bash(git commit*)",
"command": "bash ~/.claude/hooks/knowledge-on-commit.sh",
"timeout": 60,
"async": true
}]
}]
')
echo " ✓ Added PostToolUse hook (knowledge-on-commit)"
fi
echo "$EXISTING" > "$SETTINGS"
echo " ✓ Saved $SETTINGS"
echo ""
echo " Both hooks run async — they won't slow down your sessions."
echo " Restart Claude Code for hooks to take effect."
}
install_git_hook() {
echo "═══ Installing git post-commit hook ═══"
if [[ ! -d .git ]]; then
echo " ✗ Not in a git repo. Run from your project root."
return 1
fi
TARGET=".git/hooks/post-commit"
if [[ -f "$TARGET" ]]; then
if grep -q "knowledge" "$TARGET" 2>/dev/null; then
echo " ⚠ Knowledge hook already in post-commit — skipping"
return 0
fi
echo " ⚠ $TARGET exists. Appending knowledge hook..."
cat >> "$TARGET" << 'HOOK'
# Knowledge auto-update on commit (added by knowledge hooks installer)
if [[ -d "./knowledge" ]] && command -v claude &>/dev/null; then
COMMIT_SHA=$(git rev-parse --short HEAD)
COMMIT_MSG=$(git log -1 --pretty=%s)
FILES=$(git diff-tree --no-commit-id --name-only -r HEAD | head -10)
echo "$COMMIT_MSG" | grep -qiE '^(fix typo|fmt|format|lint|merge|bump|chore|wip)$' && exit 0
(claude -p "Add a 2-line entry to ./knowledge/timeline/$(date +%Y-%m-%d).md for commit $COMMIT_SHA — $COMMIT_MSG. Read INDEX.md first." --allowedTools "Read,Write" --max-turns 3 > /dev/null 2>&1 || true) &
fi
HOOK
echo " ✓ Appended to existing post-commit hook"
else
if [[ -f "$SCRIPT_DIR/post-commit" ]]; then
cp "$SCRIPT_DIR/post-commit" "$TARGET"
chmod +x "$TARGET"
echo " ✓ Installed $TARGET"
else
echo " ✗ post-commit template not found in $SCRIPT_DIR"
return 1
fi
fi
}
# ── Main ──
check_deps
if [[ $# -eq 0 ]]; then
echo "Knowledge Hooks Installer"
echo ""
echo " 1) Claude Code hooks — auto-update knowledge after sessions + commits"
echo " 2) Git post-commit hook — timeline entries on manual commits"
echo " 3) All of the above"
echo ""
read -rp "Choice [1-3]: " choice
case "$choice" in
1) install_claude_hooks ;;
2) install_git_hook ;;
3) install_claude_hooks; echo ""; install_git_hook ;;
*) echo "Invalid choice"; exit 1 ;;
esac
else
for arg in "$@"; do
case "$arg" in
--claude) install_claude_hooks ;;
--git) install_git_hook ;;
--all) install_claude_hooks; echo ""; install_git_hook ;;
*) echo "Unknown: $arg. Use --claude, --git, or --all"; exit 1 ;;
esac
done
fi
#!/usr/bin/env bash
# SCRIPT: knowledge-health.sh
# PURPOSE: Quick structural health check of ./knowledge — no AI needed
# TIER: safe
# USAGE: ./knowledge/scripts/knowledge-health.sh
# REQUIRES: bash, find, wc
# TOPICS: knowledge-management
# OUTPUT: Health report with file counts, size violations, orphan detection
set -euo pipefail
KNOWLEDGE_DIR="${1:-./knowledge}"
RED='\033[0;31m'
YELLOW='\033[1;33m'
GREEN='\033[0;32m'
NC='\033[0m'
ISSUES=0
WARNINGS=0
header() { echo -e "\n${GREEN}═══ $1 ═══${NC}"; }
issue() { echo -e " ${RED}✗ $1${NC}"; ISSUES=$((ISSUES + 1)); }
warn() { echo -e " ${YELLOW}⚠ $1${NC}"; WARNINGS=$((WARNINGS + 1)); }
ok() { echo -e " ${GREEN}✓ $1${NC}"; }
echo "Knowledge Health Check — $(date +%Y-%m-%d)"
echo "Directory: $KNOWLEDGE_DIR"
# ── Structural checks ──
header "Structure"
[[ -f "$KNOWLEDGE_DIR/INDEX.md" ]] && ok "INDEX.md exists" || issue "INDEX.md missing"
[[ -f "$KNOWLEDGE_DIR/SUMMARY.md" ]] && ok "SUMMARY.md exists" || warn "SUMMARY.md missing (recommended)"
[[ -f "$KNOWLEDGE_DIR/topics/_index.md" ]] && ok "topics/_index.md exists" || issue "topics/_index.md missing"
[[ -f "$KNOWLEDGE_DIR/timeline/_index.md" ]] && ok "timeline/_index.md exists" || issue "timeline/_index.md missing"
[[ -f "$KNOWLEDGE_DIR/diagrams/_index.md" ]] && ok "diagrams/_index.md exists" || warn "diagrams/_index.md missing"
[[ -f "$KNOWLEDGE_DIR/scripts/_index.md" ]] && ok "scripts/_index.md exists" || warn "scripts/_index.md missing"
# ── File counts ──
header "File Counts"
TOPIC_FILES=$(find "$KNOWLEDGE_DIR/topics" -name "*.md" ! -name "_index.md" 2>/dev/null | wc -l | tr -d ' ')
TIMELINE_FILES=$(find "$KNOWLEDGE_DIR/timeline" -name "*.md" ! -name "_index.md" 2>/dev/null | wc -l | tr -d ' ')
DIAGRAM_FILES=$(find "$KNOWLEDGE_DIR/diagrams" -name "*.mermaid" 2>/dev/null | wc -l | tr -d ' ')
SCRIPT_FILES=$(find "$KNOWLEDGE_DIR/scripts" -type f ! -name "_index.md" ! -name "*.md" 2>/dev/null | wc -l | tr -d ' ')
TOTAL_FILES=$(find "$KNOWLEDGE_DIR" -type f 2>/dev/null | wc -l | tr -d ' ')
echo " Topics: $TOPIC_FILES | Timeline: $TIMELINE_FILES | Diagrams: $DIAGRAM_FILES | Scripts: $SCRIPT_FILES | Total: $TOTAL_FILES"
(( TOPIC_FILES > 50 )) && issue "Topic files ($TOPIC_FILES) exceed budget of 50" || ok "Topic count within budget"
(( TIMELINE_FILES > 15 )) && warn "Timeline files ($TIMELINE_FILES) should be consolidated (budget: 10 active + archives)" || ok "Timeline count OK"
(( TOTAL_FILES > 80 )) && issue "Total files ($TOTAL_FILES) is very high — run rollup" || ok "Total file count OK"
# ── Size violations ──
header "Size Violations"
SIZE_ISSUES=0
TOPIC_LIST=$(find "$KNOWLEDGE_DIR/topics" -name "*.md" 2>/dev/null || true)
for f in $TOPIC_LIST; do
[[ -f "$f" ]] || continue
LINES=$(wc -l < "$f" | tr -d ' ')
if (( LINES > 100 )); then
issue "$(basename "$f"): $LINES lines (max 100)"
SIZE_ISSUES=$((SIZE_ISSUES + 1))
fi
done
if [[ -f "$KNOWLEDGE_DIR/INDEX.md" ]]; then
LINES=$(wc -l < "$KNOWLEDGE_DIR/INDEX.md" | tr -d ' ')
(( LINES > 60 )) && issue "INDEX.md: $LINES lines (max 60)" || ok "INDEX.md size OK ($LINES lines)"
fi
(( SIZE_ISSUES == 0 )) && ok "All topic files within size budget"
# ── Category sizes ──
header "Categories"
for dir in "$KNOWLEDGE_DIR/topics"/*/; do
[[ -d "$dir" ]] || continue
CAT=$(basename "$dir")
COUNT=$(find "$dir" -name "*.md" ! -name "_index.md" | wc -l | tr -d ' ')
if (( COUNT > 5 )); then
issue "$CAT/: $COUNT files (max 5 per category)"
else
ok "$CAT/: $COUNT files"
fi
done
# ── Staleness ──
header "Staleness (topics not updated in 30+ days)"
STALE=0
THIRTY_DAYS_AGO=$(date -d "30 days ago" +%Y-%m-%d 2>/dev/null || date -v-30d +%Y-%m-%d 2>/dev/null || echo "")
if [[ -n "$THIRTY_DAYS_AGO" ]]; then
STALE_LIST=$(find "$KNOWLEDGE_DIR/topics" -name "*.md" ! -name "_index.md" 2>/dev/null || true)
for f in $STALE_LIST; do
[[ -f "$f" ]] || continue
UPDATED=$(head -3 "$f" | grep -Eo '[0-9]{4}-[0-9]{2}-[0-9]{2}' 2>/dev/null | head -1 || true)
if [[ -n "$UPDATED" && "$UPDATED" < "$THIRTY_DAYS_AGO" ]]; then
warn "$(basename "$f"): last updated $UPDATED"
STALE=$((STALE + 1))
fi
done
(( STALE == 0 )) && ok "No stale topics detected"
else
warn "Could not determine date 30 days ago — skipping staleness check"
fi
# ── Orphan detection ──
header "Orphan Detection"
if [[ -f "$KNOWLEDGE_DIR/topics/_index.md" ]]; then
ORPHANS=0
ORPHAN_LIST=$(find "$KNOWLEDGE_DIR/topics" -name "*.md" ! -name "_index.md" 2>/dev/null || true)
for f in $ORPHAN_LIST; do
[[ -f "$f" ]] || continue
BASENAME=$(basename "$f" .md)
if ! grep -q "$BASENAME" "$KNOWLEDGE_DIR/topics/_index.md" 2>/dev/null; then
CATDIR=$(dirname "$f")
if [[ -f "$CATDIR/_index.md" ]] && grep -q "$BASENAME" "$CATDIR/_index.md" 2>/dev/null; then
continue
fi
warn "Possible orphan: $f (not found in any _index.md)"
ORPHANS=$((ORPHANS + 1))
fi
done
(( ORPHANS == 0 )) && ok "No orphan topic files detected"
fi
# ── Summary ──
header "Summary"
echo -e " Issues: ${RED}$ISSUES${NC} | Warnings: ${YELLOW}$WARNINGS${NC}"
if (( ISSUES > 0 )); then
echo -e " ${RED}Action needed — run knowledge-rollup or knowledge-prune${NC}"
elif (( WARNINGS > 3 )); then
echo -e " ${YELLOW}Consider running knowledge-rollup${NC}"
else
echo -e " ${GREEN}Knowledge base is healthy${NC}"
fi
exit $ISSUES
#!/usr/bin/env bash
# SCRIPT: knowledge-init.sh
# PURPOSE: Scan codebase and generate a bootstrap plan — agent creates the files
# TIER: safe (read-only — all claude -p calls use read-only tools, output to stdout)
# USAGE: ./knowledge/scripts/knowledge-init.sh [--force]
# REQUIRES: bash, git, claude CLI
# TOPICS: knowledge-management
# OUTPUT: Project scan data + AI-generated plan to stdout — agent writes all files
set -euo pipefail
KNOWLEDGE_DIR="./knowledge"
FORCE=false
[[ "${1:-}" == "--force" ]] && FORCE=true
if [[ -f "$KNOWLEDGE_DIR/INDEX.md" && "$FORCE" == false ]]; then
echo "Error: $KNOWLEDGE_DIR/INDEX.md exists (already bootstrapped)."
echo " Use --force to rebuild"
echo " Or use knowledge-prune.sh to repair instead"
exit 1
fi
READ_TOOLS="Read,Glob,Grep,Bash(git*),Bash(find*),Bash(ls*),Bash(head*),Bash(wc*),Bash(cat*),Bash(command*)"
echo "╔══════════════════════════════════════╗"
echo "║ Knowledge Init — Project Scan ║"
echo "║ $(date +%Y-%m-%d) ║"
echo "╚══════════════════════════════════════╝"
# Create directory structure
mkdir -p "$KNOWLEDGE_DIR"/{topics,timeline,diagrams,scripts,plans}
# ── Section 1: Git history ──
echo ""
echo "═══ Git History ═══"
if command -v git &>/dev/null && [[ -d .git ]]; then
echo "--- Recent commits (last 50) ---"
git log --oneline -50 2>/dev/null || echo "(no git history)"
echo ""
echo "--- Active contributors (3 months) ---"
git shortlog -sn --since="3 months" 2>/dev/null || echo "(none)"
echo ""
echo "--- Recently active areas ---"
git log --oneline --since="1 month" --name-only --format="" 2>/dev/null | sort | uniq -c | sort -rn | head -20 || echo "(none)"
echo ""
echo "--- Files added recently (1 month) ---"
git log --since="1 month" --diff-filter=A --name-only --format="" 2>/dev/null | head -20 || echo "(none)"
else
echo "(not a git repository)"
fi
# ── Section 2: Project structure ──
echo ""
echo "═══ Project Structure ═══"
echo "--- Directory tree (2 levels) ---"
find . -maxdepth 2 -type d ! -path '*/node_modules/*' ! -path '*/.git/*' ! -path '*/vendor/*' ! -path '*/__pycache__/*' ! -path '*/.next/*' ! -path '*/dist/*' ! -path '*/.turbo/*' 2>/dev/null | sort
echo ""
echo "--- Root files ---"
ls -la 2>/dev/null | head -30
# ── Section 3: Project identity ──
echo ""
echo "═══ Project Identity ═══"
for f in README.md readme.md README.rst CLAUDE.md; do
if [[ -f "$f" ]]; then
echo "--- $f (first 60 lines) ---"
head -60 "$f"
echo ""
fi
done
for f in package.json Cargo.toml go.mod pyproject.toml setup.py pom.xml build.gradle composer.json Gemfile; do
if [[ -f "$f" ]]; then
echo "--- $f ---"
cat "$f"
echo ""
fi
done
# ── Section 4: CI/CD & Infrastructure ──
echo ""
echo "═══ CI/CD & Infrastructure ═══"
if [[ -d .github/workflows ]]; then
echo "--- GitHub Actions workflows ---"
for wf in .github/workflows/*.yml .github/workflows/*.yaml; do
[[ -f "$wf" ]] || continue
echo " $wf:"
head -10 "$wf"
echo ""
done
fi
for f in Dockerfile docker-compose.yml docker-compose.yaml; do
[[ -f "$f" ]] && echo "Found: $f"
done
for d in terraform cdk cloudformation pulumi serverless amplify; do
[[ -d "$d" ]] && echo "Found IaC: $d/"
done
# ── Section 5: Database ──
echo ""
echo "═══ Database ═══"
for d in migrations db/migrations prisma/migrations alembic/versions drizzle; do
if [[ -d "$d" ]]; then
echo "--- Migrations: $d/ ---"
ls "$d/" 2>/dev/null | tail -10
fi
done
for f in prisma/schema.prisma schema.sql drizzle.config.ts knexfile.js; do
[[ -f "$f" ]] && echo "Found schema: $f"
done
# ── Section 6: Available CLIs ──
echo ""
echo "═══ Available CLIs ═══"
for cmd in gh aws gcloud az sentry-cli vercel fly kubectl docker terraform pulumi; do
if command -v "$cmd" &>/dev/null; then
version=$("$cmd" --version 2>/dev/null | head -1 || echo "unknown version")
echo " ✓ $cmd — $version"
fi
done
# ── Section 7: Monitoring & Observability ──
echo ""
echo "═══ Monitoring & Observability ═══"
for tool in sentry datadog newrelic cloudwatch posthog segment mixpanel grafana prometheus; do
hits=$(grep -rl "$tool" --include="*.json" --include="*.yml" --include="*.yaml" --include="*.toml" --include="*.ts" --include="*.js" --include="*.py" --include="*.env.example" . 2>/dev/null | grep -v node_modules | grep -v .git | head -3 || true)
[[ -n "$hits" ]] && echo " $tool referenced in: $hits"
done
# ── Section 8: Source structure ──
echo ""
echo "═══ Source Structure ═══"
echo "--- Key source directories ---"
for d in src app lib pkg internal cmd api server client components pages routes services models controllers middleware utils helpers; do
if [[ -d "$d" ]]; then
count=$(find "$d" -type f ! -path '*/node_modules/*' 2>/dev/null | wc -l)
echo " $d/ — $count files"
fi
done
for d in src/*/; do
[[ -d "$d" ]] || continue
count=$(find "$d" -type f 2>/dev/null | wc -l)
echo " $d — $count files"
done
# ── Section 9: Conventions detection ──
echo ""
echo "═══ Conventions ═══"
for f in .eslintrc .eslintrc.js .eslintrc.json .eslintrc.yml eslint.config.js eslint.config.mjs eslint.config.ts; do
[[ -f "$f" ]] && { echo "--- ESLint: $f ---"; head -20 "$f"; echo ""; }
done
for f in .prettierrc .prettierrc.json .prettierrc.yml .prettierrc.js prettier.config.js; do
[[ -f "$f" ]] && { echo "--- Prettier: $f ---"; cat "$f"; echo ""; }
done
for f in tsconfig.json tsconfig.base.json; do
[[ -f "$f" ]] && { echo "--- TypeScript: $f ---"; cat "$f"; echo ""; }
done
for f in .editorconfig biome.json deno.json; do
[[ -f "$f" ]] && { echo "--- $f ---"; cat "$f"; echo ""; }
done
echo "--- Git commit message style (last 20) ---"
git log --oneline -20 2>/dev/null || echo "(no git)"
echo ""
echo "--- Branch naming ---"
git branch -r 2>/dev/null | head -15 || echo "(no remotes)"
# ── Section 10: AI-powered deep scan ──
echo ""
echo "═══ AI Analysis — Codebase Plan ═══"
if command -v claude &>/dev/null; then
claude -p "You are scanning a codebase to plan a knowledge base bootstrap. Output ONLY a JSON plan (no markdown fences, no preamble):
{
\"project_summary\": \"2-3 sentence description\",
\"tech_stack\": \"1 sentence\",
\"categories\": [
{\"slug\": \"name\", \"summary\": \"1-line\", \"topics\": [
{\"slug\": \"name\", \"summary\": \"1-line\", \"key_paths\": [\"relevant/paths\"]}
]}
],
\"diagrams\": [{\"name\": \"name\", \"description\": \"what it shows\"}],
\"critical_invariants\": [\"1-line each\"],
\"environment\": {
\"observability\": [{\"signal\": \"logs|errors|metrics\", \"tool\": \"name\", \"access\": \"cli command or url\"}],
\"cicd\": [{\"stage\": \"ci|deploy|preview\", \"tool\": \"name\", \"access\": \"command\"}],
\"infrastructure\": [{\"resource\": \"name\", \"provider\": \"name\", \"access\": \"command or endpoint\"}]
}
}
RULES:
- Max 8 categories, max 5 topics per category, total ≤40
- Categories are conceptual domains (auth, database, infrastructure), NOT file paths
- Only create topics for things needing context beyond reading code
- For environment: only include tools you can confirm from config files or CLI availability
- Mark uncertain environment entries with [?]
Respond with ONLY the JSON." \
--allowedTools "$READ_TOOLS" \
--max-turns 15 \
--output-format text 2>/dev/null || echo "(claude analysis failed — agent should plan manually from data above)"
# ── Section 10: Consensus on ENVIRONMENT.md claims ──
echo ""
echo "═══ AI Consensus — Environment Verification ═══"
# Two agents independently verify environment claims (use temp files for background capture)
ENV_TMP_A=$(mktemp)
ENV_TMP_B=$(mktemp)
trap "rm -f $ENV_TMP_A $ENV_TMP_B" EXIT
claude -p "Scan this codebase for operational tooling. List ONLY tools you can confirm exist from config files, package.json, CI workflows, or CLI availability. Format each as: TOOL | SIGNAL_TYPE | ACCESS_COMMAND. Do not guess — if uncertain, skip it." \
--allowedTools "$READ_TOOLS" \
--max-turns 10 \
--output-format text > "$ENV_TMP_A" 2>/dev/null &
PID_A=$!
claude -p "Scan this codebase for operational tooling. List ONLY tools you can confirm exist from config files, package.json, CI workflows, or CLI availability. Format each as: TOOL | SIGNAL_TYPE | ACCESS_COMMAND. Do not guess — if uncertain, skip it." \
--allowedTools "$READ_TOOLS" \
--max-turns 10 \
--output-format text > "$ENV_TMP_B" 2>/dev/null &
PID_B=$!
wait $PID_A $PID_B 2>/dev/null || true
echo "--- Agent A ---"
cat "$ENV_TMP_A"
echo ""
echo "--- Agent B ---"
cat "$ENV_TMP_B"
echo ""
echo "--- Consensus ---"
echo "(Caller agent: include entries BOTH agents agree on. Mark disagreements with [?].)"
else
echo "(claude CLI not available — agent should plan manually from data above)"
fi
# ── Done ──
echo ""
echo "═══ Scan Complete ═══"
echo "The caller agent should now use this data to create:"
echo " 1. ENVIRONMENT.md (from CLIs, monitoring, CI/CD + consensus section)"
echo " 2. CONVENTIONS.md (from linter/formatter configs + git style)"
echo " 3. SUMMARY.md (from project identity + AI plan)"
echo " 4. INDEX.md (trigger table based on topic categories)"
echo " 5. Topic files (from AI plan + source structure)"
echo " 6. Timeline entry for today's bootstrap"
name knowledge-init
description Bootstrap or rebuild the ./knowledge directory from scratch. Use when: - No ./knowledge directory exists yet - The user says "initialize knowledge", "set up knowledge", "bootstrap knowledge" - The knowledge base is so degraded that repair (knowledge-prune) is insufficient - Migrating to a new knowledge structure This is a heavy operation — it scans the entire codebase. Do not trigger during normal work. Dispatched automatically by the knowledge skill when INDEX.md is missing.

Knowledge Init — Bootstrap from Scratch

Run ./knowledge/scripts/knowledge-init.sh — it scans the project and outputs a bootstrap plan. The script never writes knowledge files; you read its output and create everything.

Prerequisites

  1. ./knowledge/INDEX.md already exists? Rebuild only with --force or use knowledge-prune instead.
  2. Monorepo vs single app? Adjust topic granularity in review after init.

Primary path: knowledge-init.sh

./knowledge/scripts/knowledge-init.sh
./knowledge/scripts/knowledge-init.sh --force   # rebuild; only after explicit user OK

What the script outputs (read-only):

  1. Git history — recent commits, contributors, active areas, new files
  2. Project structure — directory tree, root files, key source directories
  3. Project identity — README, package.json, etc.
  4. CI/CD & Infrastructure — workflows, Dockerfiles, IaC
  5. Database — migrations, schema files
  6. Available CLIs — with versions
  7. Monitoring — tool references in config files
  8. AI plan (if claude CLI available) — JSON with categories, topics, diagrams, invariants, environment
  9. Environment consensus (if claude CLI available) — two agents independently scan for operational tooling

You then create: ENVIRONMENT.md, SUMMARY.md, INDEX.md, topic files, diagrams, timeline entry — using the script's output as your source data.

Afterward: Run knowledge-health.sh to verify structure. Resolve [?] items with user.


Fallback: manual bootstrap (no CLI or scripts)

Step 0: Git & environment scan (do this FIRST)

Before reading any source code, extract project shape from git and config files:

# Project velocity and shape
git log --oneline -50                                    # recent history
git shortlog -sn --since="3 months"                      # active contributors
git log --oneline --since="1 month" --stat | head -80    # recent activity areas

# Detect operational tooling
ls .github/workflows/ 2>/dev/null                        # CI/CD
ls Dockerfile docker-compose.yml 2>/dev/null             # containerization
ls terraform/ cdk/ serverless.yml amplify/ 2>/dev/null   # IaC
grep -l "sentry" package.json pyproject.toml *.toml 2>/dev/null  # error tracking
grep -l "datadog\|newrelic\|cloudwatch" . -r --include="*.json" --include="*.yml" --include="*.toml" -l 2>/dev/null  # monitoring

# Detect CLIs available
for cmd in gh aws gcloud sentry-cli vercel fly; do
  command -v $cmd &>/dev/null && echo "CLI available: $cmd"
done

Use this to populate ENVIRONMENT.md and to understand which areas of the codebase are active vs stable.

Step 1: Scan the codebase

Read (don't modify) in order:

  • README.md, CLAUDE.md, package.json / Cargo.toml / go.mod (project shape)
  • Directory structure (2 levels deep)
  • CI/CD config (.github/workflows, Dockerfile, docker-compose)
  • Database: migrations folder, schema files, ORM config
  • Key source directories: look for natural domain boundaries

Step 2: Identify 4-8 topic categories

Think in decision domains, not file paths:

  • Good: auth, database, infrastructure, payments
  • Bad: src-utils, lib-folder, config-files

Each category gets a directory with _index.md + 1-3 topic files max. Hard limit: ≤8 categories, ≤40 topic files total.

Step 3: Create directory structure

./knowledge/
├── INDEX.md
├── SUMMARY.md
├── ENVIRONMENT.md          # operational context — CLIs, infra, monitoring
├── CONVENTIONS.md          # coding standards, patterns, git workflow
├── plans/                  # ephemeral — active plans only, delete when done
├── topics/
│   ├── _index.md
│   └── {category}/
│       ├── _index.md
│       └── {topic}.md
├── diagrams/
│   ├── _index.md
│   └── {name}.mermaid
├── scripts/
│   ├── _index.md
│   └── {name}.sh
└── timeline/
    ├── _index.md
    └── YYYY-MM-DD.md

Step 4: Create ENVIRONMENT.md

Populate from Step 0 findings. This file tells future agents where to look for operational context:

  • Observability: Logs, errors, metrics — tool name + CLI command or URL
  • CI/CD: Build system, deployment tool — with CLI commands
  • Infrastructure: Databases, caches, queues, storage — provider + access
  • Git Shortcuts: Pre-built git log commands for each topic area

See format in the knowledge skill. Max 40 lines. If a CLI isn't available, still document the tool name so the agent knows what to ask about.

Step 4.5: Create CONVENTIONS.md

Infer conventions from the codebase — check linter configs, formatter configs, existing code patterns, git log message style. Only capture what linters/formatters don't enforce. Ask the user to verify.

See format in the knowledge skill. Max 40 lines.

Step 5: Create SUMMARY.md

~500 tokens. The "explain this project in 30 seconds" file:

  • What this project is and does
  • Tech stack and architecture (1 paragraph)
  • Key domains and how they interact (1 paragraph)
  • Critical constraints (financial safety, security invariants) (1 paragraph)
  • Current state (what works, what's in progress) (1 paragraph)

Max 40 lines. Never longer.

Step 6: Create INDEX.md

See format in the knowledge skill. Include:

  • Trigger table mapping task patterns → files to read
  • Topics list with 1-line descriptions
  • External docs table
  • Recent timeline (last 5)
  • Stale topics list

Max 60 lines.

Step 7: Create initial topic files

For each topic, capture what code and git cannot tell you: decisions, gotchas, context, constraints. Mark uncertainty with [?].

Key rule: reference git, don't restate. Use See: {sha} or git log --oneline -5 -- {path} instead of describing what code does.

Step 8: Seed git shortcuts in ENVIRONMENT.md

For each topic category, generate a useful git log command:

Recent {category} changes → git log --oneline -10 -- src/{path}/

Step 9: Create timeline entry for today

Log the bootstrap as the first timeline entry. Reference the init commit SHA when available.

Step 10: Report to user

  • Categories/topics identified
  • ENVIRONMENT.md contents (ask user to verify CLI commands)
  • Any [?] items needing input
  • Invite corrections

Consensus protocol (parallel verification during init)

Init produces foundational claims that persist — wrong facts at bootstrap corrupt all future sessions.

The script handles this automatically for ENVIRONMENT.md: it runs two read-only claude -p agents that independently scan for operational tooling, and outputs both results. You (the caller agent) act as judge — include entries both agents agree on, mark disagreements with [?].

For architecture diagrams (if producing them manually without the script): use the Agent tool to spawn two parallel Explore subagents, each independently mapping component relationships. Keep nodes/edges present in both, mark [?] on disputed connections.

Not used for: Topic file content (low-stakes, will be pruned), timeline entries, INDEX.md structure (mechanical).

Anti-patterns

  • Don't create a topic file per source file — topics are conceptual domains
  • Don't exceed 40 topic files — consolidate
  • Don't write paragraphs in Key Facts — one line per item
  • Don't duplicate README content — link to it
  • Don't restate code — if git or grep can answer it, don't write it down
  • Don't skip ENVIRONMENT.md — it's the highest-value-per-token file after INDEX.md
#!/usr/bin/env bash
# SCRIPT: knowledge-insights.sh
# PURPOSE: Generate targeted codebase analysis — agent decides what to persist
# TIER: safe (read-only — all claude -p calls use read-only tools, output to stdout)
# USAGE: ./knowledge/scripts/knowledge-insights.sh <analysis-type>
# REQUIRES: bash, claude CLI (optional for some types), git
# TOPICS: knowledge-management
# OUTPUT: Analysis report to stdout — agent writes to knowledge if requested
set -euo pipefail
KNOWLEDGE_DIR="./knowledge"
usage() {
cat <<EOF
Usage: $(basename "$0") <analysis-type>
Analysis types:
dependencies — Dependency graph, unused/outdated/risky deps
dead-code — Dead code, unused exports, unreachable paths
complexity — Most complex files/functions, refactor candidates
security — Quick security surface scan (NOT a full audit)
api-surface — Map all API endpoints, auth, validation
db-schema — Schema analysis, missing indexes, N+1 risks
test-coverage — Untested critical paths, test priorities
architecture — Architecture diagrams (Mermaid)
debt — TODOs, hacks, skipped tests, tech debt
onboarding — New developer guide
git-history — Hotspots, churn, velocity (no AI needed)
operations — Cloud resources, CI/CD, monitoring, env vars
ci-health — Recent CI runs, failure patterns, flaky tests
All output goes to stdout. The caller agent decides what to save.
Examples:
$(basename "$0") security
$(basename "$0") git-history
$(basename "$0") architecture
EOF
exit 1
}
ANALYSIS="${1:-}"
[[ -z "$ANALYSIS" ]] && usage
READ_TOOLS="Read,Glob,Grep,Bash(git*),Bash(find*),Bash(grep*),Bash(cat*),Bash(head*),Bash(ls*),Bash(wc*),Bash(tail*),Bash(sort*),Bash(uniq*),Bash(command*)"
run_analysis() {
local prompt="$1"
if command -v claude &>/dev/null; then
claude -p "$prompt" \
--allowedTools "$READ_TOOLS" \
--max-turns 30 \
--output-format text 2>/dev/null || echo "(claude analysis failed)"
else
echo "(claude CLI not available — analysis requires it for: $ANALYSIS)"
fi
}
# For security + architecture: dual-agent consensus
run_consensus() {
local prompt="$1"
local type="$2"
if ! command -v claude &>/dev/null; then
echo "(claude CLI not available — cannot run consensus analysis)"
return
fi
echo "Running dual-agent consensus analysis for $type..."
echo ""
# Two agents independently analyze (use temp files for background capture)
local tmp_a tmp_b
tmp_a=$(mktemp)
tmp_b=$(mktemp)
trap "rm -f $tmp_a $tmp_b" EXIT
claude -p "$prompt" \
--allowedTools "$READ_TOOLS" \
--max-turns 30 \
--output-format text > "$tmp_a" 2>/dev/null &
PID_A=$!
claude -p "$prompt" \
--allowedTools "$READ_TOOLS" \
--max-turns 30 \
--output-format text > "$tmp_b" 2>/dev/null &
PID_B=$!
wait $PID_A $PID_B 2>/dev/null || true
echo "--- Agent A ---"
cat "$tmp_a"
echo ""
echo "--- Agent B ---"
cat "$tmp_b"
echo ""
echo "--- Consensus Rules for Caller Agent ---"
if [[ "$type" == "security" ]]; then
echo " BOTH found it → high confidence, include as-is"
echo " ONE found it → include with [single-agent, verify] tag"
echo " DISAGREE (vulnerable vs safe) → include as [disputed, verify]"
echo " Never drop a finding — false negatives are worse than false positives"
else
echo " BOTH agree on relationship → include as-is"
echo " ONE agent shows it → include with [?]"
echo " DISAGREE on direction/relationship → mark [disputed]"
fi
}
case "$ANALYSIS" in
dependencies)
run_analysis "Analyze this project's dependencies.
1. Read package.json / lock files / go.mod / Cargo.toml across all packages
2. Identify: unused dependencies (imported nowhere), outdated major versions, packages with known security advisories, duplicate packages across workspaces, heaviest dependencies by install size
3. Output a prioritized list of actions
Be specific — name the packages and where they're used (or not used).
$(command -v gh &>/dev/null && echo "Also check: gh api repos/{owner}/{repo}/dependabot/alerts 2>/dev/null for known vulnerabilities")"
;;
dead-code)
run_analysis "Find dead code in this project.
1. Scan for exported functions/types never imported elsewhere
2. Find files never imported
3. Look for commented-out code blocks (>5 lines)
4. Find unreachable code paths (early returns, impossible conditions)
5. Check for unused route handlers, database queries, orphan components
Focus on the most impactful findings. Skip test files and node_modules."
;;
complexity)
run_analysis "Find the most complex parts of this codebase.
1. Files with the most lines of code (top 10)
2. Functions/methods with deep nesting (>3 levels) or many branches
3. Modules with the most dependencies (imports)
4. Files imported by the most other files (high fan-in = high risk)
5. Suggest which files would benefit most from refactoring and why
Be specific — give file paths and function names."
;;
security)
run_consensus "Quick security surface scan of this codebase. NOT a full audit — flag obvious risks.
Check for:
1. Hardcoded secrets, API keys, private keys (grep for common patterns)
2. SQL injection risks (string concatenation in queries)
3. Missing input validation on API endpoints
4. Overly permissive CORS or auth configurations
5. Unsafe deserialization or eval usage
6. Missing rate limiting on sensitive endpoints
7. Error messages that leak internal details
$(command -v gh &>/dev/null && echo "8. Check gh api repos/{owner}/{repo}/dependabot/alerts for known vulnerabilities")
$(command -v sentry-cli &>/dev/null && echo "9. Check sentry-cli issues list for recent error patterns")
Flag severity (CRITICAL/HIGH/MEDIUM) for each finding." \
"security"
;;
api-surface)
run_analysis "Map the complete API surface of this project.
For each endpoint:
- Method + path
- Auth required? (which middleware/guard)
- Input validation (what schema/validation library)
- Response shape (key fields)
- Rate limited?
Output as a table. Also note any endpoints that lack auth or validation."
;;
db-schema)
run_analysis "Analyze the database schema and data access patterns.
1. Read migrations/schema files for current schema
2. Check for: missing indexes on foreign keys, missing indexes on WHERE columns, tables without timestamps, potential N+1 patterns, missing cascade deletes, columns that should be NOT NULL
3. Map which parts of the code access which tables
Output a prioritized list of schema improvements."
;;
test-coverage)
run_analysis "Analyze test coverage gaps.
1. Find all test files and what they test
2. Find critical paths LACKING tests: financial calculations, auth flows, payment processing, data mutations, API endpoints
3. Rank untested paths by risk (financial > auth > data integrity > UI)
4. Suggest specific test cases that would add the most value
$(command -v gh &>/dev/null && echo "5. Check gh run list --limit 10 for recent CI pass/fail rates")
Focus on what's missing, not what's there."
;;
architecture)
run_consensus "Generate architecture diagrams for this codebase using Mermaid syntax.
1. Read the project structure and key source files
2. Generate:
- System overview: high-level component relationships
- Data flow: how requests move through the system
- Package dependency graph: which packages depend on which
3. Use 'graph TD' for hierarchies, 'graph LR' for flows, 'sequenceDiagram' for request flows
4. Label ALL edges with the relationship type (REST, gRPC, import, pub/sub, etc.)
5. Each diagram: first 2 lines are %% comments with description and related topics
Output Mermaid source only." \
"architecture"
;;
debt)
run_analysis "Find tech debt in this codebase.
Search for:
1. TODO/FIXME/HACK/TEMP/WORKAROUND comments — list each with file path and context
2. Skipped tests (@skip, .skip, xit, xdescribe)
3. Disabled linting rules (eslint-disable, @ts-ignore, @ts-expect-error)
4. Hardcoded values that should be config (magic numbers, hardcoded URLs)
5. Deprecated API usage
6. Copy-pasted code (similar blocks in different files)
Prioritize by impact: correctness > performance > maintainability."
;;
onboarding)
run_analysis "Generate a new developer onboarding guide for this codebase.
Cover:
1. What is this project? (2-3 sentences)
2. Tech stack and architecture (1 paragraph + key packages)
3. How to set up the dev environment (step by step)
4. Where to find things: key directories and what they contain
5. How to run tests
6. Common development tasks and workflows
7. Key concepts a new developer needs to understand
8. Critical rules: things that will break if violated
9. Useful commands and shortcuts
Write it as a practical guide. Target: competent developer, new to this project. Max 200 lines."
;;
git-history)
# Pure git — no AI needed for data gathering
echo "═══ Git History Analysis ═══"
echo ""
echo "--- Hotspot files (most frequently changed, 3 months) ---"
git log --since="3 months" --name-only --format="" 2>/dev/null | sort | uniq -c | sort -rn | head -20 || echo "(none)"
echo ""
echo "--- Churn by directory ---"
git log --since="3 months" --name-only --format="" 2>/dev/null | xargs -I{} dirname {} 2>/dev/null | sort | uniq -c | sort -rn | head -15 || echo "(none)"
echo ""
echo "--- Recent major changes (10+ files in one commit) ---"
git log --since="6 months" --oneline --shortstat 2>/dev/null | grep -E '[0-9]+ files? changed' | awk -F, '{gsub(/[^0-9]/,"",$1); if ($1 >= 10) print}' | head -10 || echo "(none)"
echo ""
echo "--- Contributors by area (3 months) ---"
for d in src app lib pkg; do
[[ -d "$d" ]] && { echo " $d/:"; git shortlog -sn --since="3 months" -- "$d/" 2>/dev/null | head -5; echo ""; }
done
echo "--- Velocity (commits per week, last 8 weeks) ---"
git log --since="8 weeks" --format="%aI" 2>/dev/null | cut -c1-10 | sort | uniq -c || echo "(none)"
# Optionally summarize with AI
if command -v claude &>/dev/null; then
echo ""
echo "--- AI Summary ---"
HIST_DATA=$(git log --since="3 months" --name-only --format="" 2>/dev/null | sort | uniq -c | sort -rn | head -30)
claude -p "Summarize these git hotspots. Identify: areas with most activity, potential risk areas (high churn), areas that seem stable. Be brief.
$HIST_DATA" \
--max-turns 3 \
--output-format text 2>/dev/null || true
fi
;;
operations)
echo "═══ Operations Analysis ═══"
echo ""
echo "--- CI/CD ---"
[[ -d .github/workflows ]] && { echo "GitHub Actions:"; ls .github/workflows/ 2>/dev/null; echo ""; }
[[ -f Jenkinsfile ]] && echo "Found: Jenkinsfile"
[[ -f .circleci/config.yml ]] && echo "Found: CircleCI"
[[ -f .gitlab-ci.yml ]] && echo "Found: GitLab CI"
echo ""
echo "--- Containers ---"
for f in Dockerfile docker-compose.yml docker-compose.yaml; do
[[ -f "$f" ]] && echo "Found: $f"
done
echo ""
echo "--- Infrastructure as Code ---"
for d in terraform cdk cloudformation pulumi; do
[[ -d "$d" ]] && { echo "Found: $d/"; ls "$d/" 2>/dev/null | head -10; echo ""; }
done
for f in serverless.yml serverless.yaml; do
[[ -f "$f" ]] && echo "Found: $f"
done
echo ""
echo "--- Environment variables ---"
for f in .env.example .env.template .env.sample; do
[[ -f "$f" ]] && { echo "--- $f ---"; cat "$f"; echo ""; }
done
echo ""
echo "--- Available CLIs ---"
for cmd in gh aws gcloud az sentry-cli vercel fly kubectl docker terraform pulumi; do
command -v "$cmd" &>/dev/null && echo " ✓ $cmd"
done
echo ""
echo "--- Monitoring references ---"
for tool in sentry datadog newrelic cloudwatch posthog grafana prometheus; do
hits=$(grep -rl "$tool" --include="*.json" --include="*.yml" --include="*.yaml" --include="*.toml" --include="*.ts" --include="*.js" --include="*.py" . 2>/dev/null | grep -v node_modules | grep -v .git | head -3 || true)
[[ -n "$hits" ]] && echo " $tool: $hits"
done
if command -v claude &>/dev/null; then
echo ""
echo "--- AI Summary ---"
run_analysis "Based on the project files, summarize the operational setup: how is this deployed, what monitoring exists, what CI/CD pipeline runs. Be brief — just the facts."
fi
;;
ci-health)
echo "═══ CI Health Analysis ═══"
echo ""
if command -v gh &>/dev/null; then
echo "--- Recent workflow runs (last 20) ---"
gh run list --limit 20 2>/dev/null || echo "(failed to list runs)"
echo ""
echo "--- Failed runs ---"
FAILED=$(gh run list --limit 20 --status failure --json databaseId,displayTitle,conclusion,createdAt 2>/dev/null || echo "")
if [[ -n "$FAILED" && "$FAILED" != "[]" ]]; then
echo "$FAILED"
echo ""
# Get failure logs for most recent failure
FAIL_ID=$(echo "$FAILED" | grep -oE '"databaseId":[0-9]+' | head -1 | grep -oE '[0-9]+')
if [[ -n "$FAIL_ID" ]]; then
echo "--- Failure log (run $FAIL_ID) ---"
gh run view "$FAIL_ID" --log-failed 2>/dev/null | tail -50 || echo "(could not fetch logs)"
fi
else
echo "No recent failures"
fi
else
echo "(gh CLI not available — checking workflow files only)"
[[ -d .github/workflows ]] && ls .github/workflows/ 2>/dev/null
fi
;;
*)
echo "Unknown analysis type: $ANALYSIS"
usage
;;
esac
echo ""
echo "═══ Analysis Complete ═══"
echo "The caller agent decides what to persist to ./knowledge."
name knowledge-insights
description Run targeted codebase analysis via the knowledge-insights shell script (Claude CLI). Use when: - The user wants dependency, security, dead-code, API surface, or tech-debt analysis - Onboarding documentation or architecture diagrams saved into ./knowledge - Ad-hoc "scan the repo for X" with optional persistence to the knowledge base - Git history analysis (hotspots, churn, contributor mapping) - Operational health checks (CI status, recent errors, deploy state) Requires the claude CLI and (for --save) write access to ./knowledge. Dispatched by the knowledge skill when deep analysis is requested.

Knowledge Insights — Targeted Analysis

Run from the project root:

./knowledge/scripts/knowledge-insights.sh <analysis-type> [--save]

Analysis types

Type Focus
dependencies Dependency graph, unused/outdated/risky deps, duplicates, weight
dead-code Unused exports, unimported files, orphan routes, commented blocks
complexity Large files, deep nesting, high fan-in, refactor candidates
security Surface scan (secrets, SQLi, validation, CORS, rate limits) + gh dependabot if available
api-surface Endpoints, auth, validation, response shapes
db-schema Schema/migrations, indexes, N+1 risks, migration gaps
test-coverage Gaps in critical paths, prioritized test ideas + latest CI results via gh run list
architecture Mermaid diagrams: system overview, data flow, package graph
debt TODO/FIXME, skipped tests, lint suppressions, magic values
onboarding New-developer guide (setup, layout, tests, workflows)
git-history Hotspot files, churn by directory, major refactors, contributor map, velocity trends
operations Cloud resources, deployment config, env vars, secrets management, CI pipeline structure
ci-health Recent CI runs, failure patterns, flaky tests, build times

Git-history analysis (no API cost)

This analysis uses only git commands — no AI calls needed for data gathering:

# Hotspot files (most frequently changed)
git log --since="3 months" --name-only --format="" | sort | uniq -c | sort -rn | head -20

# Churn by directory
git log --since="3 months" --name-only --format="" | xargs -I{} dirname {} | sort | uniq -c | sort -rn | head -15

# Recent major refactors (commits touching 10+ files)
git log --since="6 months" --oneline --shortstat | awk '/files? changed/ && $1 >= 10'

# Contributor map by area
git shortlog -sn --since="3 months" -- src/

# Velocity (commits per week, last 8 weeks)
git log --since="8 weeks" --format="%aI" | cut -c1-10 | uniq -c

The AI step summarizes findings and identifies patterns.

Operations analysis

Scans for and reports on:

  • CI/CD: .github/workflows/, Jenkinsfile, Dockerfile, docker-compose.yml
  • IaC: terraform/, cdk/, serverless.yml, cloudformation/, pulumi/
  • Monitoring: Sentry, Datadog, New Relic, CloudWatch config
  • Available CLIs and their capabilities
  • Environment variable patterns (.env.example, config files)

With --save: Updates ENVIRONMENT.md with discovered tooling.

CI-health analysis

Pulls live data when gh CLI is available:

gh run list --limit 20                          # recent runs
gh run view {id} --log-failed                   # failure details
gh api repos/{owner}/{repo}/actions/runs --jq '.workflow_runs[] | select(.conclusion=="failure")' | head -20

Identifies: flaky tests, slow builds, frequently failing workflows.

Script output → agent writes

The script always outputs to stdout — it never writes to knowledge files. You (the caller agent) decide what to persist:

Analysis type Where to save
architecture ./knowledge/diagrams/*.mermaid + diagrams/_index.md
onboarding ./knowledge/ONBOARDING.md
operations ./knowledge/ENVIRONMENT.md
git-history Relevant topic files + timeline entry
All others Timeline entry + update relevant topics

External enrichment

When CLIs exist, the script automatically pulls live data:

Analysis CLI Enrichment
security gh gh api repos/{o}/{r}/dependabot/alerts — known vulnerabilities
security sentry-cli Recent error patterns
test-coverage gh gh run list — latest CI pass/fail
ci-health gh gh run view {id} --log-failed — failure logs
operations aws/gcloud Live resource state
dependencies gh Dependabot alerts for outdated deps

Graceful fallback to static analysis if CLI unavailable.

Consensus protocol (parallel verification)

The script automatically runs dual-agent consensus for security and architecture analyses — two read-only claude -p agents independently analyze, then output both results side-by-side. You act as judge.

Merge rules:

  • Both agents found it → high confidence, include as-is
  • Only one found it → include with [single-agent, verify] tag (false negatives are worse than false positives)
  • Agents contradict → include as [disputed, verify]
  • When saving: only high-confidence findings go to topics. Tagged findings go to timeline for user review.

Not used for: dead-code, debt, complexity, git-history, onboarding — low stakes, easily verified, or mechanical.

Relationship to other skills

  • knowledge: Day-to-day read/write; insights are optional deep dives.
  • knowledge-prune: Fixes drift in existing knowledge; insights produce new findings.
  • knowledge-rollup: Compression; not the same as running an insight pass.
#!/usr/bin/env bash
# Hook: knowledge-on-commit.sh
# Event: PostToolUse (Bash) with if: "Bash(git commit*)"
# Purpose: Add a timeline entry when Claude makes a git commit
# Location: ~/.claude/hooks/knowledge-on-commit.sh
# Config: async: true, if: "Bash(git commit*)"
set -euo pipefail
INPUT=$(cat)
CWD=$(echo "$INPUT" | jq -r '.cwd // empty')
# Only if knowledge dir exists
[[ -z "$CWD" ]] && exit 0
[[ -d "$CWD/knowledge" ]] || exit 0
# Get commit info from the repo
COMMIT_SHA=$(cd "$CWD" && git rev-parse --short HEAD 2>/dev/null || echo "unknown")
COMMIT_MSG=$(cd "$CWD" && git log -1 --pretty=%s 2>/dev/null || echo "unknown")
FILES_CHANGED=$(cd "$CWD" && git diff-tree --no-commit-id --name-only -r HEAD 2>/dev/null | head -10 || echo "")
# Skip trivial commits
if echo "$COMMIT_MSG" | grep -qiE '^(fix typo|fmt|format|lint|merge|bump|chore|wip)'; then
exit 0
fi
cd "$CWD"
TIMELINE_FILE="./knowledge/timeline/$(date +%Y-%m-%d).md"
claude -p "Add a brief timeline entry to $TIMELINE_FILE (create if needed).
Commit: $COMMIT_SHA — $COMMIT_MSG
Files: $FILES_CHANGED
Format:
## $(date +%H:%M) — $COMMIT_MSG
Commit $COMMIT_SHA. Files: $(echo "$FILES_CHANGED" | head -5 | tr '\n' ', ')
Refs: topics/{relevant-slug}
Read ./knowledge/INDEX.md first to find the right topic slug. Update the > summary line. That's it — max 3 tool calls." \
--allowedTools "Read,Write" \
--max-turns 3 \
> /dev/null 2>&1 || true
exit 0
#!/usr/bin/env bash
# Hook: knowledge-on-stop.sh
# Event: Stop
# Purpose: After a meaningful session, spawn a sub-agent to update knowledge
# Location: ~/.claude/hooks/knowledge-on-stop.sh
# Config: async: true (runs in background, doesn't block Claude)
set -euo pipefail
# Read Stop event JSON from stdin
INPUT=$(cat)
CWD=$(echo "$INPUT" | jq -r '.cwd // empty')
STOP_HOOK_ACTIVE=$(echo "$INPUT" | jq -r '.stop_hook_active // false')
LAST_MSG=$(echo "$INPUT" | jq -r '.last_assistant_message // empty')
TRANSCRIPT=$(echo "$INPUT" | jq -r '.transcript_path // empty')
# Don't run if CWD is empty or knowledge dir doesn't exist
[[ -z "$CWD" ]] && exit 0
[[ -d "$CWD/knowledge" ]] || exit 0
# Prevent infinite loops — if stop hook already triggered this turn, skip
[[ "$STOP_HOOK_ACTIVE" == "true" ]] && exit 0
# Skip trivial sessions: count file-modifying tool uses in transcript
if [[ -n "$TRANSCRIPT" && -f "$TRANSCRIPT" ]]; then
WRITE_COUNT=$(grep -cE '"tool_name"\s*:\s*"(Write|Edit|MultiEdit)"' "$TRANSCRIPT" 2>/dev/null || echo "0")
# Only update knowledge if files were actually modified
(( WRITE_COUNT < 1 )) && exit 0
fi
# Truncate last message to keep prompt small
CONTEXT="${LAST_MSG:0:2000}"
cd "$CWD"
claude -p "You are a knowledge maintenance agent. A Claude Code session just finished.
Claude's final message (truncated):
$CONTEXT
Your ONLY job: check if anything should be captured in ./knowledge/.
1. Read ./knowledge/INDEX.md
2. If code changes were made: add a 2-3 line timeline entry to ./knowledge/timeline/$(date +%Y-%m-%d).md (create if needed)
3. If a bug was fixed: add to relevant topic ## Postmortems
4. If a decision was made: add to relevant topic ## Decisions with DECIDED: on line 1
5. If a gotcha was discovered: add to relevant topic ## Gotchas
6. Update the day's > summary line
7. If nothing worth capturing happened, do nothing
Be extremely brief. Max 5 tool calls." \
--allowedTools "Read,Write" \
--max-turns 5 \
> /dev/null 2>&1 || true
exit 0
#!/usr/bin/env bash
# SCRIPT: knowledge-prune.sh
# PURPOSE: Detect drift, staleness, and contradictions — agent applies fixes
# TIER: safe (read-only — all claude -p calls use read-only tools, output to stdout)
# USAGE: ./knowledge/scripts/knowledge-prune.sh [./knowledge]
# REQUIRES: bash, git, claude CLI (optional — falls back to data dump)
# TOPICS: knowledge-management
# OUTPUT: Structured report with drift findings + consensus results — agent writes fixes
set -euo pipefail
KNOWLEDGE_DIR="${1:-./knowledge}"
if [[ ! -d "$KNOWLEDGE_DIR" ]]; then
echo "Error: $KNOWLEDGE_DIR not found. Run from project root."
exit 1
fi
READ_TOOLS="Read,Glob,Grep,Bash(git*),Bash(find*),Bash(ls*),Bash(head*),Bash(wc*),Bash(cat*),Bash(grep*),Bash(command*)"
echo "╔══════════════════════════════════════╗"
echo "║ Knowledge Prune — Scan ║"
echo "║ $(date +%Y-%m-%d) ║"
echo "╚══════════════════════════════════════╝"
# ── Phase 0: Structural health ──
echo ""
echo "═══ Phase 0: Structural Health ═══"
if [[ -f "$KNOWLEDGE_DIR/scripts/knowledge-health.sh" ]]; then
bash "$KNOWLEDGE_DIR/scripts/knowledge-health.sh" "$KNOWLEDGE_DIR" || true
else
echo "(health script not found — skipping structural check)"
fi
# ── Phase 1: Git-based staleness pre-filter ──
echo ""
echo "═══ Phase 1: Staleness Pre-filter ═══"
echo "Comparing topic Updated: dates against git history of referenced paths..."
echo ""
STALE_TOPICS=""
STALE_COUNT=0
TOPIC_LIST=$(find "$KNOWLEDGE_DIR/topics" -name "*.md" ! -name "_index.md" 2>/dev/null | sort || true)
for topic in $TOPIC_LIST; do
[[ -f "$topic" ]] || continue
updated=$(head -3 "$topic" | grep -Eo '[0-9]{4}-[0-9]{2}-[0-9]{2}' 2>/dev/null | head -1 || true)
[[ -z "$updated" ]] && { echo " ⚠ $(basename "$topic"): no Updated: date found"; continue; }
paths=$(grep -oE '`[^`]+\.(ts|tsx|js|jsx|py|go|rs|rb|java|sql|sh)`' "$topic" 2>/dev/null | tr -d '`' | sort -u || true)
[[ -z "$paths" ]] && continue
changes=""
while IFS= read -r ref_path; do
[[ -z "$ref_path" ]] && continue
if [[ -f "$ref_path" ]]; then
recent=$(git log --oneline --since="$updated" -- "$ref_path" 2>/dev/null | head -3)
[[ -n "$recent" ]] && changes="${changes}\n ${ref_path}:\n${recent}"
elif git log --oneline -1 --diff-filter=D -- "$ref_path" &>/dev/null 2>&1; then
changes="${changes}\n ${ref_path}: FILE DELETED"
fi
done <<< "$paths"
if [[ -n "$changes" ]]; then
echo " STALE: $(basename "$topic") (updated: $updated)"
echo -e "$changes"
echo ""
STALE_TOPICS="${STALE_TOPICS} ${topic}"
STALE_COUNT=$((STALE_COUNT + 1))
fi
done
echo "Stale topics found: $STALE_COUNT"
# ── Phase 2: AI drift detection with consensus (only for stale topics) ──
echo ""
echo "═══ Phase 2: Drift Detection (Consensus) ═══"
if [[ $STALE_COUNT -gt 0 ]] && command -v claude &>/dev/null; then
STALE_LIST=$(echo "$STALE_TOPICS" | tr ' ' '\n' | grep -v '^$' | tr '\n' ',' | sed 's/,$//')
echo "Running dual-agent drift analysis on stale topics..."
echo ""
# Agent A & B: independent drift analysis (use temp files for background capture)
DRIFT_TMP_A=$(mktemp)
DRIFT_TMP_B=$(mktemp)
trap "rm -f $DRIFT_TMP_A $DRIFT_TMP_B" EXIT
claude -p "You are a knowledge base auditor. Check ONLY these topic files for drift against the actual codebase:
${STALE_LIST}
For each topic:
1. Read the topic file
2. For every verifiable claim (file paths, config values, versions, APIs, schema), check the actual code
3. List each claim and whether it's ACCURATE, STALE, or WRONG
Format:
TOPIC: {filename}
- CLAIM: {what topic says} → STATUS: {ACCURATE|STALE|WRONG} → ACTUAL: {what code shows}
Be precise. Only flag things you can verify from the code." \
--allowedTools "$READ_TOOLS" \
--max-turns 25 \
--output-format text > "$DRIFT_TMP_A" 2>/dev/null &
PID_A=$!
claude -p "You are a knowledge base auditor. Check ONLY these topic files for drift against the actual codebase:
${STALE_LIST}
For each topic:
1. Read the topic file
2. For every verifiable claim (file paths, config values, versions, APIs, schema), check the actual code
3. List each claim and whether it's ACCURATE, STALE, or WRONG
Format:
TOPIC: {filename}
- CLAIM: {what topic says} → STATUS: {ACCURATE|STALE|WRONG} → ACTUAL: {what code shows}
Be precise. Only flag things you can verify from the code." \
--allowedTools "$READ_TOOLS" \
--max-turns 25 \
--output-format text > "$DRIFT_TMP_B" 2>/dev/null &
PID_B=$!
wait $PID_A $PID_B 2>/dev/null || true
echo "--- Agent A Findings ---"
cat "$DRIFT_TMP_A"
echo ""
echo "--- Agent B Findings ---"
cat "$DRIFT_TMP_B"
echo ""
echo "--- Consensus Rules for Caller Agent ---"
echo " BOTH say WRONG → apply correction with [was: old_value]"
echo " BOTH say ACCURATE → no action"
echo " DISAGREE → mark [?] in topic, do not change the value"
echo " ONE says WRONG, other silent → add [?] verify: {concern}"
else
if [[ $STALE_COUNT -gt 0 ]]; then
echo "(claude CLI not available — dumping topic content for manual review)"
echo ""
for topic in $STALE_TOPICS; do
[[ -f "$topic" ]] || continue
echo "--- $(basename "$topic") ---"
cat "$topic"
echo ""
done
else
echo "No stale topics — skipping drift detection"
fi
fi
# ── Phase 3: Postmortem gaps ──
echo ""
echo "═══ Phase 3: Postmortem Gaps ═══"
if command -v git &>/dev/null && [[ -d .git ]]; then
echo "Recent bug-fix commits (30 days):"
git log --oneline --since="30 days" --grep="fix\|bug\|hotfix\|revert\|patch" -i 2>/dev/null || echo "(none)"
echo ""
echo "Existing postmortem entries:"
grep -rn "## Postmortems" -A 5 "$KNOWLEDGE_DIR/topics/" 2>/dev/null || echo "(none found)"
else
echo "(no git — skipping)"
fi
# ── Phase 4: Decision format check ──
echo ""
echo "═══ Phase 4: Decision Format ═══"
echo "All ## Decisions entries:"
grep -rn "DECIDED\|## Decisions" -A 3 "$KNOWLEDGE_DIR/topics/" 2>/dev/null || echo "(none)"
echo ""
echo "Decisions in git commit messages (for cross-reference):"
git log --oneline --all --grep="DECIDED\|decision\|chose\|migrate" -i --since="90 days" 2>/dev/null | head -20 || echo "(none)"
# ── Phase 5: ENVIRONMENT.md check ──
echo ""
echo "═══ Phase 5: ENVIRONMENT.md ═══"
if [[ -f "$KNOWLEDGE_DIR/ENVIRONMENT.md" ]]; then
echo "--- Current ENVIRONMENT.md ---"
cat "$KNOWLEDGE_DIR/ENVIRONMENT.md"
echo ""
echo "CLI availability check:"
grep -oE '`(aws|gcloud|gh|sentry-cli|vercel|fly|kubectl|docker|terraform)[^`]*`' "$KNOWLEDGE_DIR/ENVIRONMENT.md" 2>/dev/null | tr -d '`' | while read -r cmd; do
base=$(echo "$cmd" | awk '{print $1}')
if command -v "$base" &>/dev/null; then
echo " ✓ $base available"
else
echo " ✗ $base NOT FOUND"
fi
done
else
echo "⚠ ENVIRONMENT.md missing — agent should create it"
fi
# ── Done ──
echo ""
echo "═══ Scan Complete ═══"
echo "The caller agent should now:"
echo " 1. Apply consensus-agreed drift corrections (Phase 2)"
echo " 2. Add missing postmortems for non-trivial bug fixes (Phase 3)"
echo " 3. Fix decision format issues (Phase 4)"
echo " 4. Update ENVIRONMENT.md if needed (Phase 5)"
echo " 5. Review changes with: git diff $KNOWLEDGE_DIR"
name knowledge-prune
description Detect and fix problems in the ./knowledge directory: contradictions, stale facts, drift from codebase, broken references, missing postmortems, undocumented decisions. Use when: - The user says "prune/fix/repair knowledge" or "check knowledge health" - You notice a topic file contradicts actual code - You suspect knowledge has drifted from reality - Before a major refactor (ensure working from accurate context) Diagnostic + repair operation. For bulk compression, use knowledge-rollup instead. Dispatched by the knowledge skill when drift is detected.

Knowledge Prune — Detect & Fix Problems

Run ./knowledge/scripts/knowledge-prune.sh — it scans for issues and outputs a report. The script never modifies files; you read its output and apply fixes. Review your changes with git diff ./knowledge.

For bulk file reduction, use knowledge-rollup instead.

Primary workflow

./knowledge/scripts/knowledge-prune.sh              # scan and report (read-only)
./knowledge/scripts/knowledge-prune.sh ./knowledge   # explicit knowledge dir

The script outputs: structural health, staleness pre-filter, drift analysis (with dual-agent consensus if claude CLI is available), postmortem gaps, decision format issues, ENVIRONMENT.md verification. You apply the fixes.

Phases

Phase 0: Git-based staleness pre-filter (cheap — do this first)

Before expensive drift detection, use git to identify which topics might have drifted:

# For each topic, check if related code changed after the topic's Updated: date
for topic in ./knowledge/topics/**/*.md; do
  updated=$(grep -m1 'Updated:' "$topic" | grep -oP '\d{4}-\d{2}-\d{2}')
  # Extract paths mentioned in the topic
  paths=$(grep -oP '`[^`]+\.(ts|py|go|rs|js|tsx|jsx)`' "$topic" | tr -d '`')
  for path in $paths; do
    [ -f "$path" ] && git log --oneline --since="$updated" -- "$path" | head -3
  done
done

This identifies topics where the underlying code changed since the last knowledge update. Only those topics need full drift detection — skip the rest.

Output: List of topics with stale indicators, sorted by staleness severity.

Phase 1: Structural health

Runs ./knowledge/scripts/knowledge-health.sh if present (counts, sizes, orphans, staleness).

Phase 2: Drift detection

Compare topic claims to the repo — but only for topics flagged by Phase 0. Fix contradictions, resolve [?] where possible, log timeline entries when fixing.

Git-powered drift signals:

Signal Check
Topic says "we use X" git log --oneline -5 --all -- {related path} — any recent migration away from X?
Topic references a file path Does the file still exist? git log --diff-filter=D -- {path} — was it deleted?
Topic has a decision git log --grep="{keyword}" --oneline — was it reversed in a commit message?
Topic claims a version/dep Check package.json/go.mod/etc. for current version

Phase 3: Postmortems

Match recent bug fixes to ## Postmortems in topics. Check git for non-trivial fixes:

# Find commits that look like bug fixes without corresponding postmortems
git log --oneline --since="30 days" --grep="fix\|bug\|hotfix\|revert" -i

Add missing postmortem entries for non-trivial fixes (>15 min to diagnose, or affected production).

Phase 4: Decisions

Ensure ## Decisions lines use format: [YYYY-MM-DD] DECIDED: {outcome} — {rationale} | {sha}.

Cross-reference git log --grep="DECIDED\|decision\|chose\|migrate" to catch decisions recorded in commit messages but missing from topics.

Phase 5: ENVIRONMENT.md verification

If ENVIRONMENT.md exists, verify:

  • Listed CLIs still exist (command -v {cli})
  • Git shortcut commands still return results (paths haven't moved)
  • CI/CD references are current (workflow files still exist)

Consensus protocol (parallel verification)

Drift corrections and postmortem root causes are high-stakes writes — a single agent can hallucinate a "fix" that corrupts accurate knowledge.

The script handles Phase 2 consensus automatically when claude CLI is available: it runs two read-only agents that independently check stale topics for drift, then outputs both results side-by-side with merge rules. You (the caller agent) act as judge and apply only agreed corrections.

When the script isn't available (or for Phase 3 postmortems), use the Agent tool to spawn two parallel Explore subagents that independently verify the same claim, then merge their results.

Phase Trigger Why
Phase 2: Drift fixes Any correction that changes a factual claim Wrong "fix" = corrupted knowledge that cascades
Phase 3: Postmortems Root cause attribution for production incidents Wrong cause → wrong prevention → repeat incident

Merge rules:

  • Agreement (both say the same thing) → apply the correction, add [was: {old}] trail
  • Disagreement (agents contradict each other) → mark with [?], do not apply, flag for user
  • One says change, one says keep → keep existing value, add [?] verify: {concern}

When NOT to use consensus:

  • Structural fixes (missing _index.md, format issues) — deterministic
  • Decision format cleanup — mechanical
  • ENVIRONMENT.md CLI checks — verifiable with command -v

Manual fallback (no CLI)

Structural integrity

  • INDEX.md exists with trigger table
  • SUMMARY.md exists, <40 lines
  • ENVIRONMENT.md exists and CLIs are current
  • Every category has _index.md
  • No orphan topic files; budgets met (≤50 topics, ≤10 timeline files)

Quick git-based check

# What changed recently that knowledge might not reflect?
git log --oneline --since="2 weeks" --stat | head -40
# Compare against recent timeline entries
ls ./knowledge/timeline/

Contradictions and drift

Verify paths, config values, versions, APIs, schema, and architecture claims against the repo. On fix: update the topic, reference the SHA, add timeline note: Prune: corrected {fact} in {topic}.

Staleness, decisions, postmortems

  • Spot-check old topics; refresh dates or fix drift.
  • Reconcile reversed decisions with new entries.
  • Add postmortems for non-trivial bug fixes missing from topics.
  • Resolve [?] from code or safe scripts; list what remains.

Report template

Knowledge Health Report — YYYY-MM-DD
─────────────────────────────────────
Files: N topics, N timeline, N diagrams, N scripts
Topics checked: N (skipped N — no code changes since last update)
Contradictions found: N (fixed: N)
Stale topics refreshed: N
Decisions without outcomes: N (fixed: N)
Missing postmortems added: N
[?] items resolved: N / remaining: N
ENVIRONMENT.md: {current | needs update | missing}
#!/usr/bin/env bash
# SCRIPT: knowledge-rollup.sh
# PURPOSE: Analyze knowledge base for bloat and recommend consolidation — agent applies changes
# TIER: safe (read-only — all claude -p calls use read-only tools, output to stdout)
# USAGE: ./knowledge/scripts/knowledge-rollup.sh [./knowledge]
# REQUIRES: bash, git, claude CLI (optional — falls back to data dump)
# TOPICS: knowledge-management
# OUTPUT: Consolidation plan to stdout — agent writes all changes
set -euo pipefail
KNOWLEDGE_DIR="${1:-./knowledge}"
if [[ ! -d "$KNOWLEDGE_DIR" ]]; then
echo "Error: $KNOWLEDGE_DIR not found. Run from project root."
exit 1
fi
READ_TOOLS="Read,Glob,Grep,Bash(git*),Bash(find*),Bash(ls*),Bash(head*),Bash(wc*),Bash(cat*)"
echo "╔══════════════════════════════════════╗"
echo "║ Knowledge Rollup — Analysis ║"
echo "║ $(date +%Y-%m-%d) ║"
echo "╚══════════════════════════════════════╝"
# ── Section 1: Timeline analysis ──
echo ""
echo "═══ Section 1: Timeline ═══"
TIMELINE_FILES=$(find "$KNOWLEDGE_DIR/timeline" -name "*.md" ! -name "_index.md" ! -path "*/archive/*" 2>/dev/null | sort)
if [[ -z "$TIMELINE_FILES" ]]; then
TIMELINE_COUNT=0
else
TIMELINE_COUNT=$(echo "$TIMELINE_FILES" | wc -l | tr -d ' ')
fi
echo "Active timeline files: $TIMELINE_COUNT (budget: ≤10)"
if (( TIMELINE_COUNT > 10 )); then
echo "⚠ OVER BUDGET — needs compression"
echo ""
echo "--- Timeline files by date ---"
echo "$TIMELINE_FILES"
echo ""
echo "--- Content of older files (candidates for archive/promotion) ---"
echo "$TIMELINE_FILES" | head -$((TIMELINE_COUNT - 10)) | while read -r f; do
[[ -f "$f" ]] || continue
echo "--- $(basename "$f") ---"
cat "$f"
echo ""
done
else
echo "✓ Within budget"
fi
# ── Section 2: Category sizes ──
echo ""
echo "═══ Section 2: Category Sizes ═══"
BLOATED_CATS=""
for dir in "$KNOWLEDGE_DIR/topics"/*/; do
[[ -d "$dir" ]] || continue
CAT=$(basename "$dir")
COUNT=$(find "$dir" -name "*.md" ! -name "_index.md" 2>/dev/null | wc -l | tr -d ' ')
if (( COUNT > 5 )); then
echo "⚠ $CAT/: $COUNT files (max 5) — needs consolidation"
BLOATED_CATS="${BLOATED_CATS} ${dir}"
echo " Files:"
find "$dir" -name "*.md" ! -name "_index.md" -exec sh -c 'echo " $(basename "$1") — $(wc -l < "$1") lines"' _ {} \;
else
echo "✓ $CAT/: $COUNT files"
fi
done
# ── Section 3: Oversized files ──
echo ""
echo "═══ Section 3: Oversized Files ═══"
OVERSIZED=""
SIZE_LIST=$(find "$KNOWLEDGE_DIR/topics" -name "*.md" 2>/dev/null || true)
for f in $SIZE_LIST; do
[[ -f "$f" ]] || continue
lines=$(wc -l < "$f" | tr -d ' ')
if (( lines > 100 )); then
echo "⚠ $(basename "$f"): $lines lines (max 100)"
OVERSIZED="${OVERSIZED} ${f}"
fi
done
if [[ -f "$KNOWLEDGE_DIR/INDEX.md" ]]; then
lines=$(wc -l < "$KNOWLEDGE_DIR/INDEX.md" | tr -d ' ')
(( lines > 60 )) && echo "⚠ INDEX.md: $lines lines (max 60)"
fi
[[ -z "$OVERSIZED" ]] && echo "✓ All files within size budget"
# ── Section 4: AI consolidation plan (if needed) ──
echo ""
echo "═══ Section 4: Consolidation Plan ═══"
NEEDS_WORK=false
(( TIMELINE_COUNT > 10 )) && NEEDS_WORK=true
[[ -n "$BLOATED_CATS" ]] && NEEDS_WORK=true
[[ -n "$OVERSIZED" ]] && NEEDS_WORK=true
if [[ "$NEEDS_WORK" == true ]] && command -v claude &>/dev/null; then
claude -p "You are analyzing a knowledge base for consolidation. Read the current state and produce a consolidation plan.
Knowledge directory: $KNOWLEDGE_DIR
Issues found:
- Timeline files: $TIMELINE_COUNT (budget: 10)
- Bloated categories:$BLOATED_CATS
- Oversized files:$OVERSIZED
For each issue, output a specific plan:
TIMELINE PLAN:
- Which day files to archive (>30 days old)
- Which entries contain decisions/gotchas to promote to topics BEFORE archiving
- Which entries are just commit restatements that can be deleted entirely
CATEGORY PLAN:
- Which files to merge and proposed merged filename
- Which content to keep vs compress
TRIM PLAN:
- For each oversized file: which sections are verbose, which restate code (replace with path refs)
- Never delete decisions, gotchas, or postmortems
Also check: do any referenced file paths in topics still exist?
List any STALE PATHS found.
Output the plan — do NOT modify any files." \
--allowedTools "$READ_TOOLS" \
--max-turns 20 \
--output-format text 2>/dev/null || echo "(claude analysis failed — agent should plan manually)"
else
if [[ "$NEEDS_WORK" == false ]]; then
echo "✓ Knowledge base is within all budgets — no rollup needed"
else
echo "(claude CLI not available — agent should plan consolidation from data above)"
fi
fi
# ── Section 5: Stale path check ──
echo ""
echo "═══ Section 5: Stale Path Check ═══"
echo "Checking file paths referenced in topics..."
STALE_PATHS=0
{ grep -rohE '`[^`]*/[^`]*\.(ts|tsx|js|jsx|py|go|rs|rb|java|sql|sh)`' "$KNOWLEDGE_DIR/topics/" 2>/dev/null || true; } | tr -d '`' | sort -u | while read -r ref; do
if [[ ! -f "$ref" ]]; then
echo " STALE: $ref"
STALE_PATHS=$((STALE_PATHS + 1))
fi
done
echo "Done."
# ── Done ──
echo ""
echo "═══ Analysis Complete ═══"
echo "The caller agent should now:"
echo " 1. Apply the consolidation plan (timeline → archive, categories → merge, files → trim)"
echo " 2. Fix any stale paths found"
echo " 3. Rebuild INDEX.md, topics/_index.md, timeline/_index.md"
echo " 4. Review changes with: git diff $KNOWLEDGE_DIR"
name knowledge-rollup
description Consolidate and compress the ./knowledge directory. Use when: - The user says "roll up knowledge", "consolidate knowledge", "compress timeline" - Timeline has 15+ day files - A topic category has more than 5 files - The knowledge base feels bloated or slow to navigate - At the end of a sprint/week as routine maintenance Medium-weight operation that reads most of the knowledge base and rewrites portions. Dispatched by the knowledge skill when size budgets are exceeded.

Knowledge Rollup — Consolidate & Compress

Run ./knowledge/scripts/knowledge-rollup.sh — it analyzes the knowledge base for bloat and outputs a consolidation plan. The script never modifies files; you read its output and apply changes. Review with git diff ./knowledge.

Primary workflow

./knowledge/scripts/knowledge-rollup.sh              # analyze and plan (read-only)
./knowledge/scripts/knowledge-rollup.sh ./knowledge   # explicit knowledge dir

The script outputs: timeline file counts, bloated categories, oversized files, stale paths, and (if claude CLI is available) an AI-generated consolidation plan. You apply the changes.

Phases

Phase 1: Timeline compression

If >10 day files:

  1. Promote decisions/gotchas/postmortems from old day files into topic files (if not already there).
  2. Strip what git already knows. If a timeline entry is just "implemented X" with no why context, delete it — git log has this.
  3. Archive remaining days older than ~30 days to timeline/archive/YYYY-MM.md.
  4. Keep recent day files lean. Update timeline/_index.md.

Key principle: After compression, timeline entries should contain only things you can't get from git log --oneline --since="30 days". If an entry adds no semantic value beyond the commit message, it should be deleted, not archived.

Phase 2: Topic consolidation

For each category with >5 topic files:

  1. Merge related files (e.g., multiple audits → one table).
  2. Preserve all ## Decisions, ## Gotchas, ## Postmortems — these are never deleted.
  3. Replace code restatements with git references: See: {sha} or git log --oneline -5 -- {path}.
  4. Update category _index.md.

Phase 3: Oversized trim

Topic files >100 lines:

  1. Compress without deleting decisions/gotchas/postmortems.
  2. Replace inline code snippets with path references.
  3. Collapse verbose Key Facts into tighter single lines.
  4. Replace prose Context sections with 2-sentence summaries.

Phase 4: Index refresh

Rebuild topics/_index.md, timeline/_index.md, INDEX.md, and SUMMARY.md if system changed.

Phase 5: Post-rollup git verification

After compression, spot-check that the compressed knowledge still aligns with current code:

# Quick sanity check — are paths referenced in topics still valid?
grep -roh '`[^`]*/[^`]*\.\(ts\|py\|go\|rs\|js\)' ./knowledge/topics/ | tr -d '`' | sort -u | while read f; do
  [ ! -f "$f" ] && echo "STALE PATH: $f"
done

Flag any stale paths found during compression.

Constraints

  • Timeline: ≤10 active day files; older → monthly archives or deleted if git covers it.
  • Categories: ≤5 topic files per category; ≤~50 total.
  • Topic files: ≤100 lines; INDEX.md ≤60 lines; ENVIRONMENT.md ≤40 lines.
  • Audit pattern: Multiple findings → one topic file with summary table.

--dry-run

Prints what would happen for timeline/category/trim; skips index-refresh.

Manual fallback

  1. Timeline: Promote decisions/gotchas from old days into topics. Delete entries that just restate commit messages. Archive days >30 days old.
  2. Topics: Merge overloaded categories. Use summary tables for multiple findings.
  3. Trim: Shorten >100 line files. Replace code with path refs. Collapse Key Facts.
  4. Indexes: Refresh INDEX.md (trigger table, counts, recent timeline, stale list).
  5. Verify: Check that referenced paths still exist in the repo.

Report

After rollup, tell the user:

  • Files before / after
  • What was archived, merged, or deleted
  • Any stale paths found
  • Any [?] items remaining

Knowledge Management Scripts

6 scripts in ./knowledge/scripts/ | Last updated: 2026-04-10

All scripts output to stdout — they never modify knowledge files. The caller agent reads output and applies changes.

Name Tier Description Requires AI-powered?
knowledge-health.sh safe Structural health check — file counts, size violations, orphans, staleness bash No
knowledge-init.sh safe Scan codebase + generate bootstrap plan + environment consensus bash, git, claude CLI Yes (read-only)
knowledge-rollup.sh safe Analyze bloat + generate consolidation plan bash, git, claude CLI Yes (read-only)
knowledge-prune.sh safe Detect staleness, drift (dual-agent consensus), postmortem gaps bash, git, claude CLI Yes (read-only)
knowledge-insights.sh safe Targeted codebase analysis (13 types, some with consensus) bash, git, claude CLI Yes (read-only)
knowledge-install.sh safe Copy script suite into a project's ./knowledge/scripts/ bash No
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment