Date: 2026-03-27 → 2026-04-01
Severity: High
Status: Resolved — fixes implemented and empirically verified on backend-audit
Relates to: Network Upload RCA (2026-03-24), PR #203
Date: 2026-03-27 Severity: High Status: Root cause identified, monitor deployed, fixes proposed Relates to: Network Upload RCA (2026-03-24), PR #203
Date: 2026-03-26 Bugs addressed: #202, #83, #84, #86, #200
The rae daemon has no coherent process lifecycle management. Child processes (SDK subprocesses, uvicorn) are spawned without cleanup guarantees. Signals are partially handled. Shutdown is best-effort with no timeout. The result is a class of bugs where daemon termination — graceful or ungraceful — leaves orphaned processes consuming memory, CPU, and network connections indefinitely.
Observed impact (March 2026): 237 orphaned claude_agent_sdk subprocesses on a single developer machine, consuming ~10.8 GB RAM with 171 active connections to Anthropic's API.
Issue context: The rae proxy requires a custom Caddy binary with the
rae_observeplugin. Currently the binary is not distributed, not documented, and not discoverable — developers must manually build it with xcaddy and point to it via env var.
Goal: Bundle the custom Caddy binary in the rae wheel (like the Lightfield SPA), provide a local dev setup script, and implement runtime discovery with validation and graceful degradation.
| Persona | Description | Needs |
|---|
Issue: raelabs/rae-monorepo#137
Branch: 137-thrash-score-enrichment (based on backend-audit)
Date: 2026-03-19
A new enrichment stage in the normalizer → enricher → materializer pipeline that computes a real-time thrash score for each span. The thrash score is a composite of 7 weighted indicators that detect when an AI coding agent is looping, repeating work, or wasting cycles.
Context: Evaluating Plano as a LiteLLM replacement for rae's idle-state and waiting-on-whom classification (Issue #139). The core problem is a 3–35s signal gap during LLM inference where Claude Code's JSONL is silent — we need proxy-level hooks to bracket this blind spot.
Plano v0.4.11 cannot close the inference blind spot as-shipped. It lacks the real-time pre-request hooks that LiteLLM's log_pre_api_call() provides. The gap is fixable upstream with small changes, but isn't available today.
| # Lightfield Local - Setup Instructions | |
| ## Prerequisites | |
| - **uv** (Python package manager): `curl -LsSf https://astral.sh/uv/install.sh | sh` | |
| - **bun** (JS runtime): `curl -fsSL https://bun.sh/install | bash` | |
| - **Claude Code** running (it creates `~/.claude/` with the SQLite database) | |
| ## Quick Start |