A pilot empirical study of refusal-to-confabulate in action-taking LLM agents, plus the open-source framework I built so anyone can replicate it.
On a Saturday in May 2026 I gave Claude Opus 4.7, running in the Claude Code harness with read access to my Gmail and Drive, two consecutive goals: