You gave the agent a clear task. It wrote working code. It passed the tests. And it completely ignored the design you spent two weeks specifying.
If you're using LLM-driven development on anything larger than a weekend project, you've hit this wall. The agent doesn't know your architecture exists. It can't read your mind, and it definitely didn't read the spec in docs/specs/billing/spec.md. So it improvises. It invents its own abstractions, picks its own patterns, and builds something that works in isolation but drifts from the system you're trying to build.
This is the central scaling problem of AI-assisted development. Not "can the agent write code" — it can. The problem is: can it write code that belongs in your system?