Three autonomous code-generation approaches implemented the same ~600-line Python project across three runs each: a solo Opus 4.6 agent, Opus 4.6 orchestrating Sonnet 4.6 sub-agents, and Opus 4.6 operating with a custom Agentic Project System (APS) skill set. All agents were provided the same fully specified plan document to work from. Solo Opus was fastest, cheapest, and had the lowest variance. APS produced the highest code quality at 51% higher cost. And Team delegation finished last on every measured dimension. At this task size, execution-structure scaffolding improves output even when working from a plan document.
Autonomous AI code generation architectures are proliferating (single agents, team orchestration frameworks, full project pipelines) but empirical comparisons are thin on the ground. This study runs the same well-specified project through three build approaches and measures speed, cost, reproducibility, and code quality across three runs of each.
The three approaches tested