Skip to content

Instantly share code, notes, and snippets.

@michaellady
Created May 16, 2026 19:05
Show Gist options
  • Select an option

  • Save michaellady/7e63223d5d72d9ad18a03efa1f376aae to your computer and use it in GitHub Desktop.

Select an option

Save michaellady/7e63223d5d72d9ad18a03efa1f376aae to your computer and use it in GitHub Desktop.
Seven broadly-applicable lessons from a 1M-line AI-orchestrated rewrite — Bun PR #30412 Zig→Rust port (May 2026). For the underlying investigation see https://gist.github.com/michaellady/7d552137fb1e37ab9bf637e450016c25

Seven lessons from a 1M-line AI-orchestrated rewrite — broadly applicable

Distilled from a multi-week investigation of Bun PR #30412 (the AI-orchestrated Zig→Rust port, merged 2026-05-14). Five rounds of adversarial review on the underlying analysis. The specifics are Bun's; the lessons travel.

For the full investigation: https://gist.github.com/michaellady/7d552137fb1e37ab9bf637e450016c25


1. Facts before plans

PR #30412's most original move was extracting verified source facts before writing the rewrite plan. A LIFETIMES.tsv (2,253 rows pre-classifying every pointer's ownership class) and a 1,355-line verified-claims corpus (each claim cited to file:line, surviving 3-vote adversarial review) preceded the architecture doc. The plan was then a derived artifact of agreed-upon source facts.

Why it works: when agents disagree mid-port, the tiebreak references an immutable peer-verified fact instead of re-litigating from source. This inverts how most teams organize rewrites — plan first, then verify against reality.

Take it home: write the facts file before the plan file. For a rewrite, that's a per-field ownership table. For a feature, that's a per-call-site inventory. For a refactor, that's a list of every contract the change must preserve.


2. Default-deny on AI verification

The literal prompt in PR #30412's tiebreak workflow: "Default confirmed=false unless you verify against .zig." Verifiers across phases were biased to refuted unless they could cite both source file:line AND target file:line AND an observable divergence.

Why it works: cost asymmetry. A false-positive flag wastes one agent-round; a false-negative ships a bug to production. Bias the system toward catching too much.

Take it home: when you write an AI-verifier prompt, the default answer should be "this is broken." Make the agent prove the code is correct, not the other way around. Add the literal sentence to the prompt — model behavior follows model instructions.


3. The class/instance gap: systemic fix + per-site re-audit

The Bun audit found 23 ASM-confirmed LLVM noalias miscompiles — a whole class of bugs caused by Rust's &mut self promising exclusivity that the compiler exploited across re-entrant JS callbacks. The team did exactly the right thing: they fixed the codegen layer (generate-classes.ts now defaults to &self + interior mutability for all JS-exposed host functions). Then they patched the 23 sites individually.

One escaped. handle_reading in the SSL wrapper still ships with &mut self and no launder — while its three sibling methods (handle_writing, update_handshake_state, shutdown) all got the fix with explicit comments. The audit listed all four as the same cluster. One was missed.

The transferable lesson: class-level systemic fixes are necessary but not sufficient. After the systemic fix lands, re-run the per-site audit against the new code — at least one instance will be in a path the systemic change doesn't cover.

Take it home: fix the class. Then re-audit the instances. They escape.


4. Agents find well, repair poorly — always gate the fixes

Eight days after merge, the team landed a 36-finding security hardening PR. Three of those findings were originally auto-applied agent fixes that had to be dropped because each one introduced a new bug of an adjacent class:

  • An SSL-leak fix that introduced a use-after-free
  • A YAML-merge dedup that stored a non-'static byte view in a 'static field
  • An archive-overwrite precheck that added dead gating that did not close the traversal

A human reviewer caught all three.

The transferable lesson: agents are good at finding bugs and decent at proposing fixes. They're not good at validating that their own fix didn't introduce a new bug of an adjacent class. If you let agents apply fixes, install an independent gate that re-runs the safety property against the fix, not just against the original bug.

Take it home: trust agents to find. Distrust agents to repair. Always run a second pass — independent agent, human, or property test — that asks "does this fix introduce a new problem?"


5. Bounded loops with explicit IOUs

Every phase in the PR's orchestration had a round cap: 12 rounds for the test phase, 80 for the structural-fix phase, 100 for the per-crate shard phase. When an agent couldn't fix something within the cap, it wrote todo!("blocked_on: X::Y") — an explicit IOU that a later phase processes.

At merge time, the count of unresolved blocked_on: markers was zero. The loops emptied their own queues.

Why it matters: "agent runs in a loop forever" is the #1 failure mode of multi-agent orchestration. The recipe to prevent it is: a hard cap + an IOU mechanism + a phase that consumes the IOUs.

Take it home: never deploy an unbounded agent loop. Always bound it (round count or time budget). Always have a structured way for the agent to say "I can't do this, here's what's blocked." Always have a downstream consumer of those IOUs.


6. Cheap continuous gates compound

The merged Bun codebase contains 148 const _: () = assert!(...) compile-time layout asserts. Each one is ~5 lines, costs effectively nothing at runtime, and catches a whole class of cross-language drift — when a type's size, repr, or padding changes, the build breaks. PR #30722's post-merge security audit even added offset_of! asserts for struct padding bytes in serialization paths — same pattern, finer grain.

Why it matters: mechanical gates that block PRs are the cheapest way to prevent a class of bug from ever recurring. If a gate is one regex + one CI line, ship 100 of them.

Take it home: the cheapest gate that catches a whole class of bug is worth shipping repeatedly. Compile-time asserts, lint rules, regex in CI, schema validators — all of these compound when there are 100 of them in the tree.


7. Methodology preservation = peer review

PR #30412 was squash-merged into main as a single commit. The 5,512 first-parent commits, 32 claude/* sub-branches, 87 merge points, and 49+ self-correcting Revert commits — none of them appear in main's history. A future reader of git log main sees one line: "Rewrite Bun in Rust (#30412)."

In the commit before merge, every audit document, every workflow file, every spec, and the LIFETIMES.tsv were deleted from the tree. The most rigorous AI orchestration I've ever examined left no preserved trail of its own rigor in main.

The transferable lesson: if you orchestrate AI work, preserve the orchestration. The branch graph, the audit docs, the workflow JSON, the closed-item history — all of it is what makes the methodology peer-reviewable. Without it, the methodology is unfalsifiable: you have to take the author's word that it ran.

Take it home: don't squash an AI-orchestrated PR. Don't delete the audit docs at merge. Strike through closed items; don't remove them. Future-you, maintaining the code, needs the trail to trust-but-verify.


The meta-lesson

The methodology works at the class level and leaks at the instance level.

Every failure I found in PR #30412 — the escaped handle_reading miscompile, the 36 post-merge security findings, the 3 buggy agent auto-fixes — is the long tail of class-level work. The audit caught the class of bug. The codegen change addressed the class. The per-site pass closed 22 of 23 instances. The one that escaped was an instance, not a class.

The fix is not "less AI." It's class-level rigor (which AI does well) plus a human-or-tool gate that re-runs the per-class check against what actually shipped (which AI does poorly).

If you take one thing from PR #30412's methodology: invest in the class-level work, and then invest equally in the re-audit pass that catches the instances class-level work missed.


Investigation methodology: I cloned the PR locally, recovered six deleted audit documents from the pre-deletion commit, dispatched five parallel research agents (branch ecosystem, workflow internals, code-vs-audit verification, post-merge bugs, missed angles), then ran five rounds of three-reviewer adversarial review (Claude + Codex + Gemini) on every claim above. The handle_reading miscompile, the R-2 codegen-fix verification, the 32-branch census, and the post-merge bug taxonomy were all verified directly against origin/main two days post-merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment