I decided to run a proper benchmark to see how these patterns actually behave under the hood. Tested on Windows x64 (LuaJIT 2.1) with process isolation and ASM inspection. (Full disclosure: I was too lazy to boot into Linux for this, even though perf and jit.v support is admittedly better over there, but the results are clear enough.
Each of the 5 base layouts is tested in two equivalent semantics:
br= branchful:if active then x += vx endnbr= branchless:x += vx * active
With active ∈ {true/1, false/0} both produce identical output.