Per,
Patching inline-caches is rare, so taking a lock & doing single-threaded updates is fine. No CAS needed for performance, nor is false-sharing an issue (this data is also code, so nearly always resides with other read-only code).
Patching typically covers a bunch of X86 ops, at least 2 but maybe 3 or 4, depending. If any of the updates to these words happens on partial instructions, a racing other CPU might see a partial update.
Patching covers a set of X86 instruction words, more than fits in an 8-byte CAS. 16-byte CAS's must be aligned properly.
Putting all this together, these constraints imply:
- Updates are done via CAS covering whole instructions. No CAS spans a partial instruction.