All assembly snippets are generated from intx. Compiled by clang 9 (trung), -O3 -march=skylake.
The first one mul256_x86_64.s is the one evmone is using.
It exploits the fact that x86 has mul
instruction performing full 64x64 -> 128 multiplication.
See umul()
procedure in intx.
This instruction cannot be used by wasm,
but I'm working on teaching LLVM to recognize the 64x64 -> 128 mul pattern.
Instruction count: