The goal is to specify a syntax for an EVM assembly language, which can be used across various tools.
The format should be human readable, map EVM as closely as possible, allow for comments and refrain from complex syntax.
- Opcodes are upper case only
- Opcodes are separated with white space (including, but not limited to, space, tab, new line)
- Every EVM instruction is a valid opcode
- With the exception of
PUSH
, none of the opcodes have an argument - The argument of
PUSH
is also separated by white space - Argument of
PUSH
is either a decimal or a hexadecimal number (prefixed with0x
) PUSH1..32
is defined to push data with exact lengthPUSH
is an alias toPUSH32
- Comments are denoted by
;;
and the rest of the line is ignored PUSH
accepts a special syntax for jump labels (PUSH [labelname]
)- Labels are identifiers followed by a colon. When referenced in a push, their offset in the bytecode is pushed to the stack. Note:
JUMPDEST
needs to follow a label. - Literal data, not to be processed by the assembler, must be hexademical digits following the pseudo opcode
LIT
Rules 1 .. 8 are already followed by many tools, even the standard tests comply with them.
TBD
List of opcodes, no comments (what the usual Ethereum tests look like):
PUSH1 0x60 PUSH1 0x40 MSTORE PUSH1 0x8 JUMP JUMPDEST PUSH1 0x2 JUMP
Using jump labels and comments:
PUSH 0x60 ;; contract A {\n}
ErrorTag:
PUSH 0x40 ;; contract A {\n}
MSTORE ;; contract A {\n}
PUSH [tag1] ;; contract A {\n}
JUMP ;; contract A {\n}
tag1: ;; contract A {\n}
JUMPDEST ;; contract A {\n}
PUSH [ErrorTag] ;; contract A {\n}
JUMP ;; contract A {\n}
Constructor code:
PUSH 0x60 ;; contract A {\n function a()...
PUSH 0x40 ;; contract A {\n function a()...
MSTORE ;; contract A {\n function a()...
PUSH [end - start]
DUP1 ;; contract A {\n function a()...
PUSH [start]
PUSH 0 ;; contract A {\n function a()...
CODECOPY ;; contract A {\n function a()...
PUSH 0 ;; contract A {\n function a()...
RETURN ;; contract A {\n function a()...
start:
LIT 60606040526000357c0100000000000000000000000000000000000000000000000000000000900480630dbe671f146039576035565b6002565b3460025760486004805050604a565b005b6001604a025b56
end:
(These examples are based on Solidity output.)
Standard output of the assembler is the bytecode in hex digits, without a leading 0x
.
Q: support lowercase opcodes?
Q: should PUSH be an alias for smallest PUSH the literal fits into?
Q: should a functional syntax also be supported? e.g.:
loop:
JUMPI(MUL(1, 2), [loop])
Q: should Solidity's PUSHLIB
be supported or should it be a special syntax of PUSH?
Option A:
PUSHLIB LibraryName
Option B:
PUSH {LibraryName}
I assume that this is meant to describe an assembly language that can be compiled to evm bytecode. If you modeled that close to the solidity assembly output, then please note that most of the features of solidity assembly output are not meant to be used as input. Solidity assembly output is only meant to be human-readable and it misses some crucial data to be read back into the internal representation.
It would be great if we could get closer to solidity inline assembly. For that, I would prefer
//
to be used as comment character. Furthermore, usingPUSH
should be optional for literals and jump labels.Why are jump labels treated differently? What about just using
push label
orlabel
?How is
[end - start]
to be interpreted?Concerning libraries: This is basically invoking a linker feature, I'm not sure if we are ready to use that as part of the assembly language without defining a linker.