The goal is to specify a syntax for an EVM assembly language, which can be used across various tools.
The format should be human readable, map EVM as closely as possible, allow for comments and refrain from complex syntax.
- Opcodes are upper case only
- Opcodes are separated with white space (including, but not limited to, space, tab, new line)
- Every EVM instruction is a valid opcode
- With the exception of
PUSH
, none of the opcodes have an argument - The argument of
PUSH
is also separated by white space - Argument of
PUSH
is either a decimal or a hexadecimal number (prefixed with0x
) PUSH1..32
is defined to push data with exact lengthPUSH
is an alias toPUSH32
- Comments are denoted by
;;
and the rest of the line is ignored PUSH
accepts a special syntax for jump labels (PUSH [labelname]
)- Labels are identifiers followed by a colon. When referenced in a push, their offset in the bytecode is pushed to the stack. Note:
JUMPDEST
needs to follow a label. - Literal data, not to be processed by the assembler, must be hexademical digits following the pseudo opcode
LIT
Rules 1 .. 8 are already followed by many tools, even the standard tests comply with them.
TBD
List of opcodes, no comments (what the usual Ethereum tests look like):
PUSH1 0x60 PUSH1 0x40 MSTORE PUSH1 0x8 JUMP JUMPDEST PUSH1 0x2 JUMP
Using jump labels and comments:
PUSH 0x60 ;; contract A {\n}
ErrorTag:
PUSH 0x40 ;; contract A {\n}
MSTORE ;; contract A {\n}
PUSH [tag1] ;; contract A {\n}
JUMP ;; contract A {\n}
tag1: ;; contract A {\n}
JUMPDEST ;; contract A {\n}
PUSH [ErrorTag] ;; contract A {\n}
JUMP ;; contract A {\n}
Constructor code:
PUSH 0x60 ;; contract A {\n function a()...
PUSH 0x40 ;; contract A {\n function a()...
MSTORE ;; contract A {\n function a()...
PUSH [end - start]
DUP1 ;; contract A {\n function a()...
PUSH [start]
PUSH 0 ;; contract A {\n function a()...
CODECOPY ;; contract A {\n function a()...
PUSH 0 ;; contract A {\n function a()...
RETURN ;; contract A {\n function a()...
start:
LIT 60606040526000357c0100000000000000000000000000000000000000000000000000000000900480630dbe671f146039576035565b6002565b3460025760486004805050604a565b005b6001604a025b56
end:
(These examples are based on Solidity output.)
Standard output of the assembler is the bytecode in hex digits, without a leading 0x
.
Q: support lowercase opcodes?
Q: should PUSH be an alias for smallest PUSH the literal fits into?
Q: should a functional syntax also be supported? e.g.:
loop:
JUMPI(MUL(1, 2), [loop])
Q: should Solidity's PUSHLIB
be supported or should it be a special syntax of PUSH?
Option A:
PUSHLIB LibraryName
Option B:
PUSH {LibraryName}
Whatever the details, I support cleaning up the gratuitous differences between our assembly syntaxes.