Skip to content

Instantly share code, notes, and snippets.

@axic
Last active April 2, 2024 13:13
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save axic/17ddbbce4738ccf4040d30cbb5de484e to your computer and use it in GitHub Desktop.
Save axic/17ddbbce4738ccf4040d30cbb5de484e to your computer and use it in GitHub Desktop.
EVM Assembly Language

EVM Assembly Language

Motivation

The goal is to specify a syntax for an EVM assembly language, which can be used across various tools.

The format should be human readable, map EVM as closely as possible, allow for comments and refrain from complex syntax.

Specification

  1. Opcodes are upper case only
  2. Opcodes are separated with white space (including, but not limited to, space, tab, new line)
  3. Every EVM instruction is a valid opcode
  4. With the exception of PUSH, none of the opcodes have an argument
  5. The argument of PUSH is also separated by white space
  6. Argument of PUSH is either a decimal or a hexadecimal number (prefixed with 0x)
  7. PUSH1..32 is defined to push data with exact length
  8. PUSH is an alias to PUSH32
  9. Comments are denoted by ;; and the rest of the line is ignored
  10. PUSH accepts a special syntax for jump labels (PUSH [labelname])
  11. Labels are identifiers followed by a colon. When referenced in a push, their offset in the bytecode is pushed to the stack. Note: JUMPDEST needs to follow a label.
  12. Literal data, not to be processed by the assembler, must be hexademical digits following the pseudo opcode LIT

Rules 1 .. 8 are already followed by many tools, even the standard tests comply with them.

Grammar

TBD

Examples

List of opcodes, no comments (what the usual Ethereum tests look like):

PUSH1 0x60 PUSH1 0x40 MSTORE PUSH1 0x8 JUMP JUMPDEST PUSH1 0x2 JUMP

Using jump labels and comments:

  PUSH 0x60        ;; contract A {\n}
ErrorTag:
  PUSH 0x40        ;; contract A {\n}
  MSTORE           ;; contract A {\n}
  PUSH [tag1]      ;; contract A {\n}
  JUMP             ;; contract A {\n}
tag1:              ;; contract A {\n}
  JUMPDEST         ;; contract A {\n}
  PUSH [ErrorTag]  ;; contract A {\n}
  JUMP             ;; contract A {\n}

Constructor code:

  PUSH 0x60           ;; contract A {\n    function a()...
  PUSH 0x40           ;; contract A {\n    function a()...
  MSTORE              ;; contract A {\n    function a()...
  PUSH [end - start]
  DUP1                ;; contract A {\n    function a()...
  PUSH [start]
  PUSH 0              ;; contract A {\n    function a()...
  CODECOPY            ;; contract A {\n    function a()...
  PUSH 0              ;; contract A {\n    function a()...
  RETURN              ;; contract A {\n    function a()...
start:
  LIT 60606040526000357c0100000000000000000000000000000000000000000000000000000000900480630dbe671f146039576035565b6002565b3460025760486004805050604a565b005b6001604a025b56
end:

(These examples are based on Solidity output.)

Assembler output

Standard output of the assembler is the bytecode in hex digits, without a leading 0x.

Questions

Q: support lowercase opcodes?

Q: should PUSH be an alias for smallest PUSH the literal fits into?

Q: should a functional syntax also be supported? e.g.:

loop:
JUMPI(MUL(1, 2), [loop])

Q: should Solidity's PUSHLIB be supported or should it be a special syntax of PUSH?

Option A:

PUSHLIB LibraryName

Option B:

PUSH {LibraryName}
@chriseth
Copy link

I assume that this is meant to describe an assembly language that can be compiled to evm bytecode. If you modeled that close to the solidity assembly output, then please note that most of the features of solidity assembly output are not meant to be used as input. Solidity assembly output is only meant to be human-readable and it misses some crucial data to be read back into the internal representation.

It would be great if we could get closer to solidity inline assembly. For that, I would prefer // to be used as comment character. Furthermore, using PUSH should be optional for literals and jump labels.

Why are jump labels treated differently? What about just using push label or label?

How is [end - start] to be interpreted?

Concerning libraries: This is basically invoking a linker feature, I'm not sure if we are ready to use that as part of the assembly language without defining a linker.

@gcolvin
Copy link

gcolvin commented Oct 17, 2016

Whatever the details, I support cleaning up the gratuitous differences between our assembly syntaxes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment