axic/EVMASM.md

## EVMASM.md

      
    Raw
  

              EVMASM.md
            
          
    EVM Assembly Language

Motivation

The goal is to specify a syntax for an EVM assembly language, which can be used across various tools.
The format should be human readable, map EVM as closely as possible, allow for comments and refrain from complex syntax.
Specification


Opcodes are upper case only
Opcodes are separated with white space (including, but not limited to, space, tab, new line)
Every EVM instruction is a valid opcode
With the exception of PUSH, none of the opcodes have an argument
The argument of PUSH is also separated by white space
Argument of PUSH is either a decimal or a hexadecimal number (prefixed with 0x)
PUSH1..32 is defined to push data with exact length
PUSH is an alias to PUSH32
Comments are denoted by ;; and the rest of the line is ignored
PUSH accepts a special syntax for jump labels (PUSH [labelname])
Labels are identifiers followed by a colon. When referenced in a push, their offset in the bytecode is pushed to the stack. Note: JUMPDEST needs to follow a label.
Literal data, not to be processed by the assembler, must be hexademical digits following the pseudo opcode LIT

Rules 1 .. 8 are already followed by many tools, even the standard tests comply with them.
Grammar

TBD
Examples

List of opcodes, no comments (what the usual Ethereum tests look like):
PUSH1 0x60 PUSH1 0x40 MSTORE PUSH1 0x8 JUMP JUMPDEST PUSH1 0x2 JUMP

Using jump labels and comments:
  PUSH 0x60        ;; contract A {\n}
ErrorTag:
  PUSH 0x40        ;; contract A {\n}
  MSTORE           ;; contract A {\n}
  PUSH [tag1]      ;; contract A {\n}
  JUMP             ;; contract A {\n}
tag1:              ;; contract A {\n}
  JUMPDEST         ;; contract A {\n}
  PUSH [ErrorTag]  ;; contract A {\n}
  JUMP             ;; contract A {\n}

Constructor code:
  PUSH 0x60           ;; contract A {\n    function a()...
  PUSH 0x40           ;; contract A {\n    function a()...
  MSTORE              ;; contract A {\n    function a()...
  PUSH [end - start]
  DUP1                ;; contract A {\n    function a()...
  PUSH [start]
  PUSH 0              ;; contract A {\n    function a()...
  CODECOPY            ;; contract A {\n    function a()...
  PUSH 0              ;; contract A {\n    function a()...
  RETURN              ;; contract A {\n    function a()...
start:
  LIT 60606040526000357c0100000000000000000000000000000000000000000000000000000000900480630dbe671f146039576035565b6002565b3460025760486004805050604a565b005b6001604a025b56
end:


(These examples are based on Solidity output.)
Assembler output

Standard output of the assembler is the bytecode in hex digits, without a leading 0x.
Questions

Q: support lowercase opcodes?
Q: should PUSH be an alias for smallest PUSH the literal fits into?
Q: should a functional syntax also be supported? e.g.:
loop:
JUMPI(MUL(1, 2), [loop])

Q: should Solidity's PUSHLIB be supported or should it be a special syntax of PUSH?
Option A:
PUSHLIB LibraryName

Option B:
PUSH {LibraryName}