shmookey/translate.lhs

## translate.lhs
External storage on the EVM: a static recompilation approach        7 May 2017
Luke Williams <shmookey@shmookey.net>                                   Rev. 3


This document describes a way of converting ordinary compiled EVM contracts to
a form suitable for use with an off-chain storage backend, such as the system
described by Smolenski for storing contract state on IPFS. This technique is
completely invisible to the contract developer and requires no modification to
contract logic. It is also generic to any EVM contract, regardless of the high
level language it was written in, and compatible with the public blockchain.

This is a self-implementing specification written in literate Haskell. It may
be compiled and run unmodified with any recent version of Haskell to yield a
command-line tool for converting contracts.

I hereby release this document under an MIT-style license. It is free for all
commercial, non-commercial, quasi-commercial and infomercial use - but please
don't sue me if you use this and lose all your money. I don't have any either.

------------------------------------------------------------------------------

  BACKGROUND

Smolenski [1] describes an integrated approach to off-blockchain data storage
for EVM smart contracts, targeting an IPFS storage backend. A wide variety of
use cases are supported by a range of features, including private data storage
with encryption, access control and revocation, user authentication and more.
A core idea of the proposal is the notion of contracts as state transformers -
rather than modeling contracts as objects with a hidden internal structure, we
can regard them as pure code, as functions of a state value. This is certainly
a useful way of looking at contracts, if we wish to work with data stored off-
chain.

In Smolenski's proposal this view is given form at the method level - stateful
methods are written with an extra argument and return value for the state. I
argue that there are many advantages and few disadvantages if instead the view
of contracts as state transformers is applied at the contract level - outside
of Solidity, in a hidden layer of “plumbing” beneath the high-level logic that
is expressed by the contract's source code - essentially, abstracting from the
idea of contract methods as state transformer functions to view the contract
itself as a state monad. This allows us to unify our view of internal/external
storage into a single model, where the options are distinguished only by the
compile-time choice of which monad's “plumbing” to use. Then if we know what
each option looks like at the EVM opcode level, we can change our minds later:
we can “re-compile” the contract by replacing all the plumbing, reliably and
automatically, without access to the source code.

The advantages to this approach are not limited to a unity of abstractions: we
can surely save much developer time and effort by avoiding the need to rewrite
parts of contracts, as well as cumulative experience in writing contracts the
traditional way. In a strategic sense, more developers are likely to try using
it if the barriers to entry are low. Finally there is a practical benefit too:
explicitly threading around a state object ties up at least one “reachable”
stack slot (sometimes more) anywhere that state is available, further reducing
the already constraining limit of 16 arguments + function locals. The hidden
state approach does not affect a contract's use of the stack.

We should be careful to consider potential risks. Hiding the technical details
is counterproductive (and frustrating) if those details turn out to have high-
level consequences, which may not be clear to the developer. In this case, the
only such detail I have been able to identify is that the gas costs of storage
operations may be difficult to predict - but almost certain to be smaller than
the equivalent native storage operation, usually much smaller. A more detailed
comparison will require more information about the proposed implementation.

[1] Permissioned Blocks White Paper, Michael Smolenski, 2017
https://github.com/autocontracts/permissioned-blocks/blob/master/whitepaper.md

  AIM

Our goal is to transform an ordinary contract into one that takes an external
state structure as a hidden extra parameter and returns an updated copy of the
structure in a similar way, without any expensive EVM storage operations.

The resulting contract should be ABI-compatible with Smolenski's proposal.

This process should work for all existing contracts without modification and
be suitable for implementation within a smart contract. A Solidity version is
certain to be tedious and verbose; for this demonstration we will construct a
simple but powerful static recompiler in Haskell, reading compiled contracts
and operating on the EVM bytecode level to produce a new binary, suitable for
immediate deployment to the blockchain.


  DESIGN

EVM contract storage is essentially a key-value store allowing arbitrary 256-
bit keys and values. Solidity addresses storage using two conventions: scalar
values are packed as tightly as possible (without changing their ordering) in
sequence starting from slot 0. For lists and mappings, a key prefix defined by
the value's position in the list of members is concatenated with a key and
hashed in order to determine the storage slot to use. Implementing a key-value
mapping in memory is left as a future exercise. For now, we will only consider
member variables with scalar types.

When a contract reads or writes storage data, it operates on one full storage
slot at a time using the EVM opcodes `SLOAD` and `SSTORE`. When these opcodes
are encountered, the value at the top of the stack is taken as the referenced
storage key. In the case of `SSTORE`, the next value on the stack is used for
the value and the operation leaves the stack with 2 fewer items. For `SLOAD`,
the value located at the given storage key is pushed onto the stack, leaving
the total number of items unchanged.

Contracts may also access cheaper volatile working memory using the operations
`MLOAD` and `MSTORE`, which are analogous to the storage opcodes we just saw,
except that they access a byte-addressed and contiguous range of memory. The
stack arguments consumed by these operations mirror those for storage, but we
clearly will need to do some kind of address translation to map between byte-
and word-addressed layouts, and also to avoid clashes with the contract's own
use of working memory. Solidity implements a (very) simple memory management
scheme which may be used to achieve a seamless integration in the general case
but for now we will simply choose a high starting offset to avoid overlapping
regions.

Before we dive into writing our translation functions, we need to enable some
handy language extensions and import a few things from the core libraries. Our
code will otherwise be entirely self-contained. A basic installation of GHC or
Haskell Platform is all you need.

> {-# LANGUAGE ViewPatterns #-}
> import Control.Monad (forM_)
> import Data.Word (Word8)
> import Data.Bits (Bits, shiftL, shiftR, (.|.))
> import System.Environment (getArgs)
> import qualified Data.ByteString as B

To turn a storage key into a memory address, we simply multiply by 32 (bytes,
for a 256 bit key size) and add the offset. 64k ought to be enough for anyone,
so we shall define our storage region to begin on the next byte after the 16-
bit range ends:

> offset :: Integer
> offset = 0x010000      -- 2^16 = 65536 (3 bytes)

Given a replacement instruction, i.e. `MSTORE` or `MLOAD`, we can generate the
translated code:

> relocate :: Op -> [Op]
> relocate x =
>   [ Push1 0x20         -- 32 byte slots (1 byte)
>   , Mul
>   , Push3 offset
>   , Add
>   , x
>   ]

Note that the length of this operation is 9 bytes: 5 opcodes (1 byte each) and
2 constants of 1 byte and 3 bytes repsectively. We will replace each instance
of `SSTORE` and `SLOAD` with the output of this function called with `MSTORE`
and `MLOAD` respectively.

Execution on the EVM may halt for a number of reasons, either gracefully or in
case of error. In the latter case the entire transaction is “rolled back”, so
we will only concern ourselves with graceful termination. There are two ways
that a contract may terminate gracefully, those are by the `STOP` and `RETURN`
instructions. Both have the effect of halting execution, but `RETURN` copies
a region of memory and “returns” it to the caller.

We need to return the updated state data along with any data returned by the
contract, which means we must replace all `STOP`s with `RETURN`s, and insert
new instructions to place the storage memory region at the end of the returned
data. At this point we will make another simplifying assumption and constrain
the storage size to a rather small 1kb, for the sole purpose of avoiding loops
in our demonstration code:

> storageSize :: Integer
> storageSize = 0x400     -- 2^10 = 1024 (2 bytes)

We could trivially use a larger value, but there is little point calculating
an accurate one. A fully realized implementation will already be maintaining
an exact value for the mapping structure which can be used instead.

When there is no return data, i.e. we encounter a `STOP`, we simply push the
offset and size of the storage memory region to the stack, then `RETURN`:

> returnState :: [Op]
> returnState =
>   [ Push2 storageSize -- Push storage region length to stack
>   , Push3 offset      -- Push storage region offset to stack
>   , Return
>   ]

The length of this operation is 8 bytes. This code will replace the `STOP`
instruction.

When there is other return data, we will copy the storage region to the end of
the return data region and return with the length value increased by the size
of the storage data, i.e. 1kb. We could do it the other way around, but there
is no EVM instruction to copy a range of data, and it will be simpler for our
purposes to be working with a whole number of 32-byte words than working with
individual bytes. Knowing the first and second stack values contain the offset
and length of the main return data respectively, we move each word back. Our
function will append this code to the end of a contract:

> returnData :: [Op] -> [Op]
> returnData contract =
>   contract ++ prepare ++ concatMap copy [0..31] ++ complete
>   where
>     prepare =
>       [ JumpLabel 0                 -- Allow jumping to this code by label
>       , Dup1                        -- Duplicate first stack value (offset)
>       , Dup3                        -- Duplicate third stack value (length)
>       , Add                         -- Add to get offset for storage data
>       ]
>     copy x =
>       [ Push3 (x*32 + storageSize)  -- Pointer to requested word
>       , MLoad                       -- Load the word onto the stack
>       , Dup1                        -- Duplicate the current offset
>       , MStore                      -- Store in new position
>       , Push1 0x20                  -- Amount to increment offset
>       , Add                         -- Increment current offset
>       ]
>     complete =
>       [ Pop                         -- Discard final offset
>       , Swap1                       -- Swap offset and length on stack
>       , Push2 storageSize           -- Push storage size to stack
>       , Add                         -- Add storage size to return size
>       , Swap1                       -- Put return args back in order
>       , Return                      -- Finally, we can now return
>       ]

The length of this operation is 332 bytes. This is too long to insert anywhere
we find a `RETURN`, so instead we will append this code as a subroutine at the
end of the contract. We can then replace existing `RETURN`s by jumping to the
new routine using a special meta-instruction `PushLabel`, which is translated
to the appropriate `PUSH` opcode and constant during the final assembly stage:

> returnJump :: [Op]
> returnJump =
>   [ PushLabel 0
>   , Jump
>   ]

When an EVM contract is called, the message data must be read from a separate
address space. The EVM provides opcodes for loading a single word to the stack
(`CALLDATALOAD`) and copying a range into memory (`CALLDATACOPY`). We will use
the latter to copy the storage data at the end of the call data to the storage
region in memory, using `CALLDATASIZE` to determine the length. Our function
will prepend this code to the beginning of a contract:

> initStorage :: [Op] -> [Op]
> initStorage contract =
>   [ Push2 storageSize
>   , CallDataSize
>   , Push2 storageSize
>   , Sub
>   , Push3 offset
>   , CallDataCopy
>   ] ++ contract

The length of this operation is 13 bytes. We prepend this code to the contract
to ensure that the storage memory region is populated throughout the contract
execution.

With all of our translations now defined, we can write our opcode translation
function:

> translate :: Op -> [Op]
> translate x =
>   case x of SLoad  -> relocate MLoad
>             SStore -> relocate MStore
>             Stop   -> returnState
>             Return -> returnJump
>             _      -> [x]

This function takes an opcode and returns a list of opcodes to replace it. The
replacement list for a non-storage opcode is simply the singleton list of that
opcode.

Inserting new code presents us with a problem. Control flow on the EVM is by
the use of 2 jump opcodes: `JUMP` and `JUMPI`, where the latter is conditional
on the second stack value being nonzero, and both use the first stack value as
the target bytecode position - and these targets may no longer be valid.

Addressing this issue, or rather re-addressing it, will be a little trickier.
Jump targets may not always be constants pushed onto the stack in the opcode
immediately prior, in fact it is quite common for these values to be kept on
the stack for later use. It is however almost universal for them to originate
as constants in `PUSH` opcodes, and we can determine which constants likely
correspond to jump targets by correlating them against the positions of every
`JUMPDEST` opcode, which all jump operations must land on - at the risk of
mistakenly identifying ordinary constants as jump labels. There are several
approaches to re-addressing which avoid these mistakes. In general the only
perfect solution is a runtime lookup table, in practice compile-time solutions
can be just as viable with no runtime overhead. (I got flawless results from
a dynamic dependency analysis approach, but that would have made this write-up
an order of magnitude longer.)

After reading a contract but before doing any translation of it we locate all
`JUMPDEST` opcodes, convert them to position-tagged `JumpLabel` meta-opcodes,
then replace any constants that occur in the set of “labels” to the equivalent
`PushLabel` meta-opcode. This effectively gives us position-independent jumps,
enabling us to safely insert new opcodes. The process entails two passses. The
first pass creates the labels and collects them into a list:

> label :: [(Integer, Op)] -> ([Integer], [Op])
> label =
>   foldr f ([], [])
>   where
>     f (k, op) (ks, ops) = case op of
>       JumpDest -> (k : ks, JumpLabel k : ops)
>       _        -> (ks    , op          : ops)

The second pass takes the output of the first and translates `PUSH` opcodes
with constants that occur in the label list into `PushLabel` meta-opcodes.

> labelPush :: ([Integer], [Op]) -> [Op]
> labelPush (ks, ops) =
>   map f ops
>   where
>     f op = case op of
>       (push -> P k _) | k `elem` ks -> PushLabel k
>       _                             -> op

In order to reassemble the contract back into EVM bytecode, we need to perform
the same process in reverse, replacing label meta-opcodes with real opcodes
based on their new positions. This is also a two pass process. First we remove
the `JumpLabel`s, whose actual positions will now be different to their labels
and build a list of mappings. Note that we must keep track of the current byte
position by the length of the the opcodes.

> unlabel :: [Op] -> ([(Integer, Integer)], [Op])
> unlabel =
>   dropFst . foldr f (0, [], [])
>   where
>     dropFst (_, ks, ops) = (ks, ops)
>     f op (k, ks, ops) = case op of
>       JumpLabel k'    -> (k+1,   (k',k):ks, JumpDest : ops)
>       PushLabel k'    -> (k+3,   ks,        op       : ops)
>       (push -> P x n) -> (k+n+1, ks,        op       : ops)
>       _               -> (k+1,   ks,        op       : ops)

The second pass takes the output of the first and translates `PushLabel` meta-
opcodes to `PUSH2` opcodes with the mapped offset. This is the first time that
we introduce the possibility of error into out program, since the mapping may
fail if we have made a mistake somewhere. We won't handle that possibility for
this demonstration, but we will at least acknowledge it with an error message
and abort.

> unlabelPush :: ([(Integer, Integer)], [Op]) -> [Op]
> unlabelPush (kvs, ops) =
>   map f ops
>   where
>     f op = case op of
>       PushLabel k -> Push2 (remap k)
>       _           -> op
>     remap k = case lookup k kvs of
>       Just k' -> k'
>       Nothing -> error $ "no such label: " ++ show k

We now have all the machinery we need to translate contracts, and can finally
connect them all up in a neat pipeline. This function will take the list of
position-tagged opcodes from a disassembler function, and return a final list
of opcodes to provide the assembler.

> convert :: [(Integer, Op)] -> [Op]
> convert =
>     unlabelPush
>   . unlabel
>   . returnData
>   . initStorage
>   . concatMap translate
>   . labelPush
>   . label

That's basically it. All that remains is the main IO to load the contract and
save the translated version. A basic assembler/disassembler is provided in the
appendix in order to make this document a self-contained program, but we need
not concern ourselves with the details. Tying it all together, our program
will accept any number of input files as command line arguments, and for each
input write the translated output to a new file with the extension `.out`:

> main :: IO ()
> main = do
>   inputs <- tail <$> getArgs
>   forM_ inputs $ \file -> do
>     contract <- disassemble <$> B.readFile file
>     let file' = file ++ ".out"
>     B.writeFile file' . assemble $ convert contract

The translated contracts are ABI-compatible with Smolenski's proposal, in that
every method effectively has an extra argument and return value for the state
data, but this demonstration does not include a mechanism for automatically
updating the `solc`-generated ABI file. For now, these changes will have to be
added manually.

One final caveat is that contracts compiled by `solc` begin with a preamble or
“installer” section which initialises the contract and performs the vital task
of returning the actual contract bytecode, which is then stored with the newly
created account. As a simple workaround, the `--bin-runtime` option to `solc`
will cause it to emit just the part that should be translated, which can then
be spliced back in to the full bytecode. Note that therefore the initial state
of the contract will be empty; contracts that require initialisation should do
so by exposing an “initialise” method in their public API.


  CONCLUSION

We have built a static recompiler for converting smart contracts which use the
native EVM storage facility into contracts which transform state values passed
via transaction inputs and outputs. It is a general solution applicable to all
EVM contracts, though the version we have built in this document is subject to
several limitations which are described in the DESIGN section with a suggested
solution in each case - these are “engineering problems”, in the sense that an
appropriate implementation is known.

There is one feature absent from this demonstration which is an important part
of Smolenski's proposal, namely support for endorser-oracle state validation.
This involves the contract using its native EVM storage to store a hash of the
last known valid state, which can only be updated if the new hash is endorsed
by the oracle by signing it. There are two ways we could approach this: either
as a completely transparent process (a “hidden method”), or by a mechanism to
allow contracts to validate their own “state update” requests, where the lower
level code provides a way for the high level code to store the new hash. This
question is left as a topic for future discussion.

The high and ever increasing prices of native EVM storage for smart contracts
is an issue which is only likely to worsen over time as Ethereum's popularity
grows along with the volume of contract data that peers are required to store
on their machines. On the other hand, a more popular network is a more useful
one, and if contract developers are able to take advantage of cheaper storage
while still leveraging the reach of the Ethereum blockchain, many will likely
opt to do so. Interest in privacy and access control is also likely to become
stronger as distributed applications become more sophisticated. This proposal
is a meant as a small contribution to that conversation - in the hope that it
might help us to answer: what on earth we are going to do about this problem?


-----------------------------------------------------------------------------


                  APPENDIX A - ASSEMBLER AND DISASSEMBLER

These are the basic functions for disassembly and assembly. They will cause an
error if the operation fails.

> disassemble :: B.ByteString -> [(Integer, Op)]
> disassemble bs = f 0
>   where f i | i >= B.length bs = []
>             | otherwise        = let (n, x) = getOp i bs
>                                  in (toInteger i, x) : f (n+i)

> assemble :: [Op] -> B.ByteString
> assemble = mconcat . map toSym

> getOp :: Int -> B.ByteString -> (Int, Op)
> getOp p bs =
>   if p >= 0 && p < B.length bs
>   then case sym $ B.index bs p of
>     Right op     -> (1,   op)
>     Left (n, op) -> (n+1, op $ getData (p+1) n bs)
>   else
>     error $ "out of bounds " ++ show p

> getData :: Int -> Int -> B.ByteString -> Integer
> getData p n bs =
>   if p >= 0 && p + n <= B.length bs
>   then roll . B.take n $ B.drop p bs
>   else error "unexpected end of input"


                      APPENDIX B - PUSH VIEW PATTERN

This code defines the `pushN` view pattern.

> data PushN = P Integer Integer | NotPush
> push :: Op -> PushN
> push op = case op of
>   Push1  x -> P x 1  ; Push17 x -> P x 17
>   Push2  x -> P x 2  ; Push18 x -> P x 18
>   Push3  x -> P x 3  ; Push19 x -> P x 19
>   Push4  x -> P x 4  ; Push20 x -> P x 20
>   Push5  x -> P x 5  ; Push21 x -> P x 21
>   Push6  x -> P x 6  ; Push22 x -> P x 22
>   Push7  x -> P x 7  ; Push23 x -> P x 23
>   Push8  x -> P x 8  ; Push24 x -> P x 24
>   Push9  x -> P x 9  ; Push25 x -> P x 25
>   Push10 x -> P x 10 ; Push26 x -> P x 26
>   Push11 x -> P x 11 ; Push27 x -> P x 27
>   Push12 x -> P x 12 ; Push28 x -> P x 28
>   Push13 x -> P x 13 ; Push29 x -> P x 29
>   Push14 x -> P x 14 ; Push30 x -> P x 30
>   Push15 x -> P x 15 ; Push31 x -> P x 31
>   Push16 x -> P x 16 ; Push32 x -> P x 32
>   _ -> NotPush


                         APPENDIX C - SYMBOL TABLES

This code declares the `Op` type and bytecode conversions.

> type I = Integer
> data Op
>   = Stop       | Lt       | Pop      | BlockHash    | Address
>   | Add        | Gt       | MLoad    | Coinbase     | Balance
>   | Mul        | SLT      | MStore   | Timestamp    | Origin
>   | Sub        | SGT      | MStore8  | Number       | Caller
>   | Div        | Eq       | SLoad    | Difficulty   | CallValue
>   | SDiv       | IsZero   | SStore   | GasLimit     | CallDataLoad
>   | Mod        | And      | Jump     | Create       | CallDataSize
>   | SMod       | Or       | JumpI    | Call         | CallDataCopy
>   | AddMod     | Xor      | PC       | CallCode     | CodeSize
>   | MulMod     | Not      | MSize    | Return       | CodeCopy
>   | Exp        | Byte     | Gas      | DelegateCall | GasPrice
>   | SignExtend | SHA3     | JumpDest | Suicide      | ExtCodeSize
>   | Push1  I   | Push17 I | Dup1     | Swap1        | ExtCodeCopy
>   | Push2  I   | Push18 I | Dup2     | Swap2        | Log0
>   | Push3  I   | Push19 I | Dup3     | Swap3        | Log1
>   | Push4  I   | Push20 I | Dup4     | Swap4        | Log2
>   | Push5  I   | Push21 I | Dup5     | Swap5        | Log3
>   | Push6  I   | Push22 I | Dup6     | Swap6        | Log4
>   | Push7  I   | Push23 I | Dup7     | Swap7        -- Meta:
>   | Push8  I   | Push24 I | Dup8     | Swap8        | Invalid Word8
>   | Push9  I   | Push25 I | Dup9     | Swap9        | JumpLabel I
>   | Push10 I   | Push26 I | Dup10    | Swap10       | PushLabel I
>   | Push11 I   | Push27 I | Dup11    | Swap11
>   | Push12 I   | Push28 I | Dup12    | Swap12
>   | Push13 I   | Push29 I | Dup13    | Swap13
>   | Push14 I   | Push30 I | Dup14    | Swap14
>   | Push15 I   | Push31 I | Dup15    | Swap15
>   | Push16 I   | Push32 I | Dup16    | Swap16

> sym :: Word8 -> Either (Int, Integer -> Op) Op
> sym x = case x of
>   0x00 -> r Stop         ; 0x33 -> r Caller       ; 0x5b -> r JumpDest
>   0x01 -> r Add          ; 0x34 -> r CallValue    ; 0xa0 -> r Log0
>   0x02 -> r Mul          ; 0x35 -> r CallDataLoad ; 0xa1 -> r Log1
>   0x03 -> r Sub          ; 0x36 -> r CallDataSize ; 0xa2 -> r Log2
>   0x04 -> r Div          ; 0x37 -> r CallDataCopy ; 0xa3 -> r Log3
>   0x05 -> r SDiv         ; 0x38 -> r CodeSize     ; 0xa4 -> r Log4
>   0x06 -> r Mod          ; 0x39 -> r CodeCopy     ; 0xf0 -> r Create
>   0x07 -> r SMod         ; 0x3a -> r GasPrice     ; 0xf1 -> r Call
>   0x08 -> r AddMod       ; 0x3b -> r ExtCodeSize  ; 0xf2 -> r CallCode
>   0x09 -> r MulMod       ; 0x3c -> r ExtCodeCopy  ; 0xf3 -> r Return
>   0x0A -> r Exp          ; 0x40 -> r BlockHash    ; 0xf4 -> r DelegateCall
>   0x0b -> r SignExtend   ; 0x41 -> r Coinbase     ; 0xf5 -> r Suicide
>   0x10 -> r Lt           ; 0x42 -> r Timestamp    ; 0x90 -> r Swap1
>   0x11 -> r Gt           ; 0x43 -> r Number       ; 0x91 -> r Swap2
>   0x12 -> r SLT          ; 0x44 -> r Difficulty   ; 0x92 -> r Swap3
>   0x13 -> r SGT          ; 0x45 -> r GasLimit     ; 0x93 -> r Swap4
>   0x14 -> r Eq           ; 0x50 -> r Pop          ; 0x94 -> r Swap5
>   0x15 -> r IsZero       ; 0x51 -> r MLoad        ; 0x95 -> r Swap6
>   0x16 -> r And          ; 0x52 -> r MStore       ; 0x96 -> r Swap7
>   0x17 -> r Or           ; 0x53 -> r MStore8      ; 0x97 -> r Swap8
>   0x18 -> r Xor          ; 0x54 -> r SLoad        ; 0x98 -> r Swap9
>   0x19 -> r Not          ; 0x55 -> r SStore       ; 0x99 -> r Swap10
>   0x1a -> r Byte         ; 0x56 -> r Jump         ; 0x9a -> r Swap11
>   0x20 -> r SHA3         ; 0x57 -> r JumpI        ; 0x9b -> r Swap12
>   0x30 -> r Address      ; 0x58 -> r PC           ; 0x9c -> r Swap13
>   0x31 -> r Balance      ; 0x59 -> r MSize        ; 0x9d -> r Swap14
>   0x32 -> r Origin       ; 0x5a -> r Gas          ; 0x9e -> r Swap15
>   0x60 -> l (1,  Push1)  ; 0x70 -> l (17, Push17) ; 0x80 -> r Dup1
>   0x61 -> l (2,  Push2)  ; 0x71 -> l (18, Push18) ; 0x81 -> r Dup2
>   0x62 -> l (3,  Push3)  ; 0x72 -> l (19, Push19) ; 0x82 -> r Dup3
>   0x63 -> l (4,  Push4)  ; 0x73 -> l (20, Push20) ; 0x83 -> r Dup4
>   0x64 -> l (5,  Push5)  ; 0x74 -> l (21, Push21) ; 0x84 -> r Dup5
>   0x65 -> l (6,  Push6)  ; 0x75 -> l (22, Push22) ; 0x85 -> r Dup6
>   0x66 -> l (7,  Push7)  ; 0x76 -> l (23, Push23) ; 0x86 -> r Dup7
>   0x67 -> l (8,  Push8)  ; 0x77 -> l (24, Push24) ; 0x87 -> r Dup8
>   0x68 -> l (9,  Push9)  ; 0x78 -> l (25, Push25) ; 0x88 -> r Dup9
>   0x69 -> l (10, Push10) ; 0x79 -> l (26, Push26) ; 0x89 -> r Dup10
>   0x6a -> l (11, Push11) ; 0x7a -> l (27, Push27) ; 0x8a -> r Dup11
>   0x6b -> l (12, Push12) ; 0x7b -> l (28, Push28) ; 0x8b -> r Dup12
>   0x6c -> l (13, Push13) ; 0x7c -> l (29, Push29) ; 0x8c -> r Dup13
>   0x6d -> l (14, Push14) ; 0x7d -> l (30, Push30) ; 0x8d -> r Dup14
>   0x6e -> l (15, Push15) ; 0x7e -> l (31, Push31) ; 0x8e -> r Dup15
>   0x6f -> l (16, Push16) ; 0x7f -> l (32, Push32) ; 0x8f -> r Dup16
>   0x9f -> r Swap16       ; _    -> r (Invalid x)
>  where l = Left ; r = Right

> toSym :: Op -> B.ByteString
> toSym op = case op of
>   Stop       -> b 0x00 ; Caller       -> b 0x33 ; JumpDest     -> b 0x5b
>   Add        -> b 0x01 ; CallValue    -> b 0x34 ; Log0         -> b 0xa0
>   Mul        -> b 0x02 ; CallDataLoad -> b 0x35 ; Log1         -> b 0xa1
>   Sub        -> b 0x03 ; CallDataSize -> b 0x36 ; Log2         -> b 0xa2
>   Div        -> b 0x04 ; CallDataCopy -> b 0x37 ; Log3         -> b 0xa3
>   SDiv       -> b 0x05 ; CodeSize     -> b 0x38 ; Log4         -> b 0xa4
>   Mod        -> b 0x06 ; CodeCopy     -> b 0x39 ; Create       -> b 0xf0
>   SMod       -> b 0x07 ; GasPrice     -> b 0x3a ; Call         -> b 0xf1
>   AddMod     -> b 0x08 ; ExtCodeSize  -> b 0x3b ; CallCode     -> b 0xf2
>   MulMod     -> b 0x09 ; ExtCodeCopy  -> b 0x3c ; Return       -> b 0xf3
>   Exp        -> b 0x0A ; BlockHash    -> b 0x40 ; DelegateCall -> b 0xf4
>   SignExtend -> b 0x0b ; Coinbase     -> b 0x41 ; Suicide      -> b 0xf5
>   Lt         -> b 0x10 ; Timestamp    -> b 0x42 ; Swap1        -> b 0x90
>   Gt         -> b 0x11 ; Number       -> b 0x43 ; Swap2        -> b 0x91
>   SLT        -> b 0x12 ; Difficulty   -> b 0x44 ; Swap3        -> b 0x92
>   SGT        -> b 0x13 ; GasLimit     -> b 0x45 ; Swap4        -> b 0x93
>   Eq         -> b 0x14 ; Pop          -> b 0x50 ; Swap5        -> b 0x94
>   IsZero     -> b 0x15 ; MLoad        -> b 0x51 ; Swap6        -> b 0x95
>   And        -> b 0x16 ; MStore       -> b 0x52 ; Swap7        -> b 0x96
>   Or         -> b 0x17 ; MStore8      -> b 0x53 ; Swap8        -> b 0x97
>   Xor        -> b 0x18 ; SLoad        -> b 0x54 ; Swap9        -> b 0x98
>   Not        -> b 0x19 ; SStore       -> b 0x55 ; Swap10       -> b 0x99
>   Byte       -> b 0x1a ; Jump         -> b 0x56 ; Swap11       -> b 0x9a
>   SHA3       -> b 0x20 ; JumpI        -> b 0x57 ; Swap12       -> b 0x9b
>   Address    -> b 0x30 ; PC           -> b 0x58 ; Swap13       -> b 0x9c
>   Balance    -> b 0x31 ; MSize        -> b 0x59 ; Swap14       -> b 0x9d
>   Origin     -> b 0x32 ; Gas          -> b 0x5a ; Swap15       -> b 0x9e
>   Dup1       -> b 0x80 ; Dup7         -> b 0x86 ; Dup13        -> b 0x8c
>   Dup2       -> b 0x81 ; Dup8         -> b 0x87 ; Dup14        -> b 0x8d
>   Dup3       -> b 0x82 ; Dup9         -> b 0x88 ; Dup15        -> b 0x8e
>   Dup4       -> b 0x83 ; Dup10        -> b 0x89 ; Dup16        -> b 0x8f
>   Dup5       -> b 0x84 ; Dup11        -> b 0x8a ; Swap16       -> b 0x9f
>   Dup6       -> b 0x85 ; Dup12        -> b 0x8b ; Invalid x    -> unroll x
>   (push -> P x n) -> p x (fromIntegral n)
>  where p x n = (n - 1 + 0x60) `B.cons` word n (unroll x)
>        b     = B.singleton


                        APPENDIX D: HELPER FUNCTIONS

> roll :: B.ByteString -> Integer
> roll = B.foldl' unstep 0
>   where unstep a b = a `shiftL` 8 .|. fromIntegral b

> unroll :: (Integral a, Bits a) => a -> B.ByteString
> unroll = B.reverse . B.unfoldr step
>   where
>     step 0 = Nothing
>     step i = Just (fromIntegral i, i `shiftR` 8)

> word :: Integral a => a -> B.ByteString -> B.ByteString
> word x bs =
>   if B.length bs >= n
>   then
>     B.drop (max 0 $ B.length bs - n) bs
>   else
>     let len = B.length bs
>         z   = B.pack $ take (n - len) (repeat 0)
>     in z `B.append` bs
>  where n = fromIntegral x


------------------------------------------------------------------------------
End of transmission.