bradoyler/ETHEREUM-yellowpaper.md

## ETHEREUM-yellowpaper.md

      
    Raw
  

              ETHEREUM-yellowpaper.md
            
          
    source: https://github.com/ethereum/yellowpaper
pdf: http://yellowpaper.io/
Ethereum: A secure decentralized generalized transaction ledger

1 Introduction

With ubiquitous internet connections in most places of the world, global information transmission has become incredibly cheap. Technology-rooted movements like Bitcoin have demonstrated, through the power of the default, consensus mechanisms and voluntary respect of the social contract that it is possible to use the internet to make a decentralised value-transfer system, shared across the world and virtually free to use. This system can be said to be a very specialised version of a cryptographically secure, transaction-based state machine. Follow-up systems such as Namecoin adapted this original currency application of the technology into other applications albeit rather simplistic ones. Ethereum is a project which attempts to build the generalised technology; technology on which all transaction-based state machine concepts may be built. Moreover it aims to provide to the end-developer a tightly integrated end-to-end system for building software on a hitherto unexplored compute paradigm in the mainstream: a trustful object messaging compute framework.
1.1 Driving Factors

There are many goals of this project; one key goal is to facilitate transactions between consenting individuals who would otherwise have no means to trust one another. This may be due to geographical separation, interfacing difficulty, or perhaps the incompatibility, incompetence, unwillingness, expense, uncertainty, inconvenience or corruption of existing legal systems. By specifying a state-change system through a rich and unambiguous language, and furthermore architecting a system such that we can reasonably expect that an agreement will be thus enforced autonomously, we can provide a means to this end. Dealings in this proposed system would have several attributes not often found in the real world. The incorruptibility of judgement, often difficult to find, comes naturally from a disinterested algorithmic interpreter. Transparency, or being able to see exactly how a state or judgement came about through the transaction log and rules or instructional codes, never happens perfectly in human-based systems since natural language is necessarily vague, information is often lacking, and plain old prejudices are difficult to shake. Overall, I wish to provide a system such that users can be guaranteed that no matter with which other individuals, systems or organisations they interact, they can do so with absolute confidence in the possible outcomes and how those outcomes might come about.
1.2 Previous Work
first proposed the kernel of this work in late November, 2013. Though now evolved in many ways, the key functionality of a block-chain with a Turing-complete language and an effectively unlimited inter-transaction storage capability remains unchanged. provided the first work into the usage of a cryptographic proof of computational expenditure (proof-of-work) as a means of transmitting a value signal over the Internet. The value-signal was utilised here as a spam deterrence mechanism rather than any kind of currency, but critically demonstrated the potential for a basic data channel to carry a \textitstrong economic signal, allowing a receiver to make a physical assertion without having to rely upon \textittrust. later produced a system in a similar vein. The first example of utilising the proof-of-work as a strong economic signal to secure a currency was by . In this instance, the token was used to keep peer-to-peer file trading in check, ensuring consumers¢¢ be able to make micro-payments to suppliers¢¢ for their services. The security model afforded by the proof-of-work was augmented with digital signatures and a ledger in order to ensure that the historical record couldn¢t be corrupted and that malicious actors could not spoof payment or unjustly complain about service delivery. Five years later, introduced another such proof-of-work-secured value token, somewhat wider in scope. The fruits of this project, Bitcoin, became the first widely adopted global decentralised transaction ledger. Other projects built on Bitcoin¢s success; the alt-coins introduced numerous other currencies through alteration to the protocol. Some of the best known are Litecoin and Primecoin, discussed by . Other projects sought to take the core value content mechanism of the protocol and repurpose it; discusses, for example, the Namecoin project which aims to provide a decentralised name-resolution system. Other projects still aim to build upon the Bitcoin network itself, leveraging the large amount of value placed in the system and the vast amount of computation that goes into the consensus mechanism. The Mastercoin project, first proposed by , aims to build a richer protocol involving many additional high-level features on top of the Bitcoin protocol through utilisation of a number of auxiliary parts to the core protocol. The Coloured Coins project, proposed by , takes a similar but more simplified strategy, embellishing the rules of a transaction in order to break the fungibility of Bitcoin¢s base currency and allow the creation and tracking of tokens through a special chroma-wallet¢¢-protocol-aware piece of software. Additional work has been done in the area with discarding the decentralisation foundation; Ripple, discussed by , has sought to create a federated¢¢ system for currency exchange, effectively creating a new financial clearing system. It has demonstrated that high efficiency gains can be made if the decentralisation premise is discarded. Early work on smart contracts has been done by and . Around the 1990s it became clear that algorithmic enforcement of agreements could become a significant force in human cooperation. Though no specific system was proposed to implement such a system, it was proposed that the future of law would be heavily affected by such systems. In this light, Ethereum may be seen as a general implementation of such a \textitcrypto-law system.
2 The Blockchain Paradigm

Ethereum, taken as a whole, can be viewed as a transaction-based state machine: we begin with a genesis state and incrementally execute transactions to morph it into some final state. It is this final state which we accept as the canonical ``version¢¢ of the world of Ethereum. The state can include such information as account balances, reputations, trust arrangements, data pertaining to information of the physical world; in short, anything that can currently be represented by a computer is admissible. Transactions thus represent a valid arc between two states; the `valid¢ part is important---there exist far more invalid state changes than valid state changes. Invalid state changes might, \eg, be things such as reducing an account balance without an equal and opposite increase elsewhere. A valid state transition is one which comes about through a transaction. Formally:
\boldsymbolst+1 º U(\boldsymbolst, T)    (0)
where U is the Ethereum state transition function. In Ethereum, U, together with \boldsymbols are considerably more powerful then any existing comparable system; U allows components to carry out arbitrary computation, while \boldsymbols allows components to store arbitrary state between transactions. Transactions are collated into blocks; blocks are chained together using a cryptographic hash as a means of reference. Blocks function as a journal, recording a series of transactions together with the previous block and an identifier for the final state (though do not store the final state itself---that would be far too big). They also punctuate the transaction series with incentives for nodes to \textitmine. This incentivisation takes place as a state-transition function, adding value to a nominated account. Mining is the process of dedicating effort (working) to bolster one series of transactions (a block) over any other potential competitor block. It is achieved thanks to a cryptographically secure proof. This scheme is known as a proof-of-work and is discussed in detail in section (162). Formally, we expand to:
\boldsymbolst+1
º
P(\boldsymbolst, B)
(1)
B
º
(..., (T0, T1, ...) )
(2)
P(\boldsymbols, B)
º
W(B, U(U(\boldsymbols, T0), T1) ...)
(3)
Where W is the block-finalisation state transition function (a function that rewards a nominated party); B is this block, which includes a series of transactions amongst some other components; and P is the block-level state-transition function. This is the basis of the blockchain paradigm, a model that forms the backbone of not only Ethereum, but all decentralised consensus-based transaction systems to date.
2.1 Value
In order to incentivise computation within the network, there needs to be an agreed method for transmitting value. To address this issue, Ethereum has an intrinsic currency, Ether, known also as ETH and sometimes referred to by the Old English \DH. The smallest subdenomination of Ether, and thus the one in which all integer values of the currency are counted, is the Wei. One Ether is defined as being 1018Wei. There exist other subdenominations of Ether:
\begincenter \begintabularrl Trule Multiplier & Name
| rule 100 & Wei
1012& Szabo
1015& Finney
1018& Ether
\bottomrule \endtabular \endcenter
Throughout the present work, any reference to value, in the context of Ether, currency, a balance or a payment, should be assumed to be counted in Wei.
2.2 Which History?
Since the system is decentralised and all parties have an opportunity to create a new block on some older pre-existing block, the resultant structure is necessarily a tree of blocks. In order to form a consensus as to which path, from root (the genesis block) to leaf (the block containing the most recent transactions) through this tree structure, known as the blockchain, there must be an agreed-upon scheme. If there is ever a disagreement between nodes as to which root-to-leaf path down the block tree is the `best¢ blockchain, then a \textitfork occurs. This would mean that past a given point in time (block), multiple states of the system may coexist: some nodes believing one block to contain the canonical transactions, other nodes believing some other block to be canonical, potentially containing radically different or incompatible transactions. This is to be avoided at all costs as the uncertainty that would ensue would likely kill all confidence in the entire system. The scheme we use in order to generate consensus is a simplified version of the GHOST protocol introduced by . This process is described in detail in section (143). Sometimes, a path follows a new protocol from a particular height. This document describes one version of the protocol. In order to follow back the history of a path, one might need to reference multiple versions of this document.
3 Conventions
I use a number of typographical conventions for the formal notation, some of which are quite particular to the present work: The two sets of highly structured, top-level¢, state values, are denoted with bold lowercase Greek letters. They fall into those of world-state, which are denoted \boldsymbols (or a variant thereupon) and those of machine-state, \boldsymbolm. Functions operating on highly structured values are denoted with an upper-case Greek letter, \eg U, the Ethereum state transition function. For most functions, an uppercase letter is used, e.g. C, the general cost function. These may be subscripted to denote specialised variants, \eg C\text\tiny SSTORE, the cost function for the \tiny SSTORE operation. For specialised and possibly externally defined functions, I may format as typewriter text, \eg the Keccak-256 hash function (as per the winning entry to the SHA-3 contest) is denoted \textttKEC (and generally referred to as plain Keccak). Also \textttKEC512 is referring to the Keccak 512 hash function. Tuples are typically denoted with an upper-case letter, \eg T, is used to denote an Ethereum transaction. This symbol may, if accordingly defined, be subscripted to refer to an individual component, \eg Tn, denotes the nonce of said transaction. The form of the subscript is used to denote its type; \eg uppercase subscripts refer to tuples with subscriptable components. Scalars and fixed-size byte sequences (or, synonymously, arrays) are denoted with a normal lower-case letter, \eg n is used in the document to denote a transaction nonce. Those with a particularly special meaning may be Greek, \eg d, the number of items required on the stack for a given operation. Arbitrary-length sequences are typically denoted as a bold lower-case letter, \eg \mathbfo is used to denote the byte sequence given as the output data of a message call. For particularly important values, a bold uppercase letter may be used. Throughout, we assume scalars are positive integers and thus belong to the set \mathbbP. The set of all byte sequences is \mathbbB, formally defined in Appendix (163). If such a set of sequences is restricted to those of a particular length, it is denoted with a subscript, thus the set of all byte sequences of length 32 is named \mathbbB32 and the set of all positive integers smaller than 2256is named \mathbbP256. This is formally defined in section (16). Square brackets are used to index into and reference individual components or subsequences of sequences, \eg \boldsymbolm\mathbfs[0] denotes the first item on the machine¢s stack. For subsequences, ellipses are used to specify the intended range, to include elements at both limits, \eg \boldsymbolm\mathbfm[0..31] denotes the first 32 items of the machine¢s memory. In the case of the global state \boldsymbols, which is a sequence of accounts, themselves tuples, the square brackets are used to reference an individual account. When considering variants of existing values, I follow the rule that within a given scope for definition, if we assume that the unmodified input¢ value be denoted by the placeholder [¯] then the modified and utilisable value is denoted as [¯]¢, and intermediate values would be [¯], [¯]**&c. On very particular occasions, in order to maximise readability and only if unambiguous in meaning, I may use alpha-numeric subscripts to denote intermediate values, especially those of particular note. When considering the use of existing functions, given a function f, the function f denotes a similar, element-wise version of the function mapping instead between sequences. It is formally defined in section (16). I define a number of useful functions throughout. One of the more common is l, which evaluates to the last item in the given sequence:
l(\mathbfx) º \mathbfx[\lVert \mathbfx \rVert - 1]    (4)
4 Blocks, State and Transactions
Having introduced the basic concepts behind Ethereum, we will discuss the meaning of a transaction, a block and the state in more detail.
4.1 World State
The world state (\textitstate), is a mapping between addresses (160-bit identifiers) and account states (a data structure serialised as RLP, see Appendix (163)). Though not stored on the blockchain, it is assumed that the implementation will maintain this mapping in a modified Merkle Patricia tree (\textittrie, see Appendix (180)). The trie requires a simple database backend that maintains a mapping of bytearrays to bytearrays; we name this underlying database the state database. This has a number of benefits; firstly the root node of this structure is cryptographically dependent on all internal data and as such its hash can be used as a secure identity for the entire system state. Secondly, being an immutable data structure, it allows any previous state (whose root hash is known) to be recalled by simply altering the root hash accordingly. Since we store all such root hashes in the blockchain, we are able to trivially revert to old states. The account state comprises the following four fields: \begindescription
nonce  A scalar value equal to the number of transactions sent from this address or, in the case of accounts with associated code, the number of contract-creations made by this account. For account of address a in state \boldsymbols, this would be formally denoted \boldsymbols[a]n.
balance  A scalar value equal to the number of Wei owned by this address. Formally denoted \boldsymbols[a]b.
storageRoot  A 256-bit hash of the root node of a Merkle Patricia tree that encodes the storage contents of the account (a mapping between 256-bit integer values), encoded into the trie as a mapping from the Keccak 256-bit hash of the 256-bit integer keys to the RLP-encoded 256-bit integer values. The hash is formally denoted \boldsymbols[a]s.
codeHash  The hash of the EVM code of this account---this is the code that gets executed should this address receive a message call; it is immutable and thus, unlike all other fields, cannot be changed after construction. All such code fragments are contained in the state database under their corresponding hashes for later retrieval. This hash is formally denoted \boldsymbols[a]c, and thus the code may be denoted as \mathbfb, given that \texttt KEC(\mathbfb) = \boldsymbols[a]c. \enddescription Since I typically wish to refer not to the trie¢s root hash but to the underlying set of key/value pairs stored within, I define a convenient equivalence:
\texttt TRIE\big(LI*(\boldsymbols[a]\mathbfs)\big) º \boldsymbols[a]s    (5)
The collapse function for the set of key/value pairs in the trie, LI*, is defined as the element-wise transformation of the base function LI, given as:
LI\big( (k, v) \big) º \big(\texttt KEC(k), \texttt RLP(v)\big)    (6)
where:
k Î \mathbbB32     Ù     v Î \mathbbP    (7)
It shall be understood that \boldsymbols[a]\mathbfs is not a `physical¢ member of the account and does not contribute to its later serialisation. If the \textbfcodeHash field is the Keccak-256 hash of the empty string, i.e. \boldsymbols[a]c = \texttt KEC\big(()\big), then the node represents a simple account, sometimes referred to as a ``non-contract¢¢ account. Thus we may define a world-state collapse function LS:
LS(\boldsymbols) º { p(a): \boldsymbols[a] ¹ \varnothing }    (8)
where
p(a) º \big(\texttt KEC(a), \texttt RLP\big( (\boldsymbols[a]n, \boldsymbols[a]b, \boldsymbols[a]s, \boldsymbols[a]c) \big) \big)    (9)
This function, LS, is used alongside the trie function to provide a short identity (hash) of the world state. We assume:
" a: \boldsymbols[a] = \varnothing    Ú    (a Î \mathbbB20    Ù    v(\boldsymbols[a]))    (10)
where v is the account validity function:
v(x) º xn Î \mathbbP256 Ù xb Î \mathbbP256 Ù xs Î \mathbbB32 Ù xc Î \mathbbB32    (11)
4.2 The Transaction
A transaction (formally, T) is a single cryptographically-signed instruction constructed by an actor externally to the scope of Ethereum. While it is assumed that the ultimate external actor will be human in nature, software tools will be used in its construction and dissemination Notably, such tools¢ could ultimately become so causally removed from their human-based initiation---or humans may become so causally-neutral---that there could be a point at which they rightly be considered autonomous agents. \eg contracts may offer bounties to humans for being sent transactions to initiate their execution.. There are two types of transactions: those which result in message calls and those which result in the creation of new accounts with associated code (known informally as contract creation¢). Both types specify a number of common fields: \begindescription
nonce  A scalar value equal to the number of transactions sent by the sender; formally Tn.
gasPrice  A scalar value equal to the number of Wei to be paid per unit of \textitgas for all computation costs incurred as a result of the execution of this transaction; formally Tp.
gasLimit  A scalar value equal to the maximum amount of gas that should be used in executing this transaction. This is paid up-front, before any computation is done and may not be increased later; formally Tg.
to  The 160-bit address of the message call¢s recipient or, for a contract creation transaction, \varnothing, used here to denote the only member of \mathbbB0 ; formally Tt.
value  A scalar value equal to the number of Wei to be transferred to the message call¢s recipient or, in the case of contract creation, as an endowment to the newly created account; formally Tv.
v, r, s  Values corresponding to the signature of the transaction and used to determine the sender of the transaction; formally Tw, Tr and Ts. This is expanded in Appendix (215). \enddescription Additionally, a contract creation transaction contains: \begindescription
init  An unlimited size byte array specifying the EVM-code for the account initialisation procedure, formally T\mathbfi. \enddescription \textbfinit is an EVM-code fragment; it returns the \textbfbody, a second fragment of code that executes each time the account receives a message call (either through a transaction or due to the internal execution of code). \textbfinit is executed only once at account creation and gets discarded immediately thereafter. In contrast, a message call transaction contains: \begindescription
data  An unlimited size byte array specifying the input data of the message call, formally T\mathbfd. \enddescription Appendix (215)specifies the function, S, which maps transactions to the sender, and happens through the ECDSA of the SECP-256k1 curve, using the hash of the transaction (excepting the latter three signature fields) as the datum to sign. For the present we simply assert that the sender of a given transaction T can be represented with S(T).
LT(T) º \begincases (Tn, Tp, Tg, Tt, Tv, T\mathbfi, Tw, Tr, Ts) & \textif    Tt = \varnothing
(Tn, Tp, Tg, Tt, Tv, T\mathbfd, Tw, Tr, Ts) & \textotherwise \endcases    (12)
Here, we assume all components are interpreted by the RLP as integer values, with the exception of the arbitrary length byte arrays T\mathbfi and T\mathbfd.
Tn Î \mathbbP256
Ù
Tv Î \mathbbP256
Ù
Tp Î \mathbbP256
Ù
Tg Î \mathbbP256
Ù
Tw Î \mathbbP5
Ù
Tr Î \mathbbP256
Ù
Ts Î \mathbbP256
Ù
T\mathbfd Î \mathbbB
Ù
T\mathbfi Î \mathbbB
(13)
where
\mathbbPn = { P: P Î \mathbbP Ù P < 2n }    (14)
The address hash T\mathbft is slightly different: it is either a 20-byte address hash or, in the case of being a contract-creation transaction (and thus formally equal to \varnothing), it is the RLP empty byte sequence and thus the member of \mathbbB0:
T\mathbft Î \begincases \mathbbB20 & \textif     Tt ¹ \varnothing
\mathbbB0 & \textotherwise\endcases    (15)
4.3 The Block
The block in Ethereum is the collection of relevant pieces of information (known as the block \textitheader), H, together with information corresponding to the comprised transactions, \mathbfT, and a set of other block headers \mathbfU that are known to have a parent equal to the present block¢s parent¢s parent (such blocks are known as \textitommers \textitommer is the most prevalent (not saying much) gender-neutral term to mean ``sibling of parent¢¢; see \urlhttps://nonbinary.miraheze.org/wiki/Gender_neutral_language#Aunt.2FUncle). The block header contains several pieces of information: \begindescription
parentHash  The Keccak 256-bit hash of the parent block¢s header, in its entirety; formally Hp.
ommersHash  The Keccak 256-bit hash of the ommers list portion of this block; formally Ho.
beneficiary  The 160-bit address to which all fees collected from the successful mining of this block be transferred; formally Hc.
stateRoot  The Keccak 256-bit hash of the root node of the state trie, after all transactions are executed and finalisations applied; formally Hr.
transactionsRoot  The Keccak 256-bit hash of the root node of the trie structure populated with each transaction in the transactions list portion of the block; formally Ht.
receiptsRoot  The Keccak 256-bit hash of the root node of the trie structure populated with the receipts of each transaction in the transactions list portion of the block; formally He.
logsBloom  The Bloom filter composed from indexable information (logger address and log topics) contained in each log entry from the receipt of each transaction in the transactions list; formally Hb.
difficulty  A scalar value corresponding to the difficulty level of this block. This can be calculated from the previous block¢s difficulty level and the timestamp; formally Hd.
number  A scalar value equal to the number of ancestor blocks. The genesis block has a number of zero; formally Hi.
gasLimit  A scalar value equal to the current limit of gas expenditure per block; formally Hl.
gasUsed  A scalar value equal to the total gas used in transactions in this block; formally Hg.
timestamp  A scalar value equal to the reasonable output of Unix¢s time() at this block¢s inception; formally Hs.
extraData  An arbitrary byte array containing data relevant to this block. This must be 32 bytes or fewer; formally Hx.
mixHash  A 256-bit hash which proves combined with the nonce that a sufficient amount of computation has been carried out on this block; formally Hm.
nonce  A 64-bit hash which proves combined with the mix-hash that a sufficient amount of computation has been carried out on this block; formally Hn. \enddescription The other two components in the block are simply a list of ommer block headers (of the same format as above) and a series of the transactions. Formally, we can refer to a block B:
B º (BH, B\mathbfT, B\mathbfU)    (16)
4.3.1 Transaction Receipt
In order to encode information about a transaction concerning which it may be useful to form a zero-knowledge proof, or index and search, we encode a receipt of each transaction containing certain information from concerning its execution. Each receipt, denoted B\mathbfR[i] for the ith transaction, is placed in an index-keyed trie and the root recorded in the header as He. The transaction receipt is a tuple of four items comprising the post-transaction state, R\boldsymbols, the cumulative gas used in the block containing the transaction receipt as of immediately after the transaction has happened, Ru, the set of logs created through execution of the transaction, R\mathbfl and the Bloom filter composed from information in those logs, Rb:
R º (R\boldsymbols, Ru, Rb, R\mathbfl)    (17)
The function LR trivially prepares a transaction receipt for being transformed into an RLP-serialised byte array:
LR(R) º (\mathtt TRIE(LS(R\boldsymbols)), Ru, Rb, R\mathbfl)    (18)
thus the post-transaction state, R\boldsymbols is encoded into a trie structure, the root of which forms the first item. We assert Ru, the cumulative gas used is a positive integer and that the logs Bloom, Rb, is a hash of size 2048 bits (256 bytes):
Ru Î \mathbbP     Ù     Rb Î \mathbbB256    (19)
The log entries, R\mathbfl, is a series of log entries, termed, for example, (O0, O1, ...). A log entry, O, is a tuple of a logger¢s address, Oa, a series of 32-bytes log topics, O\mathbft and some number of bytes of data, O\mathbfd:
O º (Oa, (O\mathbft0, O\mathbft1, ...), O\mathbfd)    (20)
Oa Î \mathbbB20     Ù     "t Î O\mathbft: t Î \mathbbB32     Ù     O\mathbfd Î \mathbbB    (21)
We define the Bloom filter function, M, to reduce a log entry into a single 256-byte hash:
M(O) º Út Î {Oa} È O\mathbft \big( M3:2048(t) \big)    (22)
where M3:2048 is a specialised Bloom filter that sets three bits out of 2048, given an arbitrary byte sequence. It does this through taking the low-order 11 bits of each of the first three pairs of bytes in a Keccak-256 hash of the byte sequence. Formally:
M3:2048(\mathbfx: \mathbfx Î \mathbbB)
º
\mathbfy: \mathbfy Î \mathbbB256     \textwhere:
(23)
\mathbfy

(0, 0, ..., 0)     \textexcept:
(24)
"i Î {0, 2, 4}
:
\mathcalBm(\mathbfx, i)(\mathbfy) = 1
(25)
m(\mathbfx, i)
º
\mathtt\tiny KEC(\mathbfx)[i, i + 1] \bmod 2048
(26)
where \mathcalB is the bit reference function such that \mathcalBj(\mathbfx) equals the bit of index j (indexed from 0) in the byte array \mathbfx.
4.3.2 Holistic Validity
We can assert a block¢s validity if and only if it satisfies several conditions: it must be internally consistent with the ommer and transaction block hashes and the given transactions B\mathbfT (as specified in sec (145)), when executed in order on the base state \boldsymbols (derived from the final state of the parent block), result in a new state of the identity Hr:
Hr
º
\mathtt TRIE(LS(P(\boldsymbols, B)))
Ù
Ho
º
\mathtt KEC(\mathtt RLP(LH*(B\mathbfU)))
Ù
Ht
º
\mathtt TRIE({" i < \lVert B\mathbfT \rVert, i Î \mathbbP: p(i, LT(B\mathbfT[i]))})
Ù
He
º
\mathtt TRIE({" i < \lVert B\mathbfR \rVert, i Î \mathbbP: p(i, LR(B\mathbfR[i]))})
Ù
Hb
º
Ú\mathbfr Î B\mathbfR \big( \mathbfrb \big)
(27)
where p(k, v) is simply the pairwise RLP transformation, in this case, the first being the index of the transaction in the block and the second being the transaction receipt:
p(k, v) º \big( \mathtt RLP(k), \mathtt RLP(v) \big)    (28)
Furthermore:
\mathtt TRIE(LS(\boldsymbols)) = P(BH)Hr    (29)
Thus \texttt TRIE(LS(\boldsymbols)) is the root node hash of the Merkle Patricia tree structure containing the key-value pairs of the state \boldsymbols with values encoded using RLP, and P(BH) is the parent block of B, defined directly. The values stemming from the computation of transactions, specifically the transaction receipts, B\mathbfR, and that defined through the transactions state-accumulation function, P, are formalised later in section (153).
4.3.3 Serialisation
The function LB and LH are the preparation functions for a block and block header respectively. Much like the transaction receipt preparation function LR, we assert the types and order of the structure for when the RLP transformation is required:
LH(H)
º
(
Hp, Ho, Hc, Hr, Ht, He, Hb, Hd,
Hi, Hl, Hg, Hs, Hx, Hm, Hn    )
(30)
LB(B)
º
\big( LH(BH), LT*(B\mathbfT), LH*(B\mathbfU) \big)
(31)
With LT* and LH* being element-wise sequence transformations, thus:
f*\big( (x0, x1, ...) \big) º \big( f(x0), f(x1), ... \big)     \textfor any function    f    (32)
The component types are defined thus:
Hp Î \mathbbB32
Ù
Ho Î \mathbbB32
Ù
Hc Î \mathbbB20
Ù
Hr Î \mathbbB32
Ù
Ht Î \mathbbB32
Ù
He Î \mathbbB32
Ù
Hb Î \mathbbB256
Ù
Hd Î \mathbbP
Ù
Hi Î \mathbbP
Ù
Hl Î \mathbbP
Ù
Hg Î \mathbbP
Ù
Hs Î \mathbbP256
Ù
Hx Î \mathbbB
Ù
Hm Î \mathbbB32
Ù
Hn Î \mathbbB8
(33)
where
\mathbbBn = { B: B Î \mathbbB Ù \lVert B \rVert = n }    (34)
We now have a rigorous specification for the construction of a formal block structure. The RLP function \texttt RLP (see Appendix (163)) provides the canonical method for transforming this structure into a sequence of bytes ready for transmission over the wire or storage locally.
4.3.4 Block Header Validity
We define P(BH) to be the parent block of B, formally:
P(H) º B¢: \mathtt\tiny KEC(\mathtt\tiny RLP(B¢H)) = Hp    (35)
The block number is the parent¢s block number incremented by one:
Hi º P(H)Hi + 1    (36)
\newcommand\mindifficultyD_0 \newcommand\homesteadmod\ensuremathV_2 \newcommandexpdiffsymb\ensuremathe\newcommand\diffadjustmentx The canonical difficulty of a block of header H is defined as D(H):
D(H) º \begindcases \mindifficulty & \textif     Hi = 0
\textmax	 	\mindifficulty, P(H)Hd + \diffadjustment×\homesteadmod + expdiffsymb	 	& \textotherwise
\enddcases    (37)
where:
\mindifficulty º 131072    (38)
\diffadjustment º lfloor
P(H)Hd
2048
rfloor    (39)
\homesteadmod º \textmax	 	1 - lfloor
Hs - P(H)Hs
10
rfloor, -99	 	    (40)
expdiffsymb º lfloor 2 ë Hi ¸ 100000 û - 2		rfloor    (41)
The canonical gas limit Hl of a block of header H must fulfil the relation:
Hl < P(H)Hl + lfloor
P(H)Hl
1024
rfloor     Ù
(42)
Hl > P(H)Hl - lfloor
P(H)Hl
1024
rfloor     Ù
(43)
Hl ³ slant 5000
(44)
Hs is the timestamp of block H and must fulfil the relation:
Hs > P(H)Hs    (45)
This mechanism enforces a homeostasis in terms of the time between blocks; a smaller period between the last two blocks results in an increase in the difficulty level and thus additional computation required, lengthening the likely next period. Conversely, if the period is too large, the difficulty, and expected time to the next block, is reduced. The nonce, Hn, must satisfy the relations:
n £ slant
2256
Hd
Ù     m = Hm    (46)
with (n, m) = \mathttPoW(H\hcanceln, Hn, \mathbfd). Where H\hcanceln is the new block¢s header H, but \textitwithout the nonce and mix-hash components, \mathbfd being the current DAG, a large data set needed to compute the mix-hash, and \mathttPoW is the proof-of-work function (see section (162)): this evaluates to an array with the first item being the mix-hash, to proof that a correct DAG has been used, and the second item being a pseudo-random number cryptographically dependent on H and \mathbfd. Given an approximately uniform distribution in the range [0, 264), the expected time to find a solution is proportional to the difficulty, Hd. This is the foundation of the security of the blockchain and is the fundamental reason why a malicious node cannot propagate newly created blocks that would otherwise overwrite (``rewrite¢¢) history. Because the nonce must satisfy this requirement, and because its satisfaction depends on the contents of the block and in turn its composed transactions, creating new, valid, blocks is difficult and, over time, requires approximately the total compute power of the trustworthy portion of the mining peers. Thus we are able to define the block header validity function V(H):
V(H)
º
n £ slant
2256
Hd
Ù m = Hm     Ù
(47)
Hd = D(H)     Ù
(48)
Hg £ Hl     Ù
(49)
Hl < P(H)Hl + lfloor
P(H)Hl
1024
rfloor     Ù
(50)
Hl > P(H)Hl - lfloor
P(H)Hl
1024
rfloor     Ù
(51)
Hl ³ slant 5000     Ù
(52)
Hs > P(H)Hs     Ù
(53)
Hi = P(H)Hi +1     Ù
(54)
\lVert Hx \rVert £ 32
(55)
where (n, m) = \mathttPoW(H\hcanceln, Hn, \mathbfd) Noting additionally that \textbfextraData must be at most 32 bytes.
5 Gas and Payment
In order to avoid issues of network abuse and to sidestep the inevitable questions stemming from Turing completeness, all programmable computation in Ethereum is subject to fees. The fee schedule is specified in units of \textitgas (see Appendix (226)for the fees associated with various computation). Thus any given fragment of programmable computation (this includes creating contracts, making message calls, utilising and accessing account storage and executing operations on the virtual machine) has a universally agreed cost in terms of gas. Every transaction has a specific amount of gas associated with it: \textbfgasLimit. This is the amount of gas which is implicitly purchased from the sender¢s account balance. The purchase happens at the according \textbfgasPrice, also specified in the transaction. The transaction is considered invalid if the account balance cannot support such a purchase. It is named \textbfgasLimit since any unused gas at the end of the transaction is refunded (at the same rate of purchase) to the sender¢s account. Gas does not exist outside of the execution of a transaction. Thus for accounts with trusted code associated, a relatively high gas limit may be set and left alone. In general, Ether used to purchase gas that is not refunded is delivered to the \textitbeneficiary address, the address of an account typically under the control of the miner. Transactors are free to specify any \textbfgasPrice that they wish, however miners are free to ignore transactions as they choose. A higher gas price on a transaction will therefore cost the sender more in terms of Ether and deliver a greater value to the miner and thus will more likely be selected for inclusion by more miners. Miners, in general, will choose to advertise the minimum gas price for which they will execute transactions and transactors will be free to canvas these prices in determining what gas price to offer. Since there will be a (weighted) distribution of minimum acceptable gas prices, transactors will necessarily have a trade-off to make between lowering the gas price and maximising the chance that their transaction will be mined in a timely manner.
6 Transaction Execution
The execution of a transaction is the most complex part of the Ethereum protocol: it defines the state transition function U. It is assumed that any transactions executed first pass the initial tests of intrinsic validity. These include:
The transaction is well-formed RLP, with no additional trailing bytes;
the transaction signature is valid;
the transaction nonce is valid (equivalent to the sender account¢s current nonce);
the gas limit is no smaller than the intrinsic gas, g0, used by the transaction;
the sender account balance contains at least the cost, v0, required in up-front payment.
Formally, we consider the function U, with T being a transaction and \boldsymbols the state:
\boldsymbols¢ = U(\boldsymbols, T)    (56)
Thus \boldsymbols¢ is the post-transactional state. We also define Ug to evaluate to the amount of gas used in the execution of a transaction and U\mathbflto evaluate to the transaction¢s accrued log items, both to be formally defined later.
6.1 Substate
Throughout transaction execution, we accrue certain information that is acted upon immediately following the transaction. We call this \textittransaction substate, and represent it as A, which is a tuple:
A º (A\mathbfs, A\mathbfl, Ar)    (57)
The tuple contents include A\mathbfs, the self-destruct set: a set of accounts that will be discarded following the transaction¢s completion. A\mathbfl is the log series: this is a series of archived and indexable `checkpoints¢ in VM code execution that allow for contract-calls to be easily tracked by onlookers external to the Ethereum world (such as decentralised application front-ends). Finally there is Ar, the refund balance, increased through using the SSTORE instruction in order to reset contract storage to zero from some non-zero value. Though not immediately refunded, it is allowed to partially offset the total execution costs. For brevity, we define the empty substate A0 to have no self-destructs, no logs and a zero refund balance:
A0 º (\varnothing, (), 0)    (58)
6.2 Execution
We define intrinsic gas g0, the amount of gas this transaction requires to be paid prior to execution, as follows: \beginalign g_0 º & å_i Î T_\mathbfi, T_\mathbfd \begincases G_txdatazero & \textif     i = 0
G_txdatanonzero & \textotherwise \endcases
& + \begincases G_\texttxcreate & \textif     T_t = \varnothing
0 & \textotherwise \endcases
& + G_transaction \endalign where T\mathbfi,T\mathbfd means the series of bytes of the transaction¢s associated data and initialisation EVM-code, depending on whether the transaction is for contract-creation or message-call. G\texttxcreate is added if the transaction is contract-creating, but not if a result of EVM-code. G is fully defined in Appendix (226). The up-front cost v0 is calculated as:
v0 º Tg Tp + Tv    (59)
The validity is determined as:
S(T)
¹
\varnothing     Ù
\boldsymbols[S(T)]
¹
\varnothing     Ù
Tn

\boldsymbols[S(T)]n     Ù
g0
£ slant
Tg     Ù
v0
£ slant
\boldsymbols[S(T)]b     Ù
Tg
£ slant
BHl - l(B\mathbfR)u
(60)
Note the final condition; the sum of the transaction¢s gas limit, Tg, and the gas utilised in this block prior, given by l(B\mathbfR)u, must be no greater than the block¢s \textbfgasLimit, BHl. The execution of a valid transaction begins with an irrevocable change made to the state: the nonce of the account of the sender, S(T), is incremented by one and the balance is reduced by part of the up-front cost, TgTp. The gas available for the proceeding computation, g, is defined as Tg - g0. The computation, whether contract creation or a message call, results in an eventual state (which may legally be equivalent to the current state), the change to which is deterministic and never invalid: there can be no invalid transactions from this point. We define the checkpoint state \boldsymbols0:
\boldsymbols0
º
\boldsymbols     \textexcept:
(61)
\boldsymbols0[S(T)]b
º
\boldsymbols[S(T)]b - Tg Tp
(62)
\boldsymbols0[S(T)]n
º
\boldsymbols[S(T)]n + 1
(63)
Evaluating \boldsymbolsP from \boldsymbols0 depends on the transaction type; either contract creation or message call; we define the tuple of post-execution provisional state \boldsymbolsP, remaining gas g¢ and substate A:
(\boldsymbolsP, g¢, A) º \begincasesL(\boldsymbols0, S(T), To, &
g, Tp, Tv, T\mathbfi, 0) & \textif     Tt = \varnothing
Q3(\boldsymbols0, S(T), To, &
Tt, Tt, g, Tp, Tv, Tv, T\mathbfd, 0) & \textotherwise \endcases    (64)
where g is the amount of gas remaining after deducting the basic amount required to pay for the existence of the transaction:
g º Tg - g0    (65)
and To is the original transactor, which can differ from the sender in the case of a message call or contract creation not directly triggered by a transaction but coming from the execution of EVM-code. Note we use Q3 to denote the fact that only the first three components of the function¢s value are taken; the final represents the message-call¢s output value (a byte array) and is unused in the context of transaction evaluation. After the message call or contract creation is processed, the state is finalised by determining the amount to be refunded, g* from the remaining gas, g¢, plus some allowance from the refund counter, to the sender at the original rate.
g* º g¢ + \min { \Bigë \dfracTg - g¢2 \Bigû, Ar }    (66)
The total refundable amount is the legitimately remaining gas g¢, added to Ar, with the latter component being capped up to a maximum of half (rounded down) of the total amount used Tg - g¢. The Ether for the gas is given to the miner, whose address is specified as the beneficiary of the present block B. So we define the pre-final state \boldsymbols* in terms of the provisional state \boldsymbolsP:
\boldsymbols*
º
\boldsymbolsP     \textexcept
(67)
\boldsymbols*[S(T)]b
º
\boldsymbolsP[S(T)]b + g* Tp
(68)
\boldsymbols*[m]b
º
\boldsymbolsP[m]b + (Tg - g*) Tp
(69)
m
º
BHc
(70)
The final state, \boldsymbols¢, is reached after deleting all accounts that appear in the self-destruct set:
\boldsymbols¢
º
\boldsymbols*     \textexcept
(71)
" i Î A\mathbfs: \boldsymbols¢[i]
º
\varnothing
(72)
And finally, we specify Ug, the total gas used in this transaction and U\mathbfl, the logs created by this transaction:
Ug(\boldsymbols, T)
º
Tg - g¢
(73)
U\mathbfl	(\boldsymbols, T)
º
A\mathbfl
(74)
These are used to help define the transaction receipt, discussed later.
7 Contract Creation
There are a number of intrinsic parameters used when creating an account: sender (s), original transactor (o), available gas (g), gas price (p), endowment (v) together with an arbitrary length byte array, \mathbfi, the initialisation EVM code and finally the present depth of the message-call/contract-creation stack (e). We define the creation function formally as the function L, which evaluates from these values, together with the state \boldsymbols to the tuple containing the new state, remaining gas and accrued transaction substate (\boldsymbols¢, g¢, A), as in section (56):
(\boldsymbols¢, g¢, A) º L(\boldsymbols, s, o, g, p, v, \mathbfi, e)    (75)
The address of the new account is defined as being the rightmost 160 bits of the Keccak hash of the RLP encoding of the structure containing only the sender and the nonce. Thus we define the resultant address for the new account a:
a º \mathcalB96..255\Big(\mathtt\tiny KEC\Big(\mathtt\tiny RLP\big(  (s, \boldsymbols[s]n - 1)  \big)\Big)\Big)    (76)
where \mathtt\tiny KEC is the Keccak 256-bit hash function, \mathtt\tiny RLP is the RLP encoding function, \mathcalBa..b(X) evaluates to binary value containing the bits of indices in the range [a, b] of the binary data X and \boldsymbols[x] is the address state of x or \varnothing if none exists. Note we use one fewer than the sender¢s nonce value; we assert that we have incremented the sender account¢s nonce prior to this call, and so the value used is the sender¢s nonce at the beginning of the responsible transaction or VM operation. The account¢s nonce is initially defined as zero, the balance as the value passed, the storage as empty and the code hash as the Keccak 256-bit hash of the empty string; the sender¢s balance is also reduced by the value passed. Thus the mutated state becomes \boldsymbols*:
\boldsymbols* º \boldsymbols     \textexcept:    (77)
\boldsymbols*[a]
º
\big( 0, v + v¢, \mathtt\tiny TRIE(\varnothing), \mathtt\tiny KEC\big(()\big) \big)
(78)
\boldsymbols*[s]b
º
\boldsymbols[s]b - v
(79)
where v¢ is the account¢s pre-existing value, in the event it was previously in existence:
v¢ º \begincases 0 & \textif     \boldsymbols[a] = \varnothing
\boldsymbols[a]b & \textotherwise \endcases    (80)
Finally, the account is initialised through the execution of the initialising EVM code \mathbfi according to the execution model (see section (113)). Code execution can effect several events that are not internal to the execution state: the account¢s storage can be altered, further accounts can be created and further message calls can be made. As such, the code execution function X evaluates to a tuple of the resultant state \boldsymbols**, available gas remaining g**, the accrued substate A and the body code of the account \mathbfo.
(\boldsymbols**	, g**	, A, \mathbfo) º X(\boldsymbols*, g, I)
(81)
where I contains the parameters of the execution environment as defined in section (113), that is:
Ia
º
a
(82)
Io
º
o
(83)
Ip
º
p
(84)
I\mathbfd
º
()
(85)
Is
º
s
(86)
Iv
º
v
(87)
I\mathbfb
º
\mathbfi
(88)
Ie
º
e
(89)
I\mathbfd evaluates to the empty tuple as there is no input data to this call. IH has no special treatment and is determined from the blockchain. Code execution depletes gas, and gas may not go below zero, thus execution may exit before the code has come to a natural halting state. In this (and several other) exceptional cases we say an out-of-gas (OOG) exception has occurred: The evaluated state is defined as being the empty set, \varnothing, and the entire create operation should have no effect on the state, effectively leaving it as it was immediately prior to attempting the creation. If the initialization code completes successfully, a final contract-creation cost is paid, the code-deposit cost, c, proportional to the size of the created contract¢s code:
c º Gcodedeposit × |\mathbfo|    (90)
If there is not enough gas remaining to pay this, ie g**< c, then we also declare an out-of-gas exception. The gas remaining will be zero in any such exceptional condition, ie if the creation was conducted as the reception of a transaction, then this doesn¢t affect payment of the intrinsic cost of contract creation; it is paid regardless. However, the value of the transaction is not transferred to the aborted contract¢s address when we are out-of-gas. If such an exception does not occur, then the remaining gas is refunded to the originator and the now-altered state is allowed to persist. Thus formally, we may specify the resultant state, gas and substate as (\boldsymbols¢, g¢, A) where: \beginalign     g¢ & º \begincases 0 & \textif     F
g**- c & \textotherwise
\endcases
\boldsymbols¢ & º \begincases \boldsymbols & \textif     F
\boldsymbols**    \textexcept: &
\boldsymbols¢[a]_c = \texttt KEC(\mathbfo) & \textotherwise \endcases
\textwhere
F & º \big(\boldsymbols**= \varnothing Ú g**< c Ú |\mathbfo| > 24576\big) \endalign The exception in the determination of \boldsymbols¢ dictates that \mathbfo, the resultant byte sequence from the execution of the initialisation code, specifies the final body code for the newly-created account. Note that intention is that the result is either a successfully created new contract with its endowment, or no new contract with no transfer of value.
7.1 Subtleties
Note that while the initialisation code is executing, the newly created address exists but with no intrinsic body code. Thus any message call received by it during this time causes no code to be executed. If the initialisation execution ends with a SELFDESTRUCT instruction, the matter is moot since the account will be deleted before the transaction is completed. For a normal STOP code, or if the code returned is otherwise empty, then the state is left with a zombie account, and any remaining balance will be locked into the account forever.
8 Message Call
In the case of executing a message call, several parameters are required: sender (s), transaction originator (o), recipient (r), the account whose code is to be executed (c, usually the same as recipient), available gas (g), value (v) and gas price (p) together with an arbitrary length byte array, \mathbfd, the input data of the call and finally the present depth of the message-call/contract-creation stack (e). Aside from evaluating to a new state and transaction substate, message calls also have an extra component---the output data denoted by the byte array \mathbfo. This is ignored when executing transactions, however message calls can be initiated due to VM-code execution and in this case this information is used.
(\boldsymbols¢, g¢, A, \mathbfo) º Q(\boldsymbols, s, o, r, c, g, p, v, ~ev, \mathbfd, e)
Note that we need to differentiate between the value that is to be transferred, v, from the value apparent in the execution context, ~ev, for the DELEGATECALL instruction. We define \boldsymbols1, the first transitional state as the original state but with the value transferred from sender to recipient:
\boldsymbols1[r]b º \boldsymbols[r]b + v    Ù    \boldsymbols1[s]b º \boldsymbols[s]b - v    (91)
unless s = r. Throughout the present work, it is assumed that if \boldsymbols1[r] was originally undefined, it will be created as an account with no code or state and zero balance and nonce. Thus the previous equation should be taken to mean:
\boldsymbols1 º \boldsymbols1¢     \textexcept:
(92)
\boldsymbols1[s]b º \boldsymbols1¢[s]b - v    (93)
\textand    \boldsymbols1¢ º \boldsymbols     \textexcept:
(94)
\begincases \boldsymbols1¢[r] º (0, v, \mathtt\tiny TRIE(\varnothing), \mathtt\tiny KEC(())) & \textif     \boldsymbols[r] = \varnothing
\boldsymbols1¢[r]b º \boldsymbols[r]b + v & \textotherwise \endcases    (95)
The account¢s associated code (identified as the fragment whose Keccak hash is \boldsymbols[c]c) is executed according to the execution model (see section (113)). Just as with contract creation, if the execution halts in an exceptional fashion (i.e. due to an exhausted gas supply, stack underflow, invalid jump destination or invalid instruction), then no gas is refunded to the caller and the state is reverted to the point immediately prior to balance transfer (i.e. \boldsymbols).
\boldsymbols¢
º
\begincases \boldsymbols
\textif     \boldsymbols**	= \varnothing
(96)
\boldsymbols**
\textotherwise \endcases
(97)
g¢
º
\begincases 0
\textif     \boldsymbols**	= \varnothing
(98)
g**
\textotherwise \endcases
(99)
(\boldsymbols**	, g**	, A, \mathbfo)
º
\begincasesX\mathttECREC(\boldsymbols1, g, I)
\textif     r = 1
(100)
X\mathttSHA256(\boldsymbols1, g, I)
\textif     r = 2
(101)
X\mathttRIP160(\boldsymbols1, g, I)
\textif     r = 3
(102)
X\mathttID(\boldsymbols1, g, I)
\textif     r = 4
(103)
X(\boldsymbols1, g, I)
\textotherwise \endcases
(104)
Ia
º
r
(105)
Io
º
o
(106)
Ip
º
p
(107)
I\mathbfd
º
\mathbfd
(108)
Is
º
s
(109)
Iv
º
~ev
(110)
Ie
º
e
(111)
\textLet    \mathtt\tiny KEC(I\mathbfb)

\boldsymbols[c]c
(112)
It is assumed that the client will have stored the pair (\mathtt\tiny KEC(I\mathbfb), I\mathbfb) at some point prior in order to make the determination of I\mathbfb feasible. As can be seen, there are four exceptions to the usage of the general execution framework X for evaluation of the message call: these are four so-called `precompiled¢ contracts, meant as a preliminary piece of architecture that may later become \textitnative extensions. The four contracts in addresses 1, 2, 3 and 4 execute the elliptic curve public key recovery function, the SHA2 256-bit hash scheme, the RIPEMD 160-bit hash scheme and the identity function respectively. Their full formal definition is in Appendix (188).
9 Execution Model
The execution model specifies how the system state is altered given a series of bytecode instructions and a small tuple of environmental data. This is specified through a formal model of a virtual state machine, known as the Ethereum Virtual Machine (EVM). It is a \textitquasi-Turing-complete machine; the \textitquasi qualification comes from the fact that the computation is intrinsically bounded through a parameter, \textitgas, which limits the total amount of computation done.
9.1 Basics
The EVM is a simple stack-based architecture. The word size of the machine (and thus size of stack item) is 256-bit. This was chosen to facilitate the Keccak-256 hash scheme and elliptic-curve computations. The memory model is a simple word-addressed byte array. The stack has a maximum size of 1024. The machine also has an independent storage model; this is similar in concept to the memory but rather than a byte array, it is a word-addressable word array. Unlike memory, which is volatile, storage is non volatile and is maintained as part of the system state. All locations in both storage and memory are well-defined initially as zero. The machine does not follow the standard von Neumann architecture. Rather than storing program code in generally-accessible memory or storage, it is stored separately in a virtual ROM interactable only through a specialised instruction. The machine can have exceptional execution for several reasons, including stack underflows and invalid instructions. Like the out-of-gas exception, they do not leave state changes intact. Rather, the machine halts immediately and reports the issue to the execution agent (either the transaction processor or, recursively, the spawning execution environment) which will deal with it separately.
9.2 Fees Overview
Fees (denominated in gas) are charged under three distinct circumstances, all three as prerequisite to the execution of an operation. The first and most common is the fee intrinsic to the computation of the operation (see Appendix (226)). Secondly, gas may be deducted in order to form the payment for a subordinate message call or contract creation; this forms part of the payment for CREATE, CALL and CALLCODE. Finally, gas may be paid due to an increase in the usage of the memory. Over an account¢s execution, the total fee for memory-usage payable is proportional to smallest multiple of 32 bytes that are required such that all memory indices (whether for read or write) are included in the range. This is paid for on a just-in-time basis; as such, referencing an area of memory at least 32 bytes greater than any previously indexed memory will certainly result in an additional memory usage fee. Due to this fee it is highly unlikely addresses will ever go above 32-bit bounds. That said, implementations must be able to manage this eventuality. Storage fees have a slightly nuanced behaviour---to incentivise minimisation of the use of storage (which corresponds directly to a larger state database on all nodes), the execution fee for an operation that clears an entry in the storage is not only waived, a qualified refund is given; in fact, this refund is effectively paid up-front since the initial usage of a storage location costs substantially more than normal usage. See Appendix (226)for a rigorous definition of the EVM gas cost.
9.3 Execution Environment
In addition to the system state \boldsymbols, and the remaining gas for computation g, there are several pieces of important information used in the execution environment that the execution agent must provide; these are contained in the tuple I:
Ia, the address of the account which owns the code that is executing.
Io, the sender address of the transaction that originated this execution.
Ip, the price of gas in the transaction that originated this execution.
I\mathbfd, the byte array that is the input data to this execution; if the execution agent is a transaction, this would be the transaction data.
Is, the address of the account which caused the code to be executing; if the execution agent is a transaction, this would be the transaction sender.
Iv, the value, in Wei, passed to this account as part of the same procedure as execution; if the execution agent is a transaction, this would be the transaction value.
I\mathbfb, the byte array that is the machine code to be executed.
IH, the block header of the present block.
Ie, the depth of the present message-call or contract-creation (i.e. the number of CALLs or CREATEs being executed at present).
The execution model defines the function X, which can compute the resultant state \boldsymbols¢, the remaining gas g¢, the accrued substate A and the resultant output, \mathbfo, given these definitions. For the present context, we will define it as:
(\boldsymbols¢, g¢, A, \mathbfo) º X(\boldsymbols, g, I)    (113)
where we will remember that A, the accrued substate is defined as the tuple of the suicides set \mathbfs, the log series \mathbfl and the refunds r:
A º (\mathbfs, \mathbfl, r)    (114)
9.4 Execution Overview
We must now define the X function. In most practical implementations this will be modelled as an iterative progression of the pair comprising the full system state, \boldsymbols and the machine state, \boldsymbolm. Formally, we define it recursively with a function X. This uses an iterator function O (which defines the result of a single cycle of the state machine) together with functions Z which determines if the present state is an exceptional halting state of the machine and H, specifying the output data of the instruction if and only if the present state is a normal halting state of the machine. The empty sequence, denoted (), is not equal to the empty set, denoted \varnothing; this is important when interpreting the output of H, which evaluates to \varnothing when execution is to continue but a series (potentially empty) when execution should halt.
X(\boldsymbols, g, I)
º
(\boldsymbols¢, \boldsymbolm¢g, A, \mathbfo)
(115)
(\boldsymbols, \boldsymbolm¢, A, ..., \mathbfo)
º
X\big((\boldsymbols, \boldsymbolm, A0, I)\big)
(116)
\boldsymbolmg
º
g
(117)
\boldsymbolmpc
º
0
(118)
\boldsymbolm\mathbfm
º
(0, 0, ...)
(119)
\boldsymbolmi
º
0
(120)
\boldsymbolm\mathbfs
º
()
(121)
X\big( (\boldsymbols, \boldsymbolm, A, I) \big) º \begincases \big(\varnothing, \boldsymbolm, A0, I, ()\big) & \textif     Z(\boldsymbols, \boldsymbolm, I)
O(\boldsymbols, \boldsymbolm, A, I) · \mathbfo & \textif     \mathbfo ¹ \varnothing
X\big(O(\boldsymbols, \boldsymbolm, A, I)\big) & \textotherwise
\endcases    (122)
where
\mathbfo
º
H(\boldsymbolm, I)
(123)
(a, b, c, d) · e
º
(a, b, c, d, e)
(124)
Note that, when we evaluate X, we drop the fourth element I¢ and extract the remaining gas \boldsymbolm¢g from the resultant machine state \boldsymbolm¢. X is thus cycled (recursively here, but implementations are generally expected to use a simple iterative loop) until either Z becomes true indicating that the present state is exceptional and that the machine must be halted and any changes discarded or until H becomes a series (rather than the empty set) indicating that the machine has reached a controlled halt.
9.4.1 Machine State
The machine state \boldsymbolm is defined as the tuple (g, pc, \mathbfm, i, \mathbfs) which are the gas available, the program counter pc Î \mathbbP256 , the memory contents, the active number of words in memory (counting continuously from position 0), and the stack contents. The memory contents \boldsymbolm\mathbfm are a series of zeroes of size 2256. For the ease of reading, the instruction mnemonics, written in small-caps (\eg \space ADD), should be interpreted as their numeric equivalents; the full table of instructions and their specifics is given in Appendix (226). For the purposes of defining Z, H and O, we define w as the current operation to be executed:
w º \begincases I\mathbfb[\boldsymbolmpc] & \textif     \boldsymbolmpc < \lVert I\mathbfb \rVert
\text STOP & \textotherwise \endcases    (125)
We also assume the fixed amounts of \mathbfd and \mathbfa, specifying the stack items removed and added, both subscriptable on the instruction and an instruction cost function C evaluating to the full cost, in gas, of executing the given instruction.
9.4.2 Exceptional Halting
The exceptional halting function Z is defined as:
Z(\boldsymbols, \boldsymbolm, I) º
\boldsymbolmg < C(\boldsymbols, \boldsymbolm, I)     Ú
\mathbfdw = \varnothing     Ú
\lVert\boldsymbolm\mathbfs\rVert < \mathbfdw     Ú
( w Î { \text JUMP, \text JUMPI }     Ù
\boldsymbolm\mathbfs[0] Ï D(I\mathbfb) )     Ú
\lVert\boldsymbolm\mathbfs\rVert - \mathbfdw + \mathbfaw > 1024

(126)
This states that the execution is in an exceptional halting state if there is insufficient gas, if the instruction is invalid (and therefore its d subscript is undefined), if there are insufficient stack items, if a JUMP/ JUMPI destination is invalid or the new stack size would be larger than 1024. The astute reader will realise that this implies that no instruction can, through its execution, cause an exceptional halt.
9.4.3 Jump Destination Validity
We previously used D as the function to determine the set of valid jump destinations given the code that is being run. We define this as any position in the code occupied by a JUMPDEST instruction. All such positions must be on valid instruction boundaries, rather than sitting in the data portion of PUSH operations and must appear within the explicitly defined portion of the code (rather than in the implicitly defined STOP operations that trail it). Formally:
D(\mathbfc) º DJ(\mathbfc, 0)    (127)
where:
DJ(\mathbfc, i) º \begincases {} & \textif     i ³ slant |\mathbfc|
{ i } È DJ(\mathbfc, N(i, \mathbfc[i])) & \textif     \mathbfc[i] = \text JUMPDEST
DJ(\mathbfc, N(i, \mathbfc[i])) & \textotherwise
\endcases    (128)
where N is the next valid instruction position in the code, skipping the data of a PUSH instruction, if any:
N(i, w) º \begincases i + w - \text PUSH1 + 2 & \textif     w Î [\text PUSH1, \text PUSH32]
i + 1 & \textotherwise \endcases    (129)
9.4.4 Normal Halting
The normal halting function H is defined:
H(\boldsymbolm, I) º \begincases H\text\tiny RETURN(\boldsymbolm) & \textif     w = \text RETURN
() & \textif     w Î { \text STOP, \text SELFDESTRUCT }
\varnothing & \textotherwise \endcases    (130)
The data-returning halt operation, \text RETURN, has a special function H\text\tiny RETURN, defined in Appendix (226).
9.5 The Execution Cycle
Stack items are added or removed from the left-most, lower-indexed portion of the series; all other items remain unchanged:
O\big((\boldsymbols, \boldsymbolm, A, I)\big)
º
(\boldsymbols¢, \boldsymbolm¢, A¢, I)
(131)
D
º
\mathbfaw - \mathbfdw
(132)
\lVert\boldsymbolm¢\mathbfs\rVert
º
\lVert\boldsymbolm\mathbfs\rVert + D
(133)
" x Î [\mathbfaw, \lVert\boldsymbolm¢\mathbfs\rVert): \boldsymbolm¢\mathbfs[x]
º
\boldsymbolm\mathbfs[x-D]
(134)
The gas is reduced by the instruction¢s gas cost and for most instructions, the program counter increments on each cycle, for the three exceptions, we assume a function J, subscripted by one of two instructions, which evaluates to the according value:
\boldsymbolm¢g
º
\boldsymbolmg - C(\boldsymbols, \boldsymbolm, I)
(135)
\boldsymbolm¢pc
º
\begincases J\textJUMP(\boldsymbolm)
\textif     w = \text JUMP
(136)
J\textJUMPI(\boldsymbolm)
\textif     w = \text JUMPI
(137)
N(\boldsymbolmpc, w)
\textotherwise \endcases
(138)
In general, we assume the memory, self-destruct set and system state don¢t change:
\boldsymbolm¢\mathbfm
º
\boldsymbolm\mathbfm
(139)
\boldsymbolm¢i
º
\boldsymbolmi
(140)
A¢
º
A
(141)
\boldsymbols¢
º
\boldsymbols
(142)
However, instructions do typically alter one or several components of these values. Altered components listed by instruction are noted in Appendix (226), alongside values for a and d and a formal description of the gas requirements.
10 Blocktree to Blockchain
The canonical blockchain is a path from root to leaf through the entire block tree. In order to have consensus over which path it is, conceptually we identify the path that has had the most computation done upon it, or, the \textitheaviest path. Clearly one factor that helps determine the heaviest path is the block number of the leaf, equivalent to the number of blocks, not counting the unmined genesis block, in the path. The longer the path, the greater the total mining effort that must have been done in order to arrive at the leaf. This is akin to existing schemes, such as that employed in Bitcoin-derived protocols. Since a block header includes the difficulty, the header alone is enough to validate the computation done. Any block contributes toward the total computation or \textittotal difficulty of a chain. Thus we define the total difficulty of block B recursively as:
Bt
º
B¢t + Bd
(143)
B¢
º
P(BH)
(144)
As such given a block B, Bt is its total difficulty, B¢ is its parent block and Bd is its difficulty.
11 Block Finalisation
The process of finalising a block involves four stages:
Validate (or, if mining, determine) ommers;
validate (or, if mining, determine) transactions;
apply rewards;
verify (or, if mining, compute a valid) state and nonce.
11.1 Ommer Validation
The validation of ommer headers means nothing more than verifying that each ommer header is both a valid header and satisfies the relation of Nth-generation ommer to the present block where N £ 6. The maximum of ommer headers is two. Formally:
\lVert B\mathbfU \rVert £ slant 2 ÙU Î B\mathbfU V(U)    Ù    k(U, P(BH)H, 6)    (145)
where k denotes the is-kin¢¢ property: k(U, H, n) º \begincases false & \textif     n = 0  s(U, H) & Ú    k(U, P(H)H, n - 1) & \textotherwise \endcases    (146) and s denotes the is-sibling¢¢ property:
s(U, H) º (P(H) = P(U)   Ù    H ¹ U    Ù    U Ï B(H)\mathbfU)    (147)
where B(H) is the block of the corresponding header H.
11.2 Transaction Validation
The given \textbfgasUsed must correspond faithfully to the transactions listed: BHg, the total gas used in the block, must be equal to the accumulated gas used according to the final transaction:
BHg = l(\mathbfR)u    (148)
11.3 Reward Application
The application of rewards to a block involves raising the balance of the accounts of the beneficiary address of the block and each ommer by a certain amount. We raise the block¢s beneficiary account by Rb; for each ommer, we raise the block¢s beneficiary by an additional (1)/(32) of the block reward and the beneficiary of the ommer gets rewarded depending on the block number. Formally we define the function W:
W(B, \boldsymbols)
º
\boldsymbols¢: \boldsymbols¢ = \boldsymbols     \textexcept:
(149)
\boldsymbols¢[BHc]b

\boldsymbols[BHc]b + (1 +
\lVert B\mathbfU\rVert
32
)Rb
(150)
"U Î B\mathbfU:
(151)
\boldsymbols¢[Uc]b

\boldsymbols[Uc]b + (1 +
1
8
(Ui - BHi)) Rb
If there are collisions of the beneficiary addresses between ommers and the block (i.e. two ommers with the same beneficiary address or an ommer with the same beneficiary address as the present block), additions are applied cumulatively. We define the block reward as 5 Ether:
\textLet     Rb = 5 × 1018	    (152)
11.4 State & Nonce Validation
We may now define the function, G, that maps a block B to its initiation state:
G(B) º \begincases \boldsymbols0 & \textif     P(BH) = \varnothing
\boldsymbolsi: \mathtt TRIE(LS(\boldsymbolsi)) = P(BH)Hr & \textotherwise \endcases    (153)
Here, \mathtt TRIE(LS(\boldsymbolsi)) means the hash of the root node of a trie of state \boldsymbolsi; it is assumed that implementations will store this in the state database, trivial and efficient since the trie is by nature an immutable data structure. And finally define F, the block transition function, which maps an incomplete block B to a complete block B¢:
F(B)
º
B¢:     B¢ = B*     \textexcept:
(154)
B¢n

n:     x £ slant
2256
Hd
(155)
B¢m

m     \textwith (x, m) = \mathttPoW(B*\hcanceln, n, \mathbfd)
(156)
B*
º
B     \textexcept:     B*r = r(P(G(B), B))
(157)
With \mathbfd being a dataset as specified in appendix (233). As specified at the beginning of the present work, P is the state-transition function, which is defined in terms of W, the block finalisation function and U, the transaction-evaluation function, both now well-defined. As previously detailed, \mathbfR[n]\boldsymbols, \mathbfR[n]\mathbfl and \mathbfR[n]u are the nth corresponding states, logs and cumulative gas used after each transaction (\mathbfR[n]b, the fourth component in the tuple, has already been defined in terms of the logs). The former is defined simply as the state resulting from applying the corresponding transaction to the state resulting from the previous transaction (or the block¢s initial state in the case of the first such transaction):
\mathbfR[n]\boldsymbols = \begincases G(B) & \textif     n < 0
U(\mathbfR[n - 1]\boldsymbols, B\mathbfT[n]) & \textotherwise \endcases    (158)
In the case of B\mathbfR[n]u, we take a similar approach defining each item as the gas used in evaluating the corresponding transaction summed with the previous item (or zero, if it is the first), giving us a running total:
\mathbfR[n]u = \begincases 0 & \textif     n < 0
Ug(\mathbfR[n - 1]\boldsymbols, B\mathbfT[n])
+ \mathbfR[n-1]u
& \textotherwise \endcases    (159)
For \mathbfR[n]\mathbfl, we utilise the U\mathbflfunction that we conveniently defined in the transaction execution function.
\mathbfR[n]\mathbfl =U\mathbfl	(\mathbfR[n - 1]\boldsymbols, B\mathbfT[n])    (160)
Finally, we define P as the new state given the block reward function W applied to the final transaction¢s resultant state, l(B\mathbfR)\boldsymbols:
P(\boldsymbols, B) º W(B, l(\mathbfR)\boldsymbols)    (161)
Thus the complete block-transition mechanism, less \mathttPoW, the proof-of-work function is defined.
11.5 Mining Proof-of-Work
The mining proof-of-work (PoW) exists as a cryptographically secure nonce that proves beyond reasonable doubt that a particular amount of computation has been expended in the determination of some token value n. It is utilised to enforce the blockchain security by giving meaning and credence to the notion of difficulty (and, by extension, total difficulty). However, since mining new blocks comes with an attached reward, the proof-of-work not only functions as a method of securing confidence that the blockchain will remain canonical into the future, but also as a wealth distribution mechanism. For both reasons, there are two important goals of the proof-of-work function; firstly, it should be as accessible as possible to as many people as possible. The requirement of, or reward from, specialised and uncommon hardware should be minimised. This makes the distribution model as open as possible, and, ideally, makes the act of mining a simple swap from electricity to Ether at roughly the same rate for anyone around the world. Secondly, it should not be possible to make super-linear profits, and especially not so with a high initial barrier. Such a mechanism allows a well-funded adversary to gain a troublesome amount of the network¢s total mining power and as such gives them a super-linear reward (thus skewing distribution in their favour) as well as reducing the network security. One plague of the Bitcoin world is ASICs. These are specialised pieces of compute hardware that exist only to do a single task. In Bitcoin¢s case the task is the SHA256 hash function. While ASICs exist for a proof-of-work function, both goals are placed in jeopardy. Because of this, a proof-of-work function that is ASIC-resistant (i.e. difficult or economically inefficient to implement in specialised compute hardware) has been identified as the proverbial silver bullet. Two directions exist for ASIC resistance; firstly make it sequential memory-hard, i.e. engineer the function such that the determination of the nonce requires a lot of memory and bandwidth such that the memory cannot be used in parallel to discover multiple nonces simultaneously. The second is to make the type of computation it would need to do general-purpose; the meaning of specialised hardware¢¢ for a general-purpose task set is, naturally, general purpose hardware and as such commodity desktop computers are likely to be pretty close to specialised hardware¢¢ for the task. For Ethereum 1.0 we have chosen the first path. More formally, the proof-of-work function takes the form of \mathttPoW:
m = Hm     Ù     n £ slant
2256
Hd
\textwith     (m, n) = \mathttPoW(H\hcanceln, Hn, \mathbfd)    (162)
Where H\hcanceln is the new block¢s header but \textitwithout the nonce and mix-hash components; Hn is the nonce of the header; \mathbfd is a large data set needed to compute the mixHash and Hd is the new block¢s difficulty value (i.e. the block difficulty from section (143)). \mathttPoW is the proof-of-work function which evaluates to an array with the first item being the mixHash and the second item being a pseudo-random number cryptographically dependent on H and \mathbfd. The underlying algorithm is called Ethash and is described below.
11.5.1 Ethash
Ethash is the PoW algorithm for Ethereum 1.0. It is the latest version of Dagger-Hashimoto, introduced by and , although it can no longer appropriately be called that since many of the original features of both algorithms have been drastically changed in the last month of research and development. The general route that the algorithm takes is as follows: There exists a seed which can be computed for each block by scanning through the block headers up until that point. From the seed, one can compute a pseudorandom cache, Jcacheinit bytes in initial size. Light clients store the cache. From the cache, we can generate a dataset, Jdatasetinit bytes in initial size, with the property that each item in the dataset depends on only a small number of items from the cache. Full clients and miners store the dataset. The dataset grows linearly with time. Mining involves grabbing random slices of the dataset and hashing them together. Verification can be done with low memory by using the cache to regenerate the specific pieces of the dataset that you need, so you only need to store the cache. The large dataset is updated once every Jepoch blocks, so the vast majority of a miner¢s effort will be reading the dataset, not making changes to it. The mentioned parameters as well as the algorithm is explained in detail in appendix (233).
12 Implementing Contracts
There are several patterns of contracts engineering that allow particular useful behaviours; two of these that I will briefly discuss are data feeds and random numbers.
12.1 Data Feeds
A data feed contract is one which provides a single service: it gives access to information from the external world within Ethereum. The accuracy and timeliness of this information is not guaranteed and it is the task of a secondary contract author---the contract that utilises the data feed---to determine how much trust can be placed in any single data feed. The general pattern involves a single contract within Ethereum which, when given a message call, replies with some timely information concerning an external phenomenon. An example might be the local temperature of New York City. This would be implemented as a contract that returned that value of some known point in storage. Of course this point in storage must be maintained with the correct such temperature, and thus the second part of the pattern would be for an external server to run an Ethereum node, and immediately on discovery of a new block, creates a new valid transaction, sent to the contract, updating said value in storage. The contract¢s code would accept such updates only from the identity contained on said server.
12.2 Random Numbers
Providing random numbers within a deterministic system is, naturally, an impossible task. However, we can approximate with pseudo-random numbers by utilising data which is generally unknowable at the time of transacting. Such data might include the block¢s hash, the block¢s timestamp and the block¢s beneficiary address. In order to make it hard for malicious miner to control those values, one should use the BLOCKHASH operation in order to use hashes of the previous 256 blocks as pseudo-random numbers. For a series of such numbers, a trivial solution would be to add some constant amount and hashing the result.
13 Future Directions
The state database won¢t be forced to maintain all past state trie structures into the future. It should maintain an age for each node and eventually discard nodes that are neither recent enough nor checkpoints; checkpoints, or a set of nodes in the database that allow a particular block¢s state trie to be traversed, could be used to place a maximum limit on the amount of computation needed in order to retrieve any state throughout the blockchain. Blockchain consolidation could be used in order to reduce the amount of blocks a client would need to download to act as a full, mining, node. A compressed archive of the trie structure at given points in time (perhaps one in every 10,000th block) could be maintained by the peer network, effectively recasting the genesis block. This would reduce the amount to be downloaded to a single archive plus a hard maximum limit of blocks. Finally, blockchain compression could perhaps be conducted: nodes in state trie that haven¢t sent/received a transaction in some constant amount of blocks could be thrown out, reducing both Ether-leakage and the growth of the state database.
13.1 Scalability
Scalability remains an eternal concern. With a generalised state transition function, it becomes difficult to partition and parallelise transactions to apply the divide-and-conquer strategy. Unaddressed, the dynamic value-range of the system remains essentially fixed and as the average transaction value increases, the less valuable of them become ignored, being economically pointless to include in the main ledger. However, several strategies exist that may potentially be exploited to provide a considerably more scalable protocol. Some form of hierarchical structure, achieved by either consolidating smaller lighter-weight chains into the main block or building the main block through the incremental combination and adhesion (through proof-of-work) of smaller transaction sets may allow parallelisation of transaction combination and block-building. Parallelism could also come from a prioritised set of parallel blockchains, consolidated each block and with duplicate or invalid transactions thrown out accordingly. Finally, verifiable computation, if made generally available and efficient enough, may provide a route to allow the proof-of-work to be the verification of final state.
14 Conclusion
I have introduced, discussed and formally defined the protocol of Ethereum. Through this protocol the reader may implement a node on the Ethereum network and join others in a decentralised secure social operating system. Contracts may be authored in order to algorithmically specify and autonomously enforce rules of interaction.
15 Acknowledgements
Many thanks to Aeron Buchanan for authoring the Homestead revisions, Christoph Jentzsch for authoring the Ethash algorithm and Yoichi Hirai for doing most of the EIP-150 changes. Important maintenance, useful corrections and suggestions were provided by a number of others from the Ethereum DEV organisation and Ethereum community at large including Gustav Simonsson, Pawe\l Bylica, Jutta Steiner, Nick Savers, Viktor Tr\¢on, Marko Simovic, Giacomo Tazzari and, of course, Vitalik Buterin.
16 Availability
The source of this paper is maintained at \urlhttps://github.com/ethereum/yellowpaper/. An auto-generated PDF is located at \urlhttps://ethereum.github.io/yellowpaper/paper.pdf. \bibliographyBiblio \bibliographystyleplainnat \endmulticols \appendix
17 Terminology
\begindescription
External Actor  A person or other entity able to interface to an Ethereum node, but external to the world of Ethereum. It can interact with Ethereum through depositing signed Transactions and inspecting the blockchain and associated state. Has one (or more) intrinsic Accounts.
Address  A 160-bit code used for identifying Accounts.
Account  Accounts have an intrinsic balance and transaction count maintained as part of the Ethereum state. They also have some (possibly empty) EVM Code and a (possibly empty) Storage State associated with them. Though homogenous, it makes sense to distinguish between two practical types of account: those with empty associated EVM Code (thus the account balance is controlled, if at all, by some external entity) and those with non-empty associated EVM Code (thus the account represents an Autonomous Object). Each Account has a single Address that identifies it.
Transaction  A piece of data, signed by an External Actor. It represents either a Message or a new Autonomous Object. Transactions are recorded into each block of the blockchain.
Autonomous Object  A notional object existent only within the hypothetical state of Ethereum. Has an intrinsic address and thus an associated account; the account will have non-empty associated EVM Code. Incorporated only as the Storage State of that account.
Storage State  The information particular to a given Account that is maintained between the times that the Account¢s associated EVM Code runs.
Message  Data (as a set of bytes) and Value (specified as Ether) that is passed between two Accounts, either through the deterministic operation of an Autonomous Object or the cryptographically secure signature of the Transaction.
Message Call  The act of passing a message from one Account to another. If the destination account is associated with non-empty EVM Code, then the VM will be started with the state of said Object and the Message acted upon. If the message sender is an Autonomous Object, then the Call passes any data returned from the VM operation.
Gas  The fundamental network cost unit. Paid for exclusively by Ether (as of PoC-4), which is converted freely to and from Gas as required. Gas does not exist outside of the internal Ethereum computation engine; its price is set by the Transaction and miners are free to ignore Transactions whose Gas price is too low.
Contract  Informal term used to mean both a piece of EVM Code that may be associated with an Account or an Autonomous Object.
Object  Synonym for Autonomous Object.
App  An end-user-visible application hosted in the Ethereum Browser.
Ethereum Browser  (aka Ethereum Reference Client) A cross-platform GUI of an interface similar to a simplified browser (a la Chrome) that is able to host sandboxed applications whose backend is purely on the Ethereum protocol.
Ethereum Virtual Machine  (aka EVM) The virtual machine that forms the key part of the execution model for an Account¢s associated EVM Code.
Ethereum Runtime Environment  (aka ERE) The environment which is provided to an Autonomous Object executing in the EVM. Includes the EVM but also the structure of the world state on which the EVM relies for certain I/O instructions including CALL & CREATE.
EVM Code  The bytecode that the EVM can natively execute. Used to formally specify the meaning and ramifications of a message to an Account.
EVM Assembly  The human-readable form of EVM-code.
LLL  The Lisp-like Low-level Language, a human-writable language used for authoring simple contracts and general low-level language toolkit for trans-compiling to. \enddescription
18 Recursive Length Prefix
This is a serialisation method for encoding arbitrarily structured binary data (byte arrays). We define the set of possible structures \mathbbT:
\mathbbT
º
\mathbbL È \mathbbB
(163)
\mathbbL
º
{ \mathbft: \mathbft = ( \mathbft[0], \mathbft[1], ... )    Ù    "n < \lVert \mathbft \rVert    \mathbft[n] Î \mathbbT }
(164)
\mathbbB
º
{ \mathbfb: \mathbfb = ( \mathbfb[0], \mathbfb[1], ... )    Ù    "n < \lVert \mathbfb \rVert    \mathbfb[n] Î \mathbbO }
(165)
Where \mathbbO is the set of bytes. Thus \mathbbB is the set of all sequences of bytes (otherwise known as byte-arrays, and a leaf if imagined as a tree), \mathbbL is the set of all tree-like (sub-)structures that are not a single leaf (a branch node if imagined as a tree) and \mathbbT is the set of all byte-arrays and such structural sequences. We define the RLP function as \mathtt\tiny RLP through two sub-functions, the first handling the instance when the value is a byte array, the second when it is a sequence of further values:
\mathtt\tiny RLP(\mathbfx) º \begincases Rb(\mathbfx) & \textif     \mathbfx Î \mathbbB
Rl(\mathbfx) & \textotherwise \endcases    (166)
If the value to be serialised is a byte-array, the RLP serialisation takes one of three forms:
If the byte-array contains solely a single byte and that single byte is less than 128, then the input is exactly equal to the output.
If the byte-array contains fewer than 56 bytes, then the output is equal to the input prefixed by the byte equal to the length of the byte array plus 128.
Otherwise, the output is equal to the input prefixed by the minimal-length byte-array which when interpreted as a big-endian integer is equal to the length of the input byte array, which is itself prefixed by the number of bytes required to faithfully encode this length value plus 183.
Formally, we define Rb:
Rb(\mathbfx)
º
\begincases \mathbfx
\textif     \lVert \mathbfx \rVert = 1 Ù \mathbfx[0] < 128
(167)
(128 + \lVert \mathbfx \rVert) · \mathbfx
\textelse if     \lVert \mathbfx \rVert < 56
(168)
\big(183 + \big\lVert \mathtt\tiny BE(\lVert \mathbfx \rVert) \big\rVert \big) · \mathtt\tiny BE(\lVert \mathbfx \rVert) · \mathbfx
\textotherwise \endcases
(169)
\mathtt\tiny BE(x)
º
(b0, b1, ...): b0 ¹ 0 Ù x =	ån = 0n < \lVert \mathbfb \rVert	bn · 256\lVert \mathbfb \rVert - 1 - n
(170)
(a) · (b, c) · (d, e)

(a, b, c, d, e)
(171)
Thus \mathtt\tiny BE is the function that expands a positive integer value to a big-endian byte array of minimal length and the dot operator performs sequence concatenation. If instead, the value to be serialised is a sequence of other items then the RLP serialisation takes one of two forms:
If the concatenated serialisations of each contained item is less than 56 bytes in length, then the output is equal to that concatenation prefixed by the byte equal to the length of this byte array plus 192.
Otherwise, the output is equal to the concatenated serialisations prefixed by the minimal-length byte-array which when interpreted as a big-endian integer is equal to the length of the concatenated serialisations byte array, which is itself prefixed by the number of bytes required to faithfully encode this length value plus 247.
Thus we finish by formally defining Rl:
Rl(\mathbfx)
º
\begincases (192 + \lVert s(\mathbfx) \rVert) · s(\mathbfx)
\textif     \lVert s(\mathbfx) \rVert < 56
(172)
\big(247 + \big\lVert \mathtt\tiny BE(\lVert s(\mathbfx) \rVert) \big\rVert \big) · \mathtt\tiny BE(\lVert s(\mathbfx) \rVert) · s(\mathbfx)
\textotherwise \endcases
(173)
s(\mathbfx)
º
\mathtt\tiny RLP(\mathbfx0) · \mathtt\tiny RLP(\mathbfx1) ...
(174)
If RLP is used to encode a scalar, defined only as a positive integer (\mathbbP or any x for \mathbbPx), it must be specified as the shortest byte array such that the big-endian interpretation of it is equal. Thus the RLP of some positive integer i is defined as:
\mathtt\tiny RLP(i : i Î \mathbbP) º \mathtt\tiny RLP(\mathtt\tiny BE(i))    (175)
When interpreting RLP data, if an expected fragment is decoded as a scalar and leading zeroes are found in the byte sequence, clients are required to consider it non-canonical and treat it in the same manner as otherwise invalid RLP data, dismissing it completely. There is no specific canonical encoding format for signed or floating-point values.
19 Hex-Prefix Encoding
Hex-prefix encoding is an efficient method of encoding an arbitrary number of nibbles as a byte array. It is able to store an additional flag which, when used in the context of the trie (the only context in which it is used), disambiguates between node types. It is defined as the function \mathtt\tiny HP which maps from a sequence of nibbles (represented by the set \mathbbY) together with a boolean value to a sequence of bytes (represented by the set \mathbbB):
\mathtt\tiny HP(\mathbfx, t): \mathbfx Î \mathbbY
º
\begincases (16f(t), 16\mathbfx[0] + \mathbfx[1], 16\mathbfx[2] + \mathbfx[3], ...)
\textif     \lVert \mathbfx \rVert    \textis even
(176)
(16(f(t) + 1) + \mathbfx[0], 16\mathbfx[1] + \mathbfx[2], 16\mathbfx[3] + \mathbfx[4], ...)
\textotherwise \endcases
(177)
f(t)
º
\begincases 2
\textif     t ¹ 0
(178)
0
\textotherwise \endcases
(179)
Thus the high nibble of the first byte contains two flags; the lowest bit encoding the oddness of the length and the second-lowest encoding the flag t. The low nibble of the first byte is zero in the case of an even number of nibbles and the first nibble in the case of an odd number. All remaining nibbles (now an even number) fit properly into the remaining bytes.
20 Modified Merkle Patricia Tree
The modified Merkle Patricia tree (trie) provides a persistent data structure to map between arbitrary-length binary data (byte arrays). It is defined in terms of a mutable data structure to map between 256-bit binary fragments and arbitrary-length binary data, typically implemented as a database. The core of the trie, and its sole requirement in terms of the protocol specification is to provide a single value that identifies a given set of key-value pairs, which may be either a 32 byte sequence or the empty byte sequence. It is left as an implementation consideration to store and maintain the structure of the trie in a manner that allows effective and efficient realisation of the protocol. Formally, we assume the input value \mathfrakI, a set containing pairs of byte sequences:
\mathfrakI = { (\mathbfk0 Î \mathbbB, \mathbfv0 Î \mathbbB), (\mathbfk1 Î \mathbbB, \mathbfv1 Î \mathbbB), ... }    (180)
When considering such a sequence, we use the common numeric subscript notation to refer to a tuple¢s key or value, thus:
"I Î \mathfrakI I º (I0, I1)    (181)
Any series of bytes may also trivially be viewed as a series of nibbles, given an endian-specific notation; here we assume big-endian. Thus:
y(\mathfrakI)

{ (\mathbfk0¢ Î \mathbbY, \mathbfv0 Î \mathbbB), (\mathbfk1¢ Î \mathbbY, \mathbfv1 Î \mathbbB), ... }
(182)
"n     "i: i < 2\lVert\mathbfkn\rVert     \mathbfkn¢[i]
º
\begincasesë \mathbfkn[i ¸ 2] ¸ 16 û
\textif    i    \textis even
(183)
\mathbfkn[ë i ¸ 2 û] \bmod 16
\textotherwise \endcases
(184)
We define the function \texttt TRIE, which evaluates to the root of the trie that represents this set when encoded in this structure:
\texttt TRIE(\mathfrakI) º \texttt KEC(c(\mathfrakI, 0))    (185)
We also assume a function n, the trie¢s node cap function. When composing a node, we use RLP to encode the structure. As a means of reducing storage complexity, for nodes whose composed RLP is fewer than 32 bytes, we store the RLP directly; for those larger we assert prescience of the byte array whose Keccak hash evaluates to our reference. Thus we define in terms of c, the node composition function:
n(\mathfrakI, i) º \begincases () & \textif     \mathfrakI = \varnothing
c(\mathfrakI, i) & \textif     \lVert c(\mathfrakI, i)\rVert < 32
\texttt KEC(c(\mathfrakI, i)) & \textotherwise \endcases    (186)
In a manner similar to a radix tree, when the trie is traversed from root to leaf, one may build a single key-value pair. The key is accumulated through the traversal, acquiring a single nibble from each branch node (just as with a radix tree). Unlike a radix tree, in the case of multiple keys sharing the same prefix or in the case of a single key having a unique suffix, two optimising nodes are provided. Thus while traversing, one may potentially acquire multiple nibbles from each of the other two node types, extension and leaf. There are three kinds of nodes in the trie: \begindescription
Leaf  A two-item structure whose first item corresponds to the nibbles in the key not already accounted for by the accumulation of keys and branches traversed from the root. The hex-prefix encoding method is used and the second parameter to the function is required to be true.
Extension  A two-item structure whose first item corresponds to a series of nibbles of size greater than one that are shared by at least two distinct keys past the accumulation of nibbles keys and branches as traversed from the root. The hex-prefix encoding method is used and the second parameter to the function is required to be false.
Branch  A 17-item structure whose first sixteen items correspond to each of the sixteen possible nibble values for the keys at this point in their traversal. The 17th item is used in the case of this being a terminator node and thus a key being ended at this point in its traversal. \enddescription A branch is then only used when necessary; no branch nodes may exist that contain only a single non-zero entry. We may formally define this structure with the structural composition function c:
c(\mathfrakI, i) º \begincases \texttt RLP\Big( \big(\texttt HP(I0[i .. (\lVert I0\rVert - 1)], true), I1 \big) \Big) & \textif     \lVert \mathfrakI \rVert = 1     \textwhere    $ I: I Î \mathfrakI
\texttt RLP\Big( \big(\texttt HP(I0[i .. (j - 1)], false), n(\mathfrakI, j) \big) \Big) & \textif     i \ne j     \textwhere    j = \arg \maxx : $ \mathbfl: \lVert \mathbfl \rVert = x : "I Î \mathfrakI: I0[0 .. (x - 1)] = \mathbfl
\texttt RLP\Big( (u(0), u(1), ..., u(15), v) \Big) & \textotherwise     \textwhere
u(j)
º
n({ I : I Î \mathfrakI Ù I0[i] = j }, i + 1)
v

\begincases I1
\textif     $ I: I Î \mathfrakI Ù \lVert I0 \rVert = i
()
\textotherwise \endcases
\endcases    (187)
20.1 Trie Database
Thus no explicit assumptions are made concerning what data is stored and what is not, since that is an implementation-specific consideration; we simply define the identity function mapping the key-value set \mathfrakI to a 32-byte hash and assert that only a single such hash exists for any \mathfrakI, which though not strictly true is accurate within acceptable precision given the Keccak hash¢s collision resistance. In reality, a sensible implementation will not fully recompute the trie root hash for each set. A reasonable implementation will maintain a database of nodes determined from the computation of various tries or, more formally, it will memoise the function c. This strategy uses the nature of the trie to both easily recall the contents of any previous key-value set and to store multiple such sets in a very efficient manner. Due to the dependency relationship, Merkle-proofs may be constructed with an O(\log N) space requirement that can demonstrate a particular leaf must exist within a trie of a given root hash.
21 Precompiled Contracts
For each precompiled contract, we make use of a template function, X\mathttPRE, which implements the out-of-gas checking.
X\mathttPRE(\boldsymbols, g, I) º \begincases (\varnothing, 0, A0, ()) & \textif     g < gr
(\boldsymbols, g - gr, A0, \mathbfo) & \textotherwise\endcases    (188)
The precompiled contracts each use these definitions and provide specifications for the \mathbfo (the output data) and gr, the gas requirements. For the elliptic curve DSA recover VM execution function, we also define \mathbfd to be the input data, well-defined for an infinite length by appending zeroes as required. Importantly in the case of an invalid signature (\mathtt\tiny ECDSARECOVER(h, v, r, s) = \varnothing), then we have no output.
X\mathttECREC
º
X\mathttPRE     \textwhere:
(189)
gr

3000
(190)
|\mathbfo|

\begincases 0
\textif     \mathtt\tiny ECDSARECOVER(h, v, r, s) = \varnothing
(191)
32
\textotherwise \endcases
(192)
\textif     |\mathbfo| = 32:
(193)
\mathbfo[0..11]

0
(194)
\mathbfo[12..31]

\mathtt\tiny KEC\big(\mathtt\tiny ECDSARECOVER(h, v, r, s)\big)[12..31]     \textwhere:
(195)
\mathbfd[0..(|I\mathbfd|-1)]

I\mathbfd
(196)
\mathbfd[|I\mathbfd|..]

(0, 0, ...)
(197)
h

\mathbfd[0..31]
(198)
v

\mathbfd[32..63]
(199)
r

\mathbfd[64..95]
(200)
s

\mathbfd[96..127]
(201)
The two hash functions, RIPEMD-160 and SHA2-256 are more trivially defined as an almost pass-through operation. Their gas usage is dependent on the input data size, a factor rounded up to the nearest number of words.
X\mathttSHA256
º
X\mathttPRE     \textwhere:
(202)
gr

60 + 12\Bigé \dfrac|I\mathbfd|32 \Bigù
(203)
\mathbfo[0..31]

\mathtt\tiny SHA256(I\mathbfd)
(204)
X\mathttRIP160
º
X\mathttPRE     \textwhere:
(205)
gr

600 + 120\Bigé \dfrac|I\mathbfd|32 \Bigù
(206)
\mathbfo[0..11]

0
(207)
\mathbfo[12..31]

\mathtt\tiny RIPEMD160(I\mathbfd)
(208)
(209)
For the purposes here, we assume we have well-defined standard cryptographic functions for RIPEMD-160 and SHA2-256 of the form:
\mathtt SHA256(\mathbfi Î \mathbbB)
º
o Î \mathbbB32
(210)
\mathtt RIPEMD160(\mathbfi Î \mathbbB)
º
o Î \mathbbB20
(211)
Finally, the fourth contract, the identity function X\mathttID simply defines the output as the input:
X\mathttID
º
X\mathttPRE     \textwhere:
(212)
gr

15 + 3\Bigé \dfrac|I\mathbfd|32 \Bigù
(213)
\mathbfo

I\mathbfd
(214)
22 Signing Transactions
The method of signing transactions is similar to the Electrum style signatures¢; it utilises the SECP-256k1 curve as described by . It is assumed that the sender has a valid private key pr, which is a randomly selected positive integer (represented as a byte array of length 32 in big-endian form) in the range \hbox[1, \mathtt\tiny secp256k1n - 1]. We assert the functions \mathtt ECDSASIGN, \mathtt ECDSARECOVER and \mathtt ECDSAPUBKEY. These are formally defined in the literature. \mathtt ECDSAPUBKEY(pr Î \mathbbB32) º pu Î \mathbbB64 (215) \mathtt ECDSASIGN(e Î \mathbbB32, pr Î \mathbbB32) º (v Î \mathbbB1, r Î \mathbbB32, s Î \mathbbB32) (216) \mathtt ECDSARECOVER(e Î \mathbbB32, v Î \mathbbB1, r Î \mathbbB32, s Î \mathbbB32) º pu Î \mathbbB64 (217) Where pu is the public key, assumed to be a byte array of size 64 (formed from the concatenation of two positive integers each < 2256) and pr is the private key, a byte array of size 32 (or a single positive integer in the aforementioned range). It is assumed that v is the recovery id¢, a 1 byte value specifying the sign and finiteness of the curve point; this value is in the range of [27, 30], however we declare the upper two possibilities, representing infinite values, invalid. \newcommand\slimit\ensuremath\texts-limit We declare that a signature is invalid unless all the following conditions are true: \beginalign 0 < r &< \mathtt\tiny secp256k1n
0 < s &< \mathtt\tiny secp256k1n ¸ 2 + 1
v & Î {27,28} \endalign where: \beginalign \mathtt\tiny secp256k1n &= 115792089237316195423570985008687907852837564279074904382605163141518161494337 \endalign For a given private key, pr, the Ethereum address A(pr) (a 160-bit value) to which it corresponds is defined as the right most 160-bits of the Keccak hash of the corresponding ECDSA public key:
A(pr) = \mathcalB96..255\big(\mathtt\tiny KEC\big( \mathtt ECDSAPUBKEY(pr) \big) \big)    (218)
The message hash, h(T), to be signed is the Keccak hash of the transaction without the latter three signature components, formally described as Tr, Ts and Tw:
LS(T)
º
\begincases (Tn, Tp, Tg, Tt, Tv, T\mathbfi)
\textif    Tt = 0
(219)
(Tn, Tp, Tg, Tt, Tv, T\mathbfd)
\textotherwise \endcases
(220)
h(T)
º
\mathtt KEC( LS(T) )
(221)
The signed transaction G(T, pr) is defined as:
G(T, pr) º T     \textexcept:
(222)
(Tw, Tr, Ts) = \mathtt ECDSASIGN(h(T), pr)
(223)
We may then define the sender function S of the transaction as:
S(T) º \mathcalB96..255\big(\mathtt\tiny KEC\big( \mathtt ECDSARECOVER(h(T), Tw, Tr, Ts) \big) \big)    (224)
The assertion that the sender of a signed transaction equals the address of the signer should be self-evident:
" T: " pr: S(G(T, pr)) º A(pr)    (225)
23 Fee Schedule
The fee schedule G is a tuple of 31 scalar values corresponding to the relative costs, in gas, of a number of abstract operations that a transaction may effect. \begintabular*\columnwidth[h]lrl Trule Name & Value & Description*
| rule Gzero & 0 & Nothing paid for operations of the set Wzero.
Gbase & 2 & Amount of gas to pay for operations of the set Wbase.
Gverylow & 3 & Amount of gas to pay for operations of the set Wverylow.
Glow & 5 & Amount of gas to pay for operations of the set Wlow.
Gmid & 8 & Amount of gas to pay for operations of the set Wmid.
Ghigh & 10 & Amount of gas to pay for operations of the set Whigh.
Gextcode & 700 & Amount of gas to pay for operations of the set Wextcode.
Gbalance & 400 & Amount of gas to pay for a BALANCE operation.
Gsload & 200 & Paid for a SLOAD operation.
Gjumpdest & 1 & Paid for a JUMPDEST operation.
Gsset & 20000 & Paid for an SSTORE operation when the storage value is set to non-zero from zero.
Gsreset & 5000 & Paid for an SSTORE operation when the storage value¢s zeroness remains unchanged or is set to zero.
Rsclear & 15000 & Refund given (added into refund counter) when the storage value is set to zero from non-zero.
Rselfdestruct & 24000 & Refund given (added into refund counter) for self-destructing an account.
Gselfdestruct & 5000 & Amount of gas to pay for a SELFDESTRUCT operation.
Gcreate & 32000 & Paid for a CREATE operation.
Gcodedeposit & 200 & Paid per byte for a CREATE operation to succeed in placing code into state.
Gcall & 700 & Paid for a CALL operation.
Gcallvalue & 9000 & Paid for a non-zero value transfer as part of the CALL operation.
Gcallstipend & 2300 & A stipend for the called contract subtracted from Gcallvalue for a non-zero value transfer.
Gnewaccount & 25000 & Paid for a CALL or SELFDESTRUCT operation which creates an account.
Gexp & 10 & Partial payment for an EXP operation.
Gexpbyte & 50 & Partial payment when multiplied by é\log256(exponent)ù for the EXP operation.
Gmemory & 3 & Paid for every additional word when expanding memory.
G\texttxcreate & 32000 & Paid by all contract-creating transactions after the Homestead transition.
Gtxdatazero & 4 & Paid for every zero byte of data or code for a transaction.
Gtxdatanonzero & 68 & Paid for every non-zero byte of data or code for a transaction.
Gtransaction & 21000 & Paid for every transaction.
Glog & 375 & Partial payment for a LOG operation.
Glogdata & 8 & Paid for each byte in a LOG operation¢s data.
Glogtopic & 375 & Paid for each topic of a LOG operation.
Gsha3 & 30 & Paid for each SHA3 operation.
Gsha3word & 6 & Paid for each word (rounded up) for input data to a SHA3 operation.
Gcopy & 3 & Partial payment for COPY operations, multiplied by words copied, rounded up.
Gblockhash & 20 & Payment for BLOCKHASH operation.
\bottomrule \endtabular
24 Virtual Machine Specification
When interpreting 256-bit binary values as integers, the representation is big-endian. When a 256-bit machine datum is converted to and from a 160-bit address or hash, the rightwards (low-order for BE) 20 bytes are used and the left most 12 are discarded or filled with zeroes, thus the integer values (when the bytes are interpreted as big-endian) are equivalent.
24.1 Gas Cost
The general gas cost function, C, is defined as:
C(\boldsymbols, \boldsymbolm, I) º Cmem(\boldsymbolm¢i)-Cmem(\boldsymbolmi) + \begincases C\text\tiny SSTORE(\boldsymbols, \boldsymbolm) & \textif     w = \text SSTORE
Gexp & \textif     w = \text EXP Ù \boldsymbolm\mathbfs[1] = 0
Gexp + Gexpbyte×(1+ë\log256(\boldsymbolm\mathbfs[1])û) & \textif     w = \text EXP Ù \boldsymbolm\mathbfs[1] > 0
Gverylow + Gcopy×é\boldsymbolm\mathbfs[2] ¸ 32ù & \textif     w = \text CALLDATACOPY \lor \text CODECOPY
Gextcode + Gcopy×é\boldsymbolm\mathbfs[3] ¸ 32ù & \textif     w = \text EXTCODECOPY
Glog+Glogdata×\boldsymbolm\mathbfs[1] & \textif     w = \text LOG0
Glog+Glogdata×\boldsymbolm\mathbfs[1]+Glogtopic & \textif     w = \text LOG1
Glog+Glogdata×\boldsymbolm\mathbfs[1]+2Glogtopic & \textif     w = \text LOG2
Glog+Glogdata×\boldsymbolm\mathbfs[1]+3Glogtopic & \textif     w = \text LOG3
Glog+Glogdata×\boldsymbolm\mathbfs[1]+4Glogtopic & \textif     w = \text LOG4
C\text\tiny CALL(\boldsymbols, \boldsymbolm) & \textif     w = \text CALL \lor \text CALLCODE \lor \text DELEGATECALL
C\text\tiny SELFDESTRUCT(\boldsymbols, \boldsymbolm) & \textif     w = \text SELFDESTRUCT
Gcreate & \textif     w = \text CREATE
Gsha3+Gsha3word é \mathbfs[1] ¸ 32 ù & \textif     w = \text SHA3
Gjumpdest & \textif     w = \text JUMPDEST
Gsload & \textif     w = \text SLOAD
Gzero & \textif     w Î Wzero
Gbase & \textif     w Î Wbase
Gverylow & \textif     w Î Wverylow
Glow & \textif     w Î Wlow
Gmid & \textif     w Î Wmid
Ghigh & \textif     w Î Whigh
Gextcode & \textif     w Î Wextcode
Gbalance & \textif     w = \text BALANCE
Gblockhash & \textif     w = \text BLOCKHASH
\endcases    (226)
w º \begincases I\mathbfb[\boldsymbolmpc] & \textif     \boldsymbolmpc < \lVert I\mathbfb \rVert
\text STOP & \textotherwise \endcases    (227)
where:
Cmem(a) º Gmemory · a + \Bigë \dfraca2512 \Bigû    (228)
with C\text\tiny CALL, C\text\tiny SELFDESTRUCT and C\text\tiny SSTORE as specified in the appropriate section below. We define the following subsets of instructions: Wzero = { STOP, RETURN} Wbase = { ADDRESS, ORIGIN, CALLER, CALLVALUE, CALLDATASIZE, CODESIZE, GASPRICE, COINBASE,\newline
\hspace*1cm TIMESTAMP, NUMBER, DIFFICULTY, GASLIMIT, POP, PC, MSIZE, GAS} Wverylow = { ADD, SUB, NOT, LT, GT, SLT, SGT, EQ, ISZERO, AND, OR, XOR, BYTE, CALLDATALOAD, \newline
\hspace1cm MLOAD, MSTORE, MSTORE8, PUSH, DUP*, SWAP*} Wlow = { MUL, DIV, SDIV, MOD, SMOD, SIGNEXTEND} Wmid = { ADDMOD, MULMOD, JUMP} Whigh = { JUMPI} Wextcode = { EXTCODESIZE} Note the memory cost component, given as the product of Gmemory and the maximum of 0 & the ceiling of the number of words in size that the memory must be over the current number of words, \boldsymbolmi in order that all accesses reference valid memory whether for read or write. Such accesses must be for non-zero number of bytes. Referencing a zero length range (e.g. by attempting to pass it as the input range to a CALL) does not require memory to be extended to the beginning of the range. \boldsymbolm¢i is defined as this new maximum number of words of active memory; special-cases are given where these two are not equal. Note also that Cmem is the memory cost function (the expansion function being the difference between the cost before and after). It is a polynomial, with the higher-order coefficient divided and floored, and thus linear up to 724B of memory used, after which it costs substantially more. While defining the instruction set, we defined the memory-expansion for range function, M, thus:
M(s, f, l) º \begincases s & \textif     l = 0
\max(s, \ceil (f + l) ¸ 32 ) & \textotherwise \endcases    (229)
Another useful function is ``all but one 64th¢¢ function L defined as:
L(n) º n - ë n / 64 û    (230)
24.2 Instruction Set
As previously specified in section (113), these definitions take place in the final context there. In particular we assume O is the EVM state-progression function and define the terms pertaining to the next cycle¢s state (\boldsymbols¢, \boldsymbolm¢) such that:
O(\boldsymbols, \boldsymbolm, A, I) º (\boldsymbols¢, \boldsymbolm¢, A¢, I)     \textwith exceptions, as noted    (231)
Here given are the various exceptions to the state transition rules given in section (113)specified for each instruction, together with the additional instruction-specific definitions of J and C. For each instruction, also specified is a, the additional items placed on the stack and d, the items removed from stack, as defined in section (113). \begintabular*\columnwidth[h]rlrrl Trulemlticolumn5c\textbf0s: Stop and Arithmetic Operations
mlticolumn5lAll arithmetic is modulo 2256unless otherwise noted. The zero-th power of zero 00 is defined to be one. \vspace5pt
\textbfValue & \textbfMnemonic & d & a & \textbfDescription \vspace5pt
0x00 & STOP & 0 & 0 & Halts execution.
| rule 0x01 & ADD & 2 & 1 & Addition operation.
&&&& \boldsymbolm¢\mathbfs[0] º \boldsymbolm\mathbfs[0] + \boldsymbolm\mathbfs[1]
| rule 0x02 & MUL & 2 & 1 & Multiplication operation.
&&&& \boldsymbolm¢\mathbfs[0] º \boldsymbolm\mathbfs[0] × \boldsymbolm\mathbfs[1]
| rule 0x03 & SUB & 2 & 1 & Subtraction operation.
&&&& \boldsymbolm¢\mathbfs[0] º \boldsymbolm\mathbfs[0] - \boldsymbolm\mathbfs[1]
| rule 0x04 & DIV & 2 & 1 & Integer division operation.
&&&& \boldsymbolm¢\mathbfs[0] º \begincases0 & \textif     \boldsymbolm\mathbfs[1] = 0
ë\boldsymbolm\mathbfs[0] ¸ \boldsymbolm\mathbfs[1]û & \textotherwise\endcases
| rule 0x05 & SDIV & 2 & 1 & Signed integer division operation (truncated).
&&&& \boldsymbolm¢\mathbfs[0] º \begincases0 & \textif     \boldsymbolm\mathbfs[1] = 0
-2255& \textif     \boldsymbolm\mathbfs[0] = -2255Ù     \boldsymbolm\mathbfs[1] = -1
\mathbfsgn (\boldsymbolm\mathbfs[0] ¸ \boldsymbolm\mathbfs[1]) ë |\boldsymbolm\mathbfs[0] ¸ \boldsymbolm\mathbfs[1]| û & \textotherwise\endcases
&&&& Where all values are treated as two¢s complement signed 256-bit integers.
&&&& Note the overflow semantic when -2255is negated.
| rule 0x06 & MOD & 2 & 1 & Modulo remainder operation.
&&&& \boldsymbolm¢\mathbfs[0] º \begincases0 & \textif     \boldsymbolm\mathbfs[1] = 0
\boldsymbolm\mathbfs[0] \bmod \boldsymbolm\mathbfs[1] & \textotherwise\endcases
| rule 0x07 & SMOD & 2 & 1 & Signed modulo remainder operation.
&&&& \boldsymbolm¢\mathbfs[0] º \begincases0 & \textif     \boldsymbolm\mathbfs[1] = 0
\mathbfsgn (\boldsymbolm\mathbfs[0]) (|\boldsymbolm\mathbfs[0]| \bmod |\boldsymbolm\mathbfs[1]|) & \textotherwise\endcases
&&&& Where all values are treated as two¢s complement signed 256-bit integers.
| rule 0x08 & ADDMOD & 3 & 1 & Modulo addition operation.
&&&& \boldsymbolm¢\mathbfs[0] º \begincases0 & \textif     \boldsymbolm\mathbfs[2] = 0
(\boldsymbolm\mathbfs[0] + \boldsymbolm\mathbfs[1]) \mod \boldsymbolm\mathbfs[2] & \textotherwise\endcases
&&&& All intermediate calculations of this operation are not subject to the 2256modulo.
| rule 0x09 & MULMOD & 3 & 1 & Modulo multiplication operation.
&&&& \boldsymbolm¢\mathbfs[0] º \begincases0 & \textif     \boldsymbolm\mathbfs[2] = 0
(\boldsymbolm\mathbfs[0] × \boldsymbolm\mathbfs[1]) \mod \boldsymbolm\mathbfs[2] & \textotherwise\endcases
&&&& All intermediate calculations of this operation are not subject to the 2256modulo.
| rule 0x0a & EXP & 2 & 1 & Exponential operation.
&&&& \boldsymbolm¢\mathbfs[0] º \boldsymbolm\mathbfs[0] \boldsymbolm\mathbfs[1]
| rule 0x0b & SIGNEXTEND & 2 & 1 & Extend length of two¢s complement signed integer.
&&&& " i Î [0..255]: \boldsymbolm¢\mathbfs[0]i º \begincases \boldsymbolm\mathbfs[1]t &\textif     i £ slant t     \textwhere    t = 256 - 8(\boldsymbolm\mathbfs[0] + 1)
\boldsymbolm\mathbfs[1]i &\textotherwise \endcases
mlticolumn5l\boldsymbolm\mathbfs[x]i gives the ith bit (counting from zero) of \boldsymbolm\mathbfs[x] \vspace5pt
\endtabular* \begintabular*\columnwidth[h]rlrrl Trulemlticolumn5c\textbf10s: Comparison & Bitwise Logic Operations
\textbfValue & \textbfMnemonic & d & a & \textbfDescription \vspace5pt
0x10 & LT & 2 & 1 & Less-than comparison.
&&&& \boldsymbolm¢\mathbfs[0] º \begincases 1 & \textif     \boldsymbolm\mathbfs[0] < \boldsymbolm\mathbfs[1]
0 & \textotherwise \endcases
| rule 0x11 & GT & 2 & 1 & Greater-than comparison.
&&&& \boldsymbolm¢\mathbfs[0] º \begincases 1 & \textif     \boldsymbolm\mathbfs[0] > \boldsymbolm\mathbfs[1]
0 & \textotherwise \endcases
| rule 0x12 & SLT & 2 & 1 & Signed less-than comparison.
&&&& \boldsymbolm¢\mathbfs[0] º \begincases 1 & \textif     \boldsymbolm\mathbfs[0] < \boldsymbolm\mathbfs[1]
0 & \textotherwise \endcases
&&&& Where all values are treated as two¢s complement signed 256-bit integers.
| rule 0x13 & SGT & 2 & 1 & Signed greater-than comparison.
&&&& \boldsymbolm¢\mathbfs[0] º \begincases 1 & \textif     \boldsymbolm\mathbfs[0] > \boldsymbolm\mathbfs[1]
0 & \textotherwise \endcases
&&&& Where all values are treated as two¢s complement signed 256-bit integers.
| rule 0x14 & EQ & 2 & 1 & Equality comparison.
&&&& \boldsymbolm¢\mathbfs[0] º \begincases 1 & \textif     \boldsymbolm\mathbfs[0] = \boldsymbolm\mathbfs[1]
0 & \textotherwise \endcases
| rule 0x15 & ISZERO & 1 & 1 & Simple not operator.
&&&& \boldsymbolm¢\mathbfs[0] º \begincases 1 & \textif     \boldsymbolm\mathbfs[0] = 0
0 & \textotherwise \endcases
| rule 0x16 & AND & 2 & 1 & Bitwise AND operation.
&&&& " i Î [0..255]: \boldsymbolm¢\mathbfs[0]i º \boldsymbolm\mathbfs[0]i Ù \boldsymbolm\mathbfs[1]i
| rule 0x17 & OR & 2 & 1 & Bitwise OR operation.
&&&& " i Î [0..255]: \boldsymbolm¢\mathbfs[0]i º \boldsymbolm\mathbfs[0]i Ú \boldsymbolm\mathbfs[1]i
| rule 0x18 & XOR & 2 & 1 & Bitwise XOR operation.
&&&& " i Î [0..255]: \boldsymbolm¢\mathbfs[0]i º \boldsymbolm\mathbfs[0]i Å \boldsymbolm\mathbfs[1]i
| rule 0x19 & NOT & 1 & 1 & Bitwise NOT operation.
&&&& " i Î [0..255]: \boldsymbolm¢\mathbfs[0]i º \begincases 1 & \textif     \boldsymbolm\mathbfs[0]i = 0
0 & \textotherwise \endcases
| rule 0x1a & BYTE & 2 & 1 & Retrieve single byte from word.
&&&& " i Î [0..255]: \boldsymbolm¢\mathbfs[0]i º \begincases \boldsymbolm\mathbfs[1](i + 8\boldsymbolm\mathbfs[0]) & \textif     i < 8 Ù \boldsymbolm\mathbfs[0] < 32
0 & \textotherwise \endcases
&&&& For Nth byte, we count from the left (i.e. N=0 would be the most significant in big endian).
\bottomrule \endtabular* \begintabular*\columnwidth[h]rlrrl Trulemlticolumn5c\textbf20s: SHA3 \vspace5pt
\textbfValue & \textbfMnemonic & d & a & \textbfDescription \vspace5pt
0x20 & SHA3 & 2 & 1 & Compute Keccak-256 hash.
&&&& \boldsymbolm¢\mathbfs[0] º \mathtt\tiny Keccak(\boldsymbolm\mathbfm[ \boldsymbolm\mathbfs[0] \dots (\boldsymbolm\mathbfs[0] + \boldsymbolm\mathbfs[1] - 1) ])
&&&& \boldsymbolm¢i º M(\boldsymbolmi, \boldsymbolm\mathbfs[0], \boldsymbolm\mathbfs[1])
\bottomrule \endtabular* \begintabular*\columnwidth[h]rlrrl Trulemlticolumn5c\textbf30s: Environmental Information \vspace5pt
\textbfValue & \textbfMnemonic & d & a & \textbfDescription \vspace5pt
0x30 & ADDRESS & 0 & 1 & Get address of currently executing account.
&&&& \boldsymbolm¢\mathbfs[0] º Ia
| rule 0x31 & BALANCE & 1 & 1 & Get balance of the given account.
&&&& \boldsymbolm¢\mathbfs[0] º \begincases\boldsymbols[\boldsymbolm\mathbfs[0]]b& \textif     \boldsymbols[\boldsymbolm\mathbfs[0] \mod 2160] ¹ \varnothing
0&\textotherwise\endcases
| rule 0x32 & ORIGIN & 0 & 1 & Get execution origination address.
&&&& \boldsymbolm¢\mathbfs[0] º Io
&&&& This is the sender of original transaction; it is never an account with non-empty
&&&& associated code.
| rule 0x33 & CALLER & 0 & 1 & Get caller address.
&&&& \boldsymbolm¢\mathbfs[0] º Is
&&&& This is the address of the account that is directly responsible for this execution.
| rule 0x34 & CALLVALUE & 0 & 1 & Get deposited value by the instruction/transaction responsible for this execution.
&&&& \boldsymbolm¢\mathbfs[0] º Iv
| rule 0x35 & CALLDATALOAD & 1 & 1 & Get input data of current environment.
&&&& \boldsymbolm¢\mathbfs[0] º I\mathbfd[ \boldsymbolm\mathbfs[0] \dots (\boldsymbolm\mathbfs[0] + 31) ]     \textwith     I\mathbfd[x] = 0     \textif     x ³ slant \lVert I\mathbfd \rVert
&&&& This pertains to the input data passed with the message call instruction or transaction.
| rule 0x36 & CALLDATASIZE & 0 & 1 & Get size of input data in current environment.
&&&& \boldsymbolm¢\mathbfs[0] º \lVert I\mathbfd \rVert
&&&& This pertains to the input data passed with the message call instruction or transaction.
| rule 0x37 & CALLDATACOPY & 3 & 0 & Copy input data in current environment to memory.
&&&& "i Î { 0 \dots \boldsymbolm\mathbfs[2] - 1} \boldsymbolm¢\mathbfm[\boldsymbolm\mathbfs[0] + i ] º \begincases I\mathbfd[\boldsymbolm\mathbfs[1] + i] & \textif     \boldsymbolm\mathbfs[1] + i < \lVert I\mathbfd \rVert
0 & \textotherwise \endcases
&&&& The additions in \boldsymbolm\mathbfs[1] + i are not subject to the 2256modulo.
&&&& \boldsymbolm¢i º M(\boldsymbolmi, \boldsymbolm\mathbfs[0], \boldsymbolm\mathbfs[2])
&&&& This pertains to the input data passed with the message call instruction or transaction.
| rule 0x38 & CODESIZE & 0 & 1 & Get size of code running in current environment.
&&&& \boldsymbolm¢\mathbfs[0] º \lVert I\mathbfb \rVert
| rule 0x39 & CODECOPY & 3 & 0 & Copy code running in current environment to memory.
&&&& "i Î { 0 \dots \boldsymbolm\mathbfs[2] - 1} \boldsymbolm¢\mathbfm[\boldsymbolm\mathbfs[0] + i ] º \begincases I\mathbfb[\boldsymbolm\mathbfs[1] + i] & \textif     \boldsymbolm\mathbfs[1] + i < \lVert I\mathbfb \rVert
\text STOP & \textotherwise \endcases
&&&& \boldsymbolm¢i º M(\boldsymbolmi, \boldsymbolm\mathbfs[0], \boldsymbolm\mathbfs[2])
&&&& The additions in \boldsymbolm\mathbfs[1] + i are not subject to the 2256modulo.
| rule 0x3a & GASPRICE & 0 & 1 & Get price of gas in current environment.
&&&& \boldsymbolm¢\mathbfs[0] º Ip
&&&& This is gas price specified by the originating transaction.
| rule 0x3b & EXTCODESIZE & 1 & 1 & Get size of an account¢s code.
&&&& \boldsymbolm¢\mathbfs[0] º \lVert \boldsymbols[\boldsymbolm\mathbfs[0] \mod 2160]c \rVert
| rule 0x3c & EXTCODECOPY & 4 & 0 & Copy an account¢s code to memory.
&&&& "i Î { 0 \dots \boldsymbolm\mathbfs[3] - 1} \boldsymbolm¢\mathbfm[\boldsymbolm\mathbfs[1] + i ] º \begincases \mathbfc[\boldsymbolm\mathbfs[2] + i] & \textif     \boldsymbolm\mathbfs[2] + i < \lVert \mathbfc \rVert
\text STOP & \textotherwise \endcases
&&&& where \mathbfc º \boldsymbols[\boldsymbolm\mathbfs[0] \mod 2160]c
&&&& \boldsymbolm¢i º M(\boldsymbolmi, \boldsymbolm\mathbfs[1], \boldsymbolm\mathbfs[3])
&&&& The additions in \boldsymbolm\mathbfs[2] + i are not subject to the 2256modulo.
\bottomrule \endtabular* \begintabular*\columnwidth[h]rlrrl Trulemlticolumn5c\textbf40s: Block Information \vspace5pt
\textbfValue & \textbfMnemonic & d & a & \textbfDescription \vspace5pt
0x40 & BLOCKHASH & 1 & 1 & Get the hash of one of the 256 most recent complete blocks.
&&&& \boldsymbolm¢\mathbfs[0] º P(IHp, \boldsymbolm\mathbfs[0], 0)
&&&& where P is the hash of a block of a particular number, up to a maximum age.
&&&& 0 is left on the stack if the looked for block number is greater than the current block number
&&&& or more than 256 blocks behind the current block.
&&&& P(h, n, a) º \begincases 0 & \textif     n > Hi Ú a = 256 Ú h = 0
h & \textif     n = Hi
P(Hp, n, a + 1) & \textotherwise \endcases
&&&& and we assert the header H can be determined as its hash is the parent hash
&&&& in the block following it.
| rule 0x41 & COINBASE & 0 & 1 & Get the block¢s beneficiary address.
&&&& \boldsymbolm¢\mathbfs[0] º IHc
| rule 0x42 & TIMESTAMP & 0 & 1 & Get the block¢s timestamp.
&&&& \boldsymbolm¢\mathbfs[0] º IHs
| rule 0x43 & NUMBER & 0 & 1 & Get the block¢s number.
&&&& \boldsymbolm¢\mathbfs[0] º IHi
| rule 0x44 & DIFFICULTY & 0 & 1 & Get the block¢s difficulty.
&&&& \boldsymbolm¢\mathbfs[0] º IHd
| rule 0x45 & GASLIMIT & 0 & 1 & Get the block¢s gas limit.
&&&& \boldsymbolm¢\mathbfs[0] º IHl
\bottomrule \endtabular* \begintabular*\columnwidth[h]rlrrl Trulemlticolumn5c\textbf50s: Stack, Memory, Storage and Flow Operations \vspace5pt
\textbfValue & \textbfMnemonic & d & a & \textbfDescription \vspace5pt
0x50 & POP & 1 & 0 & Remove item from stack.
| rule 0x51 & MLOAD & 1 & 1 & Load word from memory.
&&&& \boldsymbolm¢\mathbfs[0] º \boldsymbolm\mathbfm[\boldsymbolm\mathbfs[0] \dots (\boldsymbolm\mathbfs[0] + 31) ]
&&&& \boldsymbolm¢i º \max(\boldsymbolmi, \ceil (\boldsymbolm\mathbfs[0] + 32) ¸ 32 )
&&&& The addition in the calculation of \boldsymbolm¢i is not subject to the 2256modulo.
| rule 0x52 & MSTORE & 2 & 0 & Save word to memory.
&&&& \boldsymbolm¢\mathbfm[ \boldsymbolm\mathbfs[0] \dots (\boldsymbolm\mathbfs[0] + 31) ] º \boldsymbolm\mathbfs[1]
&&&& \boldsymbolm¢i º \max(\boldsymbolmi, \ceil (\boldsymbolm\mathbfs[0] + 32) ¸ 32 )
&&&& The addition in the calculation of \boldsymbolm¢i is not subject to the 2256modulo.
| rule 0x53 & MSTORE8 & 2 & 0 & Save byte to memory.
&&&& \boldsymbolm¢\mathbfm[ \boldsymbolm\mathbfs[0] ] º (\boldsymbolm\mathbfs[1] \bmod 256)
&&&& \boldsymbolm¢i º \max(\boldsymbolmi, \ceil (\boldsymbolm\mathbfs[0] + 1) ¸ 32 )
&&&& The addition in the calculation of \boldsymbolm¢i is not subject to the 2256modulo.
| rule 0x54 & SLOAD & 1 & 1 & Load word from storage.
&&&& \boldsymbolm¢\mathbfs[0] º \boldsymbols[Ia]\mathbfs[\boldsymbolm\mathbfs[0]]
| rule 0x55 & SSTORE & 2 & 0 & Save word to storage.
&&&& \boldsymbols¢[Ia]\mathbfs[ \boldsymbolm\mathbfs[0] ] º \boldsymbolm\mathbfs[1]
&&&& C\text\tiny SSTORE(\boldsymbols, \boldsymbolm) º \begincases Gsset & \textif     \boldsymbolm\mathbfs[1] ¹ 0    Ù    \boldsymbols[Ia]\mathbfs[\boldsymbolm\mathbfs[0]] = 0
Gsreset & \textotherwise \endcases
&&&& A¢r º Ar + \begincases Rsclear & \textif     \boldsymbolm\mathbfs[1] = 0    Ù    \boldsymbols[Ia]\mathbfs[\boldsymbolm\mathbfs[0]] ¹ 0
0 & \textotherwise \endcases
| rule 0x56 & JUMP & 1 & 0 & Alter the program counter.
&&&& J\text\tiny JUMP(\boldsymbolm) º \boldsymbolm\mathbfs[0]
&&&& This has the effect of writing said value to \boldsymbolmpc. See section (113).
| rule 0x57 & JUMPI & 2 & 0 & Conditionally alter the program counter.
&&&& J\text\tiny JUMPI(\boldsymbolm) º \begincases \boldsymbolm\mathbfs[0] & \textif     \boldsymbolm\mathbfs[1] ¹ 0
\boldsymbolmpc + 1 & \textotherwise \endcases
&&&& This has the effect of writing said value to \boldsymbolmpc. See section (113).
| rule 0x58 & PC & 0 & 1 & Get the value of the program counter \textitprior to the increment
&&&& corresponding to this instruction.
&&&& \boldsymbolm¢\mathbfs[0] º \boldsymbolmpc
| rule 0x59 & MSIZE & 0 & 1 & Get the size of active memory in bytes.
&&&& \boldsymbolm¢\mathbfs[0] º 32\boldsymbolmi
| rule 0x5a & GAS & 0 & 1 & Get the amount of available gas, including the corresponding reduction
&&&& for the cost of this instruction.
&&&& \boldsymbolm¢\mathbfs[0] º \boldsymbolmg
| rule 0x5b & JUMPDEST & 0 & 0 & Mark a valid destination for jumps.
&&&& This operation has no effect on machine state during execution.
\bottomrule \endtabular* \begintabular*\columnwidth[h]rlrrl Trulemlticolumn5c\textbf60s & 70s: Push Operations \vspace5pt
\textbfValue & \textbfMnemonic & d & a & \textbfDescription \vspace5pt
0x60 & PUSH1 & 0 & 1 & Place 1 byte item on stack.
&&&& \boldsymbolm¢\mathbfs[0] º c(\boldsymbolmpc + 1)
&&&& \textwhere     c(x) º \begincases I\mathbfb[x] & \textif     x < \lVert I\mathbfb \rVert
0 & \textotherwise \endcases
&&&& The bytes are read in line from the program code¢s bytes array.
&&&& The function c ensures the bytes default to zero if they extend past the limits.
&&&& The byte is right-aligned (takes the lowest significant place in big endian).
| rule 0x61 & PUSH2 & 0 & 1 & Place 2-byte item on stack.
&&&& \boldsymbolm¢\mathbfs[0] º \boldsymbolc\big( (\boldsymbolmpc + 1) \dots (\boldsymbolmpc + 2) \big)
&&&& with \boldsymbolc(\boldsymbolx) º (c(\boldsymbolx0), ..., c(\boldsymbolx\lVert x \rVert -1)) with c as defined as above.
&&&& The bytes are right-aligned (takes the lowest significant place in big endian).
| rulemlticolumn1c: & mlticolumn1c: & : & : & mlticolumn1c:
| rule 0x7f & PUSH32 & 0 & 1 & Place 32-byte (full word) item on stack.
&&&& \boldsymbolm¢\mathbfs[0] º \boldsymbolc\big((\boldsymbolmpc + 1) \dots (\boldsymbolmpc + 32) \big)
&&&& where \boldsymbolc is defined as above.
&&&& The bytes are right-aligned (takes the lowest significant place in big endian).
\bottomrule \endtabular* \begintabular*\columnwidth[h]rlrrl Trulemlticolumn5c\textbf80s: Duplication Operations \vspace5pt
\textbfValue & \textbfMnemonic & d & a & \textbfDescription \vspace5pt
0x80 & DUP1 & 1 & 2 & Duplicate 1st stack item.
&&&& \boldsymbolm¢\mathbfs[0] º \boldsymbolm\mathbfs[0]
| rule 0x81 & DUP2 & 2 & 3 & Duplicate 2nd stack item.
&&&& \boldsymbolm¢\mathbfs[0] º \boldsymbolm\mathbfs[1]
| rulemlticolumn1c: & mlticolumn1c: & : & : & mlticolumn1c:
| rule 0x8f & DUP16 & 16 & 17 & Duplicate 16th stack item.
&&&& \boldsymbolm¢\mathbfs[0] º \boldsymbolm\mathbfs[15]
\bottomrule \endtabular* \begintabular*\columnwidth[h]rlrrl Trulemlticolumn5c\textbf90s: Exchange Operations \vspace5pt
\textbfValue & \textbfMnemonic & d & a & \textbfDescription \vspace5pt
0x90 & SWAP1 & 2 & 2 & Exchange 1st and 2nd stack items.
&&&& \boldsymbolm¢\mathbfs[0] º \boldsymbolm\mathbfs[1]
&&&& \boldsymbolm¢\mathbfs[1] º \boldsymbolm\mathbfs[0]
| rule 0x91 & SWAP2 & 3 & 3 & Exchange 1st and 3rd stack items.
&&&& \boldsymbolm¢\mathbfs[0] º \boldsymbolm\mathbfs[2]
&&&& \boldsymbolm¢\mathbfs[2] º \boldsymbolm\mathbfs[0]
| rulemlticolumn1c: & mlticolumn1c: & : & : & mlticolumn1c:
| rule 0x9f & SWAP16 & 17 & 17 & Exchange 1st and 17th stack items.
&&&& \boldsymbolm¢\mathbfs[0] º \boldsymbolm\mathbfs[16]
&&&& \boldsymbolm¢\mathbfs[16] º \boldsymbolm\mathbfs[0]
\bottomrule \endtabular* \begintabular*\columnwidth[h]rlrrl Trulemlticolumn5c\textbfa0s: Logging Operations \vspace5pt
mlticolumn5lFor all logging operations, the state change is to append an additional log entry on to the substate¢s log series:
mlticolumn5lA¢\mathbfl º A\mathbfl · (Ia, \mathbft, \boldsymbolm\mathbfm[ \boldsymbolm\mathbfs[0] \dots (\boldsymbolm\mathbfs[0] + \boldsymbolm\mathbfs[1] - 1) ])
mlticolumn5land to update the memory consumption counter:
mlticolumn5l\boldsymbolm¢i º M(\boldsymbolmi, \boldsymbolm\mathbfs[0], \boldsymbolm\mathbfs[1])
mlticolumn5lThe entry¢s topic series, \mathbft, differs accordingly:\vspace5pt
\textbfValue & \textbfMnemonic & d & a & \textbfDescription \vspace5pt
0xa0 & LOG0 & 2 & 0 & Append log record with no topics.
&&&& \mathbft º ()
| rule 0xa1 & LOG1 & 3 & 0 & Append log record with one topic.
&&&& \mathbft º (\boldsymbolm\mathbfs[2])
| rulemlticolumn1c: & mlticolumn1c: & : & : & mlticolumn1c:
| rule 0xa4 & LOG4 & 6 & 0 & Append log record with four topics.
&&&& \mathbft º (\boldsymbolm\mathbfs[2], \boldsymbolm\mathbfs[3], \boldsymbolm\mathbfs[4], \boldsymbolm\mathbfs[5])
\bottomrule \endtabular* \begintabular*\columnwidth[h]rlrrl Trulemlticolumn5c\textbff0s: System operations \vspace5pt
\textbfValue & \textbfMnemonic & d & a & \textbfDescription \vspace5pt
0xf0 & CREATE & 3 & 1 & Create a new account with associated code.
&&&& \mathbfi º \boldsymbolm\mathbfm[ \boldsymbolm\mathbfs[1] \dots (\boldsymbolm\mathbfs[1] + \boldsymbolm\mathbfs[2] - 1) ]
&&&& (\boldsymbols¢, \boldsymbolm¢g, A+) º \begincasesL(\boldsymbols*, Ia, Io, L(\boldsymbolmg), Ip, \boldsymbolm\mathbfs[0], \mathbfi, Ie + 1) & \textif     \boldsymbolm\mathbfs[0] £ slant \boldsymbols[Ia]b   Ù   Ie < 1024
\big(\boldsymbols, \boldsymbolmg, \varnothing\big) & \textotherwise \endcases
&&&& \boldsymbols* º \boldsymbols     \textexcept     \boldsymbols*[Ia]n = \boldsymbols[Ia]n + 1
&&&& A¢ º A \Cup A+ which abbreviates: A¢\mathbfs º A\mathbfs È A+\mathbfs     Ù     A¢\mathbfl º A\mathbfl · A+\mathbfl     Ù     A¢\mathbfr º A\mathbfr + A+\mathbfr
&&&& \boldsymbolm¢\mathbfs[0] º x
&&&& where x=0 if the code execution for this operation failed due to an exceptional halting
&&&& Z(\boldsymbols*, \boldsymbolm, I) = T or Ie = 1024
&&&& (the maximum call depth limit is reached) or \boldsymbolm\mathbfs[0] > \boldsymbols[Ia]b (balance of the caller is too
&&&& low to fulfil the value transfer); and otherwise x=A(Ia, \boldsymbols[Ia]n), the address of the newly
&&&& created account, otherwise.
&&&& \boldsymbolm¢i º M(\boldsymbolmi, \boldsymbolm\mathbfs[1], \boldsymbolm\mathbfs[2])
&&&& Thus the operand order is: value, input offset, input size.
| rule 0xf1 & CALL & 7 & 1 & Message-call into an account.
&&&& \mathbfi º \boldsymbolm\mathbfm[ \boldsymbolm\mathbfs[3] \dots (\boldsymbolm\mathbfs[3] + \boldsymbolm\mathbfs[4] - 1) ]
&&&& (\boldsymbols¢, g¢, A+, \mathbfo) º \begincases
Q(\boldsymbols, Ia, Io, t, t,
C\text\tiny CALLGAS(\boldsymbolm), Ip, \boldsymbolm\mathbfs[2], \boldsymbolm\mathbfs[2], \mathbfi, Ie + 1)
&
\textif     \boldsymbolm\mathbfs[2] £ slant \boldsymbols[Ia]b   Ù
Ie < 1024
(\boldsymbols, g, \varnothing, ()) & \textotherwise \endcases
&&&& n º \min({ \boldsymbolm\mathbfs[6], |\mathbfo|})
&&&& \boldsymbolm¢\mathbfm[ \boldsymbolm\mathbfs[5] \dots (\boldsymbolm\mathbfs[5] + n - 1) ] = \mathbfo[0 \dots (n - 1)]
&&&& \boldsymbolm¢g º \boldsymbolmg + g¢
&&&& \boldsymbolm¢\mathbfs[0] º x
&&&& A¢ º A \Cup A+
&&&& t º \boldsymbolm\mathbfs[1] \mod 2160
&&&& where x=0 if the code execution for this operation failed due to an exceptional halting
&&&& Z(\boldsymbols, \boldsymbolm, I) = T or if
&&&& \boldsymbolm\mathbfs[2] > \boldsymbols[Ia]b (not enough funds) or Ie = 1024 (call depth limit reached); x=1
&&&& otherwise.
&&&& \boldsymbolm¢i º M(M(\boldsymbolmi, \boldsymbolm\mathbfs[3], \boldsymbolm\mathbfs[4]), \boldsymbolm\mathbfs[5], \boldsymbolm\mathbfs[6])
&&&& Thus the operand order is: gas, to, value, in offset, in size, out offset, out size.
&&&& C\text\tiny CALL(\boldsymbols, \boldsymbolm) º C\text\tiny GASCAP(\boldsymbols, \boldsymbolm) + C\text\tiny EXTRA(\boldsymbols, \boldsymbolm)
&&&& C\text\tiny CALLGAS(\boldsymbols, \boldsymbolm) º \begincases C\text\tiny GASCAP(\boldsymbols, \boldsymbolm) + Gcallstipend & \textif     \boldsymbolm\mathbfs[2] ¹ 0
C\text\tiny GASCAP(\boldsymbols, \boldsymbolm) & \textotherwise \endcases
&&&& C\text\tiny GASCAP(\boldsymbols, \boldsymbolm) º \begincases \min{ L(\boldsymbolmg - C\text\tiny EXTRA(\boldsymbols, \boldsymbolm)), \boldsymbolm\mathbfs[0] } & \textif     \boldsymbolmg ³ C\text\tiny EXTRA(\boldsymbols, \boldsymbolm)
\boldsymbolm\mathbfs[0] & \textotherwise \endcases
&&&& C\text\tiny EXTRA(\boldsymbols, \boldsymbolm) º Gcall + C\text\tiny XFER(\boldsymbolm) + C\text\tiny NEW(\boldsymbols, \boldsymbolm)
&&&& C\text\tiny XFER(\boldsymbolm) º \begincases Gcallvalue & \textif     \boldsymbolm\mathbfs[2] ¹ 0
0 & \textotherwise \endcases
&&&& C\text\tiny NEW(\boldsymbols, \boldsymbolm) º \begincases Gnewaccount & \textif     \boldsymbols[\boldsymbolm\mathbfs[1] \mod 2160] = \varnothing
0 & \textotherwise \endcases
| rule 0xf2 & CALLCODE & 7 & 1 & Message-call into this account with an alternative account¢s code.
&&&& Exactly equivalent to CALL except:
&&&& (\boldsymbols¢, g¢, A+, \mathbfo) º \begincases
Q(\boldsymbols*, Ia, Io, Ia, t,
C\text\tiny CALLGAS(\boldsymbolm), Ip, \boldsymbolm\mathbfs[2], \boldsymbolm\mathbfs[2], \mathbfi, Ie + 1)
&
\textif     \boldsymbolm\mathbfs[2] £ slant \boldsymbols[Ia]b   Ù
Ie < 1024
(\boldsymbols, g, \varnothing, ()) & \textotherwise \endcases
&&&& Note the change in the fourth parameter to the call Q from the 2nd stack value \boldsymbolm\mathbfs[1]
&&&& (as in CALL) to the present address Ia. This means that the recipient is in fact the
&&&& same account as at present, simply that the code is overwritten.
| rule 0xf3 & RETURN & 2 & 0 & Halt execution returning output data.
&&&& H\text\tiny RETURN(\boldsymbolm) º \boldsymbolm\mathbfm[ \boldsymbolm\mathbfs[0] \dots ( \boldsymbolm\mathbfs[0] + \boldsymbolm\mathbfs[1] - 1 ) ]
&&&& This has the effect of halting the execution at this point with output defined.
&&&& See section (113).
&&&& \boldsymbolm¢i º M(\boldsymbolmi, \boldsymbolm\mathbfs[0], \boldsymbolm\mathbfs[1])
\endtabular* \begintabular*\columnwidth[h]rlrrl | rule 0xf4 & DELEGATECALL & 6 & 1 & Message-call into this account with an alternative account¢s code, but persisting
&&&& the current values for sender and value.
&&&& Compared with CALL, DELEGATECALL takes one fewer arguments. The omitted
&&&& argument is \boldsymbolm\mathbfs[2]. As a result, \boldsymbolm\mathbfs[3], \boldsymbolm\mathbfs[4], \boldsymbolm\mathbfs[5] and \boldsymbolm\mathbfs[6] in the definition of CALL
&&&& should respectively be replaced with \boldsymbolm\mathbfs[2], \boldsymbolm\mathbfs[3], \boldsymbolm\mathbfs[4] and \boldsymbolm\mathbfs[5].
&&&& Otherwise exactly equivalent to CALL except:
&&&& (\boldsymbols¢, g¢, A+, \mathbfo) º \begincases
Q(\boldsymbols*, Is, Io, Ia, t,
\boldsymbolm\mathbfs[0], Ip, 0, Iv, \mathbfi, Ie + 1)
& \textif     Iv £ slant \boldsymbols[Ia]b   Ù   Ie < 1024
(\boldsymbols, g, \varnothing, ()) & \textotherwise \endcases
&&&& Note the changes (in addition to that of the fourth parameter) to the second
&&&& and ninth parameters to the call Q.
&&&& This means that the recipient is in fact the same account as at present, simply
&&&& that the code is overwritten and the context is almost entirely identical.
| rule 0xfe & INVALID & \varnothing & \varnothing & Designated invalid instruction.
| rule 0xff & SELFDESTRUCT & 1 & 0 & Halt execution and register account for later deletion.
&&&& A¢\mathbfs º A\mathbfs È { Ia }
&&&& \boldsymbols¢[\boldsymbolm\mathbfs[0] \mod 2160]b º \boldsymbols[\boldsymbolm\mathbfs[0] \mod 2160]b + \boldsymbols[Ia]b
&&&& \boldsymbols¢[Ia]b º 0
&&&& A¢r º Ar + \begincases Rselfdestruct & \textif     Ia Ï A\mathbfs
0 & \textotherwise \endcases
&&&& C\text\tiny SELFDESTRUCT(\boldsymbols, \boldsymbolm) º Gselfdestruct + \begincases Gnewaccount & \textif     \boldsymbols[\boldsymbolm\mathbfs[0] \mod 2160] = \varnothing
0 & \textotherwise \endcases
\bottomrule \endtabular*
25 Genesis Block
The genesis block is 15 items, and is specified thus:
\big( \big( 0256, \mathtt\tiny KEC\big(\mathtt\tiny RLP\big( () \big)\big), 0160, stateRoot, 0, 0, 02048, 217	, 0, 0, 3141592, time, 0, 0256, \mathtt\tiny KEC\big( (42) \big) \big), (), () \big)    (232)
Where 0256 refers to the parent hash, a 256-bit hash which is all zeroes; 0160 refers to the beneficiary address, a 160-bit hash which is all zeroes; 02048 refers to the log bloom, 2048-bit of all zeros; 217refers to the difficulty; the transaction trie root, receipt trie root, gas used, block number and extradata are both 0, being equivalent to the empty byte array. The sequences of both ommers and transactions are empty and represented by (). \mathtt\tiny KEC\big( (42) \big) refers to the Keccak hash of a byte array of length one whose first and only byte is of value 42, used for the nonce. \mathtt\tiny KEC\big(\mathtt\tiny RLP\big( () \big)\big) value refers to the hash of the ommer lists in RLP, both empty lists. The proof-of-concept series include a development premine, making the state root hash some value stateRoot. Also time will be set to the initial timestamp of the genesis block. The latest documentation should be consulted for those values.
26 Ethash
26.1 Definitions
We employ the following definitions: \begintabular*\columnwidth[h]lrl Trule Name & Value & Description
| rule Jwordbytes & 4 & Bytes in word.
Jdatasetinit & 230& Bytes in dataset at genesis.
Jdatasetgrowth & 223& Dataset growth per epoch.
Jcacheinit & 224& Bytes in cache at genesis.
Jcachegrowth & 217& Cache growth per epoch.
Jepoch & 30000 & Blocks per epoch.
Jmixbytes & 128 & mix length in bytes.
Jhashbytes & 64 & Hash length in bytes.
Jparents & 256 & Number of parents of each dataset element.
Jcacherounds & 3 & Number of rounds in cache production.
Jaccesses & 64 & Number of accesses in hashimoto loop.
\bottomrule \endtabular*
26.2 Size of dataset and cache
The size for Ethash¢s cache \mathbfc Î \mathbbB and dataset \mathbfd Î \mathbbB depend on the epoch, which in turn depends on the block number.
Eepoch(Hi) = lfloor
Hi
Jepoch
rfloor    (233)
The size of the dataset growth by Jdatasetgrowth bytes, and the size of the cache by Jcachegrowth bytes, every epoch. In order to avoid regularity leading to cyclic behavior, the size must be a prime number. Therefore the size is reduced by a multiple of Jmixbytes, for the dataset, and Jhashbytes for the cache. Let dsize = \lVert \mathbfd \rVert be the size of the dataset. Which is calculated using
dsize = Eprime(Jdatasetinit + Jdatasetgrowth · Eepoch - Jmixbytes, Jmixbytes)    (234)
The size of the cache, csize, is calculated using
csize = Eprime(Jcacheinit + Jcachegrowth · Eepoch - Jhashbytes, Jhashbytes)    (235)
Eprime(x, y) = \begincases x & \textif     x / y Î \mathbbP
Eprime((x - 1) · y, y) & \textotherwise \endcases    (236)
26.3 Dataset generation
In order to generate the dataset we need the cache \mathbfc, which is an array of bytes. It depends on the cache size csize and the seed hash \mathbfs Î \mathbbB32.
26.3.1 Seed hash
The seed hash is different for every epoch. For the first epoch it is the Keccak-256 hash of a series of 32 bytes of zeros. For every other epoch it is always the Keccak-256 hash of the previous seed hash:
\mathbfs = Cseedhash(Hi)    (237)
Cseedhash(Hi) = \begincases \textttKEC(\mathbf032) & \textif     Eepoch(Hi) = 0

\textttKEC(Cseedhash(Hi - Jepoch)) & \textotherwise \endcases    (238)
With \mathbf032 being 32 bytes of zeros.
26.3.2 Cache
The cache production process involves using the seed hash to first sequentially filling up csize bytes of memory, then performing Jcacherounds passes of the RandMemoHash algorithm created by . The initial cache \mathbfc¢, being an array of arrays of single bytes, will be constructed as follows. We define the array \mathbfci, consisting of 64 single bytes, as the ith element of the initial cache:
\mathbfci = \begincases \textttKEC512(\mathbfs) & \textif     i = 0

\textttKEC512(\mathbfci-1) & \textotherwise \endcases    (239)
Therefore \mathbfc¢ can be defined as
\mathbfc¢[i] = \mathbfci     "     i < n    (240)
n = lfloor
csize
Jhashbytes
rfloor    (241)
The cache is calculated by performing Jcacherounds rounds of the RandMemoHash algorithm to the inital cache \mathbfc¢:
\mathbfc = Ecacherounds(\mathbfc¢, Jcacherounds)    (242)
Ecacherounds(\mathbfx, y) = \begincases \mathbfx & \textif     y = 0

E\text\tiny RMH(\mathbfx) & \textif     y = 1

Ecacherounds(E\text\tiny RMH(\mathbfx), y -1 ) & \textotherwise \endcases    (243)
Where a single round modifies each subset of the cache as follows:
E\text\tiny RMH(\mathbfx) = \big( Ermh(\mathbfx, 0), Ermh(\mathbfx, 1), ... , Ermh(\mathbfx, n - 1) \big)    (244)
\beginmultline E_rmh(\mathbfx, i) = \textttKEC512(\mathbfx¢[(i - 1 + n) \mod n] Å \mathbfx¢[\mathbfx¢[i][0] \mod n])
\textwith     \mathbfx¢ = \mathbfx     \textexcept     \mathbfx¢[j] = E_rmh(\mathbfx, j)     "     j < i \endmultline
26.3.3 Full dataset calculation
Essentially, we combine data from Jparents pseudorandomly selected cache nodes, and hash that to compute the dataset. The entire dataset is then generated by a number of items, each Jhashbytes bytes in size:
\mathbfd[i] = Edatasetitem(\mathbfc, i)     "     i < lfloor
dsize
Jhashbytes
rfloor    (245)
In order to calculate the single item we use an algorithm inspired by the FNV hash () in some cases as a non-associative substitute for XOR.
E\text\tiny FNV(\mathbfx, \mathbfy) = (\mathbfx · (\mathrm0x01000193 Å \mathbfy)) \mod 232	    (246)
The single item of the dataset can now be calculated as:
Edatasetitem(\mathbfc, i) = Eparents(\mathbfc, i, -1, \varnothing)    (247)
Eparents(\mathbfc, i, p, \mathbfm) = \begincases Eparents(\mathbfc, i, p +1, Emix(\mathbfm, \mathbfc, i, p + 1)) & \textif     p < Jparents -2
Emix(\mathbfm, \mathbfc, i, p + 1) & \textotherwise \endcases    (248)
Emix(\mathbfm, \mathbfc, i, p) = \begincases \textttKEC512(\mathbfc[i \mod csize] Å i) & \textif     p = 0
E\text\tiny FNV\big(\mathbfm, \mathbfc[E\text\tiny FNV(i Å p, \mathbfm[p \mod ë Jhashbytes / Jwordbytes û]) \mod csize] \big) & \textotherwise \endcases    (249)
26.4 Proof-of-work function
Essentially, we maintain a mix" Jmixbytes bytes wide, and repeatedly sequentially fetch Jmixbytes bytes from the full dataset and use the E\text\tiny FNV function to combine it with the mix. Jmixbytes bytes of sequential access are used so that each round of the algorithm always fetches a full page from RAM, minimizing translation lookaside buffer misses which ASICs would theoretically be able to avoid. If the output of this algorithm is below the desired target, then the nonce is valid. Note that the extra application of \textttKEC at the end ensures that there exists an intermediate nonce which can be provided to prove that at least a small amount of work was done; this quick outer PoW verification can be used for anti-DDoS purposes. It also serves to provide statistical assurance that the result is an unbiased, 256 bit number. The PoW-function returns an array with the compressed mix as its first item and the Keccak-256 hash of the concatenation of the compressed mix with the seed hash as the second item: \mathttPoW(H\hcanceln, Hn, \mathbfd) = \lbrace \mathbfmc(\mathtt KEC(\mathtt RLP(LH(H\hcanceln))), Hn, \mathbfd), \textttKEC(\mathbfsh(\mathtt KEC(\mathtt RLP(LH(H\hcanceln))), Hn) + \mathbfmc(\mathtt KEC(\mathtt RLP(LH(H\hcanceln))), Hn, \mathbfd)) \rbrace    (250) With H\hcanceln being the hash of the header without the nonce. The compressed mix \mathbfmc is obtained as follows: \mathbfmc(\mathbfh, \mathbfn, \mathbfd) = Ecompress(Eaccesses(\mathbfd,	åi = 0nmix	\mathbfsh(\mathbfh, \mathbfn), \mathbfsh(\mathbfh, \mathbfn), -1), -4)    (251) The seed hash being: \mathbfsh(\mathbfh, \mathbfn) = \textttKEC512(\mathbfh + Erevert(\mathbfn))    (252) Erevert(\mathbfn) returns the reverted bytes sequence of the nonce \mathbfn: Erevert(\mathbfn)[i] = \mathbfn[\lVert \mathbfn \rVert -i]    (253) We note that the +¢¢-operator between two byte sequences results in the concatenation of both sequences. The dataset \mathbfd is obtained as described in section (245). The number of replicated sequences in the mix is:
nmix = lfloor
Jmixbytes
Jhashbytes
rfloor    (254)
In order to add random dataset nodes to the mix, the Eaccesses function is used:
Eaccesses(\mathbfd, \mathbfm, \mathbfs, i) = \begincases Emixdataset(\mathbfd, \mathbfm, \mathbfs, i) & \textif     i = Jaccesses -2
Eaccesses(Emixdataset(\mathbfd, \mathbfm, \mathbfs, i), \mathbfs, i + 1) & \textotherwise \endcases    (255)
Emixdataset(\mathbfd, \mathbfm, \mathbfs, i) = E\text\tiny FNV(\mathbfm, Enewdata(\mathbfd, \mathbfm, \mathbfs, i)    (256)
Enewdata returns an array with nmix elements:
Enewdata(\mathbfd, \mathbfm, \mathbfs, i)[j] = \mathbfd[E\text\tiny FNV(i Å \mathbfs[0], \mathbfm[i \mod lfloor
Jmixbytes
Jwordbytes
rfloor]) \mod lfloor
dsize / Jhashbytes
nmix
rfloor · nmix + j]     "     j < nmix    (257)
The mix is compressed as follows:
Ecompress(\mathbfm, i) = \begincases \mathbfm & \textif     i ³ slant \lVert \mathbfm \rVert - 8
Ecompress(E\text\tiny FNV(E\text\tiny FNV(E\text\tiny FNV(\mathbfm[i + 4], \mathbfm[i + 5]), \mathbfm[i + 6]), \mathbfm[i + 7]), i + 8) & \textotherwise \endcases    (258)
\enddocument