In previous articles of this series, we've learned the basics of EVM assembly, as well as how ABI encoding allows the outside world to communicate with a contract. In this article, we'll see how a contract is created.
The EVM bytecode we've seen so far is pretty straightforward conceptually. They are just instructions that the EVM executes from top to bottom. The bytecode for contract creation is more fun, in that it blurs the barrier between code and data.
Sometimes data is code, and sometimes code is data.
Let's create a simple (and completely useless) contract:
https://gist.github.com/0d100bc90f86d9cc5000d0051ced6ff7
Compile it:
https://gist.github.com/b670d309a5a4ad3ff39a59f5e0500ebf
And the bytecode is:
https://gist.github.com/69dbe99f08c4b82b4ee57fc695c91d9b
There is no special RPC call or transaction type to create a contract. The same transaction mechanism is used for other purposes as well:
- Transferring Ether to an account or contract.
- Calling a contract's method with parameters.
Depending on what parameters you specified, the transaction is interpreted differently by Ethereum. To create a contract, the to
address should be null (or left out).
I've created the example contract with this transaction:
https://rinkeby.etherscan.io/tx/0x58f36e779950a23591aaad9e4c3c3ac105547f942f221471bf6ffce1d40f8401
Opening Etherscan, you should see that the input data for this transaction is the bytecode produced by the Solidity compiler:
When processing this transaction, the EVM would have executed the input data as code. And voila, a contract was born.
We can break the bytecode into three separate chunks:
https://gist.github.com/b38241dc783a7b689b9fbdca4a9162d8
- Deploy code runs when the contract is being created.
- Contract code runs after the contract had been created, when its methods are called.
- (optional) Auxdata is the cryptographic fingerprint of the source code, used for verification. This is just data, and never executed by the EVM.
The deploy code has two main purposes:
- Run the constructor function, and set up initial storage variables (like contract owner).
- Calculate the contract code, and return it to the EVM.
The deploy code generated by the Solidity Compiler loads the bytes 60606040525b600080fd00
from bytecode into memory, then returns it as the contract code. In this case, the "calculation" is merely reading a chunk of data into memory. In principle, we could programatically generate the contract code.
So we know that the deploy code returns the contract code. Then what? How does Ethereum create a contract from the returned contract code? To learn the details, let's dig into the Ethereum source code together.
I've found that the Go-Ethereum implementation to be the easiest reference to find the information I need. We get proper variable names, static type info, and symbol cross-references. Try beating that, Yellow Paper!
Let's read the evm.Create method on SourceGraph, from top to bottom, to see what it does.
Check whether caller has enough balance to make a transfer:
https://gist.github.com/bd740fbc600dc1110524c1ad3749209c
Derive the new contract's address from the caller's address (passing a nonce):
https://gist.github.com/e9614f0d69e5ccf12f3cdcb19c2a289f
Create the new contract account using the derived contract address:
https://gist.github.com/4e89a1a19ab170de795f2699ed6b6ed9
Transfer the initial Ether endowment from caller to the new contract:
https://gist.github.com/67764797ef9d01fe216104f11569d7e9
Execute the contract's deploy code with EVM. The ret
variable is the returned contract code:
https://gist.github.com/9de95bf725caebf30554c2aed44b913e
Check for error. Or if the contract code is too big, fail. Charge the user gas then set the contract code:
https://gist.github.com/03669acb493542092d8cf018bd0a5011
That's about it.
Feel free explore the go-ethereum code base by clicking on the symbols you are interested in.
Let's analyze the assembly code for the example contract:
https://gist.github.com/64780ec9861d7269a01643b050015a28
The bytecode for this contract broken into separate chunks:
https://gist.github.com/6760d8000851cf48665819fae0dbccc6
The assembly for the deploy code is:
https://gist.github.com/a92209730c29dd2b4fc8fbf97da00382
Tracing through the above assembly for returning the contract code:
https://gist.github.com/56b3fefd3e381a6b4dbfca6cbd35279a
dataSize(sub_0)
and dataOffset(sub_0)
are in fact PUSH instructions that put constants onto the stack. The two constants 0x1C
(28) and 0x36
(54) specifies the bytecode substring to return.
The deploy code assembly roughly corresponds to the following Python code:
https://gist.github.com/74e1ac1061815c696fc7862b3daa092e
The resulting memory content is:
https://gist.github.com/045cb8e86acadf288d01c974a23fdc34
Which corresponds to the assembly:
https://gist.github.com/58dc576189d57efe24223586fb319a5b
Looking again on Etherscan, this is exactly what was deployed as the contract code:
https://rinkeby.etherscan.io/address/0x2c7f561f1fc5c414c48d01e480fdaae2840b8aa2#code
The behaviour for the codecopy
instruction is less obvious than others. Instead of looking it up in the Yello Paper, it's convenient to refer to the go-ethereum source code to see what it does. See CODECOPY:
https://gist.github.com/83b2bb5363ed937210010cc15a5822b4
Yay no Greek letters!
(The line evm.interpreter.intPool.put(memOffset, codeOffset, length)
recyles objects for later uses. It is just an efficiency optimization.)
Aside from returning the contract code, the other purpose of the deploy code is to run the constructor to set things up. If there are constructor arguments, the deploy code needs to somehow load the arguments data from somewhere.
The Solidity convention for passing constructor arguments is by appending the ABI encoded parameter values at the end of the bytecode when calling eth_sendtransaction
. The RPC call would pass in the bytecode and ABI encoded params together as input data:
https://gist.github.com/c33b512541a0b7bac9d7363cab31ac4d
Let's look at an example contract with one constructor argument:
https://gist.github.com/d9b6076a8b9e08753497f49d56da7e16
I've created this contract, passing in the value 66
. The transaction on Etherscan:
https://rinkeby.etherscan.io/tx/0x2f409d2e186883bd3319a8291a345ddbc1c0090f0d2e182a32c9e54b5e3fdbd8
The input data is:
https://gist.github.com/3b6e21be8faaa0cbbade2cfa022e4c0a
To process the arguments in the constructor, the deploy code copies the ABI parameters from the end of the calldata
into memory, then from memory onto the stack.
The FooFactory
contract can create new instances of Foo
by calling makeNewFoo
:
https://gist.github.com/c88fed373fc877159a44b21caae06423
The full assembly for this contract in this gist: https://gist.github.com/hayeah/a94aa4e87b7b42e9003adf64806c84e4
The only new thing we've not seen before is the CREATE
instruction. Like the eth_sendtransaction
RPC call, it provides a way to create new contracts.
See opCreate for the go implementation. This instruction calls evm.Create
to create a contract:
https://gist.github.com/ad989633b5d9d4b680bb02a74400773c
We've seen evm.Create
earlier, but this time the caller is an Smart Contract, not a human.
Challenge
Write a blog article explaining how the assembly works. Send me link to your article, and I'll include it in the Deep Dive EVM series : )
If you absolutely must know all about what auxdata is, read Contract Metadata. The gist of it is that auxdata
is a hash that you can use to fetch metadata about the deployed contract.
The format of auxdata is:
https://gist.github.com/716be212a099f035d8dfe3e4fe44d826
Deconstructing the auxdata bytesequence we've seen previously:
https://gist.github.com/2efd5c4373faa68a8140f7d7ff8534d8
There you go, one more Ethereum trivia for the party night.
The way contracts are created is similar to how a self-extracting software installer works. When the installer runs, it configures the system environment, then extracts the target program onto the system by reading from its program bundle.
- Enforces separation between "install time" and "run time". No way to run the constructor twice.
- Enables smart contracts to create other smart contracts.
- Is easy for a pre-Solidity low-level language to implement.
What I found interesting about the design of contract creation is that different parts of the "smart contract installer" is packed together in the transaction as a data
byte string:
https://gist.github.com/34858fbb071cd2bc63b2bddd63c754f5
How data
should be encoded is not obvious from reading the documentation for eth_sendtransaction
. I couldn't figure out how constructor arguments are passed into a transaction until a friend told me that they are ABI-encoded then appended to the end of the bytecode.
An alternative design that would've made it clearer is perhaps to send these parts as separate properties in a transaction:
https://gist.github.com/4e3a59d59a42aadd422eb76742c87f22
Upon more thoughts, though, I think it's actually very powerful that the Transaction object is so simple. To a Transaction, data
is just a byte string, and it doesn't dictate a language model for how the data should be interpreted. By keeping the Transaction object simple, language implementers have a blank canvas for design and experiments.
Indeed, data
could even be interpreted by a different virtual machine in the future.