Skip to content

Instantly share code, notes, and snippets.

@hayeah

hayeah/blog.md Secret

Created September 3, 2017 10:31
Show Gist options
  • Save hayeah/c7858a4e6e3eb1f934a36e22b8206fe4 to your computer and use it in GitHub Desktop.
Save hayeah/c7858a4e6e3eb1f934a36e22b8206fe4 to your computer and use it in GitHub Desktop.

In previous articles of this series we've seen how Solidity represents data structures in storage.

To review, you might read

  • Part 1
  • Part 2
  • Part 3

Data sits in contract's storage, but it's useless if there's no way to interact witih data from the outside world.

In this article we'll see how exactly Solidity and EVM makes it possible for external programs to call a contract's method and cause its state to change.

The "external program" is not limited to DApp/JavaScript. Any program that can communicate with an Ethereum node using HTTP RPC can interact with any contract deployed on the blockchain by creating transactions.

You can change the state of a contract by creating a transaction. Creating a transaction is like making an HTTP request. A web server would accept your HTTP request and make changes to the database. A transaction would be accepted by the network, and the underlying blockchain extended to include the state changes.

Transactions are to Smart Contracts as HTTP requests are to web services.

Contract Transaction

Let's look at a transaction that sets a state variable to 0x1. The contract we want to interact with has a setter and a getter for the variable a:

https://gist.github.com/9e090c8487d20c53a350cb0a44f797d0

This contract is deployed on the test network Rinkeby. You can inspect it using Etherscan at the address 0x62650ae5....

I've created a transaction that makes the call setA(1). Inspect the transaction at the address 0x7db471e5....

The transaction's input data is the most important part:

https://gist.github.com/8fe6c49f79471c73d747cb6216e7fb5d

The first four bytes is the method selector. The rest of the input data are arguments in chunks of 32 bytes. In this case there is only 1 argument 0x1.

The method selector is the kecccak256 of the method signature, which is the method's name and the argument types. In this case the method signature is setA(uint256).

It is easy to calculate the method selector in Python:

https://gist.github.com/1f5fdce9cecefdb32148ff8d068c7eb1

Then take the first 4 bytes of the hash:

https://gist.github.com/7479874912442aae26a77555bf14b130

The ABI

As far as the EVM is concerned, the transaction's input data is just a sequence of bytes. The EVM has not builtin support for calling methods. A smart contract can choose to simulate a method call by processing the input data in a structured way.

If different languages agree on how input data should be interpreted, then they can easily interoperate with each other. The Contract Application Binary Interface (The ABI) specifies a common encoding scheme so different EVM languages to talk with each other.

If we know that the method signature is setA(uint256), then we'd be able to call a method regardless of what language is used to implement the contract.

Calling The Getter

Changing the state requires the entire network to agree, costing you gas. Reading the state requires only your local Ethereum node to carry out the transaction, and it's free. A local transaction is like a cached HTTP GET request.

  • It doesn't change the global state.
  • The data from cache may be slightly behind the latest global state.

Let's make an eth_call to invoke the getA method, getting the state a in return.

First, calculate the method selector:

https://gist.github.com/c668589d50aba5481067c1313e476b1d

Since there is no argument, the input data is just the method selector by itself. We can send the request to any Ethereum node. In this example, let's send to an Ethereum node hosted by infura.io:

undefined

The EVM carries out the computation and returns raw bytes as the result:

https://gist.github.com/1908c734e64fd91b604f0ae7cafd3038

According to the ABI, the bytes should be interpreted as the value 0x1.

Assembly

We've seen the ABI calling convention. Now let's see how the compiled contract processes the raw input data to make a method call. This contract defines setA(uint256):

https://gist.github.com/de74c658301ec14e4c56d10dd645e1bc

Compile:

https://gist.github.com/b3cca27dda95de47fbca1e201558e611

The assembly code for the methods being called is in the body of the contract, organized under "sub_0":

https://gist.github.com/a24299d6543c6490ab46a4037250f890

There are two pieces of boilerplate code that are irrelevant to this discussion, but FYI:

  • mstore(0x40, 0x60) at the very top reserves the first 64 bytes in memory for sha3 hashing. This is always done whether the contract needs it or not.
  • auxdata at the very bottom is used to verify that the published source code is the same as the deployed bytecode.

We can break the remaining assembly code to two parts for easier analysis:

  1. Matching the selector and jumping to a method.
  2. Loading arguments, execute method, return from method.

First, the annotated assembly for matching the selector:

https://gist.github.com/59ac1d75794aae34d21a164f61c87727

It's straightforward except for the bit-shuffling at the beginning to load 4 bytes from call data. For clarity, the assembly logic in low-level pseudocode is like:

https://gist.github.com/d5a414b9c55be6033f2d850c29715317

The annotated assembly for the actual method call:

https://gist.github.com/2b65f315fb031eeeacb5d51eca4407d7

Before entering into the method body, the assembly does two things:

  1. Saves the position to return to after method call.
  2. Loads the arguments from call data onto the stack.

In low-level pseudocode:

https://gist.github.com/fd0f9eb41728e4107ed9a957847e148a

Finally, combine the two parts together:

https://gist.github.com/fe8df911649e07099974723b21afcb09

Trivia: The opcode for revert is fd. You won't find specification for it in the Yellow Paper, or implementation in code. It turns out that fd doesn't actually exist! It's an invalid op. When the EVM encounters an invalid op, it gives up and revert state as a side-effect.

Handling Multiple Methods

How about a contract that has multiple methods?

https://gist.github.com/58fb7b09aeb306b2ab5c23bf33a49b32

Simple. Just more if-else branches one after another:

https://gist.github.com/03976a31758e08548b804030f2caad54

In pseudocode:

https://gist.github.com/420ad4735e5d3ed3a3ed0788d937a302

ABI Encoding For Complex Method Calls

To make a method call, the first four bytes of the transaction input data is always the method selector, followed by arguments in chunks of 32 bytes. The ABI Encoding Specification that details how the arguments are encoded can be extremely painful to read.

Another strategy to learn the ABI is to use use pyethereum's ABI encoding function to investigate how different types of data are encoded. We can start from simple cases, and build up to more complex types.

First, import the encode_abi function:

https://gist.github.com/e0012aba0b88f56570593fd7fec71045

For a method that has three uint256 arguments (e.g. foo(uint256 a, uint256 b, uint256 c)), the encoded arguments are simply uint256 numbers one after another:

https://gist.github.com/ecfcdf3545f1e5589222d6dac9b8575c

Types smaller than 32 bytes are padded to 32 bytes:

https://gist.github.com/40baf723f972318c0a574400ef292c37

For fix-sized arrays, the elements are again 32 bytes chunks (possibly padded), laid out one after another:

https://gist.github.com/23a1ed64ddddb8d001288ed352fdde96

ABI Encoding for Dynamic Arrays

The ABI introduces an layer of indirection to encode dynamic arrays, following a scheme called head-tail encoding.

The idea is that the elements of the dynamic arrays are packed at the tail-end of the transaction's calldata, and the arguments (the "head" part) are the positions for where to load the array elements.

If we call a method with 3 dynamic arrays, the arguments are encoded like this:

https://gist.github.com/f9f19372a86012d89f07cdfbabe52058

So the head section has three 32 bytes arguments, pointing to locations in the tail section, which contains the actual data for the three dynamic arrays.

It is possible to mix dynamic and static arguments. Here's an example with (static, dynamic, static) arguments. The static arguments are encoded as is, while data for the second dynamic array is placed in the tail section:

https://gist.github.com/0ece4a779a11689f09af0c0c2a1bfd8e

Encoding Bytes

Strings and Byte Arrays are also head-tail encoded. The only difference is that the bytes are packed tightly in chunks of 32 bytes, like so:

https://gist.github.com/714d500b5c6cc3d4a31ee9d5389fe170

If the string is larger than 32 bytes, than multiple 32 bytes chunks are used:

https://gist.github.com/9762bdf4b471e87b4dcf95fe5e961944

Nested Arrays

Nested arrays has one indirection per nesting.

https://gist.github.com/2bfd185b94b1f16159a867f45995c982

Ya, lots of zeroes.

Considering The Cost

Why does the ABI truncate the method selector to only 4 bytes? Could there be unlucky collisions for different methods if we don't use the full 32 bytes of sha256? If the truncation is to save cost, why bother saving a mere 28 bytes in the method selector if it is wasting way more bytes with zero-padding?

These two design choices seem contradictory... until we consider the gas costs for a transaction.

  • 21000 paid for every transaction.
  • 4 paid for every zero byte of data or code for a transaction.
  • 68 paid for every non-zero byte of data or code for a transaction.

Ah ha! Zeroes 17 times cheaper, so zero-padding isn't as bad as it seems.

The method selector is a cryptographic hash, which is pseudorandom. For a random string, each byte only has 0.3% chance of being 0, so the whole hash would have mostly non-zero bytes.

  • 0x1 padded to 32 bytes costs 192 gas. 4*31 (zeroes) + 68 (1 non-zero)
  • sha256 is likely to have 32 non-zero bytes, which costs about 2176 gas (32 * 68).
  • sha256 truncated to 4 bytes would cost about 272 gas (32 * 4).

This is yet another example of quirky low-level design incentivized by the gas cost structure.

Conclusion

To interact with a Smart Contract, you send it some raw bytes. It does some computation, possibly changing its own state, and then send you raw bytes in return. Method calling is actually a collective illusion enabled by the ABI.

The ABI is specified like a low-level format, but in its function it's more like a serialization format for an cross-language RPC framework.

It's possible to draw analogies between the architectural tiers of DApp and Web App:

  • The blockchain is like the database.
  • A contract is like a web service.
  • A transaction is like a request.
  • ABI is the data-interchange format, like Protocol Buffer.

Or something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment