Skip to content

Instantly share code, notes, and snippets.

@holiman
Created October 4, 2019 17:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save holiman/aafd9a46f6849deb380004178d44c434 to your computer and use it in GitHub Desktop.
Save holiman/aafd9a46f6849deb380004178d44c434 to your computer and use it in GitHub Desktop.

Trie-dependent opcodes

Problem description

As the ethereum trie becomes more and more saturated, the number of disk lookups that a node is required to do in order to access a piece of state increases too. This means that checking the EXTCODEHASH of an account at block 5 was inherently a cheaper operation that it is at, say 8.5M.

So far, the solution to these problems has been in raising the gas cost. From a node perspective, a node can (and does) use various caching mechanisms to cope with the problem, but there's an inherent problem with caches: when the yield a 'hit', they're great, but when they 'miss', they're useless.

This is attackable. By forcing a node to lookup non-existant keys, an attacker can maximize the number of disk lookups.

Sidenote: even if the 'non-existence' is cached, it's trivial to use a new non-existent key the next time, and never hit the same non-existent key again. Thus: caching 'non-existence' might be dangerous, since it will evict 'good' entries.

And there we have a problem: should we price these ops according to:

  • The happy-path, where items are cached?
    • No, that would obviously be extremely dangerous
  • The 'normal' usage, based on benchmarks of actual usage?
    • This is basically what we do now,
  • The 'paranoid' case: price everything as if caching did not exist?
    • This would severely harm basically every contract

From an engineering point of view, we're left with few options:

  • Implement bloom filters for existence. This is difficult, not least because of the eternal problems of reorgs, and the fact that it's difficult to undo bloom filter modifications.
  • Implement flattened account databases. This is also difficult, both because of reorgs and also because it needs to be an additional data structure aside from the trie -- we need the trie for consensus. So it's an extra data structure of around 15G that needs to be kept in check. Nevertheless, this is something we're actively working on within the geth team, specifically @karalabe.

Towards a solution

So, is there some way that we can be nice to those who play nice, but penalize those who do not? Yes, there is.

For a trie-dependent opcode, such as EXTCODEHASH, let

  • base be the base cost (currently 700)
  • penalty be the penalty cost (e.g. 2000).

Now, if extcodehash(addr) is performed, and addr does not exist, then penalty gas is deducted from the gas.

The addition of a penalty, depending on the result of the execution of an opcode, is something completely new. Currently, all ops are either fully static in cost, or their cost depends on the stack arguments, but no current opcode has a post-execution cost. While that would be a totally new thing, I don't believe it would be very difficult to implement.

The upside of this scheme is that we could continue to price these operations based on the 'normal' usage, but gain protection from inexistence-attacks that try to maximize disk lookups.

@MatthiasEgli
Copy link

I believe that the hard but ultimately only path which ensures fast lookups with growing state is the flattened account database with it's O(1) lookup. Having this makes it unnecessary to introduce the penalty, and given the speed in which EIPs go into network upgrades we will likely have stable clients with this long before we have the EIP included.

Long-term we have to think about this for storage operations though which are fundamentally dependent on the state size. Currently I have the feeling that we are for those operations actually using the 'paranoid' case, making sure that even in non-optimized clients it is really expensive and not a valid path for any attack. I might be wrong though, given that it might be possible to design more or less expensive storage operations combined with triggering cache invalidations.

In the end, the clients must be DoS proof and if the only way to achieve that is increasing gas cost, it has to be done.

@satoshinakamoto501
Copy link

Code Is The Road on the Boardvitalik means We Talk or Walkie-talkie.
Let's Code the Universal Virtuality Of Everything.
EVM=MVA
Ethereum Virtual Machine= Make Virtual Era.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment