holiman/trie_opcodes.md Secret

## trie_opcodes.md

      
    Raw
  

              trie_opcodes.md
            
          
    Trie-dependent opcodes

Problem description

As the ethereum trie becomes more and more saturated, the number of disk lookups that a node is required to do in order to access a piece of state increases too. This means that checking the EXTCODEHASH of an account at block 5 was inherently a cheaper operation that it is at, say 8.5M.
So far, the solution to these problems has been in raising the gas cost. From a node perspective, a node can (and does) use various caching mechanisms to cope with the problem, but there's an inherent problem with caches: when the yield a 'hit', they're great, but when they 'miss', they're useless.
This is attackable. By forcing a node to lookup non-existant keys, an attacker can maximize the number of disk lookups.
Sidenote: even if the 'non-existence' is cached, it's trivial to use a new non-existent key the next time, and never hit the same non-existent key again. Thus: caching 'non-existence' might be dangerous, since it will evict 'good' entries.
And there we have a problem: should we price these ops according to:

The happy-path, where items are cached?

No, that would obviously be extremely dangerous


The 'normal' usage, based on benchmarks of actual usage?

This is basically what we do now,


The 'paranoid' case: price everything as if caching did not exist?

This would severely harm basically every contract


From an engineering point of view, we're left with few options:

Implement bloom filters for existence. This is difficult, not least because of the eternal problems of reorgs, and the fact that it's difficult to undo bloom filter modifications.
Implement flattened account databases. This is also difficult, both because of reorgs and also because it needs to be an additional data structure aside from the trie -- we need the trie for consensus. So it's an extra data structure of around 15G that needs to be kept in check. Nevertheless, this is something we're actively working on within the geth team, specifically @karalabe.

Towards a solution

So, is there some way that we can be nice to those who play nice, but penalize those who do not? Yes, there is.
For a trie-dependent opcode, such as EXTCODEHASH, let

base be the base cost (currently 700)
penalty be the penalty cost (e.g. 2000).

Now, if extcodehash(addr) is performed, and addr does not exist, then penalty gas is deducted from the gas.
The addition of a penalty, depending on the result of the execution of an opcode, is something completely new. Currently, all ops are either fully static in cost, or their cost depends on the stack arguments, but no current opcode has a post-execution cost. While that would be a totally new thing, I don't believe it would be very difficult to implement.
The upside of this scheme is that we could continue to price these operations based on the 'normal' usage, but gain protection from inexistence-attacks that try to maximize disk lookups.