As the ethereum trie becomes more and more saturated, the number of disk lookups that a node is required to do in order to access a piece of state increases too. This means that checking the EXTCODEHASH
of an account at block 5
was inherently a cheaper operation that it is at, say 8.5M
.
So far, the solution to these problems has been in raising the gas cost. From a node perspective, a node can (and does) use various caching mechanisms to cope with the problem, but there's an inherent problem with caches: when the yield a 'hit', they're great, but when they 'miss', they're useless.
This is attackable. By forcing a node to lookup non-existant keys, an attacker can maximize the number of disk lookups.
Sidenote: even if the 'non-existence' is cached, it's trivial to use a new non-existent key the next time, and never hit the same non-existent key again. Thus: caching 'non-existence' might be dangerous, since it will evict 'good' entries.
And there we have a problem: should we price these ops according to:
- The happy-path, where items are cached?
- No, that would obviously be extremely dangerous
- The 'normal' usage, based on benchmarks of actual usage?
- This is basically what we do now,
- The 'paranoid' case: price everything as if caching did not exist?
- This would severely harm basically every contract
From an engineering point of view, we're left with few options:
- Implement bloom filters for existence. This is difficult, not least because of the eternal problems of reorgs, and the fact that it's difficult to undo bloom filter modifications.
- Implement flattened account databases. This is also difficult, both because of reorgs and also because it needs to be an additional data structure aside from the
trie
-- we need thetrie
for consensus. So it's an extra data structure of around15G
that needs to be kept in check. Nevertheless, this is something we're actively working on within the geth team, specifically @karalabe.
So, is there some way that we can be nice to those who play nice, but penalize those who do not? Yes, there is.
For a trie-dependent opcode, such as EXTCODEHASH
, let
base
be the base cost (currently700
)penalty
be the penalty cost (e.g.2000
).
Now, if extcodehash(addr)
is performed, and addr
does not exist, then penalty
gas is deducted from the gas
.
The addition of a penalty
, depending on the result of the execution of an opcode, is something completely new. Currently, all ops are either fully static in cost, or their cost depends on the stack arguments, but no current opcode has a post-execution cost. While that would be a totally new thing, I don't believe it would be very difficult to implement.
The upside of this scheme is that we could continue to price these operations based on the 'normal' usage, but gain protection from inexistence-attacks that try to maximize disk lookups.
I believe that the hard but ultimately only path which ensures fast lookups with growing state is the flattened account database with it's
O(1)
lookup. Having this makes it unnecessary to introduce the penalty, and given the speed in which EIPs go into network upgrades we will likely have stable clients with this long before we have the EIP included.Long-term we have to think about this for storage operations though which are fundamentally dependent on the state size. Currently I have the feeling that we are for those operations actually using the 'paranoid' case, making sure that even in non-optimized clients it is really expensive and not a valid path for any attack. I might be wrong though, given that it might be possible to design more or less expensive storage operations combined with triggering cache invalidations.
In the end, the clients must be DoS proof and if the only way to achieve that is increasing gas cost, it has to be done.