zsfelfoldi/new_bloom_filter.md

## new_bloom_filter.md

      
    Raw
  

              new_bloom_filter.md
            
          
    Proposal for a light client friendly log event search data structure

Author: Zsolt Felföldi (zsfelfoldi@ethereum.org)
Note: this is an informal write-up and the exact details of the data structure presented here might need further clarification. If there seems to be interest in this idea and no one finds any major holes in it then I want to write a proper EIP for this.
Here I attempt to solve two problems with our current bloom filter based log event search mechanism. One of them concerns both full and light clients: the bloom filter size is fixed and does not adapt to the increased number of log events per block due to increased gas limit. This leads to too many false positive matches, rendering the whole bloom filter mechanism almost ineffective. The other issue concerns light clients: even if we increase the bloom filter size, using the filters with a light client is only possible if the client syncs up the header chain. Geth light client did that until now but after the merge this practice becomes entirely obsolete. With the sync committees available in the consensus layer it is possible to only fetch one light client update per 8192 slot long sync committee period and then prove the latest exec layer head from the beacon state of the latest beacon header signed by the sync committee. This is great because it is going to drastically reduce the bandwidth requirement of light clients, hopefully helping with the service bandwidth availability problem that LES has been constantly facing. On the other hand this development makes it impossible to rely on the already practically not really usable bloom filtering.
The proposed new data structure is based on the idea of the bloombits data structure that's already internally generated and used for event searching by Geth full nodes. This structure is also merkle tree hashed and included in the light client's current "trusted checkpoint", making it also accessible for light clients (also becoming obsolete with ETH2). A few years ago when the bloom filter size was still sufficient, this structure did speed up log event search drastically on full nodes (a fraction of a second for searcing in millions of blocks) and for a short time it even made it viable for light clients to search for events in such a long chain history. Unfortunately not long after this feature has been released in LES/2, the false positive rate increased, forcing the light clients to download too many block receipts and making log search inefficient again. After moving away from header syncing it also will not be possible for light clients to keep this structure updated. This writing describes an improved version of this data structure that has multiple advantages (including a long term solution to the filter size issue) and proposes to make this structure a part of consensus in order to make it accessible for light clients after the merge.
What the existing bloombits structure does is relatively simple: for each 32768 block long chain section it takes the 2048 bit long bloom filters from each header and does a 90 degree rotation on this 2k * 32k bit map, creating 32k long bit vectors (each bit corresponding to a block) for each of the 2k bloom filter bit positions. This is useful because for each event search we only need to read/fetch 3 of these bit vectors and binary AND them together, resulting in all the filter matches for a 32k block long chain section. These bit vectors are also compressed, making the processing of an adequately sparse filter even more efficient. Of course, currently the filter data is not sparse enough but with this data structure making it more sparse only has a logarithmical cost (making the average distance between "1" bits twice as big adds only one extra compressed bit per "1" bit in the original filter). So the point of this proposal is to create a sparse bloombits structure, put it in a merkle tree and add the root hash to each block header, making it possible for light clients to receive merkle proofs for log event filter matches for the entire chain history, proven from the most recent header. Making this structure a standard would also hugely benefit the log search performance of all exec layer full nodes.
Making the filter more sparse is possible either by making the bloom filter larger or by using multiple filters per block. My proposal is to do both: increase the bloom filter length to somewhere between 16k and 64k (I did the calculations with 64k size, see note at the end) and create a separate filter for each searchable value (contract addresses and event identifier hashes). Since false positive rates will be extremely low anyway, 2 bits per filter is probably enough. This means that blocks usually generate hundreds/thousands of 64k bit long filter columns, each of which contains exactly 2 "1" bits (of course these can be handled in a compressed form). 1M (2^20) such column form a section, so when the section is completed we get 64k horizontal bit vectors, 1M long each. These vectors are usually super sparse, having one "1" bit per 32k bits on average (32 per vector), which means that they can be compressed to cca 64 bytes average (in the worst case an individual vector can be 128k long though if it's not compressible at all but that's still manageable). These compressed vectors are hashed into a binary merkle three, first horizontally (max 16M entries per row should be enough for cca 6000 years) then vertically (64k rows). This structure makes it efficient to retrieve two relevant rows per searched value with merkle proofs. The storage burden on the full nodes is 4 bytes of compressed bit vector data per searchable value which is much less than the 32 bytes of the value itself that is stored in the receipts. The entire binary merkle tree (when all internal nodes are stored) adds another 4 bytes per value but that data should only be stored by full nodes wanting to serve light nodes.
Now we have a linear index space of searchable log values so we need some additional pointers in order to find/prove values belonging to each block/transaction. We maintain a global log value counter (LVC) which is the index of the next log value to be added. Its actual value is included in each block header (before processing transactions) and each individual block receipt (before processing that individual transaction). Switching from per-block indexing to individual value indexing presents some additional complexity but it is offset by some cool advantages. First of all, it automatically scales well with the number of values generated per block, is not affected by either the gas limit changes or the short term fluctuations. Also, since now the values are not merged together into a single filter per block, it is possible to search for sequences/patterns in a more sophisticated way. For example, let's say we have a token contract that adds its own ID and two other values for each transaction: <tokenContract> <sender> <recipient>. If we want to search for transactions with a specific recipient then it's possible to do that without getting filter matches to every block where <tokenContract> and <recipient> appears in any combination. We can search specifically for the <tokenContract> * <recipient> pattern by fetching the relevant rows for <tokenContract> and <recipient>, offsetting the  rows by two bits and then doing a binary AND operation. If the search is done by a light client then another advantage is that based on the matching LVC position the server can provide just the relevant block receipt of the individual transaction instead of all receipts for an entire block (the LVC stored in the receipt along with the individual values proves that the requested LVC belongs to the given transaction). This is an important advantage if there are many matches because receipts for an entire block can be pretty big.
One last thing to consider is the "data churn" generated by updating these merkle trees. Adding a new section or altering the last section for each of the 64k rows generates a lot of new merkle tree nodes because of the high number of the rows. If our horizontal section index space is 16M long then it's 24 * 64k hashes updated plus another 64k for hasing together the row roots. This is 50 megabytes of re-hashed data per update, most of which can be discarded after the next update, but still, it probably should not be done for each block (not very SSD friendly). Adding completed sections is not too expensive because it's not so frequent (even if we assume 1000 values per block which is 10x more than what our current bloom filter can handle, it's one section per 1000 blocks so it's 50k per block). The problem though is that we also want to make the last incomplete section searchable and we have already stated that updating the last section of each row in each block is probably too expensive. One way to reduce this is to maintain a separate tree for the last incomplete section which is only hashed vertically (so the long term horizontally hashed structure does not need to be touched until the section is completed). This means hashing max 64 * 64k bytes of incomplete section data plus another 32 * 64k bytes if internal tree nodes, resulting in 6 megabytes of data churn per block. This is probably still too much so we can also use "shorter" and "wider" sections here, so that instead of compressing 1M bit long horizontal bit vectors, we compress 16k * 64 bit maps, covering the last 1M log value range with 64 * 1k of these "short and wide" sections. Now we have 10 levels of vertical and 6 levels of horizontal hashing in our incomplete section structure, meaning that an update costs (64 + 7*32) * 1k = 288kbytes of data churn per block which should probably be acceptable.
About the data efficiency for the light client proofs: when searching for long block ranges many sections can be proven with a single multi-proof per row so proving a single section of a single row costs little over just the compressed filter data size (64 bytes). Let us assume 1000 log values per block (note that our current bloom filter already degrades in performance with 100 values per block, especially with the fluctuation of the number of values per block filter which is not an issue with the new structure). This means one section per 1000 blocks. Searching for a single value (two rows) in a million blocks requires fetching 2*1000 sections (cca 128k of data) so searching for a single value costs 1/8 bytes per block which is very good. Retrieving proofs from the last (incomplete) section is less efficient but that's fine because it's not a lot of data in total. If this section is almost complete (consisting of 16 "short and wide" sections horizontally) then we have to download 16 * 64 bytes of filter data plus 10 * 32 of merkle proofs per row which is 1.3k per row or 2.6k per searched value.
A note on the row count choice: it is basically a tradeoff between the data churn rate for full nodes processing the structure and the bandwidth efficiency for light clients doing a search (or the database read amount for full nodes doing the search, though that's less of an issue here). The total amount of long term data stored by full nodes is only affected logarithmically because of the sparse bit vector compression so it's not a major factor (with any sensible row count choice it's going to be close to 4 bytes per value). The data churn rate though is more or less proportional to the row count while the light client bandwidth is inversely proportional. In the above calulations the data churn is still dominated by the per-block updates of the last section so if it needs to be further reduced then I believe it is better to first further tweak the "short and wide" ratio in the last secion instead of reducing the row count.