richardkiss/block-compression.md

## block-compression.md

      
    Raw
  

              block-compression.md
            
          
    Block Generators

Block generators in Chia are clvm programs that, when run with particular arguments, produce a list of coin spends. This provides opportunity to do compression by recognizing patterns and encoding.
CLVM Serialization

The standard serialization mechanism of clvm is defined here.
If you review this specification, you'll see that deserialization begins by inspecting the first byte. If the byte is 0xff, the object is a pair. If the byte is 0x00 to 0xfb, the object is an atom whose length is partially or fully encoded by this first byte.
However, the initial bytes 0xfc through 0xfe do not correspond to anything (and in fact, even an encoding that begins with 0xfb is an atom too long to be practical). This means these bytes can be repurposed as extensions to serialization for compression purposes.
During deserialization, objects are read one at a time and place onto a stack. A second stack of operators is used to determine what happens next: either we read another object onto the stack, or we do a pop-cons-push operation, which pops the top two objects off the stack, puts them together into a cons, and pushes the cons onto the stack. Any clvm tree can be create with just these two operations.
Chia generators expand to structures that contain reveals of puzzles, which can often be quite large. Puzzles are generally constructed by taking one of just a few MOD generic puzzle cores and currying arguments (such as public keys) onto the generic puzzle, making it specific to a user. When expanding, we often see the same large MOD appear over and over again, such as the core for NFTs or CATs, each of which has over 1K of code. This becomes a clear target for compression.
Instead of blindingly serializing repeated clvm trees, at each step we look to see if the next clvm sub-tree to be serialized has been encountered already. If it has, that means it's already in the stack of partial clvm objects. This also means, if you consider the stack to be a clvm object (and there's an obvious way to do so where a push is a cons c and a pop is a f/r), you can find a clvm path to that object.
We define 0xfe to mean "read the next clvm subtree, which must be an atom, and treat it like a path into the clvm object stack. Push that exact object onto the clvm object stack".
It's clear that kind of decompression can be very inexpensive. We can also easily implement it in chialisp, and have done so here.
With this technique, references to MOD cores, which can be over 1k, can be compressed into just a few bytes, so larger blocks compress even better. Eventually generator size costs becomes overwhelmed by puzzle execution costs and CREATE_COIN costs.