md-5/beat.md

## beat.md

      
    Raw
  

              beat.md
            
          
    Variable-length number encoding

Rather than limit the maximum file size supported to 16MB (24-bit) or 4GB (32-bit), beat patches use a variable-length encoding to support any number of bits, and thus, any possible file size.
The basic idea is that we encode the lowest seven bits of the number, and then the eighth bit of each byte is a flag to say whether the full number has been represented or not. If set, this is the last byte of the number. If not, then we shift out the low seven bits and repeat until the number is fully encoded.
One last optimization is to subtract one after each encode. Without this, one could encode '1' with 0x81 or 0x01 0x80, producing an ambiguity.
Decoding is the inverse of the above process.
Below are C++ implementations of this idea. Note that we are using uint64 for the data type: this will limit beat patches created with these algorithms to 64-bit file sizes. If 128-bit integers were available, they could be used instead. Of course, it's silly to even imagine patching a file larger than 16 exabytes, but beat does allow it.
Encoding

void encode(uint64 data) {
  while(true) {
    uint8 x = data & 0x7f;
    data >>= 7;
    if(data == 0) {
      write(0x80 | x);
      break;
    }
    write(x);
    data--;
  }
}
Decoding

uint64 decode() {
  uint64 data = 0, shift = 1;
  while(true) {
    uint8 x = read();
    data += (x & 0x7f) * shift;
    if(x & 0x80) break;
    shift <<= 7;
    data += shift;
  }
  return data;
}