Skip to content

Instantly share code, notes, and snippets.

@md-5
Created December 12, 2012 02:51
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save md-5/4264443 to your computer and use it in GitHub Desktop.
Save md-5/4264443 to your computer and use it in GitHub Desktop.
Variable length number encoding. Taken from http://byuu.org/programming/beat/.

Variable-length number encoding

Rather than limit the maximum file size supported to 16MB (24-bit) or 4GB (32-bit), beat patches use a variable-length encoding to support any number of bits, and thus, any possible file size.

The basic idea is that we encode the lowest seven bits of the number, and then the eighth bit of each byte is a flag to say whether the full number has been represented or not. If set, this is the last byte of the number. If not, then we shift out the low seven bits and repeat until the number is fully encoded.

One last optimization is to subtract one after each encode. Without this, one could encode '1' with 0x81 or 0x01 0x80, producing an ambiguity.

Decoding is the inverse of the above process.

Below are C++ implementations of this idea. Note that we are using uint64 for the data type: this will limit beat patches created with these algorithms to 64-bit file sizes. If 128-bit integers were available, they could be used instead. Of course, it's silly to even imagine patching a file larger than 16 exabytes, but beat does allow it.

Encoding

void encode(uint64 data) {
  while(true) {
    uint8 x = data & 0x7f;
    data >>= 7;
    if(data == 0) {
      write(0x80 | x);
      break;
    }
    write(x);
    data--;
  }
}

Decoding

uint64 decode() {
  uint64 data = 0, shift = 1;
  while(true) {
    uint8 x = read();
    data += (x & 0x7f) * shift;
    if(x & 0x80) break;
    shift <<= 7;
    data += shift;
  }
  return data;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment