A data format designed to represent complex OCaml values. The format is designed for 0-copy parsing (or, not much copy anyway) and reasonable compactness.
The encoding relies on a first byte to disambiguate between values. The first byte is a bitfield [kind:4 low:4] where:
- [kind] is the kind of value (int, float, etc.)
- [low] is a small integer value whose meaning depends on the kind. It can be a small integer, or a length, or a special value, or a constructor index. In case the value needs an integer argument [n] (length, actual integer, element count, etc.), this integer argument is encoded in [low], but if [n>=15] then [low = 15] and [n-15] is encoded as LEB128 immediately after the first byte.
kinds:
- 0: special. n=0: false n=1: true n=2: nil Other values of [n] are reserved.
- 1: positive int, value is [n].
- 2: negative int, value is [-n-1]
- 3: float. if [n=0] then it's a float32, 4 bytes follow (little-endian); if [n=1] then it's a float64, 8 bytes follow (little-endian). Other values of [n] are reserved.
- 4: string, length in bytes is [n]. [n] bytes follow.
- 5: binary blob, length in bytes is [n]. [n] bytes follow.
- 6: array, number of elements is [n]. Immediate elements follow.
- 7: dict, number of key/value pairs is [n]. [2*n] immediate elements follow.
- 8: tag, the tag number is [n]. A single immediate element follows.
- 9: reserved
- 10: cstor0, constructor index is [n]
- 11: cstor1, constructor index is [n], a single immediate argument follows
- 12: cstorN, constructor index is [n], length as LEB128 follows (probably just one byte), then immediate arguments follow
- 13, 14: reserved (possible extension: 14: pointer with metadata? [n] is relative offset, [n2:LEB128] is pointer to metadata)
- 15: pointer, relative offset to the pointed value is [n]. If we're at offset [i], then the pointer points to [i-n-1].
The toplevel value is written last, and to find it, a valid Twine blob must end with a byte [n:u8] that indicates where this last value starts. If the last byte [n] is at address [off], then the actual last value starts at [off-n-1]. If the last value is too big, it's turned into a pointer and that's what that last byte targets.
An immediate element is, basically, a non-recursive case that is easy to {b skip} over to get to the next element. It includes booleans, integers, floats, strings, pointers, but not arrays or dictionaries (which must be encoded somewhere else and referred to by a pointer).