ItemBox is a binary format for serializing data in a language-independent way, similar to JSON, YAML, BSON, MessagePack, and other formats. ItemBox is specifically designed for encoding and decoding speed, space efficiency, and ease of implementation.
ItemBox can represent any of these basic data types:
- Null
A false value distinct from all other values.
- Boolean
Either false or true.
- Integer
A number, signed or unsigned, without a fractional part.
- Floating-point
An IEEE 754 double-precision floating point value.
- Unicode string
A sequence of zero or more Unicode characters.
- Bytestring
A sequence of zero or more octets.
- Array
A list of sequential values. The contained values can be of any type, including other containers.
- Mapping
An unordered set of name/value pairs. Both the keys and the values can be of any type.
- Tagged value
A combination of another value with a Unicode "tag" that describes the value's format. For example, a timestamp may be stored as an integer value with the tag
:timestamp
. The tag is not part of the value itself, but merely describes its interpretation.
A representation of a value is referred to as a "term." Terms are always at least one byte, but often are longer and can have variable length. A term of length zero is an error, and should be treated as such.
The first byte of a term is referred to as the "type code." In addition, there may be a "payload," which can be fixed-length data, an array of bytes, or a string. No delimiters separate or end terms.
When there are multiple valid representations for a value, an encoder may use any of them. Conversely, a decoder must be able to accept all of the possible representations of a given value.
Here are a few defintions used in the encoding, for simplicity:
- Term
A single value, consisting of at least a type code.
- Uint32
An unsigned 32-bit integer in network byte order.
- Uint16
An unsigned 16-bit integer in network byte order.
- Pair
Two terms in sequence.
- ByteArray
A sequence of bytes, without terminators or separators. Its length is given by a preceding integer, or the type code.
- TermList
A sequence of terms, one after the other. Its length is given by a preceding integer, or the type code.
- PairList
A sequence of Pairs, one after the other. Its length (in pairs - i.e. the number of terms is the length times two) is given by a preceding integer, or the type code.
Terms with type code 0 simply represent a null. (Note that a "zero byte" is not the same thing as no data at all - a zero-length term is an error!)
Payload: None.
The type code 1 represents a Boolean true, and 2 represents a Boolean false.
Payload: None.
The type code 3 indicates that an integer follows.
Payload: 32-bit integer, including a sign bit.
The type code 4 also indicates an integer, in this case 64 bits.
Payload: 64-bit integer, including a sign bit.
The type code 5 indicates that a floating-point number follows.
Payload: IEEE 754 floating-point number (64 bits long).
The type code 6 indicates a Unicode string encoded in UTF-8. Improperly formatted UTF-8 should be treatead as an error.
- Payload: Uint32 indicating the length of the string in bytes;
ByteArray that many bytes long.
The type code 7 indicates a bytestring. This has no restrictions on what octets may be included.
- Payload: Uint32 indicating the length of the bytestring;
ByteArray that many bytes long.
The type code 8 indicates an array of values.
- Payload: Uint32 indicating the length of the array;
TermList that many terms long.
The type code 9 indicates a mapping.
- Payload: Uint32 indicating the number of pairs in the mapping;
PairList that many pairs long.
As of now, no meaning is assigned to type codes in the range 10 through 14. If a parser encounters one, it should return an error to the user. Future versions of this specification may add types, however it is not likely.
The type code 15 indicates a tagged value.
- Payload: Uint16 for the length of the tag;
ByteArray consisting of the tag in UTF-8 encoding; Term for the actual value of the tag.
Type codes in the range 16 through 31 are used to encode small mappings. The length of the mapping is equal to the type code minus 16. This can represent hashes with up to fifteen keys.
Binary range: 0001xxxx
Bitwise test: (tc & 240) == 16
Payload: PairList (type code minus 16) pairs long.
Type codes in the range 32 through 63 are used to encode short arrays. The length of the array is equal to the type code minus 32. This can represent arrays with up to 31 items.
Binary range: 001xxxxx
Bitwise test: (tc & 224) == 32
Payload: TermList (type code minus 32) terms long.
Type codes in the range 64 through 127 are used to encode short UTF-8 strings. The length of the string is equal to the type code minus 64. Improperly formatted UTF-8 should be treated as an error. This can represent strings up to 63 bytes long.
Binary range: 01xxxxxx
Bitwise test: (tc & 192) == 64
Payload: ByteArray (type code minus 64) bytes long.
Type codes in the range 128 through 159 are used to encode short bytestrings. The length of the string is equal to the type code minus 128. This can represent bytestrings up to 31 bytes long.
Binary range: 100xxxxx
Bitwise test: (tc & 224) == 128
Payload: ByteArray (type code minus 128) bytes long.
Type codes in the range 160 through 191 are used to encode negative integers. The value is equal to zero minus (the type code minus 159) bytes long. This can represent integers from -1 to -32.
Binary range: 101xxxxx
Bitwise test: (tc & 224) == 160
Payload: None.
Type codes in the range 192 to 255 are used to encode low-value positive integers. The value is equal to the type code minus 192. This can represent integers from 0 to 63.
Binary range: 11xxxxxx
Bitwise test: (tc & 192) == 192
Payload: None.