MessagePack is an object serialization specification like JSON.
MessagePack offers two concepts: type system and formats. In MessagePack, serialization works by converting application types into MessagePack types, and converting the types into MessagePack formats. Deserialization works by converting MessagePack formats into MessagePack types, and converting the types into application type system.
Serlialization:
Application types (objects in memory)
--> MessagePack types
--> MessagePack formats (byte array)
Deserialization:
MessagePack formats (byte array)
--> MessagePack types
--> Application types (object in memory)
This document describes the MessagePack type system, MesagePack formats and the conversion of them.
- Types
- Integer represents an integer
- Nil represents nil
- Boolean represents true or false
- Float represents a floating point number
- Raw
- String extending Raw type represents a UTF-8 string
- Binary extending Raw type and implementing Extension interface represents a byte array
- Array represents a sequence of objects
- Map represents key-value pairs of objects
- Extended implements Extension interface: represents a tuple of type information and a byte array where type informatin is an integer whose meaning is defined by applications
- Interfaces
- Extension represents a tuple of an integer and a byte array where the integer represents type information and the byte array represents data. The format of the data is defined by concrete types
- a value of Integer objects is from
-(2^63)
upto(2^64)-1
- a value of Float objects is IEEE 754 single or double precision floating-point number
- maximum length of Binary objects is
(2^32)-1
- maximum byte size of String objects is
(2^32)-1
- String objects may contain invalid byte sequence as a UTF-8 string and the behavior of deserializers depends on implementation when it received invalid byte sequence
- Deserializers should provide a mechanism to get the original byte array so that applications can decide how to handle the object
- maximum number of elements of Array objects is
(2^32)-1
- maximum number of key-value associations Map objects is
(2^32)-1
MessagePack allows applications to define types. These type definition are built on top of the MessagePack type system and MessagePack itself uses the Extended type to represent them.
Applications assign 0
to 127
to store the type information of application-specific types.
On the other hand, MessagePack expects future extension to add types that will be described in other documents.
MessagePack uses -1
to -128
to store the type information of predefined types.
[0, 127]: application-specific types
[-1, -128]: predefined types
Binary type is one of predefined extension types. Its type number is -1
.
format name | first byte (in binary) | first byte (in hex) |
---|---|---|
positive fixint | 0xxxxxxx | 0x00 - 0x7f |
fixmap | 1000xxxx | 0x80 - 0x8f |
fixarray | 1001xxxx | 0x90 - 0x9f |
fixraw | 101xxxxx | 0xa0 - 0xbf |
nil | 11000000 | 0xc0 |
(never used) | 11000001 | 0xc1 |
false | 11000010 | 0xc2 |
true | 11000011 | 0xc3 |
fixext 0 | 11000100 | 0xc4 |
fixext 1 | 11000101 | 0xc5 |
fixext 2 | 11000110 | 0xc6 |
fixext 3 | 11000111 | 0xc7 |
fixext 4 | 11001000 | 0xc8 |
fixext 5 | 11001001 | 0xc9 |
float 32 | 11001010 | 0xca |
float 64 | 11001011 | 0xcb |
uint 8 | 11001100 | 0xcc |
uint 16 | 11001101 | 0xcd |
uint 32 | 11001110 | 0xce |
uint 64 | 11001111 | 0xcf |
int 8 | 11010000 | 0xd0 |
int 16 | 11010001 | 0xd1 |
int 32 | 11010010 | 0xd2 |
int 64 | 11010011 | 0xd3 |
ext 8 type -1 | 11010100 | 0xd4 |
ext 16 type -1 | 11010101 | 0xd5 |
ext 32 type -1 | 11010110 | 0xd6 |
ext 8 | 11010111 | 0xd7 |
ext 16 | 11011000 | 0xd8 |
ext 32 | 11011001 | 0xd9 |
raw 16 | 11011010 | 0xda |
raw 32 | 11011011 | 0xdb |
array 16 | 11011100 | 0xdc |
array 32 | 11011101 | 0xdb |
map 16 | 11011110 | 0xde |
map 32 | 11011111 | 0xdf |
negative fixint | 111xxxxx | 0xe0 - 0xff |
one byte:
+--------+
| |
+--------+
a variable number of bytes:
+========+
| |
+========+
variable number of objects stored in MessagePack format:
+~~~~~~~~~~~~~~~~+
| |
+~~~~~~~~~~~~~~~~+
X, Y, Z, G, and H are the symbols that will be replaced by an actual bit
Nil format stores nil in 1 byte:
nil:
+--------+
| 0xc0 |
+--------+
Bool format family stores false or true in 1 byte:
false:
+--------+
| 0xc2 |
+--------+
true:
+--------+
| 0xc3 |
+--------+
Int format family stores an integer in 1, 2, 3, 5, or 9 bytes.
positive fixnum stores 7-bit positive integer
+--------+
|0XXXXXXX|
+--------+
where
* 0XXXXXXX is 8-bit integer
negative fixnum stores 5-bit negative integer
+--------+
|111YYYYY|
+--------+
where
* 111YYYYY is 8-bit signed integer
uint 8 stores a 8-bit unsigned integer
+--------+--------+
| 0xcc |ZZZZZZZZ|
+--------+--------+
uint 16 stores a 16-bit big-endian unsigned integer
+--------+--------+--------+
| 0xcd |ZZZZZZZZ|ZZZZZZZZ|
+--------+--------+--------+
uint 32 stores a 32-bit big-endian unsigned integer
+--------+--------+--------+--------+--------+
| 0xce |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ
+--------+--------+--------+--------+--------+
uint 64 stores a 64-bit big-endian unsigned integer
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| 0xcf |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
int 8 stores a 8-bit signed integer
+--------+--------+
| 0xd0 |ZZZZZZZZ|
+--------+--------+
int 16 stores a 16-bit big-endian signed integer
+--------+--------+--------+
| 0xd1 |ZZZZZZZZ|ZZZZZZZZ|
+--------+--------+--------+
int 32 stores a 32-bit big-endian signed integer
+--------+--------+--------+--------+--------+
| 0xd2 |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|
+--------+--------+--------+--------+--------+
int 64 stores a 64-bit big-endian signed integer
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| 0xd3 |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
Float format family stores an floating point number in 5 bytes or 9 bytes:
float 32 stores a floating point number in IEEE 754 single precision floating point number format:
+--------+--------+--------+--------+--------+
| 0xca |XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX
+--------+--------+--------+--------+--------+
float 64 stores a floating point number in IEEE 754 double precision floating point number format:
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| 0xca |YYYYYYYY|YYYYYYYY|YYYYYYYY|YYYYYYYY|YYYYYYYY|YYYYYYYY|YYYYYYYY|YYYYYYYY|
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
where
* XXXXXXXX_XXXXXXXX_XXXXXXXX_XXXXXXXX is a big-endian IEEE 754 single precision
floating point number
* YYYYYYYY_YYYYYYYY_YYYYYYYY_YYYYYYYY_YYYYYYYY_YYYYYYYY_YYYYYYYY_YYYYYYYY is a big-endian
IEEE 754 double precision floating point number
Raw format family stores an byte array in 1, 3, or 5 bytes of extra bytes in addition to the size of the byte array.
fixraw stores a byte array whose length is upto 31 bytes:
+--------+========+
|101XXXXX| data |
+--------+========+
raw 16 stores a byte array whose length is upto (2^16)-1 bytes:
+--------+--------+--------+========+
| 0xda |YYYYYYYY|YYYYYYYY| data |
+--------+--------+--------+========+
raw 32 stores a byte array whose length is upto (2^32)-1 bytes:
+--------+--------+--------+--------+--------+--------+--------+--------+--------+========+
| 0xdb |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ| data |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+========+
where:
* XXXXX is a 5-bit unsigned integer which represents N
* YYYYYYYY_YYYYYYYY is a 16-bit big-endian unsigned integer which represents N
* ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ is a 32-bit big-endian unsigned integer which represents N
* N is the length of data
Array format family stores a sequence of key-value pairs in 1, 3, or 5 bytes of extra bytes in addition to the elements.
fixarray stores an array whose length is upto 15 elements:
+--------+~~~~~~~~~~~~~~~~+
|1001XXXX| N objects |
+--------+~~~~~~~~~~~~~~~~+
array 16 stores an array whose length is upto (2^16)-1 elements:
+--------+--------+--------+~~~~~~~~~~~~~~~~+
| 0xdc |YYYYYYYY|YYYYYYYY| N objects |
+--------+--------+--------+~~~~~~~~~~~~~~~~+
array 32 stores an array whose length is upto (2^32)-1 elements:
+--------+--------+--------+--------+--------+~~~~~~~~~~~~~~~~+
| 0xdd |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ| N objects |
+--------+--------+--------+--------+--------+~~~~~~~~~~~~~~~~+
where:
* XXXX is a 4-bit unsigned integer which represents N
* YYYYYYYY_YYYYYYYY is a 16-bit big-endian unsigned integer which represents N
* ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ is a 32-bit big-endian unsigned integer which represents N
N is the size of a array
Map format family stores a sequence of key-value pairs in 1, 3, or 5 bytes of extra bytes in addition to the key-value pairs.
fixmap stores a map whose length is upto 15 elements
+--------+~~~~~~~~~~~~~~~~+
|1000XXXX| N*2 objects |
+--------+~~~~~~~~~~~~~~~~+
map 16 stores a map whose length is upto (2^16)-1 elements
+--------+--------+--------+~~~~~~~~~~~~~~~~+
| 0xde |YYYYYYYY|YYYYYYYY| N*2 objects |
+--------+--------+--------+~~~~~~~~~~~~~~~~+
map 32 stores a map whose length is upto (2^32)-1 elements
+--------+--------+--------+--------+--------+~~~~~~~~~~~~~~~~+
| 0xdf |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ| N*2 objects |
+--------+--------+--------+--------+--------+~~~~~~~~~~~~~~~~+
where:
* XXXX is a 4-bit unsigned integer which represents N
* YYYYYYYY_YYYYYYYY is a 16-bit big-endian unsigned integer which represents N
* ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ is a 32-bit big-endian unsigned integer which represents N
* N is the size of a map
* odd elements in objects are keys of a map
* the next element of a key is its associated value
Ext format family stores a tuple of an integer and a byte array.
fixext 0 stores an integer and a byte array whose length is 0 bytes
+--------+--------+
| 0xc4 | type |
+--------+--------+
fixext 1 stores an integer and a byte array whose length is 1 byte
+--------+--------+--------+
| 0xc5 | type | data |
+--------+--------+--------+
fixext 2 stores an integer and a byte array whose length is 2 bytes
+--------+--------+--------+--------+
| 0xc6 | type | data |
+--------+--------+--------+--------+
fixext 3 stores an integer and a byte array whose length is 3 bytes
+--------+--------+--------+--------+--------+
| 0xc7 | type | data |
+--------+--------+--------+--------+--------+
fixext 4 stores an integer and a byte array whose length is 4 bytes
+--------+--------+--------+--------+--------+--------+
| 0xc8 | type | data |
+--------+--------+--------+--------+--------+--------+
fixext 5 stores an integer and a byte array whose length is 5 bytes
+--------+--------+--------+--------+--------+--------+--------+
| 0xc9 | type | data |
+--------+--------+--------+--------+--------+--------+--------+
ext 8 type -1 a byte array whose length is upto (2^8)-1 bytes:
+--------+--------+========+
| 0xd4 |XXXXXXXX| data |
+--------+--------+========+
ext 16 type -1 a byte array whose length is upto (2^16)-1 bytes:
+--------+--------+--------+========+
| 0xd5 |YYYYYYYY|YYYYYYYY| data |
+--------+--------+--------+========+
ext 32 type -1 a byte array whose length is upto (2^32)-1 bytes:
+--------+--------+--------+--------+--------+========+
| 0xd6 |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ| data |
+--------+--------+--------+--------+--------+========+
ext 8 stores an integer and a byte array whose length is upto (2^8)-1 bytes:
+--------+--------+--------+========+
| 0xd7 |XXXXXXXX| type | data |
+--------+--------+--------+========+
ext 16 stores an integer and a byte array whose length is upto (2^16)-1 bytes:
+--------+--------+--------+--------+========+
| 0xd8 |YYYYYYYY|YYYYYYYY| type | data |
+--------+--------+--------+--------+========+
ext 32 stores an integer and a byte array whose length is upto (2^32)-1 bytes:
+--------+--------+--------+--------+--------+--------+========+
| 0xd9 |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ| type | data |
+--------+--------+--------+--------+--------+--------+========+
where
* XXXXXXXX is a 8-bit unsigned integer which represents N
* YYYYYYYY_YYYYYYYY is a 16-bit big-endian unsigned integer which represents N
* ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ is a big-endian 32-bit unsigned integer which represents N
* N is a length of data
* type is a 8-bit signed integer; 0xff (-1) is reserved to store the type in 2-byte in the future
and causes format error by default
MessagePack serializers convert MessagePack types into formats as following:
source types | output format |
---|---|
Integer | int format family (positive fixint, negative fixint, int 8/16/32/64, or uint 8/16/32/64) |
Nil | nil |
Boolean | bool format family (false or true) |
Float | float format family (float 32 or float 64) |
String | raw format family (fixraw, raw 16, or raw 32) |
Binary | ext format family where type is -1 (ext 8/16/32 type -1) |
Array | array format family (fixarray, array 16, or array 32) |
Map | map format family (fixmap, map 16, or map 32) |
Extended | ext format family (fixext, or ext 8/16/32) |
If an object can be represented in multiple possible output formats, serializers SHOULD use the format which represents the data in the smallest number of bytes.
MessagePack deserializers convert convert MessagePack formats into types as following:
source formats | output type |
---|---|
positive fixint, negative fixint, int 8/16/32/64, or uint 8/16/32/64 | Integer |
nil | Nil |
false and true | Boolean |
float 32 and float 64 | Float |
fixraw, raw 16, and raw 32 | String |
fixext, and ext 8/16/32 type -1 | Binary |
fixarray, array 16, and array 32 | Array |
fixmap, map 16, and map 32 | Map |
fixext, and ext 8/16/32 | Extended |
Applications may restrict the semantics of MessagePack sharing the same syntax to adapt MessagePack for certain use cases. MessagePack defines a set of restrictions as a profile.
This is the default profile which restricts nothing.
Primitive profile removes String type, Binary type, Extended type, and Extension interface. This is useful if applications use schema.
- Integer: represents an integer
- Nil: represents nil
- Boolean: represents true or false
- Float: represents a floating point number
- Raw: represents a UTF-8 string or byte array
- Array: represents a sequence of objects
- Map: represents key-value pairs of objects
source types | output format |
---|---|
Integer | int format family (positive fixint, negative fixint, int 8/16/32/64, or uint 8/16/32/64) |
Nil | nil |
Boolean | bool format family (false or true) |
Float | float format family (float 32 or float 64) |
Raw | raw format family (fixraw, raw 16, or raw 32) |
Array | array format family (fixarray, array 16, or array 32) |
Map | map format family (fixmap, map 16, or map 32) |
source formats | output type |
---|---|
positive fixint, negative fixint, int 8/16/32/64, or uint 8/16/32/64 | Integer |
nil | Nil |
false and true | Boolean |
float 32 and float 64 | Float |
fixraw, raw 16, and raw 32 | Raw |
fixext, ext 8/16/32 type -1 | Raw |
fixarray, array 16, and array 32 | Array |
fixmap, map 16, and map 32 | Map |
This is useful where identity of two objects is important such as:
- identifier of a data on a database
- authentication or digital signature
TODO
This document describes the problems that MessagePack focues on.
TODO
TODO
TODO
TODO
The purpose of this document is providing guidelines to implement MessagePack specification.
TODO
TODO
TODO
TODO
TODO
TODO
TODO
TODO
Implementations of serializers and deserializers should offer applications an option "binary_extension" so that applications can choose ambiguity-tolerant behavior.
- Serializers:
- if binary_extension=true, serializers store byte arrays using the Binary type (Extension type where type number=-1)
- if binary_extension=false, serializers store byte arrays using the String type
- for languages which don't have types to distinguish strings and byte arrays, msgpack implementations provide users with a way to set markers on byte arrays (such as a wrapper class)
- in those weak-string code, serializers may use the String type to store byte arays if users don't set the markers
- Deserializers:
- If binary_extension=true, deserializers restore String type into a string object. (ambiguity-strict behavior)
- If binary_extension=true, deserializers may validate UTF-8 strings on restoring String type. Although it depends on implementations how the deserializers handle strings including invalid byte sequence as a UTF-8 string, Here are some examples:
- it returns an instance of a special class which has a field to hold the original byte sequence
- it calls a registered callback function and returns the value returned by the function
- if binary_extension=false, deserializers don't validate UTF-8 on restoring String type at all. If the language can't include invalid byte sequence within a string object, deserializers don't restore String type into the string type (ambiguity-tolerant behavior)
- If binary_extension=false, deserializers may restore Binary type and String type into the same type
TODO
TODO
TODO