Consider the following protocol spec:
Request format:
PACKET := HEADER | BODY
HEADER := LENGTH | TYPE
LENGTH := 8-BYTE-INTEGER
TYPE := VARIABLE_INTEGER
BODY := JSON-COMMAND | SIGNATURE
LENGTH is the total length the PACKET.
A VARIABLE_INTEGER is encoded as 7 bits of value, with the upper bit 1
indicating that another byte follows, upper bit 0 being terminal.
TYPE is currently always zero.
SIGNATURE is the digital signature of the PACKET.
Reply format:
The same, but TYPE is 1.
-
What is missing in this specification?
-
What would you change about this specification?
-
How would you test an implementation?
- 8 bytes for
LENGTH
seems excessive, will we really see messages this long -- 1.8*10^19 bytes?- Would probably be better to use a
VARIABLE_INTEGER
for this
- Would probably be better to use a
- Using a
VARIABLE_INTEGER
forTYPE
also seems excessive, how many types of message do we expect to use? - We are missing the endianness of the
VARIABLE_INTEGER
's for bothLENGTH
andTYPE
- We are missing whether
8-BYTE-INTEGER
andVARIABLE_INTEGER
's are signed or unsigned, although one would presume unsigned as negative length messages seem unlikely - This implementation of
VARIABLE_INTEGER
can represent message lengths up to 128 bytes in a single byte. BitcoinsVARINT
can represent messages of up to length 255 in a single byte. However this encoding becomes more efficient for longer message lengths. - There are no details on whether this is a synchronous or an asynchronous protocol
- There is no error specification:
- what action to take in the case of receiving a message you don't understand
- what action to take in the case of receiving a message with insufficient length vs. expected
- There is no transport layer specification. Possibly the protocol is transport agnostic
- Seems wasteful to include a signature with every message, couldn't this be handled at the transport layer more efficiently?
- Messages currently include a signature but it's not clear how keys are initially exchanged so that the signature can be verified
- There is no specification on the signature algorithm to be used
- As specified, it appears that
SIGNATURE
should be including itself in the data to be signed (PACKET
), which will be tricky to implement! - There is no apparent message versioning which would allow for backwards-compatible upgrades. Whilst the
TYPE
could be incremented, old clients/servers would not recognise new message types and it's unspecified what action they should take in the case of receiving a message of unknownTYPE
(presumably terminate the connection) - Headers are variable length, not fixed, this can be useful to squeeze out some more efficiency and not waste unnecessary bits
- Missing maximum permitted size of
VARIABLE_INTEGER
. We don't want to blindly read 2**64 bytes from the socket for example - Using JSON as part of a wire protocol is not ideal because:
- It wastes a lot of bandwidth on syntax and field names
- It can't support many types of data e.g. bytes, which require conversion to Base64 and result in 33%
- Would prefer using CBOR or else specified message types for
BODY
-
We are missing the endianness of the
VARIABLE_INTEGER
's for bothLENGTH
andTYPE
-
We are missing whether
8-BYTE-INTEGER
andVARIABLE_INTEGER
's are signed or unsigned, although one would presume unsigned as negative length messages seem unlikely -
There are no details on whether this is a synchronous or an asynchronous protocol
-
There is no error specification:
- what action to take in the case of receiving a message you don't understand
- what action to take in the case of receiving a message with insufficient length vs. expected
-
There is no transport layer specification. Possibly the protocol is transport agnostic
-
There is no specification on the signature algorithm to be used
-
Missing maximum permitted size of
VARIABLE_INTEGER
. We don't want to blindly read 2**64 bytes from the socket for example
- I would change
LENGTH
to aVARIABLE_INTEGER
instead of8-BYTE-INTEGER
to get a more minimal encoding on this field. - I would add a requirement to always minimally encode numbers stored as
VARIABLE_INTEGER
- I would replace the JSON in
BODY
with an encoding which can natively handle binary data, e.g. CBOR or Protobuf for the general case, or otherwise a custom encoding specific to the application - Using the
TYPE
field to indicate whether this is a request or response seems wasteful. This field could be used to properly designate message type from a list of specified messages
- As specified, it appears that
SIGNATURE
should be including itself in the data to be signed (PACKET
), which will be tricky to implement! This should be changed
- (optional) I would remove the signature from the
BODY
and instead specify that we are delegating message authentication to the (encrypted) transport layer
Testing the implementation as originally described will be difficult because there is no specified action to take in the case of receiving unknown or otherwise malformed PACKET
s.
Assuming that we have details on how errors should be handled, the first step should be to produce a set of test vectors which should exercise the various aspects of the protocol that are actually specified:
- Known/unknown
TYPE
- Unexpected
TYPE
(i.e. sendingTYPE = 1
in a request) - Known/unknown
JSON-COMMAND
- Malformed JSON in
JSON-COMMAND
- Good/bad
SIGNATURE
- Message too long
- Message too short
- Malformed
VARIABLE_INTEGER
If we are need to truly test "the spec as advertised" (above), then we would have no option except to omit error-generating tests and simply ensure that all valid messages pass. We would then program our client/server to avoid any potential attack vectors related to handling malformed messages, and hope for the best.