Skip to content

Instantly share code, notes, and snippets.

@willcl-ark
Created October 14, 2021 13:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save willcl-ark/4b5f67d916fa8fdde750ab2d95b5a1a9 to your computer and use it in GitHub Desktop.
Save willcl-ark/4b5f67d916fa8fdde750ab2d95b5a1a9 to your computer and use it in GitHub Desktop.
will-clark-LPE response

Will Clark - Protocol Engineer Take Home Question

Consider the following protocol spec:

Request format:

PACKET := HEADER | BODY

HEADER := LENGTH | TYPE

LENGTH := 8-BYTE-INTEGER

TYPE := VARIABLE_INTEGER

BODY := JSON-COMMAND | SIGNATURE

LENGTH is the total length the PACKET.

A VARIABLE_INTEGER is encoded as 7 bits of value, with the upper bit 1

indicating that another byte follows, upper bit 0 being terminal.

TYPE is currently always zero.

SIGNATURE is the digital signature of the PACKET.


Reply format:

The same, but TYPE is 1.

  • What is missing in this specification?

  • What would you change about this specification?

  • How would you test an implementation?

Unordered initial thoughts

  • 8 bytes for LENGTH seems excessive, will we really see messages this long -- 1.8*10^19 bytes?
    • Would probably be better to use a VARIABLE_INTEGER for this
  • Using a VARIABLE_INTEGER for TYPE also seems excessive, how many types of message do we expect to use?
  • We are missing the endianness of the VARIABLE_INTEGER's for both LENGTH and TYPE
  • We are missing whether 8-BYTE-INTEGER and VARIABLE_INTEGER's are signed or unsigned, although one would presume unsigned as negative length messages seem unlikely
  • This implementation of VARIABLE_INTEGER can represent message lengths up to 128 bytes in a single byte. Bitcoins VARINT can represent messages of up to length 255 in a single byte. However this encoding becomes more efficient for longer message lengths.
  • There are no details on whether this is a synchronous or an asynchronous protocol
  • There is no error specification:
    • what action to take in the case of receiving a message you don't understand
    • what action to take in the case of receiving a message with insufficient length vs. expected
  • There is no transport layer specification. Possibly the protocol is transport agnostic
  • Seems wasteful to include a signature with every message, couldn't this be handled at the transport layer more efficiently?
  • Messages currently include a signature but it's not clear how keys are initially exchanged so that the signature can be verified
  • There is no specification on the signature algorithm to be used
  • As specified, it appears that SIGNATURE should be including itself in the data to be signed (PACKET), which will be tricky to implement!
  • There is no apparent message versioning which would allow for backwards-compatible upgrades. Whilst the TYPE could be incremented, old clients/servers would not recognise new message types and it's unspecified what action they should take in the case of receiving a message of unknown TYPE (presumably terminate the connection)
  • Headers are variable length, not fixed, this can be useful to squeeze out some more efficiency and not waste unnecessary bits
  • Missing maximum permitted size of VARIABLE_INTEGER. We don't want to blindly read 2**64 bytes from the socket for example
  • Using JSON as part of a wire protocol is not ideal because:
    • It wastes a lot of bandwidth on syntax and field names
    • It can't support many types of data e.g. bytes, which require conversion to Base64 and result in 33%
  • Would prefer using CBOR or else specified message types for BODY

What is missing in this specification?

  • We are missing the endianness of the VARIABLE_INTEGER's for both LENGTH and TYPE

  • We are missing whether 8-BYTE-INTEGER and VARIABLE_INTEGER's are signed or unsigned, although one would presume unsigned as negative length messages seem unlikely

  • There are no details on whether this is a synchronous or an asynchronous protocol

  • There is no error specification:

    • what action to take in the case of receiving a message you don't understand
    • what action to take in the case of receiving a message with insufficient length vs. expected
  • There is no transport layer specification. Possibly the protocol is transport agnostic

  • There is no specification on the signature algorithm to be used

  • Missing maximum permitted size of VARIABLE_INTEGER. We don't want to blindly read 2**64 bytes from the socket for example

What would you change about this specification?

  1. I would change LENGTH to a VARIABLE_INTEGER instead of 8-BYTE-INTEGER to get a more minimal encoding on this field.
  2. I would add a requirement to always minimally encode numbers stored as VARIABLE_INTEGER
  3. I would replace the JSON in BODY with an encoding which can natively handle binary data, e.g. CBOR or Protobuf for the general case, or otherwise a custom encoding specific to the application
  4. Using the TYPE field to indicate whether this is a request or response seems wasteful. This field could be used to properly designate message type from a list of specified messages
  • As specified, it appears that SIGNATURE should be including itself in the data to be signed (PACKET), which will be tricky to implement! This should be changed
  1. (optional) I would remove the signature from the BODY and instead specify that we are delegating message authentication to the (encrypted) transport layer

How would you test an implementation?

Testing the implementation as originally described will be difficult because there is no specified action to take in the case of receiving unknown or otherwise malformed PACKETs. Assuming that we have details on how errors should be handled, the first step should be to produce a set of test vectors which should exercise the various aspects of the protocol that are actually specified:

  1. Known/unknown TYPE
  2. Unexpected TYPE (i.e. sending TYPE = 1 in a request)
  3. Known/unknown JSON-COMMAND
  4. Malformed JSON in JSON-COMMAND
  5. Good/bad SIGNATURE
  6. Message too long
  7. Message too short
  8. Malformed VARIABLE_INTEGER

If we are need to truly test "the spec as advertised" (above), then we would have no option except to omit error-generating tests and simply ensure that all valid messages pass. We would then program our client/server to avoid any potential attack vectors related to handling malformed messages, and hope for the best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment