In this article, we want to describe some lessons we learned while implementing stateful network and security protocols using a purely functional approach. This is by no means complete or correct (it is an active research area!). Feedback, comments, experiences, and counterexamples are appreciated (mail to hannes@mehnert.org).
The overall goal is to get robust and efficient implementations of network protocols, which are also readable: the protocol logic should be contained and understandable, and follow the names and definitions of the RFC. To achieve this, the code should avoid side effects in the core protocol logic.
A fundamental feature of functional programming is that communication between functions is done via arguments and result values solely. There is no other channel, such as a shared mutable heap (assignments are avoided). Side effects and errors are explicit in the type system. This makes reading and understanding of code easy, since it can be done in a modular way, since there is no possibility that a remote code part influences the current function. Abstraction should be used where needed, keeping in mind that every abstraction is a burden on the reader of the program (e.g. yourself in 5 years).
The stated goal is to use the same purely functional implementation as a specification (test oracle which validates recorded sessions, test generator), and a robust, efficient, understandable implementation.
OCaml is not a purely functional programming language, but by avoiding certain constructs we can use the purely functional core:
-
Mutable state - OCaml has builtin support for shared mutable state (reference, observable by the
ref
keyword, andmutable
fields in records). -
Side effects - OCaml gives direct access to side effecting operations (
Unix.time ()
returns the current time, file system and network operations are available in the Unix module as well). -
Exceptions - OCaml has exceptions built in. Some parts of the compiler support library (such as
List.find
) raise exceptions. -
Objects and classes - We avoid the object system of OCaml.
For a protocol implementation, we don't really need a concrete asynchonous library - we defer the decision which one to use to later, and first cope with the protocol itself independent of any effectful computation.
Once the algebraic effect system becomes mature and merged upstream, it will for sure ease several obstacles.
Most protocols can be divided into its message format, either binary or text based (or in some structured form, json or s-expression). The message format might be encrypted, or compressed, thus some transformation has to be applied before its content can be retrieved and processed.
A protocol likely includes a state machine, consisting of a set of states, and transitions. A transition can fire by an internal event (a user wants to establish a connection) or an external event (a message is received).
Keep in mind to be liberal what you accept from others, and be conservative in what you send (Postel's robustness principle). But implementing workarounds for every broken implementation is not worth it, there is no need to being able to communicate with all other stacks (leave the broken and insecure aside).
Sometimes complexity can be avoided by layering: often the protocol logic needs to deal only with an individual connection, the fact that multiple connections are processed at the same time can be handled outside of the core logic, using some collection from connection to context.
For each message type, a parsing and unparsing function needs to be developed. This translates from the binary or text message to a suitable high-level data structure. The parser should be as independent as possible from the current state the connection is in (e.g. no protocol version specific behaviour), and may do initial validation.
A parser should either return a valid high-level structure, or an invalid structure error, and underflow and overflow errors in case some length fields do not match the actual fragment. The high-level data structure should only retain the useful and used information of a message, reserved fields and static strings can be dropped.
NB: OCaml has both variant types and polymorphic variants, which is to use is best explained on Stack Overflow here.
A sum type of all possible messages (as high-level structures) is very handy:
type message = Hello of AA | Response of BB
Parsers have roughly the signature:
val parse_message : buffer -> [ `Ok of message | `Error of error ]
An unparser might have the signature:
val write_message : message -> buffer
A state machine can be encoded using a sum type of all possible states, and a function which matches on the current state and a given action or incoming packet, and results in the new state, calling some handler which does the actual work (such as preparing a response, ...).
type state = WaitingForHello | WaitingForResponse of CC
It is likely, that certain pieces of information, gathered during the exchange
of messages, need to be preserved (think of the ciphersuite and key material in
TLS, or the window size in TCP) for the session. This should be stored in a
record (context
), which is passed to the state machine handler and returned
from it, so that it can be extended. The context contains at least the state
described earlier.
The signature for the handler might be:
type context = {
state : state ;
request_number : int ;
}
val handle_message : context -> buffer -> [ `Ok of (context * buffer list) | `Error of error ]
And the implementation is pretty straightforward, where each handle function
should do the desired validation, and return either `Ok
or `Error
:
let handle_message ctx buf =
match parse_message buf with
| `Error e -> `Error e
| `Ok msg -> match ctx.state, msg with
| WaitingForHello , Hello a -> handle_hello ctx a buf
| WaitingForResponse c, Response b -> handle_response ctx c b buf
| _ -> `Error UnexpectedInput
Depending on the protocol, the state machine handlers might vary a lot (and not share much code), or they might be very similar. The handler can either be combined into a single function, or separated into client and server handler function.
let handle_hello ctx hello buf =
let ctx = { ctx with request_number = succ ctx.request_number } in
`Ok (ctx, [])
There are different kinds of errors: configurations which make no sense (best to
prevent at the type level) should error out at load time (using an
Invalid_argument
exception), the system does not have enough resources (out of
memory), or it receives unexpected data.
Network services need to validate all incoming data, since it might be invalid data by individuals trying to circumvent the safety and security of a protocol implementation. This includes malformed individual messages, or sending invalid or unexpected messages at certain states of the protocol. It might be malicious intent, bit flips on the wire, or broken implementations of the protocol of the communication partner.
To handle structurally invalid messages the parser can return an error. Each
handle function also can bail out with an error. Here, using an error
monad results
in quite readable code (see for example OTR
parser or TLS
parser (which
contains functions to parse with exceptions, and a catch
which captures the
exceptions, only the guarded functions are exposed)):
let parse_hello buffer =
parse_header buffer >>= fun (header, buffer) ->
parse_data_item buffer >>= fun (data, buffer) ->
parse_more_data buffer >>= fun (more_data, buffer) ->
check_empty buffer >>= fun () ->
return (header, data, more_data)
In this approach the function is explicitly marked (at the type level) to return either some successfully parsed data or an error, unlike if exceptions are thrown. This ensures that a caller has to handle both possible return values.
In case you need access to persistent storage (such as session information for
HTTP cookies, or TLS session resumption, handed out DHCP leases), instead of
implementing this inside of the protocol logic, use callbacks which are exposed
via the configuration. Thus, the configuration could expose a session_store : information -> context option
, which is used for loading from the cache (or
checking whether the information
is still free (such as an IP address in the
case of a DHCP lease).
For storing the data, maybe the main handle
could expose a `Reserve of information * context
action, or the information is available from a context
once a session has been established (in TLS, we provide Engine.epoch
which can
be used to retrieve the session ID and fill the cache
here).
If your protocol should read and serve data which should not be preserved in
memory (e.g. the ftp repository is better kept on disk, rather than in memory),
a possible interface is to provide the information needed for the protocol in
the configuration (such as a map from filename and its size), and then have
actions ReadSent of (header * (filename * offset * length))
. If the
persistent storage is supposed to change, either from external or internal (file
uploading), pass the map to the handler and return it, next to the context.
Somehow a context
need to be constructed from thin air, in most protocols
there is a slight difference between the two sides, the initiator of the
communication (client), and the responder (server). Thus, it is useful to
provide two initialisation functions:
val client : client_config -> context
val server : server_config -> context
Depending on protocol, client and server configuration parameters might be the same, or radically different. The context should contain the information which communication side it represents.
To summarise, there are various data inputs for a protocol: static and global configuration, dynamic per-connection context, and sometimes persistent storage (which might be either global or per-connection, maybe depending on credentials).
The high-level API for a side-effecting client could be similar to. Depending
on the protocol, handle
might always return an optional buffer as response,
action
is also protocol-dependent:
type context (* abstract context *)
val client : client_config parameters -> context
val server : server_config parameters -> context
type error = [ ... ]
(* the action: present a warning to the user, respond with a buffer to the
communication partner, process a buffer of payload *)
type action = [ `Warning of string | `Respond of buffer | `Payload of buffer | ... ]
(* handles a received message with a given context, returns (depending on
protocol) a new context and a list of actions; or an error *)
val handle : context -> buffer -> [ `Ok of (context * action list) | `Error of error ]
(* initiates the protocol by producing a message to be send to the communication
partner *)
val initiate : context -> [ `Ok of (context * buffer option) | `Error of error ]
(* sends some payload *)
val send_data : context -> buffer -> [ `Ok of (context * buffer) | `Error of errpr ]
Accessors to data in context
can be useful as well, Concrete protocols might
omit parts of the interface: e.g. send_data
is only needed for transport
protocols which carry some payload.
This API is sufficient for implementing effectful shims on top of it, being it Unix or Lwt.
Using a purely functional approach, unit testing can be used since no effects are needed (e.g. using alcotest and coverage analysis). In addition, several virtual network interfaces are available in MirageOS which can be used to test a more complete stack (or read and process data from a packet capture (pcap) file).
Look into mirage-net-pcap and mirage-vnetif, and its usage in mirage-tcpip.
By implementing protocols in a purely functional style, this core can be more easily understood and tested (unit tests for parsers and unparsers, state machine transitions should be obvious by a single handle function), since no effects are embedded. Effectful operations, such as storage, reads from and writes to the network interface, console output, ..., can be layered on top of the functional core.
We want to have static guarantees of a protocol implementation, and we help by using OCaml's type and exhaustiveness checker. The exchaustiveness checker reports missing transitions (if we do not provide a catchall case), because we provide sum types for messages and state machine states.
- cstruct buffer (offset and length pointer into a bigarray)
- ptime (unreleased) POSIX time (RFC 3339)
- astring (unreleased) String handling
- rresult result and combinators (for error handling)
- ipaddr IPv4, IPv6, and MAC address handling
- nocrypto, cryptographic primitives
- logs, some logging infrastructure
- alcotest, a test framework
- coverage analysis, using bisect
- travisci, continuous integration on GitHub for OCaml projects
- OCaml-TLS API (implementation)
- OCaml-OTR API (implementation)
- not quite so broken