Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Basic considerations when creating a module in OCaml

Basic Module Design

First-In-First-Out Stack

module FIFO : sig
  exception Empty

  type 'a t
  val empty: 'a t
  val size : 'a t -> int
  val peek : 'a t -> 'a option
  val push : 'a -> 'a t -> 'a t
  val pop  : 'a t -> 'a t (* Empty *)
end

Signature

  • Keep representation abstract
  • Functions that introduce t values
  • Functions that consume or observe t values
  • Type design should exclude most illegal states
  • Think about client code using this module

Mutability

Abstractions can be either

  • immmutable (like FIFO.t above), or
  • mutable (like Hashtbl.t)

Errors

The type system prevents most errors, we still have:

  • Checked runtime errors - leading to exceptions
  • Unchecked runtime errors
  • Choice how to report errors: exceptions or optional values

Invariants

Immutable abstractions are easier to describe than mutable abstractions using invariants and algebraic equations:

size(empty) = 0
size(push(x,s)) = size(s) + 1
size(pop(s)) = size(s) - 1
pop(empty)= undefined
pop(push(x,empty)) = empty
pop(push(x,s)) = push(x,pop(s)), s != empty
pop(push(y,push(x,empty))) = push(y,empty)
peek(empty) = None
peek(push(x,empty)) = Some(x)
peek(push(x,s)) = peek(s), s != empty
peek(push(y,push(x,empty))) = Some(x)

This is an informal notation which is still very useful and concise to capture the interaction between functions. These equations can be translated into unit tests. (Some people might prefer to write tests rather than invariants.)

Equality

Equality in OCaml as defined by = and <> is structural and might not be the right choice to compare values. For example:

  • A set abstraction can have many internal representations for the same set. Hence, structural equality would be wrong to use for set equality.

  • The FIFO stack above may or may not have a canonical internal representation - it could depend on the order of operations. As it stands, probably only empty would be safe to use in =. It would be even safer to provide an is_empty predicate in the signature.

  • Using = on a FIFO.t value could be an unchecked runtime error.

Polymorphism

If an abstraction consumes other values - what assumptions do we have to make over them?

  • If our implementation is agnostic over the values it consumes, we can use parametric polymorphism: easy and efficient.
  • If we have to make assumptions over values, we need access to their properties. This leads to an implementation as a functor such that the functor argument provides access to all functions that we need.

In the case of the FIFO stack we simply store values but never inspect them. This makes it impossible to print them, for example, in the implementation. We also should not compare or order them in the implementation because we can't assume that eqality and order are well defined.

Other Considerations:

  • Complexity
  • Type Variance (in combination with polymorphic variants)

Implementation

The idea of an efficient, functional implementation is to use two lists:

  • type 'a t = 'a list * 'a list

  • We push to the front of the second list.

    # push 1 empty |> push 2 |> push 3;;
    - : int list * int list = ([1], [3; 2])
    
  • We pop from the front of the first list

  • If the first list becomes empty, the reversed second list becomes the first list

    # push 1 empty |> push 2 |> push 3 |> pop
    - : int list * int list = ([2; 3], [])
    
    # push 1 empty |> push 2 |> push 3 |> pop |> push 4 |> push 5;;
    - : int list * int list = ([2; 3], [5; 4])
    
  • Reversing the second list has complexity O(n) but this effort is amortized over all push/pop operations, which makes them O(1) on average.

Code

exception Empty
type 'a t = 'a list * 'a list

let empty = ([], [])

let size (xs, ys) = List.length xs + List.length ys

(* invariant: the left list is never empty unless the 
   stack is empty *)

let peek = function
  | x::_  , _     -> Some x
  | []    , []    -> None
  | []    , _     -> assert false

let push y = function
  | []    , []    -> [y], []
  | xs    , ys    -> xs, y::ys
  | []    , _ys   -> assert false

let pop = function
  | []    , []    -> raise Empty
  | [_]   , ys    -> List.rev ys, []
  | _::xs , ys    -> xs         , ys
  | []    , ys    -> assert false

Note: the stack containing 1..5 could have two representations:

  • ([1; 2], [5; 4; 3])
  • ([1] , [5; 4; 3; 2])
@mseri
Copy link

mseri commented Apr 25, 2019

@lindig nice summary:

  • there is a small typo: euqality -> equality

  • why not module type FIFO on the first block, and then module Fifo: FIFO ... in the last? So you can also show where abstraction and implementation details come into play?

@mseri
Copy link

mseri commented Apr 25, 2019

How much are you going to say about type variance?

How do you deal with undefined behaviour (pop(empty)) across multiple implementations?

How do you test algebraic properties like the ones you presented?

@lindig
Copy link
Author

lindig commented Apr 26, 2019

Thanks for the feedback.

  • I think type variants demands its own session.
  • You are right that I should have used the signature to constrain the implementation to make the connection more obvious.
  • I'm not sure the algebraic properties are correct. I want to show that it makes sense to concisely document the interaction between functions in a module. This is not always easy and it is ad hoc because making this precise you end up writing theorems. If the abstraction is mutable, you can't use these but still probably want something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment