polytypic/article.md

## article.md

      
    Raw
  

              article.md
            
          
    Fear the Obj.magic


Let me first quote a paragraph from the
Memory Representation of Values
(with my emphasis):

Obj is an undocumented module that
exposes the internals of the OCaml compiler and runtime. It is very useful for
examining and understanding how your code will behave at runtime but should
never be used for production code unless you understand the
implications. The module bypasses the OCaml type system, making memory
corruption and segmentation faults possible.

If you ask about Obj.magic on public OCaml forums, you'll likely get a warning
of some sort. Here is mine: If you found this note, because you are just
learning OCaml and have no idea of what you are doing, you should move on. Avoid
Obj.magic at all cost.
If, on the other hand, you are actually implementing a supposedly
high-performance data structure and care about the cost, and you want to
understand the implications, then this note might be for you. Knowledge and
understanding are far more useful than fear of the unknown. Understanding the
memory representation of values can significantly help one when trying to
improve the performance of data structures in OCaml.
In this note I'm going to first talk a bit about the
memory representation of values
and
about unboxed float arrays.
This is not meant to be a comprehensive treatment on the subject, however. I
only want to point out a number of specific features. Then I'll go through a
number of inefficiencies one tends to encounter when trying to implement
high-performance data structures and I'll briefly show how one might use
Obj.magic to avoid them.
On the memory representation of values

As explained in numerous
other documents,
the OCaml runtime uses a uniform memory representation of values. What this
means is that an OCaml value is stored as a single, 32-bit or 64-bit, word and
is either an immediate integer or a pointer. A bit in each value indicates
whether the value is an immediate integer or a pointer. If the value is a
pointer, it will point to a block of memory that begins with a control word
containing information on the block. The information includes, among other
things, the size of the block and a "tag" that tells what kind of block it is.
In the rest of this note we are mostly interested in the representation of these
blocks and, in particular, we need to distinguish between a few different kinds
of blocks:

constructor blocks, having one of the constructor_tags,
double blocks, having the double_tag, and
double array blocks, having the double_array_tag.

A constructor block contains entries stored as 32-bit or 64-bit words, just like
any other OCaml values, a double block contains a single double precision
floating point number stored as a 64-bit entry, taking either two 32-bit words
or one 64-bit word, and a double array block contains double precision floating
point numbers stored as 64-bit entries.
Entries in those different types of blocks need to be accessed in different
ways. Using Obj.magic we can trick OCaml into performing accesses on values
that it would not otherwise allow. If we do that carelessly, it is possible to
trick OCaml into performing invalid accesses, that, if we are lucky, fail at
runtime and crash the program.
Examining the representation of values using the Obj module

Let's create a little helper named tag_name_of that tells us the tag of an
OCaml value:
let tag_names = [|
  Obj.forcing_tag, "forcing_tag";
  Obj.cont_tag, "cont_tag";
  Obj.lazy_tag, "lazy_tag";
  Obj.closure_tag, "closure_tag";
  Obj.object_tag, "object_tag";
  Obj.infix_tag, "infix_tag";
  Obj.forward_tag, "forward_tag";
  Obj.no_scan_tag, "no_scan_tag";
  Obj.abstract_tag, "abstract_tag";
  Obj.string_tag, "string_tag";
  Obj.double_tag, "double_tag";
  Obj.double_array_tag, "double_array_tag";
  Obj.custom_tag, "custom_tag";
  Obj.int_tag, "int_tag";
  Obj.out_of_heap_tag, "out_of_heap_tag";
  Obj.unaligned_tag, "unaligned_tag";
|]

let tag_name_of x =
  let tag = Obj.tag (Obj.repr x) in
  if Obj.first_non_constant_constructor_tag <= tag &&
     tag <= Obj.last_non_constant_constructor_tag then
    Printf.sprintf "constructor_tag %d" tag
  else
    match Array.find_opt (fun (tag', _) -> tag == tag') tag_names with
    | Some (_, name) -> name
    | None -> Printf.sprintf "unknown tag %d" tag
Let's then examine some OCaml values. An int value is stored as an immediate
integer value as are nullary or constant constructors:
# tag_name_of 101
- : string = "int_tag"

# tag_name_of None
- : string = "int_tag"

# tag_name_of `Nullary
- : string = "int_tag"
As one might expect a float is stored as block with the double_tag:
# tag_name_of 1.01
- : string = "double_tag"
Pretty much everything else, i.e. non-constant constructors, most records and
tuples, ref-cells (which are actually records), and most arrays, are stored as
constructor blocks:
# tag_name_of (Ok "value")
- : string = "constructor_tag 0"

# tag_name_of (Error "value")
- : string = "constructor_tag 1"

# tag_name_of (4.2, 1.01)
- : string = "constructor_tag 0"

# tag_name_of (ref 1.01)
- : string = "constructor_tag 0"

# tag_name_of (Atomic.make 1.01)
- : string = "constructor_tag 0"

# tag_name_of [|101; 42|]
- : string = "constructor_tag 0"
The notable exceptions are records that only contain float fields and
float arrays:
# let open struct type vec2 = { x: float; y: float } end in
  tag_name_of { x = 4.2; y = 1.01 }
- : string = "double_array_tag"

# tag_name_of (Array.make 101 4.1)
- : string = "double_array_tag"
As explained in the post
About unboxed float arrays,
the special treatment of float arrays has both pros and cons. In this note we
are mostly interested in the cons, how to deal with the possibility of
float arrays, and how to avoid the costs when they are unnecessary.
Array accesses

Let's examine the code that OCaml generates for array accesses using the
Compiler Explorer.
OCaml generates different code for an array access depending on what it knows
about the type of the array elements.
Array of immediate type

For an array of immediate elements:
type immediate [@@immediate]

let immediate_get (xs: immediate array) i =
  Array.unsafe_get xs i

let immediate_set (xs: immediate array) i v =
  Array.unsafe_set xs i v
OCaml generates what one might expect:
camlExample__immediate_get_269:
        movq    -4(%rax,%rbx,4), %rax
        ret
camlExample__immediate_set_317:
        movq    %rdi, -4(%rax,%rbx,4)
        movl    $1, %eax
        ret
This is the most efficient case — essentially a single instruction for an
array access.
Array of non-float type

Things get more interesting for an array of non-float elements:
type non_float = [ `Non_float of non_float ]

let non_float_get (xs: non_float array) i =
  Array.unsafe_get xs i

let non_float_set (xs: non_float array) i v =
  Array.unsafe_set xs i v
OCaml has no way to say to the compiler that an abstract type is not a float.
So, above we define a concrete type non_float that the compiler can analyze
and see that it is not an immediate type and also that it isn't a float. The
compiler generates array accesses that assume that the values may be pointers:
camlExample__non_float_get_323:
        movq    -4(%rax,%rbx,4), %rax
        ret
camlExample__non_float_set_327:
        subq    $8, %rsp
        movq    %rdi, %rsi
        leaq    -4(%rax,%rbx,4), %rdi
        movq    %rsp, %rbp
        movq    56(%r14), %rsp
        call    caml_modify@PLT
        movq    %rbp, %rsp
        movl    $1, %eax
        addq    $8, %rsp
        ret
A get is still optimal, but now a set calls caml_modify. The caml_modify
function, which is part of the OCaml runtime, will perform the necessary GC
write barrier and fences mandated by
the OCaml memory model.
Array of float

Things get perhaps even more interesting for an array of float elements:
let float_get (xs: float array) i =
  Array.unsafe_get xs i

let float_set (xs: float array) i v =
  Array.unsafe_set xs i v
While a set is still fairly efficient, a get actually allocates memory:
camlExample__float_get_332:
        subq    $8, %rsp
        subq    $16, %r15               ; Allocate 16 bytes for double block
        cmpq    (%r14), %r15            ; Check whether a GC call is needed
        jb      .L105
.L107:
        leaq    8(%r15), %rdi
        movq    $1277, -8(%rdi)         ; Write double block header word
        movsd   -4(%rax,%rbx,4), %xmm0  ; Read from array
        movsd   %xmm0, (%rdi)           ; Write double block value
        movq    %rdi, %rax
        addq    $8, %rsp
        ret
.L105:
        call    caml_call_gc@PLT
.L106:
        jmp     .L107
camlExample__float_set_336:
        movsd   (%rdi), %xmm0           ; Read double block value (*)
        movsd   %xmm0, -4(%rax,%rbx,4)  ; Write double array element
        movl    $1, %eax
        ret
⚠️ There is one particularly important thing to note in the float_set code. A
float value is generally represented as a double block. So, before writing the
value to the double array block, the value is read from the double block. Now,
consider the following function:
let will_this_crash_or_not xs =
  float_set xs 0 (Obj.magic ())
If you call will_this_crash_or_not [| 1.01 |] the program will most likely
crash. Notice the instruction marked with a (*). It attempts to read the value
from a double block using the value in register rdi as a pointer.
Obj.magic () is an immediate value. When that immediate value is interpreted
as a pointer, it will most likely point to an address in memory that the program
is not allowed to access and so the call will most likely crash.
Array of unknown type

Based on what we saw previously, one might already be able to guess what happens
when one accesses arrays of unknown (abstract or polymorphic) type:
let polymorphic_get xs i =
  Array.unsafe_get xs i

let polymorphic_set xs i v =
  Array.unsafe_set xs i v
The compiler has no choice but to generate code that can handle any one of the
previous cases:
camlExample__polymorphic_get_341:
        subq    $8, %rsp
        movzbq  -8(%rax), %rdi          ; Fetch tag from array header word
        cmpq    $254, %rdi              ; Is it a double array?
        je      .L109
        movq    -4(%rax,%rbx,4), %rax   ; (*)
        addq    $8, %rsp
        ret
.L109:
        subq    $16, %r15               ; Allocate 16 bytes for double block
        cmpq    (%r14), %r15            ; Check whether a GC call is needed
        jb      .L111
.L113:
        leaq    8(%r15), %rdi
        movq    $1277, -8(%rdi)         ; Write double block header word
        movsd   -4(%rax,%rbx,4), %xmm0  ; Read from array
        movsd   %xmm0, (%rdi)           ; Write double block value
        movq    %rdi, %rax
        addq    $8, %rsp
        ret
.L111:
        call    caml_call_gc@PLT
.L112:
        jmp     .L113
camlExample__polymorphic_set_345:
        subq    $8, %rsp
        movq    %rdi, %rsi
        movzbq  -8(%rax), %rdi          ; Fetch tag from array header word
        cmpq    $254, %rdi              ; Is it a double array?
        je      .L115
        leaq    -4(%rax,%rbx,4), %rdi
        movq    %rsp, %rbp
        movq    56(%r14), %rsp
        call    caml_modify@PLT         ; (*)
        movq    %rbp, %rsp
        jmp     .L114
.L115:
        movsd   (%rsi), %xmm0           ; Read double block value
        movsd   %xmm0, -4(%rax,%rbx,4)  ; Write double array element
.L114:
        movl    $1, %eax
        addq    $8, %rsp
        ret
That is quite an explosion from the best case of a single instruction for both a
get and a set. The possibility of dealing with a float array significantly
increases the amount of code generated for and slows down array accesses. I
refer to this effect as float array pessimization — the compiler
pessimistically generates expensive code.
⚠️ There is one thing that I'd like to point out at this point. Notice the two
lines marked with (*). When the compiler does not know the representation of
the elements, a single instruction is sufficient for the get and a call to
caml_modify is sufficient for the set of the non-float array case. The
caml_modify function can handle both immediate and pointer like values,
including float values, i.e. double blocks, but it does not handle
float arrays as targets.
Summary

The OCaml runtime uses a uniform representation values, where a value is either
an immediate integer value or a pointer to a block, which can, for our purposes,
be a constructor block, a double block, or a double array block. When using
Obj.magic it is important to distinguish between these representations.
Avoiding float array pessimization can both significantly reduce code size and
improve performance.
Using Obj.magic

In the following subsections I will go through a few examples or cases where
Obj.magic can be used to avoid some inefficiency.
A unique zero value

Consider the following data structure:
module type Mvar = sig
  type 'a t
  val create : 'a option -> 'a t
  val take_opt : 'a t -> 'a option
  val try_give : 'a t -> 'a -> bool
end

module Mvar_opt : Mvar = struct
  type 'a t = 'a option ref

  let create = ref

  let[@poll error][@inline never] take_opt t =
    match !t with
    | None -> None
    | Some _ as some ->
      t := None;
      some

  let[@poll error][@inline never] try_give t opt =
    !t == None && begin
      t := opt;
      true
    end

  let try_give t v =
    (* A [@poll error] function must not allocate, which is why we do the
       allocation outside of the critical section. *)
    try_give t (Some v)
end
The above could be the beginnings of a wait-free
systhread-safe
implementation of a so called
M-var.
A performance issue with the above implementation is that a try_give operation
requires an allocation and that there is an extra indirection due to the use of
an option.
If there was a way to have a zero value that is distinct from all other values
then we wouldn't need the option.
Well, there is a way. Recall that the OCaml runtime uses a uniform
representation of values. Each value is either an immediate integer value or a
pointer to a block. A pointer to a unique block is distinct from all other
values. Allocating a fresh mutable block, such as a ref cell, gives us a
unique value. We can then use Obj.magic to cast that unique value to any type.
We then need to make sure that we don't let the zero value escape outside of
the M-var implementation:
module Mvar_zero : Mvar = struct
  type 'a t = 'a ref

  let zero =
    (* [zero] is a unique value distinct from all other OCaml values. *)
    Obj.magic (ref 101)

  let create opt =
    (* The [match] below is in an inlinable function and there is hope that an
       optimizing compiler for OCaml could eliminate the implied allocation of
       [Some value] in some cases. *)
    ref (match opt with None -> zero | Some value -> value)

  let[@poll error][@inline never] take_zero t =
    let value_or_zero = !t in
    (* We compare using [!=] or [==]. *)
    if value_or_zero != zero then
      t := zero;
    value_or_zero

  let take_opt t =
    let value_or_zero = take_zero t in
    if value_or_zero == zero then
      None
    else
      (* The allocation below is in an inlinable function and there is hope that
         an optimizing compiler for OCaml could eliminate this allocation. *)
      Some value_or_zero

  let[@poll error][@inline never] try_give t value =
    !t == zero && begin
      t := value;
      true
    end
end
Let's give it a spin:
# let mv = Mvar_zero.create (Some 4.2)
val mv : float Mvar_zero.t = <abstr>

# Mvar_zero.take_opt mv
- : float option = Some 4.2

# Mvar_zero.take_opt mv
- : float option = None

# Mvar_zero.try_give mv 1.01
- : bool = true

# Mvar_zero.take_opt mv
- : float option = Some 1.01
Is this safe?
A ref cell is allocated as a constructor block, because the representation of
the value it holds is unknown. In other words, a ref cell is not subject to
the float array pessimization. Because the representation of the element stored
by M-var is unknown to the compiler, the compiler will generate calls to
caml_modify to update the value and that is safe for both immediate values and
pointer values. When used with a float value, the float value will be
treated as a uniformly represented value like any other pointer value.
Here is
a Compiler Explorer link
for this example.
Avoiding space leaks

Consider the following node based and wait-free systhread-safe queue using a
GADT to avoid some unnecessary match cases:
module type Queue = sig
  type 'a t
  val create : unit -> 'a t
  val enqueue : 'a t -> 'a -> unit
  val dequeue_opt : 'a t -> 'a option
end

module Queue_opt : Queue = struct
  type ('a, _) node =
    | Nil : ('a, [> `Nil]) node
    | Next : {
        mutable next: ('a, [`Nil | `Next]) node;
        mutable value: 'a option; (* 1 *)
      } -> ('a, [> `Next]) node

  type 'a t = {
      mutable head: ('a, [`Next]) node;
      mutable tail: ('a, [`Next]) node;
    }

  let create () =
    let dummy = Next {
        next = Nil;
        value = None; (* 2 *)
      }
    in
    { head = dummy; tail = dummy }

  let[@poll error][@inline never]
    enqueue t (Next new_tail_r as new_tail : (_, [< `Next]) node) =
    let Next old_tail_r = t.tail in
    t.tail <- new_tail;
    old_tail_r.next <- new_tail

  let enqueue t v =
    let (Next _ as new_tail : (_, [< `Next]) node) = Next {
        next = Nil;
        value = Some v (* 3 *)
      }
    in
    enqueue t new_tail

  let[@poll error][@inline never] dequeue_opt t =
    let Next old_head_r = t.head in
    match old_head_r.next with
    | Nil -> None
    | Next new_head_r as new_head ->
      let result = new_head_r.value in
      new_head_r.value <- None; (* 4 *)
      t.head <- new_head;
      result
end
Notice the lines marked with (* ? *). The queue makes it so that both head
and tail always point to a non-Nil node. In other words, the queue uses a so
called sentinel node. The value of the node pointed to by the head is not
logically part of the queue and the next field of the node pointed to by the
tail is Nil. When the queue is empty, head and tail point to the same
node.
When we remove a value from the queue, we need to make sure that the value is
not retained by the queue. This is the reason why we wrap the value as an
option (1), which requires allocating extra space (3), and populate (2) or
overwrite (3) the value field with None when the node is empty.
We could use the same trick as we used in the previous section to create a
unique zero value to avoid allocating extra space for the option and avoid
the associated indirection, but in this case we can do even better. We can
simply use Obj.magic () as the "no leak" value, because we don't need to be
able to tell it apart from valid values:
module Queue_unit : Queue = struct
  type ('a, _) node =
    | Nil : ('a, [> `Nil]) node
    | Next : {
        mutable next: ('a, [`Nil | `Next]) node;
        mutable value: 'a; (* 1 *)
      } -> ('a, [> `Next]) node

  type 'a t = {
      mutable head: ('a, [`Next]) node;
      mutable tail: ('a, [`Next]) node;
    }

  let create () =
    let dummy = Next {
        next = Nil;
        value = Obj.magic (); (* 2 *)
      }
    in
    { head = dummy; tail = dummy }

  let[@poll error][@inline never]
    enqueue t (Next new_tail_r as new_tail : (_, [< `Next]) node) =
    let Next old_tail_r = t.tail in
    t.tail <- new_tail;
    old_tail_r.next <- new_tail

  let enqueue t v =
    let (Next _ as new_tail : (_, [< `Next]) node) = Next {
        next = Nil;
        value = v (* 3 *)
      }
    in
    enqueue t new_tail

  let[@poll error][@inline never] dequeue_node t =
    let Next old_head_r = t.head in
    match old_head_r.next with
    | Nil -> Nil
    | Next _ as new_head ->
      t.head <- new_head;
      new_head

  let dequeue_opt t =
    match dequeue_node t with
    | Nil -> None
    | Next new_head_r ->
      let value = new_head_r.value in
      new_head_r.value <- Obj.magic (); (* 4 *)
      (* The allocation below is in an inlinable function and there is hope that
         an optimizing compiler for OCaml could eliminate the allocation. *)
      Some value
end
Let's give it a spin:
# let q : float Queue_unit.t = Queue_unit.create ()
val q : float Queue_unit.t = <abstr>

# Queue_unit.dequeue_opt q
- : float option = None

# Queue_unit.enqueue q 1.01
- : unit = ()

# Queue_unit.enqueue q 4.2
- : unit = ()

# Queue_unit.dequeue_opt q
- : float option = Some 1.01

# Queue_unit.dequeue_opt q
- : float option = Some 4.2

# Queue_unit.dequeue_opt q
- : float option = None
Is this safe?
A Next _ node will be allocated as a constructor block, because the next
field is not a float. The representation of 'a is also unknown to the
compiler. The compiler will treat the value field like elements of a non-float
array and Obj.magic (), which is an immediate value, can safely be stored in
such a field.
Here is
a Compiler Explorer link
for this example.
Avoiding float array pessimization

Arrays are a very common building block of high-performance data structures.
Consider the following array based wait-free systhread-safe bounded queue
implementation:
module type Bounded_queue = sig
  type 'a t
  val create : min_capacity:int -> 'a t
  val try_enqueue : 'a t -> 'a -> bool
  val dequeue_opt : 'a t -> 'a option
end

let ceil_pow2 x =
  if x <= 0 then 1
  else
    let x = x - 1 in
    let x = x lor (x lsr 1) in
    let x = x lor (x lsr 2) in
    let x = x lor (x lsr 4) in
    let x = x lor (x lsr 8) in
    let x = x lor (x lsr 16) in
    let x = if Sys.int_size <= 32 then x else x lor (x lsr 32) in
    x + 1

module Bounded_queue_opt : Bounded_queue = struct
  type 'a t = {
      mutable head : int;
      mutable tail : int;
      array : 'a option array; (* 1 *)
    }

  let create ~min_capacity =
    let capacity = ceil_pow2 min_capacity in {
      head = 0;
      tail = 0;
      array = Array.make capacity None; (* 2 *)
    }

  let[@poll error][@inline never] try_enqueue t opt =
    let capacity = Array.length t.array in
    t.tail - t.head < capacity && begin
      Array.unsafe_set t.array (t.tail land (capacity - 1)) opt;
      t.tail <- t.tail + 1;
      true
    end

  let try_enqueue t v = try_enqueue t (Some v) (* 3 *)

  let[@poll error][@inline never] dequeue_opt t =
    let size = t.tail - t.head in
    if size <= 0 then
      None
    else
      let capacity = Array.length t.array in
      let index = t.head land (capacity - 1) in
      let result = Array.unsafe_get t.array index in
      Array.unsafe_set t.array index None; (* 4 *)
      t.head <- t.head + 1;
      result
end
Notice the lines with comments. Once again, to avoid space leaks, we wrap values
as options (1, 3) and populate (2) or overwrite (4) with None to avoid
leaks.
To avoid the extra allocations and indirections one could use an array of plain
values 'a array and use Obj.magic () as the elements. Unfortunately, doing
so would trigger the float array pessimization. Aside from the performance cost,
it would also make it somewhat more tricky to implement a wait-free
systhread-safe version of the queue, because getting an element from the array
would potentially require an allocation.
We can avoid the float array pessimization by making OCaml think it is accessing
an array of non_float elements and we can also use the trick of creating a
zero value to avoid allocations:
module Bounded_queue_nfa : Bounded_queue = struct
  type non_float = [ `Non_float of non_float ]

  type 'a t = {
      mutable head : int;
      mutable tail : int;
      array : non_float array;
    }

  let create ~min_capacity =
    let capacity = ceil_pow2 min_capacity in {
      head = 0;
      tail = 0;
      array = Array.make capacity (Obj.magic ()) (* 1 *)
    }

  let[@poll error][@inline never] try_enqueue t v =
    let capacity = Array.length t.array in
    t.tail - t.head < capacity && begin
      Array.unsafe_set t.array (t.tail land (capacity - 1)) (Obj.magic v); (* 2 *)
      t.tail <- t.tail + 1;
      true
    end

  let zero = Obj.magic (ref 42) (* 3 *)

  let[@poll error][@inline never] dequeue_zero t =
    let size = t.tail - t.head in
    if size <= 0 then
      zero (* 4 *)
    else
      let capacity = Array.length t.array in
      let index = t.head land (capacity - 1) in
      let result = Array.unsafe_get t.array index in (* 5 *)
      Array.unsafe_set t.array index (Obj.magic ()); (* 6 *)
      t.head <- t.head + 1;
      result

  let dequeue_opt t =
    let value_or_zero = dequeue_zero t in
    if value_or_zero == zero then (* 7 *)
      None
    else
      Some (Obj.magic value_or_zero) (* 8 *)
end
Let's give it a spin:
# let bq : float Bounded_queue_nfa.t = Bounded_queue_nfa.create ~min_capacity:2
val bq : float Bounded_queue_nfa.t = <abstr>

# Bounded_queue_nfa.dequeue_opt bq
- : float option = None

# Bounded_queue_nfa.try_enqueue bq 1.01
- : bool = true

# Bounded_queue_nfa.try_enqueue bq 4.2
- : bool = true

# Bounded_queue_nfa.try_enqueue bq 7.6
- : bool = false

# Bounded_queue_nfa.dequeue_opt bq
- : float option = Some 1.01

# Bounded_queue_nfa.try_enqueue bq 7.6
- : bool = true

# Bounded_queue_nfa.dequeue_opt bq
- : float option = Some 4.2

# Bounded_queue_nfa.dequeue_opt bq
- : float option = Some 7.6

# Bounded_queue_nfa.dequeue_opt bq
- : float option = None
Is this safe?
To access an element of a non_float array, OCaml will generate code that works
with both immediate and pointer values, including float values, i.e. double
blocks, but will not with a float array target. OCaml decides the array
representation when allocating the array (1) based on the initial value. We
populate the array with Obj.magic () values. This makes OCaml allocate the
array as a constructor block and not as a double array block. The reads (5) and
writes (2, 6) are therefore safe as is the cast (8) to the result type after a
successful dequeue. The use of a unique value (3) used to indicate an empty
queue (4, 7) is also safe.
⚠️ But, wait, there is a gotcha! There is now a discrepancy between the type of
the array and the type of elements we actually store into the array. The basic
OCaml compilers do not notice the problem. However, in some cases the
Flambda compiler does notice such
discrepancies. In this particular case, we used the [@poll error] and
[@inline never] annotations on some of the functions, because we want those
functions to not contain any poll points. If we remove those annotations Flambda
will complain:
>> Fatal error: Assignment of a float to a specialised non-float array:
[...]

In this case, disallowing inlining prevents Flambda from exposing the sleight of
hand.
Here is
a Compiler Explorer
link for this example.
What about abstract types?

In the previous example, we used the non_float type to trick OCaml to generate
the array accesses we wanted. This trick was used, because we essentially wanted
to use an array of any given type. If the array elements need not be of
arbitrary type, but are rather known to be of some type whose representation is
known to the compiler, then the non_float trick may be unnecessary.
Unfortunately, there is a catch. The OCaml compiler does not propagate
representation information through abstract types.
As an example, at the time of writing this, Atomic.t is an abstract type
constructor. This means that, to access an array of atomics, _ Atomic.t array,
OCaml will invoke the float array pessimization. Funnily enough, it does that
even though it is smart enough to specialize Atomic.get to a single
instruction on the x86.
Consider the following example:
let atomic_array_get xs i =
  let x = Array.unsafe_get xs i in (* 1 *)
  Atomic.get x (* 2 *)
As one can confirm with
Compiler Explorer,
the compiler is smart enough to generate a single instruction corresponding to
the Atomic.get:
camlExample__atomic_array_get_268:
        subq    $8, %rsp
        movzbq  -8(%rax), %rdi
        cmpq    $254, %rdi               ; Is it a double array?
        je      .L101
        movq    -4(%rax,%rbx,4), %rdi
        jmp     .L100
.L101:
        subq    $16, %r15
        cmpq    (%r14), %r15
        jb      .L103
.L105:
        leaq    8(%r15), %rdi
        movq    $1277, -8(%rdi)
        movsd   -4(%rax,%rbx,4), %xmm0
        movsd   %xmm0, (%rdi)
.L100:
        movq    (%rdi), %rax             ; Atomic.get
        addq    $8, %rsp
        ret
.L103:
        call    caml_call_gc@PLT
.L104:
        jmp     .L105
However, at the same time, the compiler generates code for a potential float
array.

Indeed.
Avoiding Atomic.t allocation and indirection

OCaml recently acquired multicore support and part of that support is the
Atomic module and the ref-like Atomic.t type constructor. Consider the
following linked list based lock-free set implementation using physical
equality:
module type Physical_set = sig
  type 'a t
  val create : unit -> 'a t
  val try_add : 'a t -> 'a -> bool
  val try_remove : 'a t -> 'a -> bool
end

module Physical_set_external_atomic : Physical_set = struct
  type ('a, _) node =
    | Null : ('a, [> `Null]) node
    | Node : {
        next: 'a link Atomic.t; (* 1 *)
        value: 'a
      } -> ('a, [> `Node]) node
    | Mark : ('a, [< `Null | `Node]) node -> ('a, [> `Mark]) node

  and 'a link =
    | Link : ('a, [< `Null | `Node | `Mark]) node -> 'a link [@@unboxed]

  type 'a t = 'a link Atomic.t

  let create () = Atomic.make (Link Null)

  let rec find_node t prev value : _ -> (_, [< `Null | `Node ]) node = function
    | Link (Mark _) -> find_node t t value (Atomic.get t)
    | Link Null -> Null
    | Link (Node r as node) as before ->
        match Atomic.get r.next with
        | Link (Mark node) ->
            if Atomic.compare_and_set prev before (Link node) then
              find_node t prev value (Link node)
            else
              find_node t prev value (Atomic.get prev)
        | Link (Null | Node _) as next ->
            if r.value == value then
              node
            else
              find_node t r.next value next

  let rec try_add t value =
    let before = Atomic.get t in
    match find_node t t value before with
    | Node _ -> false
    | Null ->
        let after =
          Node {
            next = Atomic.make before; (* 2 *)
            value
          }
        in
        Atomic.compare_and_set t before (Link after)
        || try_add t value

  let rec try_remove t value =
    match find_node t t value (Atomic.get t) with
    | Null -> false
    | Node r ->
        match Atomic.get r.next with
        | Link (Mark _) ->
            false
        | Link ((Null | Node _) as next) as before ->
            let after = Mark next in
            if Atomic.compare_and_set r.next before (Link after)
            then begin
              find_node t t value (Atomic.get t) |> ignore;
              true
            end
            else try_remove t value
end
Notice the lines marked with (* ? *). The links within the list go through
Atomic.t blocks. Internally a _ Atomic.t has essentially the same
representation as a _ ref. Unfortunately, there is nothing like the mutable
specifier for record fields to allow one to "inline" the atomic to the record
and avoid the indirection. The use of Atomic.t effectively doubles the number
of indirections in the list. Doubling the indirections is effectively the same
as making the list twice as long and
can actually double the amount of time operations on the data structure take.
We can, however, use Obj.magic to trick the OCaml compiler to treat the first
field of a record as an Atomic.t:
module Physical_set_inlined : Physical_set = struct
  type ('a, _) node =
    | Null : ('a, [> `Null]) node
    | Node : {
        mutable next: 'a link; (* 1 *)
        value: 'a
      } -> ('a, [> `Node]) node
    | Mark : ('a, [< `Null | `Node]) node -> ('a, [> `Mark]) node

  and 'a link =
    | Link : ('a, [< `Null | `Node | `Mark]) node -> 'a link [@@unboxed]

  type 'a t = 'a link Atomic.t

  let create () = Atomic.make (Link Null)

  let next_as_atomic (Node _ as node : ('a, [< `Node]) node) =
    (Obj.magic node : 'a link Atomic.t) (* 2 *)

  let rec find_node t prev value : _ -> (_, [< `Null | `Node ]) node = function
    | Link (Mark _) -> find_node t t value (Atomic.get t)
    | Link Null -> Null
    | Link (Node r as node) as before ->
        match Atomic.get (next_as_atomic node) with
        | Link (Mark node) ->
            if Atomic.compare_and_set prev before (Link node) then
              find_node t prev value (Link node)
            else
              find_node t prev value (Atomic.get prev)
        | Link (Null | Node _) as next ->
            if r.value == value then
              node
            else
              find_node t (next_as_atomic node) value next

  let rec try_add t value =
    let before = Atomic.get t in
    match find_node t t value before with
    | Node _ -> false
    | Null ->
        let after =
          Node {
            next = before; (* 3 *)
            value
          }
        in
        Atomic.compare_and_set t before (Link after)
        || try_add t value

  let rec try_remove t value =
    match find_node t t value (Atomic.get t) with
    | Null -> false
    | Node _ as node ->
        match Atomic.get (next_as_atomic node) with
        | Link (Mark _) ->
            false
        | Link ((Null | Node _) as next) as before ->
            let after = Mark next in
            if Atomic.compare_and_set (next_as_atomic node) before (Link after)
            then begin
              find_node t t value (Atomic.get t) |> ignore;
              true
            end
            else try_remove t value
end
Let's give it a spin:
# let s : int Physical_set_inlined.t = Physical_set_inlined.create ()
val s : int Physical_set_inlined.t = <abstr>

# Physical_set_inlined.try_add s 101
- : bool = true

# Physical_set_inlined.try_add s 42
- : bool = true

# Physical_set_inlined.try_remove s 101
- : bool = true

# Physical_set_inlined.try_add s 42
- : bool = false

# Physical_set_inlined.try_remove s 42
- : bool = true

# Physical_set_inlined.try_remove s 101
- : bool = false
Is this safe?
An 'a Atomic.t has the same representation as a 'a ref, which is a record
with a single mutable field. No special optimizations are made for floats
nor for any other type of values and a Node _ record is also allocated as a
constructor block rather than as a float array. OCaml compilers do not perform
record layout optimizations, which means that the layouts of 'a Atomic.t and
of a record whose first element is of type 'a are compatible and the accesses
are safe.
Here is
a Compiler Explorer link
for this example.
Well...

There is technically no guarantee of any kind when using Obj.magic. The reason
for that is that, in the future, the OCaml runtime might be changed in some way,
or a new compiler for OCaml might emerge, that makes the reasoning based on the
current implementation(s) and uniform representation of values invalid.
Does that make me worried?
No, it does not. Primarily for two reasons:


Obj.magic is fairly commonly used. Try the
Sherlocode tool, for example, and don't
forget to try clicking for more. This is a roundabout way to say that changes
to the representation of values are likely to break a lot of existing code.
While that does not entirely prevent changes from being made, it is unlikely
that such changes would be just suddenly introduced. If significant changes
are made, it will take a long time.

Perhaps, or rather, in our hopes and dreams, one of the most likely future
changes is dropping the float array pessimization. That change will not break
any of the examples in this note, but would make the uses of Obj.magic to
avoid the pessimization redundant. In all likelyhood, it will take many years
before that happens.


None of the uses of Obj.magic are apparent from the signatures of the data
structures. Part of the reason why the motivating examples make no use of
Obj.magic is to show that it can be done. What this means is that, should
some use of Obj.magic become broken or unsafe, one could always adjust the
data structure implementation and release a new version of it.


In other words, it should be easy to fix unlikely future issues and there should
be plenty of time to fix them.
Do I recommend using Obj.magic?
No, I don't really recommend it. All other things being equal, I'd rather not
have the Obj module at all and would rather have an optimizing compiler that
would not suffer from the float array pessimization and would be able to perform
data representation optimizations. Unfortunately that is not where we are now
and there are cases where Obj.magic can significantly reduce code size and
improve performance.

  
## dune
(mdx
 (files article.md))

## dune-project
(lang dune 3.8)

(using mdx 0.4)