Let me first quote a paragraph from the Memory Representation of Values (with my emphasis):
Obj is an undocumented module that exposes the internals of the OCaml compiler and runtime. It is very useful for examining and understanding how your code will behave at runtime but should never be used for production code unless you understand the implications. The module bypasses the OCaml type system, making memory corruption and segmentation faults possible.
If you ask about Obj.magic
on public OCaml forums, you'll likely get a warning
of some sort. Here is mine: If you found this note, because you are just
learning OCaml and have no idea of what you are doing, you should move on. Avoid
Obj.magic
at all cost.
If, on the other hand, you are actually implementing a supposedly high-performance data structure and care about the cost, and you want to understand the implications, then this note might be for you. Knowledge and understanding are far more useful than fear of the unknown. Understanding the memory representation of values can significantly help one when trying to improve the performance of data structures in OCaml.
In this note I'm going to first talk a bit about the
memory representation of values
and
about unboxed float arrays.
This is not meant to be a comprehensive treatment on the subject, however. I
only want to point out a number of specific features. Then I'll go through a
number of inefficiencies one tends to encounter when trying to implement
high-performance data structures and I'll briefly show how one might use
Obj.magic
to avoid them.
As explained in numerous other documents, the OCaml runtime uses a uniform memory representation of values. What this means is that an OCaml value is stored as a single, 32-bit or 64-bit, word and is either an immediate integer or a pointer. A bit in each value indicates whether the value is an immediate integer or a pointer. If the value is a pointer, it will point to a block of memory that begins with a control word containing information on the block. The information includes, among other things, the size of the block and a "tag" that tells what kind of block it is.
In the rest of this note we are mostly interested in the representation of these blocks and, in particular, we need to distinguish between a few different kinds of blocks:
- constructor blocks, having one of the
constructor_tag
s, - double blocks, having the
double_tag
, and - double array blocks, having the
double_array_tag
.
A constructor block contains entries stored as 32-bit or 64-bit words, just like any other OCaml values, a double block contains a single double precision floating point number stored as a 64-bit entry, taking either two 32-bit words or one 64-bit word, and a double array block contains double precision floating point numbers stored as 64-bit entries.
Entries in those different types of blocks need to be accessed in different
ways. Using Obj.magic
we can trick OCaml into performing accesses on values
that it would not otherwise allow. If we do that carelessly, it is possible to
trick OCaml into performing invalid accesses, that, if we are lucky, fail at
runtime and crash the program.
Let's create a little helper named tag_name_of
that tells us the tag of an
OCaml value:
let tag_names = [|
Obj.forcing_tag, "forcing_tag";
Obj.cont_tag, "cont_tag";
Obj.lazy_tag, "lazy_tag";
Obj.closure_tag, "closure_tag";
Obj.object_tag, "object_tag";
Obj.infix_tag, "infix_tag";
Obj.forward_tag, "forward_tag";
Obj.no_scan_tag, "no_scan_tag";
Obj.abstract_tag, "abstract_tag";
Obj.string_tag, "string_tag";
Obj.double_tag, "double_tag";
Obj.double_array_tag, "double_array_tag";
Obj.custom_tag, "custom_tag";
Obj.int_tag, "int_tag";
Obj.out_of_heap_tag, "out_of_heap_tag";
Obj.unaligned_tag, "unaligned_tag";
|]
let tag_name_of x =
let tag = Obj.tag (Obj.repr x) in
if Obj.first_non_constant_constructor_tag <= tag &&
tag <= Obj.last_non_constant_constructor_tag then
Printf.sprintf "constructor_tag %d" tag
else
match Array.find_opt (fun (tag', _) -> tag == tag') tag_names with
| Some (_, name) -> name
| None -> Printf.sprintf "unknown tag %d" tag
Let's then examine some OCaml values. An int
value is stored as an immediate
integer value as are nullary or constant constructors:
# tag_name_of 101
- : string = "int_tag"
# tag_name_of None
- : string = "int_tag"
# tag_name_of `Nullary
- : string = "int_tag"
As one might expect a float
is stored as block with the double_tag
:
# tag_name_of 1.01
- : string = "double_tag"
Pretty much everything else, i.e. non-constant constructors, most records and tuples, ref-cells (which are actually records), and most arrays, are stored as constructor blocks:
# tag_name_of (Ok "value")
- : string = "constructor_tag 0"
# tag_name_of (Error "value")
- : string = "constructor_tag 1"
# tag_name_of (4.2, 1.01)
- : string = "constructor_tag 0"
# tag_name_of (ref 1.01)
- : string = "constructor_tag 0"
# tag_name_of (Atomic.make 1.01)
- : string = "constructor_tag 0"
# tag_name_of [|101; 42|]
- : string = "constructor_tag 0"
The notable exceptions are records that only contain float
fields and
float array
s:
# let open struct type vec2 = { x: float; y: float } end in
tag_name_of { x = 4.2; y = 1.01 }
- : string = "double_array_tag"
# tag_name_of (Array.make 101 4.1)
- : string = "double_array_tag"
As explained in the post
About unboxed float arrays,
the special treatment of float array
s has both pros and cons. In this note we
are mostly interested in the cons, how to deal with the possibility of
float array
s, and how to avoid the costs when they are unnecessary.
Let's examine the code that OCaml generates for array accesses using the Compiler Explorer. OCaml generates different code for an array access depending on what it knows about the type of the array elements.
For an array of immediate elements:
type immediate [@@immediate]
let immediate_get (xs: immediate array) i =
Array.unsafe_get xs i
let immediate_set (xs: immediate array) i v =
Array.unsafe_set xs i v
OCaml generates what one might expect:
camlExample__immediate_get_269:
movq -4(%rax,%rbx,4), %rax
ret
camlExample__immediate_set_317:
movq %rdi, -4(%rax,%rbx,4)
movl $1, %eax
ret
This is the most efficient case — essentially a single instruction for an array access.
Things get more interesting for an array of non-float elements:
type non_float = [ `Non_float of non_float ]
let non_float_get (xs: non_float array) i =
Array.unsafe_get xs i
let non_float_set (xs: non_float array) i v =
Array.unsafe_set xs i v
OCaml has no way to say to the compiler that an abstract type is not a float
.
So, above we define a concrete type non_float
that the compiler can analyze
and see that it is not an immediate
type and also that it isn't a float
. The
compiler generates array accesses that assume that the values may be pointers:
camlExample__non_float_get_323:
movq -4(%rax,%rbx,4), %rax
ret
camlExample__non_float_set_327:
subq $8, %rsp
movq %rdi, %rsi
leaq -4(%rax,%rbx,4), %rdi
movq %rsp, %rbp
movq 56(%r14), %rsp
call caml_modify@PLT
movq %rbp, %rsp
movl $1, %eax
addq $8, %rsp
ret
A get
is still optimal, but now a set
calls caml_modify
. The caml_modify
function, which is part of the OCaml runtime, will perform the necessary GC
write barrier and fences mandated by
the OCaml memory model.
Things get perhaps even more interesting for an array of float
elements:
let float_get (xs: float array) i =
Array.unsafe_get xs i
let float_set (xs: float array) i v =
Array.unsafe_set xs i v
While a set
is still fairly efficient, a get
actually allocates memory:
camlExample__float_get_332:
subq $8, %rsp
subq $16, %r15 ; Allocate 16 bytes for double block
cmpq (%r14), %r15 ; Check whether a GC call is needed
jb .L105
.L107:
leaq 8(%r15), %rdi
movq $1277, -8(%rdi) ; Write double block header word
movsd -4(%rax,%rbx,4), %xmm0 ; Read from array
movsd %xmm0, (%rdi) ; Write double block value
movq %rdi, %rax
addq $8, %rsp
ret
.L105:
call caml_call_gc@PLT
.L106:
jmp .L107
camlExample__float_set_336:
movsd (%rdi), %xmm0 ; Read double block value (*)
movsd %xmm0, -4(%rax,%rbx,4) ; Write double array element
movl $1, %eax
ret
float_set
code. A
float
value is generally represented as a double block. So, before writing the
value to the double array block, the value is read from the double block. Now,
consider the following function:
let will_this_crash_or_not xs =
float_set xs 0 (Obj.magic ())
If you call will_this_crash_or_not [| 1.01 |]
the program will most likely
crash. Notice the instruction marked with a (*)
. It attempts to read the value
from a double block using the value in register rdi
as a pointer.
Obj.magic ()
is an immediate value. When that immediate value is interpreted
as a pointer, it will most likely point to an address in memory that the program
is not allowed to access and so the call will most likely crash.
Based on what we saw previously, one might already be able to guess what happens when one accesses arrays of unknown (abstract or polymorphic) type:
let polymorphic_get xs i =
Array.unsafe_get xs i
let polymorphic_set xs i v =
Array.unsafe_set xs i v
The compiler has no choice but to generate code that can handle any one of the previous cases:
camlExample__polymorphic_get_341:
subq $8, %rsp
movzbq -8(%rax), %rdi ; Fetch tag from array header word
cmpq $254, %rdi ; Is it a double array?
je .L109
movq -4(%rax,%rbx,4), %rax ; (*)
addq $8, %rsp
ret
.L109:
subq $16, %r15 ; Allocate 16 bytes for double block
cmpq (%r14), %r15 ; Check whether a GC call is needed
jb .L111
.L113:
leaq 8(%r15), %rdi
movq $1277, -8(%rdi) ; Write double block header word
movsd -4(%rax,%rbx,4), %xmm0 ; Read from array
movsd %xmm0, (%rdi) ; Write double block value
movq %rdi, %rax
addq $8, %rsp
ret
.L111:
call caml_call_gc@PLT
.L112:
jmp .L113
camlExample__polymorphic_set_345:
subq $8, %rsp
movq %rdi, %rsi
movzbq -8(%rax), %rdi ; Fetch tag from array header word
cmpq $254, %rdi ; Is it a double array?
je .L115
leaq -4(%rax,%rbx,4), %rdi
movq %rsp, %rbp
movq 56(%r14), %rsp
call caml_modify@PLT ; (*)
movq %rbp, %rsp
jmp .L114
.L115:
movsd (%rsi), %xmm0 ; Read double block value
movsd %xmm0, -4(%rax,%rbx,4) ; Write double array element
.L114:
movl $1, %eax
addq $8, %rsp
ret
That is quite an explosion from the best case of a single instruction for both a
get
and a set
. The possibility of dealing with a float array
significantly
increases the amount of code generated for and slows down array accesses. I
refer to this effect as float array pessimization — the compiler
pessimistically generates expensive code.
(*)
. When the compiler does not know the representation of
the elements, a single instruction is sufficient for the get
and a call to
caml_modify
is sufficient for the set
of the non-float array
case. The
caml_modify
function can handle both immediate and pointer like values,
including float
values, i.e. double blocks, but it does not handle
float array
s as targets.
The OCaml runtime uses a uniform representation values, where a value is either
an immediate integer value or a pointer to a block, which can, for our purposes,
be a constructor block, a double block, or a double array block. When using
Obj.magic
it is important to distinguish between these representations.
Avoiding float array pessimization can both significantly reduce code size and
improve performance.
In the following subsections I will go through a few examples or cases where
Obj.magic
can be used to avoid some inefficiency.
Consider the following data structure:
module type Mvar = sig
type 'a t
val create : 'a option -> 'a t
val take_opt : 'a t -> 'a option
val try_give : 'a t -> 'a -> bool
end
module Mvar_opt : Mvar = struct
type 'a t = 'a option ref
let create = ref
let[@poll error][@inline never] take_opt t =
match !t with
| None -> None
| Some _ as some ->
t := None;
some
let[@poll error][@inline never] try_give t opt =
!t == None && begin
t := opt;
true
end
let try_give t v =
(* A [@poll error] function must not allocate, which is why we do the
allocation outside of the critical section. *)
try_give t (Some v)
end
The above could be the beginnings of a wait-free systhread-safe implementation of a so called M-var.
A performance issue with the above implementation is that a try_give
operation
requires an allocation and that there is an extra indirection due to the use of
an option
.
If there was a way to have a zero
value that is distinct from all other values
then we wouldn't need the option
.
Well, there is a way. Recall that the OCaml runtime uses a uniform
representation of values. Each value is either an immediate integer value or a
pointer to a block. A pointer to a unique block is distinct from all other
values. Allocating a fresh mutable block, such as a ref
cell, gives us a
unique value. We can then use Obj.magic
to cast that unique value to any type.
We then need to make sure that we don't let the zero
value escape outside of
the M-var implementation:
module Mvar_zero : Mvar = struct
type 'a t = 'a ref
let zero =
(* [zero] is a unique value distinct from all other OCaml values. *)
Obj.magic (ref 101)
let create opt =
(* The [match] below is in an inlinable function and there is hope that an
optimizing compiler for OCaml could eliminate the implied allocation of
[Some value] in some cases. *)
ref (match opt with None -> zero | Some value -> value)
let[@poll error][@inline never] take_zero t =
let value_or_zero = !t in
(* We compare using [!=] or [==]. *)
if value_or_zero != zero then
t := zero;
value_or_zero
let take_opt t =
let value_or_zero = take_zero t in
if value_or_zero == zero then
None
else
(* The allocation below is in an inlinable function and there is hope that
an optimizing compiler for OCaml could eliminate this allocation. *)
Some value_or_zero
let[@poll error][@inline never] try_give t value =
!t == zero && begin
t := value;
true
end
end
Let's give it a spin:
# let mv = Mvar_zero.create (Some 4.2)
val mv : float Mvar_zero.t = <abstr>
# Mvar_zero.take_opt mv
- : float option = Some 4.2
# Mvar_zero.take_opt mv
- : float option = None
# Mvar_zero.try_give mv 1.01
- : bool = true
# Mvar_zero.take_opt mv
- : float option = Some 1.01
Is this safe?
A ref
cell is allocated as a constructor block, because the representation of
the value it holds is unknown. In other words, a ref
cell is not subject to
the float array pessimization. Because the representation of the element stored
by M-var is unknown to the compiler, the compiler will generate calls to
caml_modify
to update the value and that is safe for both immediate values and
pointer values. When used with a float
value, the float
value will be
treated as a uniformly represented value like any other pointer value.
Here is a Compiler Explorer link for this example.
Consider the following node based and wait-free systhread-safe queue using a GADT to avoid some unnecessary match cases:
module type Queue = sig
type 'a t
val create : unit -> 'a t
val enqueue : 'a t -> 'a -> unit
val dequeue_opt : 'a t -> 'a option
end
module Queue_opt : Queue = struct
type ('a, _) node =
| Nil : ('a, [> `Nil]) node
| Next : {
mutable next: ('a, [`Nil | `Next]) node;
mutable value: 'a option; (* 1 *)
} -> ('a, [> `Next]) node
type 'a t = {
mutable head: ('a, [`Next]) node;
mutable tail: ('a, [`Next]) node;
}
let create () =
let dummy = Next {
next = Nil;
value = None; (* 2 *)
}
in
{ head = dummy; tail = dummy }
let[@poll error][@inline never]
enqueue t (Next new_tail_r as new_tail : (_, [< `Next]) node) =
let Next old_tail_r = t.tail in
t.tail <- new_tail;
old_tail_r.next <- new_tail
let enqueue t v =
let (Next _ as new_tail : (_, [< `Next]) node) = Next {
next = Nil;
value = Some v (* 3 *)
}
in
enqueue t new_tail
let[@poll error][@inline never] dequeue_opt t =
let Next old_head_r = t.head in
match old_head_r.next with
| Nil -> None
| Next new_head_r as new_head ->
let result = new_head_r.value in
new_head_r.value <- None; (* 4 *)
t.head <- new_head;
result
end
Notice the lines marked with (* ? *)
. The queue makes it so that both head
and tail
always point to a non-Nil
node. In other words, the queue uses a so
called sentinel node. The value
of the node pointed to by the head
is not
logically part of the queue and the next
field of the node pointed to by the
tail
is Nil
. When the queue is empty, head
and tail
point to the same
node.
When we remove a value from the queue, we need to make sure that the value is
not retained by the queue. This is the reason why we wrap the value as an
option
(1), which requires allocating extra space (3), and populate (2) or
overwrite (3) the value
field with None
when the node is empty.
We could use the same trick as we used in the previous section to create a
unique zero
value to avoid allocating extra space for the option
and avoid
the associated indirection, but in this case we can do even better. We can
simply use Obj.magic ()
as the "no leak" value, because we don't need to be
able to tell it apart from valid values:
module Queue_unit : Queue = struct
type ('a, _) node =
| Nil : ('a, [> `Nil]) node
| Next : {
mutable next: ('a, [`Nil | `Next]) node;
mutable value: 'a; (* 1 *)
} -> ('a, [> `Next]) node
type 'a t = {
mutable head: ('a, [`Next]) node;
mutable tail: ('a, [`Next]) node;
}
let create () =
let dummy = Next {
next = Nil;
value = Obj.magic (); (* 2 *)
}
in
{ head = dummy; tail = dummy }
let[@poll error][@inline never]
enqueue t (Next new_tail_r as new_tail : (_, [< `Next]) node) =
let Next old_tail_r = t.tail in
t.tail <- new_tail;
old_tail_r.next <- new_tail
let enqueue t v =
let (Next _ as new_tail : (_, [< `Next]) node) = Next {
next = Nil;
value = v (* 3 *)
}
in
enqueue t new_tail
let[@poll error][@inline never] dequeue_node t =
let Next old_head_r = t.head in
match old_head_r.next with
| Nil -> Nil
| Next _ as new_head ->
t.head <- new_head;
new_head
let dequeue_opt t =
match dequeue_node t with
| Nil -> None
| Next new_head_r ->
let value = new_head_r.value in
new_head_r.value <- Obj.magic (); (* 4 *)
(* The allocation below is in an inlinable function and there is hope that
an optimizing compiler for OCaml could eliminate the allocation. *)
Some value
end
Let's give it a spin:
# let q : float Queue_unit.t = Queue_unit.create ()
val q : float Queue_unit.t = <abstr>
# Queue_unit.dequeue_opt q
- : float option = None
# Queue_unit.enqueue q 1.01
- : unit = ()
# Queue_unit.enqueue q 4.2
- : unit = ()
# Queue_unit.dequeue_opt q
- : float option = Some 1.01
# Queue_unit.dequeue_opt q
- : float option = Some 4.2
# Queue_unit.dequeue_opt q
- : float option = None
Is this safe?
A Next _
node will be allocated as a constructor block, because the next
field is not a float
. The representation of 'a
is also unknown to the
compiler. The compiler will treat the value
field like elements of a non-float
array and Obj.magic ()
, which is an immediate value, can safely be stored in
such a field.
Here is a Compiler Explorer link for this example.
Arrays are a very common building block of high-performance data structures.
Consider the following array based wait-free systhread-safe bounded queue implementation:
module type Bounded_queue = sig
type 'a t
val create : min_capacity:int -> 'a t
val try_enqueue : 'a t -> 'a -> bool
val dequeue_opt : 'a t -> 'a option
end
let ceil_pow2 x =
if x <= 0 then 1
else
let x = x - 1 in
let x = x lor (x lsr 1) in
let x = x lor (x lsr 2) in
let x = x lor (x lsr 4) in
let x = x lor (x lsr 8) in
let x = x lor (x lsr 16) in
let x = if Sys.int_size <= 32 then x else x lor (x lsr 32) in
x + 1
module Bounded_queue_opt : Bounded_queue = struct
type 'a t = {
mutable head : int;
mutable tail : int;
array : 'a option array; (* 1 *)
}
let create ~min_capacity =
let capacity = ceil_pow2 min_capacity in {
head = 0;
tail = 0;
array = Array.make capacity None; (* 2 *)
}
let[@poll error][@inline never] try_enqueue t opt =
let capacity = Array.length t.array in
t.tail - t.head < capacity && begin
Array.unsafe_set t.array (t.tail land (capacity - 1)) opt;
t.tail <- t.tail + 1;
true
end
let try_enqueue t v = try_enqueue t (Some v) (* 3 *)
let[@poll error][@inline never] dequeue_opt t =
let size = t.tail - t.head in
if size <= 0 then
None
else
let capacity = Array.length t.array in
let index = t.head land (capacity - 1) in
let result = Array.unsafe_get t.array index in
Array.unsafe_set t.array index None; (* 4 *)
t.head <- t.head + 1;
result
end
Notice the lines with comments. Once again, to avoid space leaks, we wrap values
as option
s (1, 3) and populate (2) or overwrite (4) with None
to avoid
leaks.
To avoid the extra allocations and indirections one could use an array of plain
values 'a array
and use Obj.magic ()
as the elements. Unfortunately, doing
so would trigger the float array pessimization. Aside from the performance cost,
it would also make it somewhat more tricky to implement a wait-free
systhread-safe version of the queue, because getting an element from the array
would potentially require an allocation.
We can avoid the float array pessimization by making OCaml think it is accessing
an array of non_float
elements and we can also use the trick of creating a
zero
value to avoid allocations:
module Bounded_queue_nfa : Bounded_queue = struct
type non_float = [ `Non_float of non_float ]
type 'a t = {
mutable head : int;
mutable tail : int;
array : non_float array;
}
let create ~min_capacity =
let capacity = ceil_pow2 min_capacity in {
head = 0;
tail = 0;
array = Array.make capacity (Obj.magic ()) (* 1 *)
}
let[@poll error][@inline never] try_enqueue t v =
let capacity = Array.length t.array in
t.tail - t.head < capacity && begin
Array.unsafe_set t.array (t.tail land (capacity - 1)) (Obj.magic v); (* 2 *)
t.tail <- t.tail + 1;
true
end
let zero = Obj.magic (ref 42) (* 3 *)
let[@poll error][@inline never] dequeue_zero t =
let size = t.tail - t.head in
if size <= 0 then
zero (* 4 *)
else
let capacity = Array.length t.array in
let index = t.head land (capacity - 1) in
let result = Array.unsafe_get t.array index in (* 5 *)
Array.unsafe_set t.array index (Obj.magic ()); (* 6 *)
t.head <- t.head + 1;
result
let dequeue_opt t =
let value_or_zero = dequeue_zero t in
if value_or_zero == zero then (* 7 *)
None
else
Some (Obj.magic value_or_zero) (* 8 *)
end
Let's give it a spin:
# let bq : float Bounded_queue_nfa.t = Bounded_queue_nfa.create ~min_capacity:2
val bq : float Bounded_queue_nfa.t = <abstr>
# Bounded_queue_nfa.dequeue_opt bq
- : float option = None
# Bounded_queue_nfa.try_enqueue bq 1.01
- : bool = true
# Bounded_queue_nfa.try_enqueue bq 4.2
- : bool = true
# Bounded_queue_nfa.try_enqueue bq 7.6
- : bool = false
# Bounded_queue_nfa.dequeue_opt bq
- : float option = Some 1.01
# Bounded_queue_nfa.try_enqueue bq 7.6
- : bool = true
# Bounded_queue_nfa.dequeue_opt bq
- : float option = Some 4.2
# Bounded_queue_nfa.dequeue_opt bq
- : float option = Some 7.6
# Bounded_queue_nfa.dequeue_opt bq
- : float option = None
Is this safe?
To access an element of a non_float array
, OCaml will generate code that works
with both immediate and pointer values, including float
values, i.e. double
blocks, but will not with a float array
target. OCaml decides the array
representation when allocating the array (1) based on the initial value. We
populate the array with Obj.magic ()
values. This makes OCaml allocate the
array as a constructor block and not as a double array block. The reads (5) and
writes (2, 6) are therefore safe as is the cast (8) to the result type after a
successful dequeue. The use of a unique value (3) used to indicate an empty
queue (4, 7) is also safe.
[@poll error]
and
[@inline never]
annotations on some of the functions, because we want those
functions to not contain any poll points. If we remove those annotations Flambda
will complain:
>> Fatal error: Assignment of a float to a specialised non-float array:
[...]
In this case, disallowing inlining prevents Flambda from exposing the sleight of hand.
Here is a Compiler Explorer link for this example.
In the previous example, we used the non_float
type to trick OCaml to generate
the array accesses we wanted. This trick was used, because we essentially wanted
to use an array of any given type. If the array elements need not be of
arbitrary type, but are rather known to be of some type whose representation is
known to the compiler, then the non_float
trick may be unnecessary.
Unfortunately, there is a catch. The OCaml compiler does not propagate representation information through abstract types.
As an example, at the time of writing this, Atomic.t
is an abstract type
constructor. This means that, to access an array of atomics, _ Atomic.t array
,
OCaml will invoke the float array pessimization. Funnily enough, it does that
even though it is smart enough to specialize Atomic.get
to a single
instruction on the x86.
Consider the following example:
let atomic_array_get xs i =
let x = Array.unsafe_get xs i in (* 1 *)
Atomic.get x (* 2 *)
As one can confirm with
Compiler Explorer,
the compiler is smart enough to generate a single instruction corresponding to
the Atomic.get
:
camlExample__atomic_array_get_268:
subq $8, %rsp
movzbq -8(%rax), %rdi
cmpq $254, %rdi ; Is it a double array?
je .L101
movq -4(%rax,%rbx,4), %rdi
jmp .L100
.L101:
subq $16, %r15
cmpq (%r14), %r15
jb .L103
.L105:
leaq 8(%r15), %rdi
movq $1277, -8(%rdi)
movsd -4(%rax,%rbx,4), %xmm0
movsd %xmm0, (%rdi)
.L100:
movq (%rdi), %rax ; Atomic.get
addq $8, %rsp
ret
.L103:
call caml_call_gc@PLT
.L104:
jmp .L105
However, at the same time, the compiler generates code for a potential float array.
Indeed.
OCaml recently acquired multicore support and part of that support is the
Atomic
module and the ref-like Atomic.t
type constructor. Consider the
following linked list based lock-free set implementation using physical
equality:
module type Physical_set = sig
type 'a t
val create : unit -> 'a t
val try_add : 'a t -> 'a -> bool
val try_remove : 'a t -> 'a -> bool
end
module Physical_set_external_atomic : Physical_set = struct
type ('a, _) node =
| Null : ('a, [> `Null]) node
| Node : {
next: 'a link Atomic.t; (* 1 *)
value: 'a
} -> ('a, [> `Node]) node
| Mark : ('a, [< `Null | `Node]) node -> ('a, [> `Mark]) node
and 'a link =
| Link : ('a, [< `Null | `Node | `Mark]) node -> 'a link [@@unboxed]
type 'a t = 'a link Atomic.t
let create () = Atomic.make (Link Null)
let rec find_node t prev value : _ -> (_, [< `Null | `Node ]) node = function
| Link (Mark _) -> find_node t t value (Atomic.get t)
| Link Null -> Null
| Link (Node r as node) as before ->
match Atomic.get r.next with
| Link (Mark node) ->
if Atomic.compare_and_set prev before (Link node) then
find_node t prev value (Link node)
else
find_node t prev value (Atomic.get prev)
| Link (Null | Node _) as next ->
if r.value == value then
node
else
find_node t r.next value next
let rec try_add t value =
let before = Atomic.get t in
match find_node t t value before with
| Node _ -> false
| Null ->
let after =
Node {
next = Atomic.make before; (* 2 *)
value
}
in
Atomic.compare_and_set t before (Link after)
|| try_add t value
let rec try_remove t value =
match find_node t t value (Atomic.get t) with
| Null -> false
| Node r ->
match Atomic.get r.next with
| Link (Mark _) ->
false
| Link ((Null | Node _) as next) as before ->
let after = Mark next in
if Atomic.compare_and_set r.next before (Link after)
then begin
find_node t t value (Atomic.get t) |> ignore;
true
end
else try_remove t value
end
Notice the lines marked with (* ? *)
. The links within the list go through
Atomic.t
blocks. Internally a _ Atomic.t
has essentially the same
representation as a _ ref
. Unfortunately, there is nothing like the mutable
specifier for record fields to allow one to "inline" the atomic to the record
and avoid the indirection. The use of Atomic.t
effectively doubles the number
of indirections in the list. Doubling the indirections is effectively the same
as making the list twice as long and
can actually double the amount of time operations on the data structure take.
We can, however, use Obj.magic
to trick the OCaml compiler to treat the first
field of a record as an Atomic.t
:
module Physical_set_inlined : Physical_set = struct
type ('a, _) node =
| Null : ('a, [> `Null]) node
| Node : {
mutable next: 'a link; (* 1 *)
value: 'a
} -> ('a, [> `Node]) node
| Mark : ('a, [< `Null | `Node]) node -> ('a, [> `Mark]) node
and 'a link =
| Link : ('a, [< `Null | `Node | `Mark]) node -> 'a link [@@unboxed]
type 'a t = 'a link Atomic.t
let create () = Atomic.make (Link Null)
let next_as_atomic (Node _ as node : ('a, [< `Node]) node) =
(Obj.magic node : 'a link Atomic.t) (* 2 *)
let rec find_node t prev value : _ -> (_, [< `Null | `Node ]) node = function
| Link (Mark _) -> find_node t t value (Atomic.get t)
| Link Null -> Null
| Link (Node r as node) as before ->
match Atomic.get (next_as_atomic node) with
| Link (Mark node) ->
if Atomic.compare_and_set prev before (Link node) then
find_node t prev value (Link node)
else
find_node t prev value (Atomic.get prev)
| Link (Null | Node _) as next ->
if r.value == value then
node
else
find_node t (next_as_atomic node) value next
let rec try_add t value =
let before = Atomic.get t in
match find_node t t value before with
| Node _ -> false
| Null ->
let after =
Node {
next = before; (* 3 *)
value
}
in
Atomic.compare_and_set t before (Link after)
|| try_add t value
let rec try_remove t value =
match find_node t t value (Atomic.get t) with
| Null -> false
| Node _ as node ->
match Atomic.get (next_as_atomic node) with
| Link (Mark _) ->
false
| Link ((Null | Node _) as next) as before ->
let after = Mark next in
if Atomic.compare_and_set (next_as_atomic node) before (Link after)
then begin
find_node t t value (Atomic.get t) |> ignore;
true
end
else try_remove t value
end
Let's give it a spin:
# let s : int Physical_set_inlined.t = Physical_set_inlined.create ()
val s : int Physical_set_inlined.t = <abstr>
# Physical_set_inlined.try_add s 101
- : bool = true
# Physical_set_inlined.try_add s 42
- : bool = true
# Physical_set_inlined.try_remove s 101
- : bool = true
# Physical_set_inlined.try_add s 42
- : bool = false
# Physical_set_inlined.try_remove s 42
- : bool = true
# Physical_set_inlined.try_remove s 101
- : bool = false
Is this safe?
An 'a Atomic.t
has the same representation as a 'a ref
, which is a record
with a single mutable
field. No special optimizations are made for float
s
nor for any other type of values and a Node _
record is also allocated as a
constructor block rather than as a float
array. OCaml compilers do not perform
record layout optimizations, which means that the layouts of 'a Atomic.t
and
of a record whose first element is of type 'a
are compatible and the accesses
are safe.
Here is a Compiler Explorer link for this example.
There is technically no guarantee of any kind when using Obj.magic
. The reason
for that is that, in the future, the OCaml runtime might be changed in some way,
or a new compiler for OCaml might emerge, that makes the reasoning based on the
current implementation(s) and uniform representation of values invalid.
Does that make me worried?
No, it does not. Primarily for two reasons:
-
Obj.magic
is fairly commonly used. Try the Sherlocode tool, for example, and don't forget to try clicking for more. This is a roundabout way to say that changes to the representation of values are likely to break a lot of existing code. While that does not entirely prevent changes from being made, it is unlikely that such changes would be just suddenly introduced. If significant changes are made, it will take a long time.Perhaps, or rather, in our hopes and dreams, one of the most likely future changes is dropping the float array pessimization. That change will not break any of the examples in this note, but would make the uses of
Obj.magic
to avoid the pessimization redundant. In all likelyhood, it will take many years before that happens. -
None of the uses of
Obj.magic
are apparent from the signatures of the data structures. Part of the reason why the motivating examples make no use ofObj.magic
is to show that it can be done. What this means is that, should some use ofObj.magic
become broken or unsafe, one could always adjust the data structure implementation and release a new version of it.
In other words, it should be easy to fix unlikely future issues and there should be plenty of time to fix them.
Do I recommend using Obj.magic
?
No, I don't really recommend it. All other things being equal, I'd rather not
have the Obj
module at all and would rather have an optimizing compiler that
would not suffer from the float array pessimization and would be able to perform
data representation optimizations. Unfortunately that is not where we are now
and there are cases where Obj.magic
can significantly reduce code size and
improve performance.