Skip to content

Instantly share code, notes, and snippets.

@thecppzoo
Last active May 11, 2019 03:16
Show Gist options
  • Save thecppzoo/9e990c4be993cc1f1d7232e17dcfe9a8 to your computer and use it in GitHub Desktop.
Save thecppzoo/9e990c4be993cc1f1d7232e17dcfe9a8 to your computer and use it in GitHub Desktop.
Description of variant

This is a description about a recent small project:

I've been researching and working on how to implement the best mechanisms for event handlers. Frequently the events are received by a dispatcher component that classifies the event and activates the correct processing function. For example, if the event is a market data message which could be a trade, a bid or an ask:

void dispatch(int eventType, void *data) {
    switch(eventType) {
        case 0: processTrade(data); break;
        case 1: processBid(data); break;
        case 2: processAsk(data); break;
    }
}

However, this is effort intensive, brittle work: The set of messages may change (for example, with the incorporation of things like "short sale trade", "top of the book ask"), which further classifies the existing set of events. Then the developer has to make sure all of the places where dispatching occurs are fully accounted for.

Furthermore, sometimes the user may want to keep a collection of events in memory to process them later, for example, to save them to some persistance mechanism, for that, there is the need to store the event indistinctly from their type, and a mechanism to dispatch to the right processing given the specific type of the element. This is what the C++ 17 variant and visit attempt to accomplish:

// example using an array for clarity
// "Persistance" has overloads for operator() for all types of messages
void persist(Persistance &p, std::variant<Trade, Bid, Ask> (&a)[1024]) {
    for(int i = 0; i < 1024; ++i) {
        visit(p, a[i]);
    }
}

From the point of view of the interface, I found a few things that do not convince me, I thought variant and visitation might have an slightly simpler and more useful interface, and also, the improvement should lead to better performance. With regards to the implementations I have seen, GCC's stdlib, Clang's libc++ and even Google's Abseil they all look way more complicated than they need to be. Abseil looks better than the others but then they do things like a manual expansion of the dispatch into a switch see the code here of dubious benefit in terms of performance or compilation time...

These observations led to my implementing my ideas, here, currently on the test folder. Before considering the performance in detail (which is superior), the most important thing for me is that the code looks particularly clean, easy to understand; as you know I take to heart these two quotations:

"There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult." C.A.R. Hoare

and

"Simplicity is a great virtue but it requires hard work to achieve it and education to appreciate it. And to make matters worse: complexity sells better." E. Dijkstra.

When you make something clean, that you nearly obviously "got it right", the performance should also be there. Please check the code and tests.

Performance

Let us start with a very simple but important use case, destruction, since every variant instance built in principle needs to some time be destroyed. To isolate the destruction itself, as opposed to also the deallocation, let us call the pseudo destructor. And let us make the example interesting by using mostly variant alternatives that have trivial destructors, except one alternative, HasDestructor, which has a non-trivial destructor. For a baseline, stdlib's variant:

struct HasDestructor {
    ~HasDestructor();
};

#include <variant>

using STDLibVariant = std::variant<int, char, HasDestructor, double>;

template<typename T>
void pseudo_destructor(T *p) {
    p->~T();
}

void useStl(STDLibVariant &sv) { pseudo_destructor(&sv); }

Gets compiled, as proved in the compiler explorer (with -O3):

useStl(std::variant<int, char, HasDestructor, double>&): # @useStl(std::variant<int, char, HasDestructor, double>&)
  push rbx
  mov rbx, rdi
  movzx eax, byte ptr [rdi + 8]
  cmp rax, 255
  je .LBB0_2
  mov rdi, rbx
  call qword ptr [8*rax + std::__detail::__variant::_Variant_storage<false, int, char, HasDestructor, double>::_S_vtable<0ul, 1ul, 2ul, 3ul>]
.LBB0_2:
  mov byte ptr [rbx + 8], -1
  pop rbx
  ret
  mov rdi, rax
  call __clang_call_terminate
__clang_call_terminate: # @__clang_call_terminate
  push rax
  call __cxa_begin_catch
  call std::terminate()
void std::__detail::__variant::__erased_dtor<std::__detail::__variant::_Variant_storage<false, int, char, HasDestructor, double> const&, 0ul>(std::__detail::__variant::_Variant_storage<false, int, char, HasDestructor, double> const&): # @void std::__detail::__variant::__erased_dtor<std::__detail::__variant::_Variant_storage<false, int, char, HasDestructor, double> const&, 0ul>(std::__detail::__variant::_Variant_storage<false, int, char, HasDestructor, double> const&)
  ret
void std::__detail::__variant::__erased_dtor<std::__detail::__variant::_Variant_storage<false, int, char, HasDestructor, double> const&, 1ul>(std::__detail::__variant::_Variant_storage<false, int, char, HasDestructor, double> const&): # @void std::__detail::__variant::__erased_dtor<std::__detail::__variant::_Variant_storage<false, int, char, HasDestructor, double> const&, 1ul>(std::__detail::__variant::_Variant_storage<false, int, char, HasDestructor, double> const&)
  ret
void std::__detail::__variant::__erased_dtor<std::__detail::__variant::_Variant_storage<false, int, char, HasDestructor, double> const&, 2ul>(std::__detail::__variant::_Variant_storage<false, int, char, HasDestructor, double> const&): # @void std::__detail::__variant::__erased_dtor<std::__detail::__variant::_Variant_storage<false, int, char, HasDestructor, double> const&, 2ul>(std::__detail::__variant::_Variant_storage<false, int, char, HasDestructor, double> const&)
  jmp HasDestructor::~HasDestructor() [complete object destructor] # TAILCALL
void std::__detail::__variant::__erased_dtor<std::__detail::__variant::_Variant_storage<false, int, char, HasDestructor, double> const&, 3ul>(std::__detail::__variant::_Variant_storage<false, int, char, HasDestructor, double> const&): # @void std::__detail::__variant::__erased_dtor<std::__detail::__variant::_Variant_storage<false, int, char, HasDestructor, double> const&, 3ul>(std::__detail::__variant::_Variant_storage<false, int, char, HasDestructor, double> const&)
  ret
std::__detail::__variant::_Variant_storage<false, int, char, HasDestructor, double>::_S_vtable<0ul, 1ul, 2ul, 3ul>:
  .quad void std::__detail::__variant::__erased_dtor<std::__detail::__variant::_Variant_storage<false, int, char, HasDestructor, double> const&, 0ul>(std::__detail::__variant::_Variant_storage<false, int, char, HasDestructor, double> const&)
  .quad void std::__detail::__variant::__erased_dtor<std::__detail::__variant::_Variant_storage<false, int, char, HasDestructor, double> const&, 1ul>(std::__detail::__variant::_Variant_storage<false, int, char, HasDestructor, double> const&)
  .quad void std::__detail::__variant::__erased_dtor<std::__detail::__variant::_Variant_storage<false, int, char, HasDestructor, double> const&, 2ul>(std::__detail::__variant::_Variant_storage<false, int, char, HasDestructor, double> const&)
  .quad void std::__detail::__variant::__erased_dtor<std::__detail::__variant::_Variant_storage<false, int, char, HasDestructor, double> const&, 3ul>(std::__detail::__variant::_Variant_storage<false, int, char, HasDestructor, double> const&)

That is, the compiler does not have the control flow analysis powers to determine the only case that matters is when the variant contains the alternative HasDestructor, or that the type switch is 2, there is the impenetrable barrier of calling into a table of function pointers.

However, my implementation gets converted to this, which I got in the compiler explorer:

useZoo(zoo::Variant<int, char, HasDestructor, double>&): # @useZoo(zoo::Variant<int, char, HasDestructor, double>&)
  cmp dword ptr [rdi + 8], 2
  jne .LBB0_1
  jmp HasDestructor::~HasDestructor() [complete object destructor] # TAILCALL
.LBB0_1:
  ret

And that is it!.

You can see in my implementation that it itself is the first user of visitation: All of the internal mechanisms in Variant, including destruction, are implemented as visits.

Let us compare a normal visitation, first, a hand-crafted switch which should be the absolute maximum of performance, my visitation and visitation using stdlib:

struct HasDestructor {
    ~HasDestructor();
};

using ZooVariant = zoo::Variant<int, char, HasDestructor, double>;

#include <variant>
using STDLibVariant = std::variant<int, char, HasDestructor, double>;

template<typename T> void process(T &&t);

void manualDispatch(ZooVariant &v) {
    switch(v.typeSwitch_) {
        case 0: process(zoo::get<0>(v)); break;
        case 1: process(zoo::get<1>(v)); break;
        case 2: process(zoo::get<2>(v)); break;
        case 3: process(zoo::get<3>(v)); break;
        default: process(zoo::get<4>(v));
    }
}

void visit(ZooVariant &zv) {
    visit([](auto &&arg) { process(arg); }, zv);
}


void visit(STDLibVariant &sv) {
    visit([](auto &&arg) { process(arg); }, sv);
}

And the corresponding assemblers:

Hand-crafted:

manualDispatch(zoo::Variant<int, char, HasDestructor, double>&): # @manualDispatch(zoo::Variant<int, char, HasDestructor, double>&)
  mov eax, dword ptr [rdi + 8]
  cmp rax, 3
  ja .LBB0_6
  jmp qword ptr [8*rax + .LJTI0_0]
.LBB0_2:
  jmp void process<int&>(int&) # TAILCALL
.LBB0_6:
  jmp void process<zoo::BadVariant&>(zoo::BadVariant&) # TAILCALL
.LBB0_3:
  jmp void process<char&>(char&) # TAILCALL
.LBB0_4:
  jmp void process<HasDestructor&>(HasDestructor&) # TAILCALL
.LBB0_5:
  jmp void process<double&>(double&) # TAILCALL
.LJTI0_0:
  .quad .LBB0_2
  .quad .LBB0_3
  .quad .LBB0_4
  .quad .LBB0_5

That is, just to jump into a table using the type switch as the index. My implementation of visitation:

visit(zoo::Variant<int, char, HasDestructor, double>&): # @visit(zoo::Variant<int, char, HasDestructor, double>&)
  mov eax, dword ptr [rdi + 8]
  cmp rax, 3
  ja .LBB1_6
  jmp qword ptr [8*rax + .LJTI1_0]
.LBB1_2:
  jmp void process<int&>(int&) # TAILCALL
.LBB1_6:
  jmp void process<zoo::BadVariant&>(zoo::BadVariant&) # TAILCALL
.LBB1_3:
  jmp void process<char&>(char&) # TAILCALL
.LBB1_4:
  jmp void process<HasDestructor&>(HasDestructor&) # TAILCALL
.LBB1_5:
  jmp void process<double&>(double&) # TAILCALL
.LJTI1_0:
  .quad .LBB1_2
  .quad .LBB1_3
  .quad .LBB1_4
  .quad .LBB1_5

That is, identical.

Now, stdlib's:

visit(std::variant<int, char, HasDestructor, double>&): # @visit(std::variant<int, char, HasDestructor, double>&)
  push rax
  movzx eax, byte ptr [rdi + 8]
  cmp rax, 255
  je .LBB2_2
  mov rsi, rdi
  mov rdi, rsp
  call qword ptr [8*rax + std::__detail::__variant::__gen_vtable<void, visit(std::variant<int, char, HasDestructor, double>&)::$_1&&, std::variant<int, char, HasDestructor, double>&>::_S_vtable]
  pop rax
  ret
.LBB2_2:
  mov edi, 16
  call __cxa_allocate_exception
  mov qword ptr [rax], offset vtable for std::bad_variant_access+16
  mov qword ptr [rax + 8], offset .L.str
  mov esi, offset typeinfo for std::bad_variant_access
  mov edx, offset std::exception::~exception() [base object destructor]
  mov rdi, rax
  call __cxa_throw
std::bad_variant_access::~bad_variant_access() [deleting destructor]: # @std::bad_variant_access::~bad_variant_access() [deleting destructor]
  push rbx
  mov rbx, rdi
  call std::exception::~exception() [base object destructor]
  mov rdi, rbx
  pop rbx
  jmp operator delete(void*) # TAILCALL
std::bad_variant_access::what() const: # @std::bad_variant_access::what() const
  mov rax, qword ptr [rdi + 8]
  ret
std::__detail::__variant::__gen_vtable_impl<std::__detail::__variant::_Multi_array<void (*)(visit(std::variant<int, char, HasDestructor, double>&)::$_1&&, std::variant<int, char, HasDestructor, double>&)>, std::tuple<std::variant<int, char, HasDestructor, double>&>, std::integer_sequence<unsigned long, 0ul> >::__visit_invoke(visit(std::variant<int, char, HasDestructor, double>&)::$_1&&, std::variant<int, char, HasDestructor, double>&): # @"std::__detail::__variant::__gen_vtable_impl<std::__detail::__variant::_Multi_array<void (*)(visit(std::variant<int, char, HasDestructor, double>&)::$_1&&, std::variant<int, char, HasDestructor, double>&)>, std::tuple<std::variant<int, char, HasDestructor, double>&>, std::integer_sequence<unsigned long, 0ul> >::__visit_invoke(visit(std::variant<int, char, HasDestructor, double>&)::$_1&&, std::variant<int, char, HasDestructor, double>&)"
  push rax
  cmp byte ptr [rsi + 8], 0
  jne .LBB5_1
  mov rdi, rsi
  pop rax
  jmp void process<int&>(int&) # TAILCALL
.LBB5_1:
  mov edi, 16
  call __cxa_allocate_exception
  mov qword ptr [rax], offset vtable for std::bad_variant_access+16
  mov qword ptr [rax + 8], offset .L.str
  mov esi, offset typeinfo for std::bad_variant_access
  mov edx, offset std::exception::~exception() [base object destructor]
  mov rdi, rax
  call __cxa_throw
std::__detail::__variant::__gen_vtable_impl<std::__detail::__variant::_Multi_array<void (*)(visit(std::variant<int, char, HasDestructor, double>&)::$_1&&, std::variant<int, char, HasDestructor, double>&)>, std::tuple<std::variant<int, char, HasDestructor, double>&>, std::integer_sequence<unsigned long, 1ul> >::__visit_invoke(visit(std::variant<int, char, HasDestructor, double>&)::$_1&&, std::variant<int, char, HasDestructor, double>&): # @"std::__detail::__variant::__gen_vtable_impl<std::__detail::__variant::_Multi_array<void (*)(visit(std::variant<int, char, HasDestructor, double>&)::$_1&&, std::variant<int, char, HasDestructor, double>&)>, std::tuple<std::variant<int, char, HasDestructor, double>&>, std::integer_sequence<unsigned long, 1ul> >::__visit_invoke(visit(std::variant<int, char, HasDestructor, double>&)::$_1&&, std::variant<int, char, HasDestructor, double>&)"
  push rax
  cmp byte ptr [rsi + 8], 1
  jne .LBB6_1
  mov rdi, rsi
  pop rax
  jmp void process<char&>(char&) # TAILCALL
.LBB6_1:
  mov edi, 16
  call __cxa_allocate_exception
  mov qword ptr [rax], offset vtable for std::bad_variant_access+16
  mov qword ptr [rax + 8], offset .L.str
  mov esi, offset typeinfo for std::bad_variant_access
  mov edx, offset std::exception::~exception() [base object destructor]
  mov rdi, rax
  call __cxa_throw
std::__detail::__variant::__gen_vtable_impl<std::__detail::__variant::_Multi_array<void (*)(visit(std::variant<int, char, HasDestructor, double>&)::$_1&&, std::variant<int, char, HasDestructor, double>&)>, std::tuple<std::variant<int, char, HasDestructor, double>&>, std::integer_sequence<unsigned long, 2ul> >::__visit_invoke(visit(std::variant<int, char, HasDestructor, double>&)::$_1&&, std::variant<int, char, HasDestructor, double>&): # @"std::__detail::__variant::__gen_vtable_impl<std::__detail::__variant::_Multi_array<void (*)(visit(std::variant<int, char, HasDestructor, double>&)::$_1&&, std::variant<int, char, HasDestructor, double>&)>, std::tuple<std::variant<int, char, HasDestructor, double>&>, std::integer_sequence<unsigned long, 2ul> >::__visit_invoke(visit(std::variant<int, char, HasDestructor, double>&)::$_1&&, std::variant<int, char, HasDestructor, double>&)"
  push rax
  cmp byte ptr [rsi + 8], 2
  jne .LBB7_1
  mov rdi, rsi
  pop rax
  jmp void process<HasDestructor&>(HasDestructor&) # TAILCALL
.LBB7_1:
  mov edi, 16
  call __cxa_allocate_exception
  mov qword ptr [rax], offset vtable for std::bad_variant_access+16
  mov qword ptr [rax + 8], offset .L.str
  mov esi, offset typeinfo for std::bad_variant_access
  mov edx, offset std::exception::~exception() [base object destructor]
  mov rdi, rax
  call __cxa_throw
std::__detail::__variant::__gen_vtable_impl<std::__detail::__variant::_Multi_array<void (*)(visit(std::variant<int, char, HasDestructor, double>&)::$_1&&, std::variant<int, char, HasDestructor, double>&)>, std::tuple<std::variant<int, char, HasDestructor, double>&>, std::integer_sequence<unsigned long, 3ul> >::__visit_invoke(visit(std::variant<int, char, HasDestructor, double>&)::$_1&&, std::variant<int, char, HasDestructor, double>&): # @"std::__detail::__variant::__gen_vtable_impl<std::__detail::__variant::_Multi_array<void (*)(visit(std::variant<int, char, HasDestructor, double>&)::$_1&&, std::variant<int, char, HasDestructor, double>&)>, std::tuple<std::variant<int, char, HasDestructor, double>&>, std::integer_sequence<unsigned long, 3ul> >::__visit_invoke(visit(std::variant<int, char, HasDestructor, double>&)::$_1&&, std::variant<int, char, HasDestructor, double>&)"
  push rax
  cmp byte ptr [rsi + 8], 3
  jne .LBB8_1
  mov rdi, rsi
  pop rax
  jmp void process<double&>(double&) # TAILCALL
.LBB8_1:
  mov edi, 16
  call __cxa_allocate_exception
  mov qword ptr [rax], offset vtable for std::bad_variant_access+16
  mov qword ptr [rax + 8], offset .L.str
  mov esi, offset typeinfo for std::bad_variant_access
  mov edx, offset std::exception::~exception() [base object destructor]
  mov rdi, rax
  call __cxa_throw
.L.str:
  .asciz "Unexpected index"

std::__detail::__variant::__gen_vtable<void, visit(std::variant<int, char, HasDestructor, double>&)::$_1&&, std::variant<int, char, HasDestructor, double>&>::_S_vtable:
  .quad std::__detail::__variant::__gen_vtable_impl<std::__detail::__variant::_Multi_array<void (*)(visit(std::variant<int, char, HasDestructor, double>&)::$_1&&, std::variant<int, char, HasDestructor, double>&)>, std::tuple<std::variant<int, char, HasDestructor, double>&>, std::integer_sequence<unsigned long, 0ul> >::__visit_invoke(visit(std::variant<int, char, HasDestructor, double>&)::$_1&&, std::variant<int, char, HasDestructor, double>&)
  .quad std::__detail::__variant::__gen_vtable_impl<std::__detail::__variant::_Multi_array<void (*)(visit(std::variant<int, char, HasDestructor, double>&)::$_1&&, std::variant<int, char, HasDestructor, double>&)>, std::tuple<std::variant<int, char, HasDestructor, double>&>, std::integer_sequence<unsigned long, 1ul> >::__visit_invoke(visit(std::variant<int, char, HasDestructor, double>&)::$_1&&, std::variant<int, char, HasDestructor, double>&)
  .quad std::__detail::__variant::__gen_vtable_impl<std::__detail::__variant::_Multi_array<void (*)(visit(std::variant<int, char, HasDestructor, double>&)::$_1&&, std::variant<int, char, HasDestructor, double>&)>, std::tuple<std::variant<int, char, HasDestructor, double>&>, std::integer_sequence<unsigned long, 2ul> >::__visit_invoke(visit(std::variant<int, char, HasDestructor, double>&)::$_1&&, std::variant<int, char, HasDestructor, double>&)
  .quad std::__detail::__variant::__gen_vtable_impl<std::__detail::__variant::_Multi_array<void (*)(visit(std::variant<int, char, HasDestructor, double>&)::$_1&&, std::variant<int, char, HasDestructor, double>&)>, std::tuple<std::variant<int, char, HasDestructor, double>&>, std::integer_sequence<unsigned long, 3ul> >::__visit_invoke(visit(std::variant<int, char, HasDestructor, double>&)::$_1&&, std::variant<int, char, HasDestructor, double>&)

typeinfo name for std::bad_variant_access:
  .asciz "St18bad_variant_access"

typeinfo for std::bad_variant_access:
  .quad vtable for __cxxabiv1::__si_class_type_info+16
  .quad typeinfo name for std::bad_variant_access
  .quad typeinfo for std::exception

vtable for std::bad_variant_access:
  .quad 0
  .quad typeinfo for std::bad_variant_access
  .quad std::exception::~exception() [base object destructor]
  .quad std::bad_variant_access::~bad_variant_access() [deleting destructor]
  .quad std::bad_variant_access::what() const

Very clearly slower and fatter assembler code.

Why?

Well, most of the components in the STL have been very pioneering work, which means that chances are that being among the first to capture some ideas, they did not get them 100% correctly, and also, they might have been defined before newer features could have improved them but it was not possible because the specification had already ossified. And then, those sub-optimal idioms may keep reappearing in the interest of consistency or familiarity. For example, when I reimplemented std::function, I knew that the check against null pointer was worse than non-null default initialization, I also knew that the calling-uninitialized-target exception is not useful. And then, that in practical cases, for trampoline invocation, value-semantics function argument types must be privileged over referential ones; putting things together you get cleaner code that also performs much better.

Here, something similar happens, the dubiously beneficial bad_variant_access exception is more complicated and performance hostile to instead letting visitors decide how to handle this error. Especially since in the vast majority of uses it is a serious user mistake to have their code go to where a visitor consumes an uninitialized variant... This is similar to how when I re-implemented std::any I began by providing an "unsafe" casting... It helps that I discovered Clang "merges" nested switches, that way this code can get away without having to create a table of function pointers and the attendant control flow analysis barrier.

Also, unfortunately the colleagues who implement these things fall prey to what Kevlin Henney calls "ineffective habits", particularly what he calls "corporate coding style". The antidote to these things, of course, is the epistemological doctrine of software engineering, that is, expressing your knowledge through source code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment