Skip to content

Instantly share code, notes, and snippets.

@GuillaumeDua
Last active July 19, 2025 13:00
Show Gist options
  • Save GuillaumeDua/0239fda353264b67ddcb39b5d9a01105 to your computer and use it in GitHub Desktop.
Save GuillaumeDua/0239fda353264b67ddcb39b5d9a01105 to your computer and use it in GitHub Desktop.
C++ legacy inheritance vs CRTP + std::variant

C++ : Polymorphic inheritance without vtable

In this article, we will see how to use CRTP, std::variant and std::visit to increase our code performances.

Table of content

  1. Introduction
  2. What's wrong with vTables ?
  3. A way out ?
    1. CRTP
    2. std::variant + std::visit
  4. Benchmark
  5. Pros & cons

Motivation :

Inheritance and vTable were used for many years to create interface in C++ polymorphic class
What if ... there were another way to do this ?
easier, cleaner, faster and more reliable

Introduction

Inheritance is a mechanism that allows developers to create a hierarchy between classes, using "is-a" relationships. The class being inherited from is called the parent class (or base class), and the class inheriting is called the child class (or derived class).

This is useful for many purposes, such like :

  • Code reusability.
    A child class will inherit datas and functions, so there's no need to duplicate code. Thus, the code base is faster to create, and easier to read & maintain.

  • Make the code more "human friendly"
    An "Is-a" relationship is meaningful for anyone who is familiar with OOP. It allows to create a classe hierarchy that best reflects the way we mentally organize informations. A cat is a feline, a feline is an animal, and animal is a life-form, etc.

class life_form{};
class animal : public life_form{};
class feline : public animal{};
class cat    : public feline{};

void func()
{
    cat my_cat;
}
  • Static polymorphism

Static polymorphism is a polymorphism resolved at compile time.


In the previous code snippet, my_cat is variable of cat type.
my_cat is also a feline, an animal, and a life_form.

void feed(animal & any_animal)
{
    // feed `any_animal`
}

void func()
{
    cat my_cat;

    feed(my_cat);
}
  • Runtime polymorhpsim & vTable
    Ah, here's the big thing.

Runtime polymorphism is a polymorphism resolved at runtime. How ? Using vTables. Virtual tables (vTable) is a lookup table of functions pointers used to resolve function calls in a dynamic (late) binding way. When compiling a class, the compiler (at compile time thus) creates a static array that contains one entry for each virtual function that the class can call.

struct animal
{
    virtual void move_forward() = 0;
};
struct fish : public animal
{
    void move_forward() override
    {
        // swim using fins
    }
};
struct snake : public animal
{
    void move_forward() override
    {
        // use serpentine method
    }
};
struct feline : public animal
{
    void move_forward() override
    {
        // walk using paws
    }
};

void func()
{
    using pointer_type = std::unique_ptr<animal>;

    pointer_type my_animal(new snake());

    my_animal->move_forward(); // use serpentine method
}

What's wrong with vTables ?

As mentionned in the previous section, a virtual function call typically means a call via a function pointer stored in a vTable.

  • Pros : Allows runtime polymorphism
  • Cons : Bad performance impact

Why ?

When called, a virtual function require first to read the adress of the function. Thus, while the function instruction are loaded into the memory, the CPU will idle, waiting.

A way out ?

Great ! We now can summarize our list of requierement, and dive into code experiments !

Requierements :

  • Interface
  • Polymorphism
  • Minimal amount of easy-to-read code
  • Better performances than vtables

Let's use the following use-case :

  • An actor that can receive, queue, and reacts to messages, and update itself as well.

Actor scenario :

  1. Receives and queues 3 messages, one after another
  2. Updates itself
  3. Handles queued messages

Additionaly, we want actors to be polymorphic, in order to handle a collection or them.
And we want to be able to add or remove them dynamically from the collection.

For example purpose, we will use interger (int) as message type, just like :

using message_type = int;

Using inheritance, our abstract class may looks like this :

struct actor
{
    virtual ~actor() = default;

    virtual void update() = 0;

    void handle_queued_messages() { /* handle queued message one after another */ }
    void receive_message(message_type && msg)
    {   // queue `msg` into `pending_messages`
        pending_messages.emplace(std::forward<message_type>(msg));
    }

private:
    std::queue<message_type> pending_messages;
    virtual void handle_one_message(message_type && msg) = 0;
};
struct A : actor
{
    void update() override { /* impl ... */ }
    void handle_one_message(message_type && msg) override { /* impl ... */ }
};
struct B : actor
{
    void update() override { /* impl ... */ }
    void handle_one_message(message_type && msg) override { /* impl ... */ }
};
using container_type = std::vector<std::unique_ptr<using_inheritance::actor>>;

container_type actors;
// fill `actors` with A-s and B-s

for (auto & active_actor : actors)
{   // broadcast messages ...
    active_actor->receive_message(41);
    active_actor->receive_message(42);
    active_actor->receive_message(43);
}

for (auto & active_actor : actors)
{
    active_actor->update();
    active_actor->handle_all_messages();
}

In the code snippet above, we can see two issues :

  • Pointers indirection when dereferencing active_actor for each member function call.
  • VTable usage with pure virtual functions calls : update() and handle_one_message(msg).

CRTP to the rescue ?

Curiously recurring template pattern (CRTP), is a C++ idiom in which a class derive from a template class instanciation that use the first one as template argument.
It allows safe, static downcasting, from the base class into the derived one.

If you want more informations about CRTP, please consider reading this blog serie, from fluentcpp.com.

template <typename T>
class base{};

class derived : public base<derived>
{};

The main advantage of doing such thing is that from the base class perspective, the derived object is itself but downcasted.
Thus, by design, because the base is always inherited from by its template parameter, we can use static_cast instead of dynamic_cast.


In summary, using static_cast, the base class can access the derived class by downcasting itself into the derived class.

template <typename T>
struct base
{
    void do_stuff()
    {
        T & as_derived = static_cast<T&>(*this);
        // do stuffs with `as_derived`
    }
};

Also, we need to use two CRTP best-practices :

  • Base class has private constructor
  • Base class is friend with its derived class

This way, we ensure that the template base class will always be instanciated by the class it is derived from.

This is legal (compiling) code. Check it on godbolt.

template <typename T>
struct base
{};

struct impl_1 : public base<impl_1>
{};

struct impl_2 : public base<impl_1> // oops ! Shoud be base<impl_2>
{};

But if we use the tricks mentioned above, impl_2 does not compile anymore.

template <typename T>
class base
{
    base() = default;
    friend T;
};

Looks good so far. Let's see our current implementation progress (collapsible, click to expand) :

Interface

template <typename T>
struct actor
{
    void update()
    {
        as_underlying().update();
    }

    void handle_all_messages()
    {
        while (!pending_messages.empty())
        {
            auto message = std::move(pending_messages.front());
            pending_messages.pop();
            handle_one_message(std::move(message));
        }
    }

    void receive_message(message_type && msg)
    {
        pending_messages.emplace(std::forward<message_type>(msg));
    }

private:
    friend T;
    actor() = default;

    std::queue<message_type> pending_messages;

    inline T & as_underlying()
    {
        return static_cast<T&>(*this);
    }

    void handle_one_message(message_type && msg)
    {
        as_underlying().handle_one_message(std::forward<message_type>(msg));
    }
};

Derived classes

struct A : actor<A>
{
    using actor::actor;

    void update(){ /* impl ...*/ }

private:
    friend struct actor<A>;

    void handle_one_message(message_type && msg){ /* impl ...*/ }
};

struct B : actor<B>
{
    using actor::actor;

    void update(){ /* impl ...*/ }

private:
    friend struct actor<B>;

    void handle_one_message(message_type && msg){ /* impl ...*/ }
};

Let's have a look to our checklist of requierements :

  • Interface
  • Polymorphism
  • Minimal amount of easy-to-read code
  • Better performances than vtables

What about polymorphism ?

CRTP looks great, but in opposition to inheritance, we lost polymorphism.
Remember, we need to get multiple implementation of actor into a container.

using container_type = std::vector<std::unique_ptr<actor</* ? */>>>;

Also, what if we could avoid the usage of pointers as container's value_type ?

std::variant to the rescue

Well, we know our implementation types at compile time.
So, an all designated solution might be to use std::variant. Let's try this :

template <typename ... Ts>
using poly_T = std::variant<Ts...>;

using container_type = std::vector
<
    poly_T<A, B>
>;

So we can get the following usage : (Test it on Godbolot)

container_type actors
{
    A{},
    B{},
    A{}
    /* etc ... */
};

actors.emplace_back(A{});
actors.emplace_back(B{});

Before we can check our "polymorphism" from our list of requierement,
we need to find a synthax to call member functions.

Once again, the STL provides an all designated solution, using std::variant's std::visit.

template <class R, class Visitor, class... Variants>
constexpr R visit(Visitor&& my_visitor, Variants&&... my_var);

std::visit applies the visitor my_visitor to the std::variant my_var
The Visitor is any callable that covers every possible alternatives of Variants

Thus, in order to interact with our std::variant, we can define visitors in many ways.
In our specific case, we just need generic lambdas that were introduce with C++14.

Implementing our use case, we can write the following code :

container_type actors; /* contains many A-s and B-s */

for (auto & active_actor : actors)
{   // broadcast messages ...
    std::visit([](auto & act)
    {
        act.receive_message(41);
        act.receive_message(42);
        act.receive_message(43);
    }, active_actor);
}

for (auto & active_actor : actors)
{   // update, then handle pending messages
    std::visit([](auto & act)
    {
        act.update();
        act.handle_all_messages();
    }, active_actor);
}

Another great advantage of this design is that we can handle std::variant's template parameters differently (here, A and B),
without using dynamic_cast.

What about specific cases ?

Sometimes, you may want to have additional code for a specific type.

In legacy code (using inheritance), we would do the following ugly thing :

struct base
{
    virtual ~base() = default;
};
struct A : public base{};
struct B : public base{};

void handle_base_value(base * value)
{
    if (A * value_as_A_ptr = dynamic_cast<A*>(value))
        // deal with value_as_A_ptr
        ;
    else if (B * value_as_B_ptr = dynamic_cast<B*>(value))
        // deal with value_as_B_ptr
        ;
    else
        // other types
        ;
}
void func()
{
    base * my_value = new A();

    handle_base_value(my_value);
}

Using std::variant and std::visit we don't need dynamic_cast and pointers anymore, because types are known at compile time.

for (auto & active_actor : actors)
{
    std::visit([](auto & act)
    {
        using T = std::decay_t<decltype(arg)>;

        if constexpr (std::is_same_v<T, A>)
            // `act` is an `A`
            ;
        else if constexpr (std::is_same_v<T,B>)
            // `act` is a `B`
        else
        {   // other types
            static_assert(always_false<T>::value, "non-exhaustive visitor!");
            // or handle deal with act in another way
        }

    }, active_actor);
}

Alternatively, we still can define visitor types the old way, defining all operator() overloads so it is exhaustive.

using visitor_type = struct
{
    void operator()(using_CRTP_and_variants::A &){}
    void operator()(using_CRTP_and_variants::B &){}
};

for (auto & active_actor : actors)
{
    std::visit(visitor_type{}, active_actor);
}

In order to avoid if-constexpr and reduce the amount of code, another alternative is to use the convinient overloaded lambas trick :

template<class... Ts> struct overloaded : Ts... { using Ts::operator()...; };
template<class... Ts> overloaded(Ts...) -> overloaded<Ts...>;
for (auto & active_actor : actors)
{
    std::visit(overloaded
    {
        [](A & arg){ /* `arg` is an A */ },
        [](B & arg){ /* `arg` is a  B */ },
        [](auto & ){ /* other types   */ }
    }, active_actor);
}

The shorter the better, right ?

Let's have a look to our checklist :

  • Interface
  • Polymorphism
  • Minimal amount of easy-to-read code
  • Better performances than inheritance

Performances

Let's try the snippet (see below) on quick-bench.com, using C++20 standard and O3 optimization level.

Compiler STL CTRP + variants vtable how much faster? link
Clang 7.0 Libc++ (LLVM) 345 502 1.5 times test it !
GCC 8.2 libstdC++ (GNU) 281 667 2.4 times faster test it !

From 1.5 to 2.4 times faster. Now we can check our "Better performances" checkbox.

  • Interface
  • Polymorphism
  • Minimal amount of easy-to-read code
  • Better performances than inheritance
See the benchmark complete snippet

// gcc   : CRTP_and_variant is 2.5 times faster (libstdc++, GNU)
// clang : CRTP_and_variant is 1.4 times faster (libc++, LLVM)

#include <queue>
#include <variant>
#include <iostream>
#include <memory>

using message_type = int;

namespace using_CRTP_and_variants
{
    template <typename T>
    struct actor
    {
        void update()
        {
            as_underlying().update();
        }

        void handle_all_messages()
        {	// internal states only
            while (!pending_messages.empty())
            {
                auto message = std::move(pending_messages.front());
                pending_messages.pop();
                handle_one_message(std::move(message));
            }
        }

        void receive_message(message_type && msg)
        {
            pending_messages.emplace(std::forward<message_type>(msg));
        }

    private:
        friend T;
        actor() = default;

        std::queue<message_type> pending_messages;

        inline T & as_underlying()
        {
            return static_cast<T&>(*this);
        }
        inline T const & as_underlying() const
        {
            return static_cast<T const &>(*this);
        }

        void handle_one_message(message_type && msg)
        {
            as_underlying().handle_one_message(std::forward<message_type>(msg));
        }
    };

    struct A : actor<A>
    {
        using actor::actor;

        void update()
        {
            //std::cout << "A : update()\n";
        }

    private:
        friend struct actor<A>;

        void handle_one_message(message_type && msg)
        {
            //std::cout << "A : handle_one_message : " << msg << '\n';
        }
    };
    struct B : actor<B>
    {
        using actor::actor;

        void update()
        {
            //std::cout << "B : update()\n";
        }

    private:
        friend struct actor<B>;

        void handle_one_message(message_type && msg)
        {
            //std::cout << "B : handle_one_message : " << msg << '\n';
        }
    };
}

namespace using_inheritance
{
    struct actor
    {
        virtual ~actor() = default;
        virtual void update() = 0;

        void handle_all_messages()
        {	// internal states only
            while (!pending_messages.empty())
            {
                auto message = std::move(pending_messages.front());
                pending_messages.pop();
                handle_one_message(std::move(message));
            }
        }

        void receive_message(message_type && msg)
        {
            pending_messages.emplace(std::forward<message_type>(msg));
        }

    private:

        std::queue<message_type> pending_messages;

        virtual void handle_one_message(message_type && msg) = 0;
    };

    struct A : actor
    {
        void update() override
        {
            //std::cout << "A : update()\n";
        }
        void handle_one_message(message_type && msg) override
        {
            //std::cout << "A : handle_one_message : " << msg << '\n';
        }
    };
    struct B : actor
    {
        void update() override
        {
            //std::cout << "B : update()\n";
        }
        void handle_one_message(message_type && msg) override
        {
            //std::cout << "B : handle_one_message : " << msg << '\n';
        }
    };
}


template <typename ... Ts>
using poly_T = std::variant<Ts...>;

static void test_CRTP_and_variants(benchmark::State& state) {

    using container_type = std::vector<poly_T
    <
        using_CRTP_and_variants::A,
        using_CRTP_and_variants::B>
    >;

    container_type actors
    {
        using_CRTP_and_variants::A{},
        using_CRTP_and_variants::B{},
        using_CRTP_and_variants::A{},
        using_CRTP_and_variants::B{},
        using_CRTP_and_variants::A{},
        using_CRTP_and_variants::B{},
        using_CRTP_and_variants::A{},
        using_CRTP_and_variants::B{},
        using_CRTP_and_variants::A{},
        using_CRTP_and_variants::B{}
    };

    for (auto _ : state) {
        for (auto & active_actor : actors)
        {	// broadcast messages ...
            std::visit([](auto & act)
            {
                act.receive_message(41);
                act.receive_message(42);
                act.receive_message(43);
            }, active_actor);
        }

        for (auto & active_actor : actors)
        {
            std::visit([](auto & act)
            {
                act.update();
                act.handle_all_messages();
            }, active_actor);
        }
        benchmark::DoNotOptimize(actors);
    }
}
// Register the function as a benchmark
BENCHMARK(test_CRTP_and_variants);

static void test_inheritance(benchmark::State& state) {

    using container_type = std::vector<std::unique_ptr<using_inheritance::actor>>;

    container_type actors;
    {
        actors.emplace_back(std::make_unique<using_inheritance::A>());
        actors.emplace_back(std::make_unique<using_inheritance::B>());
        actors.emplace_back(std::make_unique<using_inheritance::A>());
        actors.emplace_back(std::make_unique<using_inheritance::B>());
        actors.emplace_back(std::make_unique<using_inheritance::A>());
        actors.emplace_back(std::make_unique<using_inheritance::B>());
        actors.emplace_back(std::make_unique<using_inheritance::A>());
        actors.emplace_back(std::make_unique<using_inheritance::B>());
        actors.emplace_back(std::make_unique<using_inheritance::A>());
        actors.emplace_back(std::make_unique<using_inheritance::B>());
    }

    for (auto _ : state) {
        for (auto & active_actor : actors)
        {	// broadcast messages ...
            active_actor->receive_message(41);
            active_actor->receive_message(42);
            active_actor->receive_message(43);
        }

        for (auto & active_actor : actors)
        {
            active_actor->update();
            active_actor->handle_all_messages();
        }
    }
}
BENCHMARK(test_inheritance);

Pros & cons

Let's do a quick recap.
According to our design, we can define an interface, create many polymorphic implementations with a minimal amount of readable code.
As a result, we end with better performances than the old fashion way "inheritance with vtable".

However, is it worthy ?

Pros :

  • no more vtable
  • std::visit flexibility
  • Better performances

Cons :

  • sizeof(std::variant<...>)

We can already see an issue : std::variants<...>'s size on the stack. Indeed, the size of a std::variant is slightly greater than the type with the largest alignement it contains, as it must also store the information of which type it currently contains.

  • CRTP can hide/shadow functions

CRTP can hide/shadow member functions, and thus lead to unexpected behaviors.
In order to avoid this, as saw previously, we used explicit base contructor call and friendship.

However, according to Herb Sutter's talks "Thoughts on a more powerful and simpler C++", we may find a way to solve this in the future using metaclasses. Maybe.

We still can do some static checks using SFINAE with std::void_t and std::experimental::is_detected

Thank you for reading

Wrote by Guillaume Dua.
Special thanks to O. Libre for reviewing this paper.

Want to read more ?
New articles incoming soon on gist.github.

@GuillaumeDua
Copy link
Author

@Hochheilige

Reading code you produced, you also can use concepts for static polymorphism, if your compiler supports C++20.
https://en.cppreference.com/w/cpp/language/constraints

@Hochheilige
Copy link

@GuillaumeDua Thank you for your explanation and for all links it is really useful and I feel like I've taken new look on C++ after each your comment!

I'm not sure that I'm ready to concepts and C++20 because I even don't know most of C++17 features, but found your advice about concepts interesting and going to try it.

Thank you again!

@GuillaumeDua
Copy link
Author

@Hochheilige Concepts are way easier that previous SFINAE + detection idiom boilerplates.
I'll write a paper about this very topic when I'll have enough time :)

Keep in mind that in a general manner, C++ tends to become easier, so I'd advise you to use latest standards as often as possible.
Also, I keep seeing students and professionals in my training that are currently learnin stuffs that already became deprecated - if not removed - in newest standards ! Like std::auto_ptr for instance.

@Hochheilige
Copy link

@GuillaumeDua Thanks again, really waiting for your next article :)

@X-Ryl669
Copy link

@xNWDD : I'm not sure I agree with your vtable point. In a usual polymorphism scheme, you'll have this:

struct Base { virtual foo() = 0; }; 

implemented, in memory, as: 
Base * pointer
 |
\/
[ VtablePtr, MembersOfBase ]
    ^
     \------------> [ RTTI, pointer2foo ]

So when you do (Child*)->foo(), the actual operations are: 1. Load Child pointer, 2. Load VtablePtr, 3. Load foo, 4. Jump foo

Let's say you have 4 different children here, you'll get 4 virtual table in code section (one for each Child), containing RTTI information + 4 foo functions pointer, plus 4 foo functions (in reality, you'll have 5 of them, since you'll also have the Base virtual table + pure function handler).

In a std::visit case (or any static virtual table) you'll have this memory pattern:

std::visit(vtable, variant):

vtable is:
[ pointer2foo, pointer2bar, ...]

So when you do std::visit(vtable, variant), considering that std::visit will be inline by the compiler (and it should usually be) and the virtual table too (again, it's very usual), the actual operations are: 1. Load variant pointer, 2. Load foo, 3. Jump foo

This is because the compiler implements the vtable directly in the caller code, so it's already loaded in memory when your code is inlined, so the point 1 above is actually a jump table (than can also be inline if the compile can deduce the variant type at compile time, but that's not the point here).

This scheme also (theoretically) reduce the binary code space, since there's only a single vtable that needs to be stored in the binary here (with only 4 actual function pointers in the 4 children example above, not 5 since it's not possible to call the base class pointer).

So you save the RTTI code space and one indirection.

There's also no need for the functions to have the same signature (which is another benefit of this scheme, IMHO), and it also allow static "virtual" method that the former doesn't support either.

The drawback, as you said, is if you spread your code from many std::visit each of them will implement a vtable so any gains would be lost here.

With C++23, with deduced this, you can have the base declared as struct Base (with no template either), so it can be stored in a vector without the variant (but you'll need to save the underlying type somehow when it's time to call the "polymorphic" function).

@xNWDD
Copy link

xNWDD commented Jul 18, 2025

@X-Ryl669 It took me a while to understand what you were disagreeing with (because It's been five years and you didn't explicitly mention it) and you're right: std::visit and virtual are not the same as std::visit is a lower level construct and can be inlined for small visitors which can save one indirection and theoretically could provide more performance. I agree in that this conceptual gain is bound to trigger way more common for std::visit, but this theoretical gain also exists for virtual when the polymorphic base has internal linkage.

Also, while these kinds of optimization (devirtualization and inlining) have not been triggering consistently in my experience (game development), they are theoretically sound and definitely work well in small/medium programs.

I want to mention too that as far as I can see compiling the previous benchmark (crtp+virtual vs crtp+variant/visit) with more modern compilers no toy benchmark gains are shown (even in situations where everything fits into cache), which suggests that whatever gains std::visit/std::variant are not measurable within the overall cost of a dynamic dispatch, the difference in the x2.4 bench seems to be still crtp (which more fine-grained control about where the dynamic dispatching occurs and provides the compiler with more information).

Regarding your other points:

  • "There's also no need for the functions to have the same signature" yes, It's a great tool, very useful and powerful when you want one visitor to operate over multiple different variants. However, I consider this use orthogonal to that of the article, there are very cool things you can do to it. Those would be possible with virtual (since the dynamic dispatch implicitly provides each override of the virtual function a different parameter which is a struct that contains members that would be the different args).

  • It is also true that for cold std::visit calls, the vtable is more likely than not within cache (unlike virtual) because of inlining the branching near the std::visit. But the implications of having these cold paths within a hot path making a significant, measurable impact implies "almost as many different virtual dispatches as there is code" (when you use polymorphism in a way in which dynamic dispatch could be a bottleneck you often have big polymorphic collections, and, at that point the vtables/dispatches are hot; your runtime is not often bottlenecked by "I have so many classes that all my vtables are constantly spilling out of cache").

  • About Let's say you have 4 different children here, you'll get 4 virtual table in code section (one for each Child) I understand what you meant but I'm going to clarify for any future readers (since it can easily mislead less experienced programmers) that the amount of virtual tables in the code section scales is 100% unrelated to how many children you have. The amount of virtual tables scales with the amount of types inheriting or declaring virtual methods, with each virtual table being a fixed size (of size depending on compiler options such as rtti and cpu architecture). This doesn't take away the fact that a codebase containing a lot of virtual methods will use more binary size or that a codebase using few virtual methods but rtti will use more binary size.
    However, it boils down to the same than performance: If you use many std::visits you might increase binary size way more than a few vtables, this is because std::visit scales with amount of dynamic dispatches and virtual scales with amount of clases/methods; so a dogmatic OOP programmer will waste much more than someone using std::visit without care, but someone using std::visit without care will probably waste more than a person using OOP when it's useful and not in a dogmatic way.

In general both are very useful tools, I can see how some messages could be seen as me hating one paradigm or the other but my main gripe has been at all times that the benchmark doesn't measure what the article claims to measure.
On a personal level I have always preferred options that are lower level and give more information to the compiler (like crtp or std::visit) because they bring in less baggage. But the reasons to pick one or the other should be based in their advantages and disadvantages.

Picking variant/visit (with crtp) over inheritance/virtual (without crtp) because an artificial benchmark online shows it to be x2.4 faster is just not a good idea, because the moment you make it variant/visit+crtp vs inheritance/virtual+crtp it becomes x1.
Instead you should know and understand both tools and the trade-offs and why one benchmark was faster or slower (in the example of this article CRTP is what makes the big difference). Then with that information you decide based on your use case: early-binding/sealed codebase vs late-binding/plugins, scaling (are dynamic dispatches batchable?), memory usage (variants can allow for tighter structures that are not 8-byte aligned or ordering the fields as to avoid the 56-bits of the vtable address), binary size (given your dispatch patterns and system constraints for embedded devices)...

@X-Ryl669
Copy link

X-Ryl669 commented Jul 19, 2025

I agree with you, and the main point being don't trust the benchmark for taking an architectural decision. You were talking about the identical cost of inheritance vtable and static vtable but you didn't explained why, so I've tried to show (with my poor ascii art skills) what each implies.

In short, if your code is doing a lot of std::visit to implement "compile-time" dynamic dispatch, you better use the variant/std::visit scheme since it'll be inlined and it'll disappear in most case. Compiler often miss such dispatching with inheritance (I don't know why, but I observed that compiler have difficulties tracking the most derived class as soon as a function is called with a reference to the base class, even if this function has internal linkage).

If you need "run-time" dynamic dispatch, then using inheritance vs a variant/std::visit choice should be made by:

  1. Do you run-time dispatch in multiple place in the code? => Use inheritance, it'll give smaller code size, more code locality, same performance
  2. Do you have only few change in your virtual method hierarchy (like Child only override one of the 8 virtual method) of the base class ? => Prefer using variant & visit, since in that case the overhead of virtual table will be huge.
  3. Do you have multi level inheritance ? => While it's possible to deal with it with CRTP, it's a PITA (until C++23 and deducing this) to implement, keep inheritance
  4. Is your code readable and maintainable ? => Using inheritance is well known and understood. Using CRTP + variant is not obvious and implies knowing C++20 well. It's ok to use the latter in internal part, but you can't expose this interface outside (and it's a pain to maintain if you need to add more children to your hierarchy, since you have to change both variant declaration and all visit code)

There are also mixed (or in-between) scheme with LLVM's dyn_cast that's getting a advantage of a compile-time information to perform a usually RTTI based job. This allows to disable RTTI in the build, reducing the vtable impact too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment