Skip to content

Instantly share code, notes, and snippets.

@thecppzoo
thecppzoo / strlen_avx2.md
Last active February 22, 2024 20:28
Disassembly of __strlen_avx2 from GLIBC

Since the assembler sources of strlen for AVX2 in GLIBC are virtually obfuscated by the macros, here's the actual assembler: (comments are mine)

Dump of assembler code for function __strlen_avx2: (obtained with GDB disassable)

   0x00007ffff7bf27e0 <+0>:	endbr64 
   0x00007ffff7bf27e4 <+4>:	mov    %edi,%eax
   0x00007ffff7bf27e6 <+6>:	mov    %rdi,%rdx
   0x00007ffff7bf27e9 <+9>:	vpxor  %xmm0,%xmm0,%xmm0
@thecppzoo
thecppzoo / cppcon2022.md
Created June 24, 2022 20:15
CPPCon 2022 Outline
  • Hash Tables
  • Hash Table invariants
  • Robin Hood
  • SWAR
  • Benchmarks & Compiler explorer disassemblies
@thecppzoo
thecppzoo / dataOrientationMacros.cpp
Created October 29, 2021 18:45
Data Orientation Macros
// This example requires the boost preprocessing library headers
#include <boost/preprocessor/seq/for_each_i.hpp>
#include <boost/preprocessor/seq/for_each.hpp>
#include <boost/preprocessor/punctuation/comma_if.hpp>
#include <boost/preprocessor/cat.hpp>
#include <boost/preprocessor/seq/transform.hpp>
#include <boost/preprocessor/seq/enum.hpp>
// Macro naming convention:

Goals

The main aim of AnyContainer is to solve the need for subtyping without the problems of subclassing, i.e., not forcing user types to inherit from base classes, allowing value semantics, and having the option of the performance of normal C++ code rather being forced to the inefficient dynamic dispatch through a virtual table.

We think the goals have been achieved: the performance and object code size are optimal as far as we know, and it is possible to have general solutions for runtime polymorphism, including infinite refinement.

Comparison to folly::Poly

First let us see a comparison to another type-erasure framework.

@thecppzoo
thecppzoo / variant_description.md
Last active May 11, 2019 03:16
Description of variant

This is a description about a recent small project:

I've been researching and working on how to implement the best mechanisms for event handlers. Frequently the events are received by a dispatcher component that classifies the event and activates the correct processing function. For example, if the event is a market data message which could be a trade, a bid or an ask:

void dispatch(int eventType, void *data) {
    switch(eventType) {
        case 0: processTrade(data); break;
        case 1: processBid(data); break;
 case 2: processAsk(data); break;
@thecppzoo
thecppzoo / variant_draft.md
Last active May 6, 2019 09:55
Draft of implementation of variant

To do:

  1. Correct noexcept of swap
  2. Confirm the move-operations preserve the value category
  3. Deduce the return type of visit
  4. Deduce the noexceptness of visit
  5. Comple the set of constructors (not just by index, but also by type)
  6. Refactor components to the rest of the library -- Maxer, Largest, ...
  7. Rename BadVariant
  8. Refactor implementations into their namespace
@thecppzoo
thecppzoo / EgytianAlgorithmIdeas.md
Last active May 1, 2019 22:35
Egyptian algorithm ideas

Imagine there is a template that tells you from an operation type a class-function "operate", and a type called "inverse" which gives you the inverse operation:

template<typename Operation>
struct OperatorTraits {
    template<typename Argument>
    static Argument operate(Argument x, Argument y);

    using Inverse = // ...
};
@thecppzoo
thecppzoo / no_restrict.md
Created October 12, 2018 05:42
An idiom (not practical) to emulate __restrict

The code below emulates "restrict", the function argument attribute that indicates the ranges of the pointers do not overlap. It explains its limitations:

struct Aligned {
    alignas(1024) double d;
};

using u = unsigned;
constexpr u count = 1 << 10;
using ul = unsigned long;
@thecppzoo
thecppzoo / Multiple-precision-integer-operations-in-Clang-and-GCC.md
Last active September 19, 2017 20:28
Multiple precision operations in Clang and GCC

I have been working on a simple precision-doubling template.

The largest integer supported by GCC and Clang are the __int128 and unsigned __int128 types.

Addition

In general, there is no way to access the carry flag necessary to implement multiple precision additions. Fortunately Clang detects that's what one is doing and generate good assembler, at least in AMD64 (x86-64):

#include <stdint.h>
@thecppzoo
thecppzoo / recursive-factorial.md
Last active October 22, 2018 20:03
Recursive factorial in C++ and Scala

It turns out that both GCC and Clang compile recursive code for factorial as a simple loop:

template<typename T>
constexpr
T factorial_impl(T value) {
    return value <= 1 ? 1 : value * factorial(value - 1);
}

long factorial(long arg) { return factorial_impl(arg); }