Skip to content

Instantly share code, notes, and snippets.

View CAFxX's full-sized avatar

Carlo Alberto Ferraris CAFxX

View GitHub Profile
@CAFxX
CAFxX / golang_minimize_allocations.md
Last active May 6, 2025 18:18
Minimize allocations in Go

📂 Minimize allocations in Go

A collection of tips for when you need to minimize the number of allocations in your Go programs.

Use the go profiler to identify which parts of your program are responsible for most allocations.

⚠️ Never apply these tricks blindly (i.e. without measuring the actual performance benefit/impact). ⚠️

Most of these tricks cause a tradeoff between reducing memory allocations and other aspects (including e.g. higher peak memory usage, higher CPU usage, lower maintainability, higher probability of introducing subtle bugs). Only apply these tricks if the tradeoff in every specfic case is globally positive.

@CAFxX
CAFxX / pwgen.go
Last active May 1, 2025 12:58
pwgen
// https://go.dev/play/p/374e3zUuZmE
package main
import (
"crypto/rand"
"flag"
"fmt"
"maps"
"math/big"
@CAFxX
CAFxX / 00_cpu_wishlist.md
Last active April 25, 2025 06:20
CPU ISA and implementation wishlists
@CAFxX
CAFxX / prompt.txt
Last active April 19, 2025 00:22
Possible partial gemini 2.5 system prompt
As part of a reasoning step, Gemini 2.5 Pro Preview 03-25 randomly blurted out:
---
If multiple possible answers are available in the sources, present all possible answers.
If the question has multiple parts or covers various aspects, ensure that you answer them all to the best of your ability.
When answering questions, aim to give a thorough and informative answer, even if doing so requires expanding beyond the specific inquiry from the user.
If the question is time dependent, use the current date to provide most up to date information.
If you are asked a question in a language other than English, try to answer the question in that language.
Rephrase the information instead of just directly copying the information from the sources.
@CAFxX
CAFxX / sorter.md
Last active March 6, 2025 09:02
Minimal latency N-way sorter

Minimal Latency N-way Sorter (MLNS)

A MLNS is a $n$-sorter circuit that returns $N$ input values, sorted. It is functionally equivalent to a [sorting network][1] with $N$ inputs.

For small values of $N$ (3, 4, and possibly also 5) it can make sense, in order to minimize latency at the cost of slightly increased die area and gate count, to not use a traditional [sorting network][1] and instead perform all comparisons in parallel in a single stage, and then select the correct ordering, combinatorially, using the outputs of all comparisons.

Alternatively, an MLNS can also be used as the foundational building block of a $M$-inputs sorting network (with $M>N$) to reduce the number of stages (and therefore latency) of the sorting network.

@CAFxX
CAFxX / sort.cpp
Last active March 6, 2025 08:58
Sorting network for short arrays, branchless (CMOVxx, VMINSS/VMAXSS) - https://godbolt.org/z/v4G4xPofc
template <typename T>
static void sort(T* a, int l) {
#define S(i, j) { \
T t1 = a[i], t2 = a[j]; \
if (t1 > t2) { T t = t1; t1 = t2; t2 = t; } \
a[i] = t1, a[j] = t2;\
}
// Sorting networks from https://bertdobbelaere.github.io/sorting_networks.html
// Using the ones with lower CEs because every S(...) requires two CMOVxx or a
// VMINSS+VMAXSS pair, and it seems that's the limit per cycle on current
@CAFxX
CAFxX / aligned.go
Created January 23, 2020 00:52
aligned
package aligned
import (
"fmt"
"unsafe"
"golang.org/x/sys/cpu"
)
const (
cacheLineSize = unsafe.Sizeof(cpu.CacheLinePad{})
@CAFxX
CAFxX / count_digits.c
Last active January 10, 2025 07:26
Fast count decimal digits (branchless)
/*
Fast, branchless count of decimal digits in a uint64
(C) 2025 Carlo Alberto Ferraris (CAFxX)
This compiles down on x86-64 to something like
lzcnt rcx, rdi
lea rax, [rip + countDigits.lut1]
movzx eax, byte ptr [rcx + rax]
lea rdx, [rip + countDigits.lut2]
@CAFxX
CAFxX / gomaxprocs.go
Last active October 29, 2024 03:15
Lock-free, fast GOMAXPROCS(0)
package xruntime
import (
"runtime"
"sync"
"sync/atomic"
"time"
)
var gmp atomic.Int32
@CAFxX
CAFxX / memchrs.c
Last active October 24, 2024 08:25
memchrs
// https://godbolt.org/z/63Ebd37vz
#include <immintrin.h>
#include <stdint.h>
#include <string.h>
void* memchrs(const void* haystack, int len, const char* needles, int n) {
if (len <= 0 || n <= 0) {
return NULL;
}