Skip to content

Instantly share code, notes, and snippets.

View puzzle.c
// For N = 11, a call to perms(p, N, 0) only takes 0.36 seconds.
// Why would a debugger seemingly hang forever when you try to
// step over the outermost recursive call to perms in the loop?
// Hint: How is "step over" ("next" in gdb/lldb) implemented?
u64 perms(u64 p[N], int n, u64 s) {
if (n == 1) {
s += signature(p);
} else {
for (int i = 0; i < n; i++) {
View tagged_prefetched_eval.c
INLINE
u32 index(Ref r) {
return r >> 4;
}
INLINE
u32 tag(Ref r) {
return r & 15;
}
View value_speculation.c
// Estimating CPU frequency...
// CPU frequency: 4.52 GHz
// sum1: value = 15182118497126522709, 0.31 secs, 5.14 cycles/elem
// sum2: value = 15182118497126522709, 0.17 secs, 2.93 cycles/elem
#define RW(x) asm("" : "+r"(x))
typedef struct Node {
u64 value;
struct Node *next;
View mispredict_penalty.c
#define NOP ({ int x; asm("nop" : "=r"(x)); asm("" : : "r"(x)); })
NOINLINE
double calc_mispredict_penalty(void) {
i64 n = 1ull << 28;
u64 rnglen = (n + 63) / 64;
u64 *rng = malloc(rnglen * sizeof(u64));
for (u64 i = 0; i < rnglen; i++)
rng[i] = random_u64();
View vm_bench.c
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#if defined(__x86_64__)
#define BREAK asm("int3")
#else
#error Implement macros for your CPU.
#endif
View prefetch_and_hoist.c
// Example: Opcode dispatch in a bytecode VM. Assume the opcode case dispatching is mispredict heavy,
// and that pc, ins, next_ins, next_opcase are always in registers.
#define a ((ins >> 8) & 0xFF)
#define b ((ins >> 16) & 0xFF)
#define c ((ins >> 24) & 0xFF)
// Version 1: Synchronous instruction fetch and opcode dispatch. The big bottleneck is that given how light
// the essential work is for each opcode case (e.g. something like ADD is typical), you're dominated
// by the cost of the opcode dispatch branch mispredicts. When there's a mispredict, the pipeline restarts
@pervognsen
pervognsen / shift_dfa.md
Last active Jul 23, 2021
Shift-based DFAs
View shift_dfa.md

A traditional table-based DFA implementation looks like this:

uint8_t table[NUM_STATES][256]

uint8_t run(const uint8_t *start, const uint8_t *end, uint8_t state) {
    for (const uint8_t *s = start; s != end; s++)
        state = table[state][*s];
    return state;
}
View pratt_cps.c
// Direct style
int paren_prefix(Ctx *c) {
next(c); // LPAREN
int x = binary(c, -1);
expect(c, RPAREN);
return x;
}
int binary(Ctx *c, int min_prec) {
View remotery_code.c
#pragma section(".remotery_code", read, execute)
__declspec(allocate(".remotery_code"))
static const uint8_t code[] =
{
0x50, // push rax
0x9C, // pushfq
0x53, // push rbx
0x51, // push rcx
0x52, // push rdx
View algebraic_data_type_numeral_encoding.md

(Originally written as a reply to an HN submission of this article: https://www.cs.virginia.edu/~lat7h/blog/posts/434.html)

There's a simple recipe for arithmetically encoding recursive algebraic data types (in the functional programming sense) which is related to this.

What you might have seen is Goedel numbering where a finite sequence of natural numbers a_0, a_1, ..., a_n (where n isn't fixed but can vary per sequence) is mapped bijectively onto p_0^a_0 a_1^p_1 ... a_n^p_n where p_0, p_1, ... is an enumeration of the primes.

However, if you want to represent trees instead of sequences, you have a better, simpler option. The key is the existence of a bijective pairing function between N^2 and N, which you can write as <m, n> for m, n in N.

You have a lot of choices for how to construct the pairing function. But a curious fact is that there is essentially one polynomial pairing function and it's the one you saw in class when you learned that the rationals are countable: https://en.wikipedia.org/wiki/Fuet