Skip to content

Instantly share code, notes, and snippets.

@pervognsen
pervognsen / mu.cpp
Last active Aug 10, 2022
Mu as of the second stream
View mu.cpp
#include "mu.h"
#define _CRT_SECURE_NO_WARNINGS
#include <malloc.h>
#define _USE_MATH_DEFINES
#include <math.h>
#define _NO_CRT_STDIO_INLINE
#include <stdio.h>
#include <stdarg.h>
#define NO_STRICT
View rad.py
# Reverse-mode automatic differentiation
import math
# d(-x) = -dx
def func_neg(x):
return -x, [-1]
# d(x + y) = dx + dy
def func_add(x, y):
@pervognsen
pervognsen / shift_dfa.md
Last active Jul 29, 2022
Shift-based DFAs
View shift_dfa.md

A traditional table-based DFA implementation looks like this:

uint8_t table[NUM_STATES][256]

uint8_t run(const uint8_t *start, const uint8_t *end, uint8_t state) {
    for (const uint8_t *s = start; s != end; s++)
        state = table[state][*s];
    return state;
}
View rh_grow.c
// This can grow a Robin Hood linear probing hash table near word-at-a-time memcpy speeds. If you're confused why I use 'keys'
// to describe the hash values, it's because my favorite perspective on Robin Hood (which I learned from Paul Khuong)
// is that it's just a sorted gap array which is MSB bucketed and insertion sorted per chain:
// https://pvk.ca/Blog/2019/09/29/a-couple-of-probabilistic-worst-case-bounds-for-robin-hood-linear-probing/
// The more widely known "max displacement" picture of Robin Hood hashing also has strengths since the max displacement
// can be stored very compactly. You can see a micro-optimized example of that here for small tables where the max displacement
// can fit in 4 bits: Sub-nanosecond Searches Using Vector Instructions, https://www.youtube.com/watch?v=paxIkKBzqBU
void grow(Table *table) {
u64 exp = 64 - table->shift;
// We grow the table downward in place by a factor of 2 (not counting the overflow area at table->end).
View intern.c
typedef struct intern_t {
intern_t *next;
uint32_t length; // can be narrower if you want to limit internable string length, which is a good idea.
char str[1];
} intern_t;
hashtable_t string_table;
const char *string_intern(const char *str, uint32_t length) {
uint64_t key = string_hash(str, length);
View clifford.md

A quadratic space is a real vector space V with a quadratic form Q(x), e.g. V = R^n with Q as the squared length. The Clifford algebra Cl(V) of a quadratic space is the associative algebra that contains V and satisfies x^2 = Q(x) for all x in V. We're imposing by fiat that the square of a vector should be the quadratic form's value and seeing where it takes us. Treat x^2 = Q(x) as a symbolic rewriting rule that lets you replace x^2 or x x with Q(x) and vice versa whenever x is a vector. Beyond that Cl(V) satisfies the standard axioms of an algebra: it lets you multiply by scalars, it's associative and distributive, but not necessarily commutative.

Remarkably, this is all you need to derive everything about Clifford algebras.

Let me show you how easy it is to bootstrap the theory from nothing.

We know Cl(V) contains a copy of V. Since x^2 = Q(x) for all x, it must also contain a copy of some nonnegative reals.

View be_x64.cpp
#define _CRT_SECURE_NO_WARNINGS
#include <string.h>
#include <stdio.h>
#include <windows.h>
#pragma warning (disable: 4146)
#include <stdint.h>
#ifdef _DEBUG
#define Assert(x) \
if (!(x)) { MessageBoxA(0, #x, "Assertion Failure", MB_OK); __debugbreak(); }
View ast_tagged_index.c
// We have five kinds of nodes: literal, negate, not, add, xor.
// In this case we only need 3 bits for the tag but you can use as many as you need.
enum Tag {
LIT,
NEG,
NOT,
ADD,
XOR,
};
View segregated_tables.c
// Length-segregated string tables for length < 16. You use a separate overflow table for length >= 16.
// By segregating like this you can pack the string data in the table itself tightly without any padding. The datapath
// is uniform and efficient for all lengths < 16 by using unaligned 16-byte SIMD loads/compares and masking off the length prefix.
// One of the benefits of packing string data tightly for each length table is that you can afford to reduce the load factor
// on shorter length tables without hurting space utilization too much. This can push hole-in-one rates into the 95% range without
// too much of a negative impact on cache utilization.
// Since get() takes a vector register as an argument with the key, you want to shape the upstream code so the string to be queried
// is naturally in a vector. For example, in an optimized identifier lexer you should already have a SIMD fast path for length < 16
View multidimensional_array_views.md

Multi-dimensional array views for systems programmers

As C programmers, most of us think of pointer arithmetic for multi-dimensional arrays in a nested way:

The address for a 1-dimensional array is base + x. The address for a 2-dimensional array is base + x + y*x_size for row-major layout and base + y + x*y_size for column-major layout. The address for a 3-dimensional array is base + x + (y + z*y_size)*x_size for row-column-major layout. And so on.