Skip to content

Instantly share code, notes, and snippets.

@pervognsen
pervognsen / shift_dfa.md
Last active Sep 14, 2021
Shift-based DFAs
View shift_dfa.md

A traditional table-based DFA implementation looks like this:

uint8_t table[NUM_STATES][256]

uint8_t run(const uint8_t *start, const uint8_t *end, uint8_t state) {
    for (const uint8_t *s = start; s != end; s++)
        state = table[state][*s];
    return state;
}
View imlayout.cpp
#pragma comment(lib, "SDL2")
#pragma comment(lib, "SDL2main")
#include "SDL.h"
#include <vector>
#include <unordered_map>
#include <assert.h>
#include <algorithm>
View asan_clang_cl.md

I was told by @mmozeiko that Address Sanitizer (ASAN) works on Windows now. I'd tried it a few years ago with no luck, so this was exciting news to hear.

It was a pretty smooth experience, but with a few gotchas I wanted to document.

First, download and run the LLVM installer for Windows: https://llvm.org/builds/

Then download and install the VS extension if you're a Visual Studio 2017 user like I am.

It's now very easy to use Clang to build your existing MSVC projects since there's a cl compatible frontend:

View rh_grow.c
// This can grow a Robin Hood linear probing hash table near word-at-a-time memcpy speeds. If you're confused why I use 'keys'
// to describe the hash values, it's because my favorite perspective on Robin Hood (which I learned from Paul Khuong)
// is that it's just a sorted gap array which is MSB bucketed and insertion sorted per chain:
// https://pvk.ca/Blog/2019/09/29/a-couple-of-probabilistic-worst-case-bounds-for-robin-hood-linear-probing/
// The more widely known "max displacement" picture of Robin Hood hashing also has strengths since the max displacement
// can be stored very compactly. You can see a micro-optimized example of that here for small tables where the max displacement
// can fit in 4 bits: Sub-nanosecond Searches Using Vector Instructions, https://www.youtube.com/watch?v=paxIkKBzqBU
void grow(Table *table) {
u64 exp = 64 - table->shift;
// We grow the table downward in place by a factor of 2 (not counting the overflow area at table->end).
View value_speculation.c
// Estimating CPU frequency...
// CPU frequency: 4.52 GHz
// sum1: value = 15182118497126522709, 0.31 secs, 5.14 cycles/elem
// sum2: value = 15182118497126522709, 0.17 secs, 2.93 cycles/elem
#define RW(x) asm("" : "+r"(x))
typedef struct Node {
u64 value;
struct Node *next;
View puzzle.c
// For N = 11, a call to perms(p, N, 0) only takes 0.36 seconds.
// Why would a debugger seemingly hang forever when you try to
// step over the outermost recursive call to perms in the loop?
// Hint: How is "step over" ("next" in gdb/lldb) implemented?
u64 perms(u64 p[N], int n, u64 s) {
if (n == 1) {
s += signature(p);
} else {
for (int i = 0; i < n; i++) {
View multistep.asm
multistep: # @multistep
push r15
push r14
push r13
push r12
push rbx
mov rcx, qword ptr [rsi]
mov r10, qword ptr [rsi + 8]
mov rdx, qword ptr [rsi + 24]
mov r11, qword ptr [rsi + 32]
View fp_aliasing_semantics.md

x64 addss:

VADDSS (EVEX encoded versions)
    IF (EVEX.b = 1) AND SRC2 *is a register*
        THEN
            SET_RM(EVEX.RC);
        ELSE
            SET_RM(MXCSR.RM);
    FI;
    IF k1[0] or *no writemask*
View algebraic_data_type_numeral_encoding.md

(Originally written as a reply to an HN submission of this article: https://www.cs.virginia.edu/~lat7h/blog/posts/434.html)

There's a simple recipe for arithmetically encoding recursive algebraic data types (in the functional programming sense) which is related to this.

What you might have seen is Goedel numbering where a finite sequence of natural numbers a_0, a_1, ..., a_n (where n isn't fixed but can vary per sequence) is mapped bijectively onto p_0^a_0 a_1^p_1 ... a_n^p_n where p_0, p_1, ... is an enumeration of the primes.

However, if you want to represent trees instead of sequences, you have a better, simpler option. The key is the existence of a bijective pairing function between N^2 and N, which you can write as <m, n> for m, n in N.

You have a lot of choices for how to construct the pairing function. But a curious fact is that there is essentially one polynomial pairing function and it's the one you saw in class when you learned that the rationals are countable: https://en.wikipedia.org/wiki/Fuet

View fft.txt
Suppose we want to compute the frequency spectrum of an n-point sampled signal. That is,
we want to compute the signal's discrete Fourier transform. Taking a cue from multi-rate
signal processing, let's try a divide-and-conquer approach where we downsample the signal
by 2:1 and recursively compute the spectrum of that. There are two possible downsamplings,
corresponding to the even and odd phases.
By the Nyquist sampling theorem, assuming the signal has no upper half-band frequencies,
i.e. its top n/2 frequency bins are zero, the spectrum can be perfectly reconstructed
from the spectrum of _either_ the even subsignal or the odd subsignal, without any aliasing.