Skip to content

Instantly share code, notes, and snippets.

; https://web.archive.org/web/20190701203222/https://www.pcengines.ch/tp3.htm
; The disassembler was applied to a copy of TP 3.01A downloaded from WinWorld.
; I postprocessed the disassembly with a script to clean up spacing and column alignment.
; *** TURBO PASCAL version 3.01 A source code
; ***
; *** commented by Pascal Dornier
; *** all rights reserved
; "***
cseg $100 ; "COM file...
@pervognsen
pervognsen / vec.c
Last active October 18, 2022 09:11
// A sketch of partially persistent (linear, non-branching history) vectors with the in-place v2 trick.
typedef struct Leaf Leaf;
typedef struct Node Node;
typedef struct Vec Vec;
typedef uint32_t Ver;
typedef uint64_t Val;
typedef int32_t RelPtr;

Let's say you have to perform a 4-way case analysis and are given a choice between cascaded 2-way branches or a single 4-way branch.

What's the difference in branch misprediction penalties, assuming the 4 cases are random?

Denote the branch penalty by x (which is 15-20 cycles on Sky Lake according to Agner Fog).

With the 4-way branch, the worst-case branch penalty is x. The expected branch penalty is 3/4 x: there's a 3/4 chance of x penalty.

// Lexer
#define TOK_CHAR(ch1, tok1) \
case ch1: \
next_ch(); \
tok = TOK_##tok1; \
tok_prec = 0; \
break;
#define TOK_EXPR(ch1, tok1, prec1, lexpr1, rexpr1) \
void gen_mul(Value *dest, Value *src) {
if (isconst(dest)) {
gen_swap(dest, src); // this doesn't generate instructions and just swaps descriptors ("value renaming")
}
val_to_reg(dest); // the post-condition is that dest is allocated to a register
if (isconst(src) && isimm32(src->ival)) {
if (src->ival == 0) {
int_to_val(dest, 0);
} else if (src->ival == 1) {
// do nothing
// semantic analysis, called by parser
void do_mul(Value *dest, Value *src) {
promote_arith(dest, src);
if (isint(dest)) {
if (isconst2(dest, src)) {
dest->ival *= src->ival;
} else {
gen_mul(dest, src);
}
@pervognsen
pervognsen / expr.c
Last active February 5, 2023 17:27
void parse_expr(Value *dest);
Sym *parse_ident(void) {
if (tok != TOK_IDENT) {
error("Expected identifier");
}
Sym *ident = tok_sym;
next();
return ident;
}
// x64 encoding
enum Reg {
RAX, RCX, RDX, RBX, RSP, RBP, RSI, RDI,
R8, R9, R10, R11, R12, R13, R14, R15,
};
enum XmmReg {
XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7,
XMM8, XMM9, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15,

The trick to designing transpose algorithms for both small and large problems is to recognize their simple recursive structure.

For a matrix A, let's denote its transpose by T(A) as a shorthand. First, suppose A is a 2x2 matrix:

    [A00 A01]
A = [A10 A11]

Then we have:

def bits(x, start, stop):
assert 0 <= start <= stop
return (x >> start) & (1 << (stop - start) - 1)
# Instruction set definition
sse1 = make_instruction_set("SSE1")
m128 = sse1.make_vector_type("m128", float32, 4)
instruction = sse1.instruction