Matt MattPD

## README.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                bjacob
                / README.md
            
            
              Last active
              November 22, 2023 00:24
            
          
    Relative performance of matmul element types on x86 and Arm

Context

Recent efforts to run LLMs send us searching for some element types to quantize
weights and activations into, that will somehow be wide enough to provide enough
accuracy, and narrow enough to provide enough performance and/or memory compression.
This document is about the "performance" dimension, specifically on x86 and Arm
architectures.

  
## ZeroCostGC.md

      
              1 file
            
          
              1 fork
            
          
              3 comments
            
          
              34 stars
            
          
                AndrasKovacs
                / ZeroCostGC.md
            
            
              Last active
              April 6, 2024 17:07
            
              
                Garbage collection with zero-cost at non-GC time
              
          
    Garbage collection with zero cost at non-GC time

Every once in a while I investigate low-level backend options for PL-s, although
so far I haven't actually written any such backend for my projects. Recently
I've been looking at precise garbage collection in popular backends, and I've
been (like on previous occasions) annoyed by limitations and compromises.
I was compelled to think about a system which accommodates precise relocating GC
as much as possible. In one extreme configuration, described in this note, there

  
## README.md

      
              1 file
            
          
              2 forks
            
          
              0 comments
            
          
              11 stars
            
          
                robrich
                / README.md
            
            
              Last active
              July 24, 2023 12:06
            
              
                the definitive deep dive into the .git folder
              
          
    the definitive deep dive into the .git folder

Thanks for joining us for "the definitive deep dive into the .git folder". It's an incredible live-demo where we open every file in the .git folder and show what it does.
Links

Here's the links we saw:

  
## papers.md

      
              2 files
            
          
              5 forks
            
          
              0 comments
            
          
              83 stars
            
          
                pdarragh
                / papers.md
            
            
              Last active
              April 17, 2024 19:29
            
              
                Approachable PL Papers for Undergrads
              
          
    Approachable PL Papers for Undergrads

On September 28, 2021, I asked on Twitter:

PL Twitter:
you get to recommend one published PL paper for an undergrad to read with oversight by someone experienced. the paper should be interesting, approachable, and (mostly) self-contained.
what paper do you recommend?


## shift_dfa.md

      
              1 file
            
          
              4 forks
            
          
              6 comments
            
          
              93 stars
            
          
                pervognsen
                / shift_dfa.md
            
            
              Last active
              January 27, 2024 19:54
            
              
                Shift-based DFAs
              
          
    A traditional table-based DFA implementation looks like this:
uint8_t table[NUM_STATES][256]

uint8_t run(const uint8_t *start, const uint8_t *end, uint8_t state) {
    for (const uint8_t *s = start; s != end; s++)
        state = table[state][*s];
    return state;
}


## elf_format_cheatsheet.md

      
              1 file
            
          
              42 forks
            
          
              0 comments
            
          
              117 stars
            
          
                x0nu11byt3
                / elf_format_cheatsheet.md
            
            
              Created
              February 27, 2021 05:26
            
              
                ELF Format Cheatsheet
              
          
    ELF Format Cheatsheet

Introduction

Executable and Linkable Format (ELF), is the default binary format on Linux-based systems.

Compilation


## gf2p8affineqb-articles.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              22 stars
            
          
                animetosho
                / gf2p8affineqb-articles.md
            
            
              Last active
              February 2, 2024 11:53
            
              
                A list of articles documenting uses of the GF2P8AFFINE instruction
              
          
    Unexpected Uses for the Galois Field Affine Transformation Instruction

Intel added the Galois Field instruction set (GFNI) extensions to their Sunny Cove and Tremont cores. What’s particularly interesting is that GFNI is the only new SIMD extension that came with SSE and VEX/AVX encodings (in addition to EVEX/AVX512), to allow it to be supported on all future Intel cores, including those which don’t support AVX512 (such as the Atom line, as well as Celeron/Pentium branded “big” cores).
I suspect GFNI was aimed at accelerating SM4 encryption, however, one of the instructions can be used for many other purposes. The extension includes three instructions, but of particular interest here is the Affine Transformation (GF2P8AFFINEQB), aka bit-matrix multiply, instruction.
There have been various articles which discuss out-of-band

  
## galois-field-affine-uses.md

      
              1 file
            
          
              0 forks
            
          
              6 comments
            
          
              10 stars
            
          
                animetosho
                / galois-field-affine-uses.md
            
            
              Last active
              February 6, 2024 00:42
            
              
                A list of “out-of-band” uses for the GF2P8AFFINEQB instruction I haven’t seen documented elsewhere
              
          
    Count Leading/Trailing Zero Bits (Byte-wise)

Counting the trailing zero bit count (TZCNT) can be done by isolating the lowest bit, then depositing this into the appropriate locations for the count. The leading zero bit count (LZCNT) can be done by reversing bits, then computing the TZCNT.
__m128i _mm_tzcnt_epi8(__m128i a) {
	// isolate lowest bit
	a = _mm_andnot_si128(_mm_add_epi8(a, _mm_set1_epi8(0xff)), a);
	// convert lowest bit to index

  
## quantize-tikz.cc
#if 0
(g++-9 $0 || g++ $0) && \
	./a.out > output.tex && \
	pdflatex output && \
	exec convert -density 400 -flatten output.pdf -resize 25% output.png
exit 1
#endif

#include <cmath>
#include <cstdio>

## asm_x64.c
// x64 encoding

enum Reg {
    RAX, RCX, RDX, RBX, RSP, RBP, RSI, RDI,
    R8,  R9,  R10, R11, R12, R13, R14, R15,
};

enum XmmReg {
    XMM0, XMM1, XMM2,  XMM3,  XMM4,  XMM5,  XMM6,  XMM7,
    XMM8, XMM9, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15,
	#if 0
	(g++-9 $0 \|\| g++ $0) && \
	./a.out > output.tex && \
	pdflatex output && \
	exec convert -density 400 -flatten output.pdf -resize 25% output.png
	exit 1
	#endif

	#include <cmath>
	#include <cstdio>
	// x64 encoding

	enum Reg {
	RAX, RCX, RDX, RBX, RSP, RBP, RSI, RDI,
	R8, R9, R10, R11, R12, R13, R14, R15,
	};

	enum XmmReg {
	XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7,
	XMM8, XMM9, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15,