Skip to content

Instantly share code, notes, and snippets.

View paniq's full-sized avatar

Leonard Ritter paniq

View GitHub Profile
@paniq
paniq / arcmem.md
Last active May 4, 2024 10:08
Arcmem: Fixing The C Memory Model

Arcmem: Fixing The C Memory Model

by Leonard Ritter, May 3rd 2024

Introduction

As part of bootstrapping the next iteration of the Scopes compiler, which, until now, oriented itself heavily along LLVM and C semantics, the author tackled the general issue of operating on memory, specifically solving longstanding usability issues that C programmers should be intimately familiar with, which increase cognitive load and make programming in C a never-ending nightmare. These issues are:

  • There are at least three kinds of pointers which all use the same type but need to be treated differently: global, stack and heap.
  • Rules of memory ownership and sharing are all informally specified, and difficult to enforce in a heterogeneous environment.
  • Stack allocations may lead to stack overflow.
sdl wgpu := do
using import backport
use-backport-globals;
import sdl wgpu
using import print String glm slice
using import compiler.target.SPIR-V
# optimally encoding a set of two integers
N := 32
N/2 := N // 2
NUMVALS := N * (N + 1) // 2
fn trienc (x y)
assert (x < N)
assert (y < N)
assert (x >= y)

64-bit Pointer Basing in O(1)

A problem that occasionally comes up when walking a dynamically typed heap is to find the base address for a given pointer i.e. the start of an allocation. With this ability, the following features are possible:

  • Reference-counting slices of dynamic arrays
  • Faster lookup during conservative garbage collection
  • A copy-on-write that changes the pointer written to
  • Bidirectional edges within objects (not just only at the boundary)
@paniq
paniq / miller_101.py
Last active April 12, 2024 16:02
The BCC Coordinate System
"""
a demo of how to convert from cubic to BCC lattice, analog to the tetrahedral
or FCC lattice, but without the headache of having to manage an interlaced grid
structure as with the usual approach, at the expense of a diagonally skewed
non-cubic bounding box in cartesian coordinates.
an integer conversion from bcc to cartesian coordinates is as easy as
x = -tx+ty+tz
# note that with a custom heap implementation, these features can be supported
# without tagging.
# for more info, see
# https://gist.github.com/paniq/429c96446e3e7e01f0c860987049d1cf
using import struct print format C.stdio
malloc := extern 'malloc (function voidstar usize)
realloc := extern 'realloc (function voidstar voidstar usize)
free := extern 'free (function void voidstar)

Idea for temporal bitnet optimization: one could perhaps avoid needing float accumulators during training; say we alter a random number of weights each pass (for a 1-bit bitnet that's a xor, always). the xor bits can be reconstructed from a global random seed value. if we memorize the resulting loss delta and the seeds of the past N steps, then we can compute a moving average for each xor bit with 1/N sub-bit precision.

with this, one could try doing N prospective measurements, and then accumulate the results (which cost very little in terms of space - so N could be high!) to get a really good and smooth gradient; that approach might be able to replace Adam completely.

Come to think of it, that approach might even work for float weights and Forward Gradient Optimization, since the seed can also be used to create random floats with gaussian distribution.

# perfect hash configuration:
hash all values, then find substring of bits where keys don't
collide within K attempts.
using import hash Array String format
# if false, use u64 xxhash, otherwise 128-bit scrambled polynomial rolling hash
USE_U128 := true
using import struct enum print itertools
enum Token plain
None
Word
Open
Close
@@ finite-state-machine
struct Tokenizer
@paniq
paniq / tagged_heap.md
Last active February 19, 2024 09:39
The Knowable Heap

Following conditions need to be met to make the C heap serializable:

  1. get start of allocation from offset pointer
  2. get size of allocation
  3. addresses appearing in allocation must be tagged as pointers (1:64 metadata)
  4. associate foreign pointers/allocations with plugins; the simplest is dlsym, but there's also file handles, graphics resources, JIT functions, stack vars etc.

(4) is what makes the whole affair very interesting.

above conditions also make it possible to run a GC. it is also possible to perfectly deduplicate the heap in subquadratic time, to validate pointers, guard against bad load/stores & to merge different heaps.