Skip to content

Instantly share code, notes, and snippets.

View paniq's full-sized avatar

Leonard Ritter paniq

View GitHub Profile
# optimally encoding a set of two integers
N := 32
N/2 := N // 2
NUMVALS := N * (N + 1) // 2
fn trienc (x y)
assert (x < N)
assert (y < N)
assert (x >= y)
# note that with a custom heap implementation, these features can be supported
# without tagging.
# for more info, see
# https://gist.github.com/paniq/429c96446e3e7e01f0c860987049d1cf
using import struct print format C.stdio
malloc := extern 'malloc (function voidstar usize)
realloc := extern 'realloc (function voidstar voidstar usize)
free := extern 'free (function void voidstar)

Idea for temporal bitnet optimization: one could perhaps avoid needing float accumulators during training; say we alter a random number of weights each pass (for a 1-bit bitnet that's a xor, always). the xor bits can be reconstructed from a global random seed value. if we memorize the resulting loss delta and the seeds of the past N steps, then we can compute a moving average for each xor bit with 1/N sub-bit precision.

with this, one could try doing N prospective measurements, and then accumulate the results (which cost very little in terms of space - so N could be high!) to get a really good and smooth gradient; that approach might be able to replace Adam completely.

Come to think of it, that approach might even work for float weights and Forward Gradient Optimization, since the seed can also be used to create random floats with gaussian distribution.

# perfect hash configuration:
hash all values, then find substring of bits where keys don't
collide within K attempts.
using import hash Array String format
# if false, use u64 xxhash, otherwise 128-bit scrambled polynomial rolling hash
USE_U128 := true
using import struct enum print itertools
enum Token plain
None
Word
Open
Close
@@ finite-state-machine
struct Tokenizer

64-bit Pointer Basing in O(1)

A problem that occasionally comes up when walking a dynamically typed heap is to find the base address for a given pointer i.e. the start of an allocation. With this ability, the following features are possible:

  • Reference-counting slices of dynamic arrays
  • Faster lookup during conservative garbage collection
  • A copy-on-write that changes the pointer written to
  • Bidirectional edges within objects (not just only at the boundary)
@paniq
paniq / squarify.py
Last active February 16, 2024 08:46
Squarify
# Squarify
# subdivide a rectangle into squares whose sum have the same area using
# the identity:
#
# w * h = min(w, h) ** 2 + (max(w,h) - min(w,h)) * min(w, h)
W,H = 1920, 1080
w,h = W, H
lengths = []
@paniq
paniq / prime_scheduler.py
Last active February 16, 2024 09:07
Prime Scheduler
# Prime Number Scheduler
# factoring number N has complexity < O(N log N)
# factorizes numbers sequentially with amortized complexity O(1) per number
# written by Leonard Ritter, 2024
schedule = {}
x = 2
while x < 1000000:
# get all factors scheduled for this number
@paniq
paniq / tagged_heap.md
Last active February 19, 2024 09:39
The Knowable Heap

Following conditions need to be met to make the C heap serializable:

  1. get start of allocation from offset pointer
  2. get size of allocation
  3. addresses appearing in allocation must be tagged as pointers (1:64 metadata)
  4. associate foreign pointers/allocations with plugins; the simplest is dlsym, but there's also file handles, graphics resources, JIT functions, stack vars etc.

(4) is what makes the whole affair very interesting.

above conditions also make it possible to run a GC. it is also possible to perfectly deduplicate the heap in subquadratic time, to validate pointers, guard against bad load/stores & to merge different heaps.

@paniq
paniq / loop_integration.md
Last active November 12, 2023 14:46
Loop Integration

Loop Integration

by Leonard Ritter

This article proposes loop integration, a generalization of loop unrolling to an arithmetic loop summation operator that permits static loop analysis and expansion with logarithmic complexity.

Rationale

In program optimization, we often encounter loops whose complexity we aim to reduce. Some loops can even be removed completely through folding of constant expressions. The classical technique for this is called loop unrolling: apply initial arguments to the loop body, evaluate, and if the result is constant, then reapply it again, and so forth, until the only path left leads out of the loop.