Skip to content

Instantly share code, notes, and snippets.

@zeux
zeux / stbench.py
Created January 18, 2024 20:14
Safetensors load/save benchmark (assumes input model is fp16 and converts to bf16)
import argparse
import json
import os
import safetensors
import safetensors.torch
import sys
import time
import torch
def fast_save_file(tensors, filename, metadata=None):
@zeux
zeux / matbench.c
Last active January 10, 2024 02:46
Matrix-vector multiplication benchmark, targeting Apple M1/M2/M3 -- expecting clang to vectorize the loop in dotprod_fpN functions using half-precision multiply adds. Requires OpenMP (from homebrew)
// brew install libomp
// cc -o matbench matbench.c -O3 -ffast-math -Xclang -fopenmp -I/opt/homebrew/opt/libomp/include -L/opt/homebrew/opt/libomp/lib -lomp
// ./matbench
#include <assert.h>
#include <math.h>
#include <omp.h>
#include <stdio.h>
#include <time.h>
@zeux
zeux / qt.js
Created October 9, 2023 03:01
Quaternion transformation precision
// This code looks at precision impact of transforming a vector repeatedly by a slightly-non-unit quaternion
// Slightly-non-unit quaternions are important: they result in the process of quaternion computations naturally
// Repeated transformations are important: they may occur during simulation or complex long chains of computation
// Note that because this code runs in JS in double precision, this doesn't model floating-point roundoff.
function applyQuaternion1( q, v ) {
const x = v.x, y = v.y, z = v.z;
const qx = q.x, qy = q.y, qz = q.z, qw = q.w;
@zeux
zeux / bounds-frag.glsl
Last active July 30, 2023 04:51
Shader code used in "Approximate projected bounds" article, used for profiling with offline cycle estimation tools.
#version 450
// 2D Polyhedral Bounds of a Clipped, Perspective-Projected 3D Sphere. Michael Mara, Morgan McGuire. 2013
bool projectSphereView(vec3 c, float r, float znear, float P00, float P11, out vec4 aabb)
{
if (c.z < r + znear) return false;
vec3 cr = c * r;
float czr2 = c.z * c.z - r * r;
@zeux
zeux / alphasort.cpp
Last active June 9, 2023 07:38
On Nature paper about sorting algorithms. Thread for context: https://mastodon.gamedev.place/@zeux/110510029570470184.
/*
The Nature paper about sorting algorithms has an "improvement" for sort3 that saves a mov.
Thread for context: https://mastodon.gamedev.place/@zeux/110510029570470184
This code is experimentally verifying that the proposed optimization is perf neutral
(aka is not improving performance). You'll need to remove the mov from all 3 versions
and retest; feel free to test one version at a time.
Cycle count established by using 'perf stat' on Ryzen 7 5900X - it does not depend on
whether the mov is there.
@zeux
zeux / meshlets.py
Last active December 20, 2022 02:19
Gather best meshlet configurations (from the topology perspective) for each meshlet size limit
tl = 512
for vl in [32, 64, 96, 128, 256]:
bestx = 0
besty = 0
bests = vl
for x in range(1, vl):
for y in range(1, vl):
v = (x+1)*(y+1)
@zeux
zeux / gctracker.lua
Last active December 2, 2023 03:56
GC tracker for Luau that provides more predicatable (compared to `__gc`...) destructor invocation for dead objects. Supports ~constant time update cost by limiting the iteration count such that update can be called every frame with a small n for negligible performance cost.
--!strict
--[[
BSD Zero Clause License
Copyright (c) 2022 Arseny Kapoulkine
Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted.
@zeux
zeux / gcpacer.md
Last active February 23, 2022 22:53
Luau GC exploration: doing some math on relationship between S / G / P

GC pacing

This document tries to establish a mathematical formulation for GC pacing in Luau GC, assuming a uniform rate of allocation in an application with steady live set.

GC algorithm assumptions

  • GC proceeds in three phases: mark, atomic, sweep
  • During mark, the heap size only grows as we don't deallocate memory short of table resize
  • During sweep, the heap size grows due to new allocations and shrinks due to swept objects
  • Live set is fixed at atomic time (between mark & sweep)
@zeux
zeux / usleep.cpp
Last active November 26, 2021 21:24
Run with intervals 100, 1000, 10000 as a command line input
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
#include <stdint.h>
#ifdef __APPLE__
#include <mach/mach_time.h>
#else
@zeux
zeux / luaubind.hpp
Last active November 24, 2021 20:28
A simple proof of concept for Luau function binding.
// This file is part of the Luau programming language and is licensed under MIT License; see LICENSE.txt for details
#pragma once
// Use this with luaL_Reg + luaL_register:
//
// static const luaL_Reg funcs[] = {
// {"test123", LUAU_BIND(test123)},
// {NULL, NULL},
// };