Skip to content

Instantly share code, notes, and snippets.

@zeux
zeux / gctracker.lua
Last active April 23, 2024 12:52
GC tracker for Luau that provides more predicatable (compared to `__gc`...) destructor invocation for dead objects. Supports ~constant time update cost by limiting the iteration count such that update can be called every frame with a small n for negligible performance cost.
--!strict
--[[
BSD Zero Clause License
Copyright (c) 2022 Arseny Kapoulkine
Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted.

State of Roblox graphics API across all platforms, with percentage deltas since EOY 2020. Updated December 31 2021.

Windows

API Share
Direct3D 11+ 92% (+3%)
Direct3D 10.1 5% (-2%)
Direct3D 10.0 3% (-0.5%)
@zeux
zeux / cone-culling-experiments.log
Last active February 19, 2024 08:38
Comparison of backface culling efficiency for cluster cone culling with 64-triangle clusters and triangle mask culling (6 64-bit masks per cluster).
Algorithms used for Cone* preprocess the mesh in some way, then split sequentially into 64-triangle clusters:
ConeBase: optimize mesh for transform cache
ConeSort: split mesh into large planar connected clusters, bin clusters into 6 buckets by cardinal axes, optimize each bucket for transform cache
ConeAcmr: optimize mesh for transform cache, split sequentially into variable length clusters that are relatively planar, sort clusters by avg normal
ConeCash: optimize mesh for transform cache, picking triangles that reduce ACMR but prioritizing those that keep current cluster planar
MaskBase: split sequentially into 64-triangle clusters, store a 64-bit conservative triangle mask for 6 frustums (cube faces)
ManyConeN: split sequentially into 64-triangle clusters, store N (up to 4) cones for each cluster and a cone id per triangle (2 bit)
Note that all Cone* solutions get significantly worse results with 128 or 256 triangle clusters; it doesn't matter much for Mask.
The biggest challenge with Cone* algorithms is t
@zeux
zeux / minid3d9.h
Created February 12, 2016 08:32
Minimal set of headers for D3D9
// This file is designed to be included in D3D9-dependent code instead of d3d9.h, while adding minimal amount of junk
#pragma once
#include <BaseTyps.h>
#include <BaseTsd.h>
// stdlib.h
#ifndef _INC_STDLIB
#define _INC_STDLIB
#endif
@zeux
zeux / clang27.md
Last active January 27, 2024 11:45
How does clang 2.7 hold up in 2021?

A friend recently learned about Proebsting's law and mentioned it to me off hand. I knew about the law's existence but I never really asked myself - do I believe in it?

For people who aren't aware, Proebsting's law states:

Compiler Advances Double Computing Power Every 18 Years

Which is to say, if you upgrade your compiler every 18 years, you would expect on average your code to double in performance on the same hardware.

Let's C about this

@zeux
zeux / stbench.py
Created January 18, 2024 20:14
Safetensors load/save benchmark (assumes input model is fp16 and converts to bf16)
import argparse
import json
import os
import safetensors
import safetensors.torch
import sys
import time
import torch
def fast_save_file(tensors, filename, metadata=None):
@zeux
zeux / matbench.c
Last active January 10, 2024 02:46
Matrix-vector multiplication benchmark, targeting Apple M1/M2/M3 -- expecting clang to vectorize the loop in dotprod_fpN functions using half-precision multiply adds. Requires OpenMP (from homebrew)
// brew install libomp
// cc -o matbench matbench.c -O3 -ffast-math -Xclang -fopenmp -I/opt/homebrew/opt/libomp/include -L/opt/homebrew/opt/libomp/lib -lomp
// ./matbench
#include <assert.h>
#include <math.h>
#include <omp.h>
#include <stdio.h>
#include <time.h>
@zeux
zeux / qt.js
Created October 9, 2023 03:01
Quaternion transformation precision
// This code looks at precision impact of transforming a vector repeatedly by a slightly-non-unit quaternion
// Slightly-non-unit quaternions are important: they result in the process of quaternion computations naturally
// Repeated transformations are important: they may occur during simulation or complex long chains of computation
// Note that because this code runs in JS in double precision, this doesn't model floating-point roundoff.
function applyQuaternion1( q, v ) {
const x = v.x, y = v.y, z = v.z;
const qx = q.x, qy = q.y, qz = q.z, qw = q.w;

State of Roblox graphics API across all platforms, with percentage deltas since EOY 2018. Updated December 29 2019.

Windows

API Share
Direct3D 11+ 85% (+5%)
Direct3D 10.1 8.5% (-1.5%)
Direct3D 10.0 5.5% (-2.5%)
Direct3D 9 1% (-1%)
@zeux
zeux / bounds-frag.glsl
Last active July 30, 2023 04:51
Shader code used in "Approximate projected bounds" article, used for profiling with offline cycle estimation tools.
#version 450
// 2D Polyhedral Bounds of a Clipped, Perspective-Projected 3D Sphere. Michael Mara, Morgan McGuire. 2013
bool projectSphereView(vec3 c, float r, float znear, float P00, float P11, out vec4 aabb)
{
if (c.z < r + znear) return false;
vec3 cr = c * r;
float czr2 = c.z * c.z - r * r;