Arseny Kapoulkine zeux

## gctracker.lua
--!strict

--[[
BSD Zero Clause License

Copyright (c) 2022 Arseny Kapoulkine

Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted.

## roblox-graphics-apis-2021.md

      
              1 file
            
          
              0 forks
            
          
              4 comments
            
          
              7 stars
            
          
                zeux
                / roblox-graphics-apis-2021.md
            
            
              Last active
              March 17, 2024 05:37
            
          
    State of Roblox graphics API across all platforms, with percentage deltas since EOY 2020. Updated December 31 2021.
Windows


API
Share


Direct3D 11+
92% (+3%)


Direct3D 10.1
5% (-2%)


Direct3D 10.0
3% (-0.5%)


## cone-culling-experiments.log
Algorithms used for Cone* preprocess the mesh in some way, then split sequentially into 64-triangle clusters:
ConeBase: optimize mesh for transform cache
ConeSort: split mesh into large planar connected clusters, bin clusters into 6 buckets by cardinal axes, optimize each bucket for transform cache
ConeAcmr: optimize mesh for transform cache, split sequentially into variable length clusters that are relatively planar, sort clusters by avg normal
ConeCash: optimize mesh for transform cache, picking triangles that reduce ACMR but prioritizing those that keep current cluster planar
MaskBase: split sequentially into 64-triangle clusters, store a 64-bit conservative triangle mask for 6 frustums (cube faces)
ManyConeN: split sequentially into 64-triangle clusters, store N (up to 4) cones for each cluster and a cone id per triangle (2 bit)

Note that all Cone* solutions get significantly worse results with 128 or 256 triangle clusters; it doesn't matter much for Mask.
The biggest challenge with Cone* algorithms is t

## minid3d9.h
// This file is designed to be included in D3D9-dependent code instead of d3d9.h, while adding minimal amount of junk
#pragma once

#include <BaseTyps.h>
#include <BaseTsd.h>

// stdlib.h
#ifndef _INC_STDLIB
#define _INC_STDLIB
#endif

## clang27.md

      
              1 file
            
          
              0 forks
            
          
              12 comments
            
          
              39 stars
            
          
                zeux
                / clang27.md
            
            
              Last active
              January 27, 2024 11:45
            
              
                How does clang 2.7 hold up in 2021?
              
          
    A friend recently learned about Proebsting's law and mentioned it to me off hand.
I knew about the law's existence but I never really asked myself - do I believe in it?
For people who aren't aware, Proebsting's law states:

Compiler Advances Double Computing Power Every 18 Years

Which is to say, if you upgrade your compiler every 18 years, you would expect on average your code to double in performance on the same hardware.
Let's C about this


## stbench.py
import argparse
import json
import os
import safetensors
import safetensors.torch
import sys
import time
import torch

def fast_save_file(tensors, filename, metadata=None):

## matbench.c
// brew install libomp
// cc -o matbench matbench.c -O3 -ffast-math -Xclang -fopenmp -I/opt/homebrew/opt/libomp/include -L/opt/homebrew/opt/libomp/lib -lomp
// ./matbench

#include <assert.h>
#include <math.h>
#include <omp.h>
#include <stdio.h>
#include <time.h>

## qt.js
// This code looks at precision impact of transforming a vector repeatedly by a slightly-non-unit quaternion
// Slightly-non-unit quaternions are important: they result in the process of quaternion computations naturally
// Repeated transformations are important: they may occur during simulation or complex long chains of computation

// Note that because this code runs in JS in double precision, this doesn't model floating-point roundoff.

function applyQuaternion1( q, v ) {

	const x = v.x, y = v.y, z = v.z;
	const qx = q.x, qy = q.y, qz = q.z, qw = q.w;

## roblox-graphics-apis-2019.md

      
              1 file
            
          
              0 forks
            
          
              1 comment
            
          
              16 stars
            
          
                zeux
                / roblox-graphics-apis-2019.md
            
            
              Last active
              October 3, 2023 08:56
            
          
    State of Roblox graphics API across all platforms, with percentage deltas since EOY 2018. Updated December 29 2019.
Windows


API
Share


Direct3D 11+
85% (+5%)


Direct3D 10.1
8.5% (-1.5%)


Direct3D 10.0
5.5% (-2.5%)


Direct3D 9
1% (-1%)


## bounds-frag.glsl
#version 450

// 2D Polyhedral Bounds of a Clipped, Perspective-Projected 3D Sphere. Michael Mara, Morgan McGuire. 2013
bool projectSphereView(vec3 c, float r, float znear, float P00, float P11, out vec4 aabb)
{
    if (c.z < r + znear) return false;

    vec3 cr = c * r;
    float czr2 = c.z * c.z - r * r;
	--!strict

	--[[
	BSD Zero Clause License

	Copyright (c) 2022 Arseny Kapoulkine

	Permission to use, copy, modify, and/or distribute this software for any
	purpose with or without fee is hereby granted.
API	Share
Direct3D 11+	92% (+3%)
Direct3D 10.1	5% (-2%)
Direct3D 10.0	3% (-0.5%)
	Algorithms used for Cone* preprocess the mesh in some way, then split sequentially into 64-triangle clusters:
	ConeBase: optimize mesh for transform cache
	ConeSort: split mesh into large planar connected clusters, bin clusters into 6 buckets by cardinal axes, optimize each bucket for transform cache
	ConeAcmr: optimize mesh for transform cache, split sequentially into variable length clusters that are relatively planar, sort clusters by avg normal
	ConeCash: optimize mesh for transform cache, picking triangles that reduce ACMR but prioritizing those that keep current cluster planar
	MaskBase: split sequentially into 64-triangle clusters, store a 64-bit conservative triangle mask for 6 frustums (cube faces)
	ManyConeN: split sequentially into 64-triangle clusters, store N (up to 4) cones for each cluster and a cone id per triangle (2 bit)

	Note that all Cone* solutions get significantly worse results with 128 or 256 triangle clusters; it doesn't matter much for Mask.
	The biggest challenge with Cone* algorithms is t
	// This file is designed to be included in D3D9-dependent code instead of d3d9.h, while adding minimal amount of junk
	#pragma once

	#include <BaseTyps.h>
	#include <BaseTsd.h>

	// stdlib.h
	#ifndef _INC_STDLIB
	#define _INC_STDLIB
	#endif
	import argparse
	import json
	import os
	import safetensors
	import safetensors.torch
	import sys
	import time
	import torch

	def fast_save_file(tensors, filename, metadata=None):
	// brew install libomp
	// cc -o matbench matbench.c -O3 -ffast-math -Xclang -fopenmp -I/opt/homebrew/opt/libomp/include -L/opt/homebrew/opt/libomp/lib -lomp
	// ./matbench

	#include <assert.h>
	#include <math.h>
	#include <omp.h>
	#include <stdio.h>
	#include <time.h>
	// This code looks at precision impact of transforming a vector repeatedly by a slightly-non-unit quaternion
	// Slightly-non-unit quaternions are important: they result in the process of quaternion computations naturally
	// Repeated transformations are important: they may occur during simulation or complex long chains of computation

	// Note that because this code runs in JS in double precision, this doesn't model floating-point roundoff.

	function applyQuaternion1( q, v ) {

	const x = v.x, y = v.y, z = v.z;
	const qx = q.x, qy = q.y, qz = q.z, qw = q.w;
API	Share
Direct3D 11+	85% (+5%)
Direct3D 10.1	8.5% (-1.5%)
Direct3D 10.0	5.5% (-2.5%)
Direct3D 9	1% (-1%)
	#version 450

	// 2D Polyhedral Bounds of a Clipped, Perspective-Projected 3D Sphere. Michael Mara, Morgan McGuire. 2013
	bool projectSphereView(vec3 c, float r, float znear, float P00, float P11, out vec4 aabb)
	{
	if (c.z < r + znear) return false;

	vec3 cr = c * r;
	float czr2 = c.z * c.z - r * r;