Skip to content

Instantly share code, notes, and snippets.

State of Roblox graphics API across all platforms, with percentage deltas since EOY 2018. Updated December 29 2019.

Windows

API Share
Direct3D 11+ 85% (+5%)
Direct3D 10.1 8.5% (-1.5%)
Direct3D 10.0 5.5% (-2.5%)
Direct3D 9 1% (-1%)
@zeux
zeux / bounds-frag.glsl
Last active July 30, 2023 04:51
Shader code used in "Approximate projected bounds" article, used for profiling with offline cycle estimation tools.
#version 450
// 2D Polyhedral Bounds of a Clipped, Perspective-Projected 3D Sphere. Michael Mara, Morgan McGuire. 2013
bool projectSphereView(vec3 c, float r, float znear, float P00, float P11, out vec4 aabb)
{
if (c.z < r + znear) return false;
vec3 cr = c * r;
float czr2 = c.z * c.z - r * r;
@zeux
zeux / nlerpsimd.cpp
Created May 6, 2016 03:40
A very fast version of nlerp with precision tweaks to make it match slerp, with SSE2 and AVX2 optimizations.
#include <stdio.h>
#include <math.h>
#include <immintrin.h>
#include <vector>
#include <type_traits>
#ifdef IACA
#include <iacaMarks.h>
#else
@zeux
zeux / builtin.lua
Created May 20, 2020 15:43
Roblox Luau type surface as of May 20, 2020
export type any=any
export type nil=nil
export type string=string
export type number=number
export type boolean=boolean
@zeux
zeux / alphasort.cpp
Last active June 9, 2023 07:38
On Nature paper about sorting algorithms. Thread for context: https://mastodon.gamedev.place/@zeux/110510029570470184.
/*
The Nature paper about sorting algorithms has an "improvement" for sort3 that saves a mov.
Thread for context: https://mastodon.gamedev.place/@zeux/110510029570470184
This code is experimentally verifying that the proposed optimization is perf neutral
(aka is not improving performance). You'll need to remove the mov from all 3 versions
and retest; feel free to test one version at a time.
Cycle count established by using 'perf stat' on Ryzen 7 5900X - it does not depend on
whether the mov is there.
@zeux
zeux / crtheap.cpp
Created February 12, 2016 08:26
Overriding CRT heap functions to use a custom allocator.
// User-defined global heap prototypes
extern void mem_global_init();
extern void mem_global_term();
extern void* mem_global_allocate(size_t size, size_t align);
extern void mem_global_deallocate(void* ptr);
extern size_t mem_global_get_size(void* ptr);
// Actual code
#define BREAK() __debugbreak()
#include <windows.h>
#include <GL/gl.h>
#include <stdio.h>
#pragma comment(lib, "opengl32.lib")
void glview()
{
HWND hWnd = CreateWindow(L"ListBox", L"Title", WS_OVERLAPPEDWINDOW, CW_USEDEFAULT, CW_USEDEFAULT, CW_USEDEFAULT, CW_USEDEFAULT, NULL, NULL, NULL, NULL);
HDC hDC = GetDC(hWnd);
@zeux
zeux / meshlets.py
Last active December 20, 2022 02:19
Gather best meshlet configurations (from the topology perspective) for each meshlet size limit
tl = 512
for vl in [32, 64, 96, 128, 256]:
bestx = 0
besty = 0
bests = vl
for x in range(1, vl):
for y in range(1, vl):
v = (x+1)*(y+1)
@zeux
zeux / generic-permute.hpp
Created February 6, 2015 08:21
Compile-time translation of Altivec-style (constant) shuffle into SSE instructions
#pragma once
#define V_SWIZZLE_X 0
#define V_SWIZZLE_Y 1
#define V_SWIZZLE_Z 2
#define V_SWIZZLE_W 3
#define V_PERMUTE_LX 0
#define V_PERMUTE_LY 1
#define V_PERMUTE_LZ 2
1.
a. Avoiding Catastrophic Performance Loss: Detecting CPU-GPU Sync Points
Speaker: John McDonald (NVIDIA)
b. Creating FPS Open Worlds Using Procedural Techniques
Speaker: Tom Betts (Big Robot)
c. Mantle - Introducing a New API for Graphics (Presented by AMD)
Speakers: Stephan Hodes (AMD), John Larkin (AMD), Guennadi Riguer (AMD), Gordon Selley (AMD)
d. OpenGL ES 3.0 and Beyond: How To Deliver Desktop Graphics on Mobile Platforms (Presented by Intel Corp)
Speakers: Jon Kennedy (Intel Corp), Chris Kirkpatrick (Intel Corp)
e. Optimizing Mobile Games with Gameloft and ARM (Presented by ARM)