Skip to content

Instantly share code, notes, and snippets.

raysan5 /
Last active Aug 6, 2020
A small state-of-the-art study on custom engines



A couple of weeks ago I played (and finished) A Plague Tale, a game by Asobo Studio. I was really captivated by the game, not only by the beautiful graphics but also by the story and the locations in the game. I decided to investigate a bit about the game tech and I was surprised to see it was developed with a custom engine by a relatively small studio. I know there are some companies using custom engines but it's very difficult to find a detailed market study with that kind of information curated and updated. So this article.

Nowadays lots of companies choose engines like Unreal or Unity for their games (or that's what lot of people think) because d

View perfect-quantization-dxt-endpoints.txt
Perfect Quantization of DXT endpoints
One of the issues that affect the quality of most DXT compressors is the way floating point colors are rounded.
For example, stb_dxt does:
max16 = (unsigned short)(stb__sclamp((At1_r*yy - At2_r*xy)*frb+0.5f,0,31) << 11);
max16 |= (unsigned short)(stb__sclamp((At1_g*yy - At2_g*xy)*fg +0.5f,0,63) << 5);
max16 |= (unsigned short)(stb__sclamp((At1_b*yy - At2_b*xy)*frb+0.5f,0,31) << 0);

I was told by @mmozeiko that Address Sanitizer (ASAN) works on Windows now. I'd tried it a few years ago with no luck, so this was exciting news to hear.

It was a pretty smooth experience, but with a few gotchas I wanted to document.

First, download and run the LLVM installer for Windows:

Then download and install the VS extension if you're a Visual Studio 2017 user like I am.

It's now very easy to use Clang to build your existing MSVC projects since there's a cl compatible frontend:

sebbbi / FramentShaderWaveCoherency.txt
Last active Nov 28, 2018
FramentShaderWaveCoherency test shader (Vulkan 1.1)
View FramentShaderWaveCoherency.txt
#version 450
#extension GL_ARB_separate_shader_objects : enable
#extension GL_KHR_shader_subgroup_basic : enable
#extension GL_KHR_shader_subgroup_ballot : enable
#extension GL_KHR_shader_subgroup_vote : enable
#extension GL_KHR_shader_subgroup_arithmetic : enable
layout(location = 0) out vec4 outColor;
sebbbi / FastUniformLoadWithWaveOps.txt
Last active Apr 5, 2020
Fast uniform load with wave ops (up to 64x speedup)
View FastUniformLoadWithWaveOps.txt
In shader programming, you often run into a problem where you want to iterate an array in memory over all pixels in a compute shader
group (tile). Tiled deferred lighting is the most common case. 8x8 tile loops over a light list culled for that tile.
Simplified HLSL code looks like this:
Buffer<float4> lightDatas;
Texture2D<uint2> lightStartCounts;
RWTexture2D<float4> output;
[numthreads(8, 8, 1)]
View microsoft_craziness.h
// Author: Jonathan Blow
// Version: 1
// Date: 31 August, 2018
// This code is released under the MIT license, which you can find at

why doesn't radfft support AVX on PC?

So there's two separate issues here: using instructions added in AVX and using 256-bit wide vectors. The former turns out to be much easier than the latter for our use case.

Problem number 1 was that you positively need to put AVX code in a separate file with different compiler settings (/arch:AVX for VC++, -mavx for GCC/Clang) that make all SSE code emitted also use VEX encoding, and at the time radfft was written there was no way in CDep to set compiler flags for just one file, just for the overall build.

[There's the GCC "target" annotations on individual funcs, which in principle fix this, but I ran into nasty problems with this for several compiler versions, and VC++ has no equivalent, so we're not currently using that and just sticking with different compilation units.]

The other issue is to do with CPU power management.

View TinyCRT.h
// TinyCRT, revamp and TinyWin support by Don Williamson, 2011
// Based on and LIBCTINY by Matt Pietrek
#pragma once
twoscomplement / TransientFunction.h
Last active Apr 15, 2018
TransientFunction: A light-weight alternative to std::function [C++11]
View TransientFunction.h
// TransientFuction: A light-weight alternative to std::function [C++11]
// Pass any callback - including capturing lambdas - cheaply and quickly as a
// function argument
// Based on:
// - No instantiation of called function at each call site
// - Simple to use - use TransientFunction<> as the function argument
// - Low cost: cheap setup, one indirect function call to invoke
You can’t perform that action at this time.