Daniel Ridner danielrid

## GPUOptimizationForGameDev.md

      
              1 file
            
          
              133 forks
            
          
                51 comments
              
            
              1435 stars
            
          
                silvesthu
                / GPUOptimizationForGameDev.md
            
            
              Last active
              November 5, 2025 06:05
            
              
                GPU Optimization for GameDev
              
          
    GPU Optimization for GameDev

Graphics Pipeline / GPU Architecture Overview


2011 - A trip through the Graphics Pipeline 2011
2013 - Performance Optimization Guidelines and the GPU Architecture behind them
2015 - Life of a triangle - NVIDIA's logical pipeline
2015 - Render Hell 2.0
2016 - How bad are small triangles on GPU and why?
2017 - GPU Performance for Game Artists
2019 - Understanding the anatomy of GPUs using Pokémon


## avx_sigh.md

      
              1 file
            
          
              3 forks
            
          
                0 comments
              
            
              66 stars
            
          
                rygorous
                / avx_sigh.md
            
            
              Last active
              June 1, 2024 02:05
            
          
why doesn't radfft support AVX on PC?

So there's two separate issues here: using instructions added in AVX and using 256-bit wide vectors. The former turns out to be much easier than the latter for our use case.
Problem number 1 was that you positively need to put AVX code in a separate file with different compiler settings (/arch:AVX for VC++, -mavx for GCC/Clang) that make all SSE code emitted also use VEX encoding, and at the time radfft was written there was no way in CDep to set compiler flags for just one file, just for the overall build.
[There's the GCC "target" annotations on individual funcs, which in principle fix this, but I ran into nasty problems with this for several compiler versions, and VC++ has no equivalent, so we're not currently using that and just sticking with different compilation units.]
The other issue is to do with CPU power management.

  
## Reflection.h

#pragma once


#include <typestring/typestring.hh>


namespace rfl
{
	// Compile-time data member description

## TransientFunction.h
// TransientFuction: A light-weight alternative to std::function [C++11]
// Pass any callback - including capturing lambdas - cheaply and quickly as a
// function argument
//
// Based on:
// https://deplinenoise.wordpress.com/2014/02/23/using-c11-capturing-lambdas-w-vanilla-c-api-functions/
//
//  - No instantiation of called function at each call site
//  - Simple to use - use TransientFunction<> as the function argument
//  - Low cost: cheap setup, one indirect function call to invoke

## Tex2DCatmullRom.hlsl
// The following code is licensed under the MIT license: https://gist.github.com/TheRealMJP/bc503b0b87b643d3505d41eab8b332ae

// Samples a texture with Catmull-Rom filtering, using 9 texture fetches instead of 16.
// See http://vec3.ca/bicubic-filtering-in-fewer-taps/ for more details
float4 SampleTextureCatmullRom(in Texture2D<float4> tex, in SamplerState linearSampler, in float2 uv, in float2 texSize)
{
    // We're going to sample a a 4x4 grid of texels surrounding the target UV coordinate. We'll do this by rounding
    // down the sample location to get the exact center of our "starting" texel. The starting texel will be at
    // location [1, 1] in the grid, where [0, 0] is the top left corner.
    float2 samplePos = uv * texSize;

## orthodoxc++.md

      
              1 file
            
          
              72 forks
            
          
                185 comments
              
            
              1244 stars
            
          
                bkaradzic
                / orthodoxc++.md
            
            
              Last active
              November 3, 2025 04:11
            
              
                Orthodox C++
              
          
    Orthodox C++

This article has been updated and is available here.


## aes-ni.c
#include <stdint.h>     //for int8_t
#include <string.h>     //for memcmp
#include <wmmintrin.h>  //for intrinsics for AES-NI
//compile using gcc and following arguments: -g;-O0;-Wall;-msse2;-msse;-march=native;-maes

//internal stuff

//macros
#define DO_ENC_BLOCK(m,k) \
	do{\

## tinder-api-documentation.md

      
              1 file
            
          
              155 forks
            
          
                362 comments
              
            
              845 stars
            
          
                rtt
                / tinder-api-documentation.md
            
            
              Last active
              October 6, 2025 20:20
            
              
                Tinder API Documentation
              
          
    Tinder API documentation

Note: this was written in April/May 2014 and the API may has definitely changed since. I have nothing to do with Tinder, nor its API, and I do not offer any support for anything you may build on top of this. Proceed with caution

http://rsty.org/
I've sniffed most of the Tinder API to see how it works. You can use this to create bots (etc) very trivially. Some example python bot code is here -> https://gist.github.com/rtt/5a2e0cfa638c938cca59 (horribly quick and dirty, you've been warned!)

	#pragma once


	#include <typestring/typestring.hh>


	namespace rfl
	{
	// Compile-time data member description
	// TransientFuction: A light-weight alternative to std::function [C++11]
	// Pass any callback - including capturing lambdas - cheaply and quickly as a
	// function argument
	//
	// Based on:
	// https://deplinenoise.wordpress.com/2014/02/23/using-c11-capturing-lambdas-w-vanilla-c-api-functions/
	//
	// - No instantiation of called function at each call site
	// - Simple to use - use TransientFunction<> as the function argument
	// - Low cost: cheap setup, one indirect function call to invoke
	// The following code is licensed under the MIT license: https://gist.github.com/TheRealMJP/bc503b0b87b643d3505d41eab8b332ae

	// Samples a texture with Catmull-Rom filtering, using 9 texture fetches instead of 16.
	// See http://vec3.ca/bicubic-filtering-in-fewer-taps/ for more details
	float4 SampleTextureCatmullRom(in Texture2D<float4> tex, in SamplerState linearSampler, in float2 uv, in float2 texSize)
	{
	// We're going to sample a a 4x4 grid of texels surrounding the target UV coordinate. We'll do this by rounding
	// down the sample location to get the exact center of our "starting" texel. The starting texel will be at
	// location [1, 1] in the grid, where [0, 0] is the top left corner.
	float2 samplePos = uv * texSize;
	#include <stdint.h> //for int8_t
	#include <string.h> //for memcmp
	#include <wmmintrin.h> //for intrinsics for AES-NI
	//compile using gcc and following arguments: -g;-O0;-Wall;-msse2;-msse;-march=native;-maes

	//internal stuff

	//macros
	#define DO_ENC_BLOCK(m,k) \
	do{\