John Calsbeek Nexuapex

## TransientFunction.h
// TransientFuction: A light-weight alternative to std::function [C++11]
// Pass any callback - including capturing lambdas - cheaply and quickly as a
// function argument
//
// Based on:
// https://deplinenoise.wordpress.com/2014/02/23/using-c11-capturing-lambdas-w-vanilla-c-api-functions/
//
//  - No instantiation of called function at each call site
//  - Simple to use - use TransientFunction<> as the function argument
//  - Low cost: cheap setup, one indirect function call to invoke

## effective_modern_cmake.md

      
              1 file
            
          
              271 forks
            
          
              59 comments
            
          
              2538 stars
            
          
                mbinna
                / effective_modern_cmake.md
            
            
              Last active
              April 18, 2024 19:26
            
              
                Effective Modern CMake
              
          
    Effective Modern CMake

Getting Started

For a brief user-level introduction to CMake, watch C++ Weekly, Episode 78, Intro to CMake by Jason Turner. LLVM’s CMake Primer provides a good high-level introduction to the CMake syntax. Go read it now.
After that, watch Mathieu Ropert’s CppCon 2017 talk Using Modern CMake Patterns to Enforce a Good Modular Design (slides). It provides a thorough explanation of what modern CMake is and why it is so much better than “old school” CMake. The modular design ideas in this talk are based on the book [Large-Scale C++ Software Design](https://www.amazon.de/Large-Scale-Soft

  
## mergesort_kit.cpp
#include <emmintrin.h>
#include <tmmintrin.h> // for PSHUFB; this isn't strictly necessary (see comments in reverse_s16)

typedef int16_t S16;
typedef __m128i Vec;

static inline Vec  load8_s16(const S16 *x)      { return _mm_loadu_si128((const __m128i *) x); }
static inline void store8_s16(S16 *x, Vec v)    { _mm_storeu_si128((__m128i *) x, v); }

static inline void sort_two(Vec &a, Vec &b)     { Vec t = a; a = _mm_min_epi16(a, b); b = _mm_max_epi16(b, t); }

## dram_latency_then_and_now.md

      
              1 file
            
          
              3 forks
            
          
              5 comments
            
          
              27 stars
            
          
                pervognsen
                / dram_latency_then_and_now.md
            
            
              Last active
              September 21, 2023 07:35
            
          
    One thing that surprises newer programmers is that the older 8-bit microcomputers from the 70s and 80s were designed
to run at the speed of random memory access to DRAM and ROM. The C64 was released in 1982 when I was born and its
6502 CPU ran at 1 MHz (give or take depending on NTSC vs PAL). It had a 2-stage pipelined design that was designed to
overlap execution and instruction fetch for the current and next instruction. Cycle counting was simple to understand
and master since it was based almost entirely on the number of memory accesses (1 cycle each), with a 1-cycle penalty
for taken branches because of the pipelined instruction fetch for the next sequential instruction. So, the entire
architecture was based on keeping the memory subsystem busy 100% of the time by issuing a read or write every cycle.
One-byte instructions with no memory operands like INX still take the minimum 2 cycles per instruction and end up
redundantly issuing the same memory request two cycles in a row.

  
## bluenoise.md

      
              1 file
            
          
              10 forks
            
          
              1 comment
            
          
              107 stars
            
          
                pixelmager
                / bluenoise.md
            
            
              Last active
              October 11, 2023 07:05
            
              
                Blue Noise links
              
          
    Use cases


Bluenoise in the game INSIDE (dithering, raymarching, reflections)
Dithering, Ray marching, shadows etc
A Survery of Blue Noise and Its Applications

Textures/Matrices for direct use (data!)


Moments In Graphics (void-and-cluster)

2D
3D and 4D


Bart Wronski Implementation of Solid Angle algorithm


## Tex2DCatmullRom.hlsl
// The following code is licensed under the MIT license: https://gist.github.com/TheRealMJP/bc503b0b87b643d3505d41eab8b332ae

// Samples a texture with Catmull-Rom filtering, using 9 texture fetches instead of 16.
// See http://vec3.ca/bicubic-filtering-in-fewer-taps/ for more details
float4 SampleTextureCatmullRom(in Texture2D<float4> tex, in SamplerState linearSampler, in float2 uv, in float2 texSize)
{
    // We're going to sample a a 4x4 grid of texels surrounding the target UV coordinate. We'll do this by rounding
    // down the sample location to get the exact center of our "starting" texel. The starting texel will be at
    // location [1, 1] in the grid, where [0, 0] is the top left corner.
    float2 samplePos = uv * texSize;

## gist:e0f055bfb74e3d5f0af20690759de5a7
Why do compilers even bother with exploiting undefinedness signed overflow? And what are those
mysterious cases where it helps?

A lot of people (myself included) are against transforms that aggressively exploit undefined behavior, but
I think it's useful to know what compiler writers are accomplishing by this.

TL;DR: C doesn't work very well if int!=register width, but (for backwards compat) int is 32-bit on all
major 64-bit targets, and this causes quite hairy problems for code generation and optimization in some
fairly common cases. The signed overflow UB exploitation is an attempt to work around this.

## cool-game-programming-blogs.opml
<?xml version="1.0" encoding="UTF-8"?>
<opml version="1.0">
    <head>
        <title>Graphics, Games, Programming, and Physics Blogs</title>
    </head>
    <body>
        <outline text="Tech News" title="Tech News">
            <outline type="rss" text="Ars Technica" title="Ars Technica" xmlUrl="http://feeds.arstechnica.com/arstechnica/index/" htmlUrl="https://arstechnica.com"/>
            <outline type="rss" text="Polygon - Full" title="Polygon - Full" xmlUrl="http://www.polygon.com/rss/index.xml" htmlUrl="https://www.polygon.com/"/>
            <outline type="rss" text="Road to VR" title="Road to VR" xmlUrl="http://www.roadtovr.com/feed" htmlUrl="https://www.roadtovr.com"/>

## BTree.cpp
enum { BMAX = 32, BCUT = BMAX / 2, BHEIGHT = 6 };

typedef uint8_t BIndex;

struct BNode {
    BIndex length;
    Key keys[BMAX];
    union {
        BNode *children[BMAX];
        Value values[BMAX];

## D3D9 reflection data reading
// NOTE: We parse the constant table by hand since shadergen has to link against the
// Xenon (XBox 360) d3dx lib statically if Xbox builds are to be supported. That means
// we can't easily use non-Xenon D3DX from here.

// **** This part copy & pasted from D3DX headers (but it's a file format so it's consistent
// across versions)

//----------------------------------------------------------------------------
// D3DXSHADER_CONSTANTTABLE:
// -------------------------
	// TransientFuction: A light-weight alternative to std::function [C++11]
	// Pass any callback - including capturing lambdas - cheaply and quickly as a
	// function argument
	//
	// Based on:
	// https://deplinenoise.wordpress.com/2014/02/23/using-c11-capturing-lambdas-w-vanilla-c-api-functions/
	//
	// - No instantiation of called function at each call site
	// - Simple to use - use TransientFunction<> as the function argument
	// - Low cost: cheap setup, one indirect function call to invoke
	#include <emmintrin.h>
	#include <tmmintrin.h> // for PSHUFB; this isn't strictly necessary (see comments in reverse_s16)

	typedef int16_t S16;
	typedef __m128i Vec;

	static inline Vec load8_s16(const S16 x) { return _mm_loadu_si128((const __m128i ) x); }
	static inline void store8_s16(S16 x, Vec v) { _mm_storeu_si128((__m128i ) x, v); }

	static inline void sort_two(Vec &a, Vec &b) { Vec t = a; a = _mm_min_epi16(a, b); b = _mm_max_epi16(b, t); }
	// The following code is licensed under the MIT license: https://gist.github.com/TheRealMJP/bc503b0b87b643d3505d41eab8b332ae

	// Samples a texture with Catmull-Rom filtering, using 9 texture fetches instead of 16.
	// See http://vec3.ca/bicubic-filtering-in-fewer-taps/ for more details
	float4 SampleTextureCatmullRom(in Texture2D<float4> tex, in SamplerState linearSampler, in float2 uv, in float2 texSize)
	{
	// We're going to sample a a 4x4 grid of texels surrounding the target UV coordinate. We'll do this by rounding
	// down the sample location to get the exact center of our "starting" texel. The starting texel will be at
	// location [1, 1] in the grid, where [0, 0] is the top left corner.
	float2 samplePos = uv * texSize;
	Why do compilers even bother with exploiting undefinedness signed overflow? And what are those
	mysterious cases where it helps?

	A lot of people (myself included) are against transforms that aggressively exploit undefined behavior, but
	I think it's useful to know what compiler writers are accomplishing by this.

	TL;DR: C doesn't work very well if int!=register width, but (for backwards compat) int is 32-bit on all
	major 64-bit targets, and this causes quite hairy problems for code generation and optimization in some
	fairly common cases. The signed overflow UB exploitation is an attempt to work around this.
	<?xml version="1.0" encoding="UTF-8"?>
	<opml version="1.0">
	<head>
	<title>Graphics, Games, Programming, and Physics Blogs</title>
	</head>
	<body>
	<outline text="Tech News" title="Tech News">
	<outline type="rss" text="Ars Technica" title="Ars Technica" xmlUrl="http://feeds.arstechnica.com/arstechnica/index/" htmlUrl="https://arstechnica.com"/>
	<outline type="rss" text="Polygon - Full" title="Polygon - Full" xmlUrl="http://www.polygon.com/rss/index.xml" htmlUrl="https://www.polygon.com/"/>
	<outline type="rss" text="Road to VR" title="Road to VR" xmlUrl="http://www.roadtovr.com/feed" htmlUrl="https://www.roadtovr.com"/>
	enum { BMAX = 32, BCUT = BMAX / 2, BHEIGHT = 6 };

	typedef uint8_t BIndex;

	struct BNode {
	BIndex length;
	Key keys[BMAX];
	union {
	BNode *children[BMAX];
	Value values[BMAX];
	// NOTE: We parse the constant table by hand since shadergen has to link against the
	// Xenon (XBox 360) d3dx lib statically if Xbox builds are to be supported. That means
	// we can't easily use non-Xenon D3DX from here.

	// **** This part copy & pasted from D3DX headers (but it's a file format so it's consistent
	// across versions)

	//----------------------------------------------------------------------------
	// D3DXSHADER_CONSTANTTABLE:
	// -------------------------