Skip to content

Instantly share code, notes, and snippets.

View cdyk's full-sized avatar

Christopher Dyken cdyk

View GitHub Profile
@sebbbi
sebbbi / FastUniformLoadWithWaveOps.txt
Last active May 11, 2024 07:37
Fast uniform load with wave ops (up to 64x speedup)
In shader programming, you often run into a problem where you want to iterate an array in memory over all pixels in a compute shader
group (tile). Tiled deferred lighting is the most common case. 8x8 tile loops over a light list culled for that tile.
Simplified HLSL code looks like this:
Buffer<float4> lightDatas;
Texture2D<uint2> lightStartCounts;
RWTexture2D<float4> output;
[numthreads(8, 8, 1)]
@rygorous
rygorous / conj_split_radix_fft.cpp
Created October 24, 2017 04:11
Reference conjugate-pair split-radix FFT impl
#include <complex>
static size_t const kMaxN = 2048; // This is the largest straight FFT we support
typedef std::complex<float> complexf; // or use whatever other complex number type you prefer
// This is a "mip chain" of precomputed twiddle factors
// Twiddles for length-1 start at index 1,
// twiddles for length-2 start at index 2,
// ...,
@rygorous
rygorous / mergesort_kit.cpp
Last active August 28, 2023 10:13
int16 mergesort construction kit
#include <emmintrin.h>
#include <tmmintrin.h> // for PSHUFB; this isn't strictly necessary (see comments in reverse_s16)
typedef int16_t S16;
typedef __m128i Vec;
static inline Vec load8_s16(const S16 *x) { return _mm_loadu_si128((const __m128i *) x); }
static inline void store8_s16(S16 *x, Vec v) { _mm_storeu_si128((__m128i *) x, v); }
static inline void sort_two(Vec &a, Vec &b) { Vec t = a; a = _mm_min_epi16(a, b); b = _mm_max_epi16(b, t); }
@NocturnDragon
NocturnDragon / Swizzles.h
Last active October 15, 2023 01:20
Swizzles in Clang, GCC, and Visual c++
#include <stdio.h>
// #define CLANG_EXTENSION
// Clang compile with -O3
#define VS_EXTENSION
// https://godbolt.org/z/sVWrF4
// Clang compile with -O3 -fms-compatibility
// VS2017 compile with /O3
@v3n
v3n / ogl_osx.md
Last active November 21, 2023 06:09
GLFW on OS X starting guide

OpenGL Development on OS X

While it's possible to download packages and install them manually, it's such a hassle. Fortunately for us, OS X has an unofficial package manager called http://brew.sh Let's install it. Open you Terminal and paste the following code:

ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Great. Homebrew will automatically install packages to /usr/local. Conveniently, that directory is already in your include and link paths.

@jboner
jboner / latency.txt
Last active May 25, 2024 01:28
Latency Numbers Every Programmer Should Know
Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD
@rygorous
rygorous / gist:2156668
Last active April 16, 2024 11:18
float->half variants
// float->half variants.
// by Fabian "ryg" Giesen.
//
// I hereby place this code in the public domain, as per the terms of the
// CC0 license:
//
// https://creativecommons.org/publicdomain/zero/1.0/
//
// float_to_half_full: This is basically the ISPC stdlib code, except
// I preserve the sign of NaNs (any good reason not to?)