Skip to content

Instantly share code, notes, and snippets.

@Reedbeta
Reedbeta / cool-game-programming-blogs.opml
Last active December 29, 2023 10:02
List of cool blogs on game programming, graphics, theoretical physics, and other random stuff
<?xml version="1.0" encoding="UTF-8"?>
<opml version="1.0">
<head>
<title>Graphics, Games, Programming, and Physics Blogs</title>
</head>
<body>
<outline text="Tech News" title="Tech News">
<outline type="rss" text="Ars Technica" title="Ars Technica" xmlUrl="http://feeds.arstechnica.com/arstechnica/index/" htmlUrl="https://arstechnica.com"/>
<outline type="rss" text="Polygon - Full" title="Polygon - Full" xmlUrl="http://www.polygon.com/rss/index.xml" htmlUrl="https://www.polygon.com/"/>
<outline type="rss" text="Road to VR" title="Road to VR" xmlUrl="http://www.roadtovr.com/feed" htmlUrl="https://www.roadtovr.com"/>
@cdwfs
cdwfs / cube-strip
Last active December 22, 2023 15:39
Rendering a cube as a 14-index triangle strip
; How to render a cube using a single 14-index triangle strip
; (Reposted from http://www.asmcommunity.net/forums/topic/?id=6284#post-45209)
; Drawbacks: no support for per-vertex normals, UVs, etc.
;
; 1-------3-------4-------2 Cube = 8 vertices
; | E __/|\__ A | H __/| =================
; | __/ | \__ | __/ | Single Strip: 4 3 7 8 5 3 1 4 2 7 6 5 2 1
; |/ D | B \|/ I | 12 triangles: A B C D E F G H I J K L
; 5-------8-------7-------6
; | C __/|
//
// Author: Jonathan Blow
// Version: 1
// Date: 31 August, 2018
//
// This code is released under the MIT license, which you can find at
//
// https://opensource.org/licenses/MIT
//
//

One thing that surprises newer programmers is that the older 8-bit microcomputers from the 70s and 80s were designed to run at the speed of random memory access to DRAM and ROM. The C64 was released in 1982 when I was born and its 6502 CPU ran at 1 MHz (give or take depending on NTSC vs PAL). It had a 2-stage pipelined design that was designed to overlap execution and instruction fetch for the current and next instruction. Cycle counting was simple to understand and master since it was based almost entirely on the number of memory accesses (1 cycle each), with a 1-cycle penalty for taken branches because of the pipelined instruction fetch for the next sequential instruction. So, the entire architecture was based on keeping the memory subsystem busy 100% of the time by issuing a read or write every cycle. One-byte instructions with no memory operands like INX still take the minimum 2 cycles per instruction and end up redundantly issuing the same memory request two cycles in a row.

why doesn't radfft support AVX on PC?

So there's two separate issues here: using instructions added in AVX and using 256-bit wide vectors. The former turns out to be much easier than the latter for our use case.

Problem number 1 was that you positively need to put AVX code in a separate file with different compiler settings (/arch:AVX for VC++, -mavx for GCC/Clang) that make all SSE code emitted also use VEX encoding, and at the time radfft was written there was no way in CDep to set compiler flags for just one file, just for the overall build.

[There's the GCC "target" annotations on individual funcs, which in principle fix this, but I ran into nasty problems with this for several compiler versions, and VC++ has no equivalent, so we're not currently using that and just sticking with different compilation units.]

The other issue is to do with CPU power management.

@rygorous
rygorous / mergesort_kit.cpp
Last active August 28, 2023 10:13
int16 mergesort construction kit
#include <emmintrin.h>
#include <tmmintrin.h> // for PSHUFB; this isn't strictly necessary (see comments in reverse_s16)
typedef int16_t S16;
typedef __m128i Vec;
static inline Vec load8_s16(const S16 *x) { return _mm_loadu_si128((const __m128i *) x); }
static inline void store8_s16(S16 *x, Vec v) { _mm_storeu_si128((__m128i *) x, v); }
static inline void sort_two(Vec &a, Vec &b) { Vec t = a; a = _mm_min_epi16(a, b); b = _mm_max_epi16(b, t); }
@twoscomplement
twoscomplement / TransientFunction.h
Last active August 19, 2023 08:32
TransientFunction: A light-weight alternative to std::function [C++11]
// TransientFuction: A light-weight alternative to std::function [C++11]
// Pass any callback - including capturing lambdas - cheaply and quickly as a
// function argument
//
// Based on:
// https://deplinenoise.wordpress.com/2014/02/23/using-c11-capturing-lambdas-w-vanilla-c-api-functions/
//
// - No instantiation of called function at each call site
// - Simple to use - use TransientFunction<> as the function argument
// - Low cost: cheap setup, one indirect function call to invoke

I was told by @mmozeiko that Address Sanitizer (ASAN) works on Windows now. I'd tried it a few years ago with no luck, so this was exciting news to hear.

It was a pretty smooth experience, but with a few gotchas I wanted to document.

First, download and run the LLVM installer for Windows: https://llvm.org/builds/

Then download and install the VS extension if you're a Visual Studio 2017 user like I am.

It's now very easy to use Clang to build your existing MSVC projects since there's a cl compatible frontend:

//
// TinyCRT, revamp and TinyWin support by Don Williamson, 2011
// Based on http://www.codeproject.com/KB/library/tlibc.aspx and LIBCTINY by Matt Pietrek
//
#pragma once
#ifdef USE_DEFAULT_CRT