Skip to content

Instantly share code, notes, and snippets.

View lamarqua's full-sized avatar

Adrien lamarqua

  • Montréal, Québec
View GitHub Profile

Let's say you have to perform a 4-way case analysis and are given a choice between cascaded 2-way branches or a single 4-way branch.

What's the difference in branch misprediction penalties, assuming the 4 cases are random?

Denote the branch penalty by x (which is 15-20 cycles on Sky Lake according to Agner Fog).

With the 4-way branch, the worst-case branch penalty is x. The expected branch penalty is 3/4 x: there's a 3/4 chance of x penalty.

// x64 encoding
enum Reg {
RAX, RCX, RDX, RBX, RSP, RBP, RSI, RDI,
R8, R9, R10, R11, R12, R13, R14, R15,
};
enum XmmReg {
XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7,
XMM8, XMM9, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15,
@sereprz
sereprz / delete_from_calendar.py
Created April 7, 2020 20:31
A script to delete a bunch of events that were overlaid on my google calendar by mistake. See https://developers.google.com/calendar/quickstart/python for the initial setup
import datetime
import pickle
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
SCOPES = 'https://www.googleapis.com/auth/calendar.events'
def main():
#if _WIN32
// measure the time in seconds to run f(). doubles will retain nanosecond precision.
template<typename F>
double timeit(F f) {
LARGE_INTEGER freq;
QueryPerformanceFrequency(&freq);
LARGE_INTEGER start;
QueryPerformanceCounter(&start);
f();
LARGE_INTEGER end;
def blocks(A, block_size=(1, 1)):
i, j = block_size
while len(A):
yield A[:i, :j], A[:i, j:], A[i:, :j], A[i:, j:]
A = A[i:, j:]
# 2400 ms for 1000x1000 matrix (~0.4 GFLOPS). Equivalent C code is only twice as fast (~0.8 GFLOPS).
# The reason the C code isn't much faster is that it's an O(n^3) algorithm and most of the time is spent in
# the O(n^2) kernel routine for the outer product in A11[:] -= A10 @ A10.T. Even if we posit that Python
# is 1000x slower than C for the outer loop, that's still 1000n + n^3 vs n^3, which is negligible for n = 1000.
@rygorous
rygorous / rast.c
Created March 2, 2020 01:56
Simple watertight triangle rasterizer
// ---- triangle rasterizer
#define SUBPIXEL_SHIFT 8
#define SUBPIXEL_SCALE (1 << SUBPIXEL_SHIFT)
static RADINLINE S64 det2x2(S32 a, S32 b, S32 c, S32 d)
{
S64 r = (S64) a*d - (S64) b*c;
return r >> SUBPIXEL_SHIFT;
}
@mmalex
mmalex / allpass.md
Last active October 8, 2025 20:44
optimising allpass reverbs by using a single shared buffer

TLDR: if you've got a bunch of delays in series, for example all-pass filters in a reverb, put them all in a single big buffer and let them crawl over each other for a perf win!

recently I was fiddling around with my hobby reverb code, in preparation for porting it onto a smaller/slower CPU. I'd implemented a loop-of-allpass filters type reverb, just like everybody else, and indeed, I basically had the classic 'OOP'ish abstraction of an 'allpass' struct that was, say, 313 samples long, and... did an allpass. on its own little float buffer[313]. (well, short integer, not float, but thats not relevant) I'll write out the code in a moment.

but then I was browsing the internet one night, as you do, and stumbled on this old post by Sean Costello of Valhalla DSP fame - noting the sad passing of Alesis founder and general all-round DSP legend, Keith Barr. https://valhalladsp.com/2010/08/25/rip-keith-barr/

It's worth a read just for his wonderful anecdote about the birth of the midiverb - which spawned the thou

@pixelsnafu
pixelsnafu / CloudsResources.md
Last active October 9, 2025 16:27
Useful Resources for Rendering Volumetric Clouds

Volumetric Clouds Resources List

  1. A. Schneider, "Real-Time Volumetric Cloudscapes," in GPU Pro 7: Advanced Rendering Techniques, 2016, pp. 97-127. (Follow up presentations here, and here.)

  2. S. Hillaire, "Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite" in Physically Based Shading in Theory and Practice course, SIGGRAPH 2016. [video] [course notes] [scatter integral shadertoy]

  3. [R. Högfeldt, "Convincing Cloud Rendering – An Implementation of Real-Time Dynamic Volumetric Clouds in Frostbite"](https://odr.chalmers.se/hand

@Jolg42
Jolg42 / readme.md
Created January 25, 2020 14:13
Code Signing on macOS and Windows + Apple Notarization

macOS

Steps

  • What we want is to get a Developer Id https://developer.apple.com/developer-id/ to be able to sign the binaries for distribution.
  • The company needs to get an Apple Developer Account Membership for macOS for $99/y https://developer.apple.com/programs/enroll/
  • Apple needs a A D-U-N-S® Number to register the account, the person doing the registration will need to get in touch with somebody that knows the legal part.
  • The registration could take a couple days
  • When done, a certificate can be created for signing, you'll need to sync it with Xcode.
  • Now the binary can be signed, and the signature can be verified.
@wtaysom
wtaysom / bel-eve-vr.md
Last active January 24, 2024 03:25
A Review of Paul Graham's Bel, Chris Granger's Eve, and a Silly VR Rant

Hello Friends,

This elf begging to climb onto the web for Christmas began as a personal email, a review of Paul Graham's little Lisp Bel. He sprouted arms, legs, and in gingerstyle ran away. Arms for symbols, legs for conses: these primitives are the mark a Lisp — even more so than the parenthesis. What do we get when we remove these foundation stones: naming and pairing?

No pairs. No cons. No structure. Unordered. Chaos. Eve, a beautifully incomplete aspect oriented triple store. No need for legs when you can effortlessly transport to your destination. Lazy. Pure. Here and now, a retrospective.

No symbols. No names. No variables. Combinators. Forth. No need for arms when you can effortlessly push and pop your stack. No words. A world without words. Virtual worlds. Virtual reality. Space. Time. Motion. Action. Kinetic Programming, a proposal.

I apologize in advance. Checking my pocketwatch, I see I haven't t