Skip to content

Instantly share code, notes, and snippets.

why doesn't radfft support AVX on PC?

So there's two separate issues here: using instructions added in AVX and using 256-bit wide vectors. The former turns out to be much easier than the latter for our use case.

Problem number 1 was that you positively need to put AVX code in a separate file with different compiler settings (/arch:AVX for VC++, -mavx for GCC/Clang) that make all SSE code emitted also use VEX encoding, and at the time radfft was written there was no way in CDep to set compiler flags for just one file, just for the overall build.

[There's the GCC "target" annotations on individual funcs, which in principle fix this, but I ran into nasty problems with this for several compiler versions, and VC++ has no equivalent, so we're not currently using that and just sticking with different compilation units.]

The other issue is to do with CPU power management.

@djg
djg / reading-list.md
Last active February 19, 2024 18:09
Fabian's Recommened Reading List
@paniq
paniq / minmaxabssign.txt
Last active June 24, 2024 17:57
useful min/max/abs/sign identities
max(-x,-y) = -min(x,y)
min(-x,-y) = -max(x,y)
abs(x) = abs(-x)
abs(x) = max(x,-x) = -min(x,-x)
abs(x*a) = if (a >= 0) abs(x)*a
(a < 0) -abs(x)*a
// basically any commutative operation
min(x,y) + max(x,y) = x + y
@KdotJPG
KdotJPG / OpenSimplex2S.java
Last active June 19, 2024 15:44
Visually isotropic coherent noise algorithm based on alternate constructions of the A* lattice.
/**
* K.jpg's OpenSimplex 2, smooth variant ("SuperSimplex")
*
* More language ports, as well as legacy 2014 OpenSimplex, can be found here:
* https://github.com/KdotJPG/OpenSimplex2
*/
public class OpenSimplex2S {
private static final long PRIME_X = 0x5205402B9270C86FL;
@rygorous
rygorous / gist:98a8a45626309e98bc77
Last active October 26, 2017 00:05
Scrollable, zoomable 64-bit integer virtual canvas
// Declare coord_x/coord_y as non-normalized GL_SHORT x4
// coord_x: write 64-bit uint (x ^ 0x8000800080008000) in little endian order
// of 16-bit words, same for y (on a LE platform this happens automatically)
layout(location=0) in vec4 coord_x, coord_y;
// Center of screen: 64-bit uint for x/y, split up into 4 floats:
// center_x_biased.x = ((center_x_u64 >> 0) & 0xffff) - 32768.0f;
// center_x_biased.y = ((center_x_u64 >> 16) & 0xffff) - 32768.0f;
// and so forth.
uniform vec4 center_x_biased, center_y_biased;
@P7h
P7h / jdk_download.sh
Last active May 21, 2024 02:10
Script to download JDK / JRE / Java binaries from Oracle website from terminal / shell / command line / command prompt
##### ##### ##### ##### ##### ##### ##### ##### ##### ##### ##### ##### ##### ##### ##### ##### #####
### Shell script to download Oracle JDK / JRE / Java binaries from Oracle website using terminal / command / shell prompt using wget.
### You can download all the binaries one-shot by just giving the BASE_URL.
### Script might be useful if you need Oracle JDK on Amazon EC2 env.
### Script is updated for every JDK release.
### Features:-
# 1. Resumes a broken / interrupted [previous] download, if any.
# 2. Renames the file to a proper name with including platform info.