Skip to content

Instantly share code, notes, and snippets.

Fabian 'ryg' Giesen rygorous

Block or report user

Report or block rygorous

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@rygorous
rygorous / gist:a549832e23b913ac70237d23c1600f8a
Created Aug 16, 2019
pseudo-ucode expansion for LOOP <dest>
View gist:a549832e23b913ac70237d23c1600f8a
lea rcx, [rcx-1] ; decrement rcx w/o flag update
mov temp0, rax ; save rax that we're about to trash
lahf ; save original flags
test rcx, rcx ; check whether updated rcx is zero
setz temp1 ; temp1 = 1 if rcx=0, 0 otherwise
sahf ; restore flags
mov rax, temp0 ; restore rax
jecxz temp1, dest ; jump if temp1 is zero, not rcx (doesn't exist in regular ISA but rcx is renamed anyway so the internal uop can do any source)
NOTE the actual ucode expansion probably doesn't have the MOVs since I would expect the internal LAHF/SAHF uops
@rygorous
rygorous / b.bat
Created Aug 9, 2019
Histogram code with all the tricks :) Needs NASM + VC++
View b.bat
@echo off
setlocal
cd %~dp0
call vcvars amd64
..\..\bin\win32\nasm -f win64 -g -o histo_asm.obj histo_asm.nas || exit /b 1
cl /Zi /O2 /nologo histotest.cpp histo_asm.obj || exit /b 1
@rygorous
rygorous / rcpss.cpp
Created Jul 8, 2019
Tabled rcpss (should match HW version for stated value range on Skylake, anyway)
View rcpss.cpp
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <emmintrin.h>
static uint32_t recip(uint32_t bits)
{
uint32_t u;
float f;
memcpy(&f, &bits, sizeof(bits));
@rygorous
rygorous / convergents.py
Created Jun 6, 2019
Approximate rational fractions using convergents of the continued fraction expansion
View convergents.py
# Returns (exact, p, q) where p/q is an approximation to numer/denom and exact is true if
# the approximation is exact.
def convergent(numer, denom, limit):
"""Find an approximation to numer/denom with neither numerator nor denominator above limit"""
prev_p, cur_p = 0, 1
prev_q, cur_q = 1, 0
rest_p = numer
rest_q = denom
while rest_q != 0:
@rygorous
rygorous / conformance_basic_table.xml
Created May 29, 2019
gstpeaq self-compiled conformance test results
View conformance_basic_table.xml
<table frame="none" id="conformance_basic_table">
<title>Conformance test results for the basic version.</title>
<tgroup cols='4' align='right' colsep='1' rowsep='1'>
<colspec align='left' />
<thead>
<row>
<entry>Item</entry>
<entry>Reference DI</entry>
<entry>Actual DI</entry>
<entry>Difference</entry>
View gist:a9876b67bef4decb781ab4976a0f6197
496238dd984784c52f3a5dd6762fb711 acodsna.wav
378fbbd218516a8a17ba7ef1ab0170a9 arefsna.wav
498b16c46d9a9f2812a6b4393c26d3ba bcodtri.wav
c02748a8c6665c4e24832a1fa9144765 breftri.wav
32018f15be1272bf70e4bf9c6dbbe585 ccodsax.wav
81c650d658f9354f75cf32bceea10788 crefsax.wav
dc3acf34ae6a8b5087a9916bd687ce7a ecodsmg.wav
96538db2ef9c73c49eab483786109336 erefsmg.wav
6ad9c08fa978cbe2688880451b84d291 fcodsb1.wav
b300060ccda896138d2decbae5310f9f fcodtr1.wav
@rygorous
rygorous / fifth_root.c
Created May 23, 2019
Fifth root for doubles.
View fifth_root.c
static double
fifth_root_pos_finite(double x)
{
static const double fifth_roots_pow2[9] = { // 2^((i-4)/5)
0.57434917749851754908974044155911542475223541259766,
0.65975395538644709958475687017198652029037475585937,
0.75785828325519899451023775327485054731369018554687,
0.87055056329612412469032278750091791152954101562500,
1.00000000000000000000000000000000000000000000000000,
1.14869835499703509817948088311823084950447082519530,
View gist:422f8d8382eb61a55c087714c92f0d4e
// Baseline version without prefetch
static const LRMEntry * lrm_search_one_basic(const LRM * lrm, const U8 * ptr)
{
LRM_hash_t hash = lrm_hash8(ptr);
// Jump-in: narrow down the search interval using the jump table
LRM_hash_t ji = hash >> lrm->jumpInShift;
S32 jump1 = lrm->jumpIn[ji];
S32 jump2 = lrm->jumpIn[ji + 1];
View gist:9387cbe6f33708adf91ed95e413bddc0
; input in Q0..Q7
; swap antidiagonal elements within 2x2 blocks
vtrn.16 q0, q1
vtrn.16 q2, q3
vtrn.16 q4, q5
vtrn.16 q6, q7
; swap antidiagonal 2x2 blocks within 4x4 blocks
vtrn.32 q0, q2
@rygorous
rygorous / gist:f5e11f6589088f42dddd2711582afb1e
Last active Mar 21, 2019
Single-precision FMA using double-precision mul/add ops
View gist:f5e11f6589088f42dddd2711582afb1e
We agreed beforehand that we don't care about tininess-before-or-after-rounding shenanigans. :)
With that out of the way, the idea is to do something like this:
fma(af, bf, cf):
# All arithmetic done with RN
ad = float_to_double(af)
bd = float_to_double(bf)
cd = float_to_double(cf)
You can’t perform that action at this time.