Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
GCN hacks for classify op...
// GCN hack variant 1 (assumes shader in flush-denorms mode; vA, vB, vC are floats, vOut is int)
v_med3_i32 vOut, vA, #0, #2
v_med3_i32 vTmp, vB, #0, #4
v_sad_u32 vOut, vOut, #0, vTmp
v_med3_i32 vTmp, vC, #0, #8
v_sad_u32 vOut, vOut, #0, vTmp
// GCN hack variant 1b (assumes shader in flush-denorms mode; vA, vB, vNC are floats (NC = negative C), vOut is int)
v_med3_i32 vOut, vA, #0, #2
v_med3_i32 vTmp, vB, #0, #4
v_med3_i32 vTmp2, vNC, #0, #-8
v_sad_u32 vOut, vOut, vTmp2, vTmp
// GCN hack variant 2
v_cmpgt_f32 vcc, vC, #0
v_cndmask_b32 vOut, #1, #0, vcc
v_cmpgt_f32 vcc, vB, #0
v_addc_u32 vOut, vOut, vOut, vcc
v_cmpgt_f32 vcc, vA, #0
v_addc_u32 vOut, vOut, vOut, vcc
v_lshlrev_b32 vOut, #1, vOut
// GCN hack variant 3 (should be strictly better than 2)
v_cmpgt_f32 vcc, vA, #0
v_cndmask_b32 vOut, #2, #0, vcc
v_cmpgt_f32 vcc, vB, #0
v_cndmask_b32 vTmp1, #4, #0, vcc
v_cmpgt_f32 vcc, vC, #0
v_cndmask_b32 vTmp2, #-8, #0, vcc
v_sad_u32 vOut, vTmp1, vTmp2, vOut
@EricLengyel
Copy link

I ended up reformulating the LUT (easy) to take values given by (a<0)?1:0 + (b<0)?2:0 + (c<0)?4:0 and using the following code, where A, B, and C are floats being reinterpreted as uint, int, and int, respectively.

v_lshrrev_b32   vOut, 31, vA
v_ashrrev_i32   vTmp, 31, vB
v_bfi_b32       vOut, 2, vTmp, vOut
v_ashrrev_i32   vTmp, 31, vC
v_bfi_b32       vOut, 4, vTmp, vOut

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment