GCN hacks for classify op...
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| // GCN hack variant 1 (assumes shader in flush-denorms mode; vA, vB, vC are floats, vOut is int) | |
| v_med3_i32 vOut, vA, #0, #2 | |
| v_med3_i32 vTmp, vB, #0, #4 | |
| v_sad_u32 vOut, vOut, #0, vTmp | |
| v_med3_i32 vTmp, vC, #0, #8 | |
| v_sad_u32 vOut, vOut, #0, vTmp | |
| // GCN hack variant 1b (assumes shader in flush-denorms mode; vA, vB, vNC are floats (NC = negative C), vOut is int) | |
| v_med3_i32 vOut, vA, #0, #2 | |
| v_med3_i32 vTmp, vB, #0, #4 | |
| v_med3_i32 vTmp2, vNC, #0, #-8 | |
| v_sad_u32 vOut, vOut, vTmp2, vTmp | |
| // GCN hack variant 2 | |
| v_cmpgt_f32 vcc, vC, #0 | |
| v_cndmask_b32 vOut, #1, #0, vcc | |
| v_cmpgt_f32 vcc, vB, #0 | |
| v_addc_u32 vOut, vOut, vOut, vcc | |
| v_cmpgt_f32 vcc, vA, #0 | |
| v_addc_u32 vOut, vOut, vOut, vcc | |
| v_lshlrev_b32 vOut, #1, vOut | |
| // GCN hack variant 3 (should be strictly better than 2) | |
| v_cmpgt_f32 vcc, vA, #0 | |
| v_cndmask_b32 vOut, #2, #0, vcc | |
| v_cmpgt_f32 vcc, vB, #0 | |
| v_cndmask_b32 vTmp1, #4, #0, vcc | |
| v_cmpgt_f32 vcc, vC, #0 | |
| v_cndmask_b32 vTmp2, #-8, #0, vcc | |
| v_sad_u32 vOut, vTmp1, vTmp2, vOut |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I ended up reformulating the LUT (easy) to take values given by (a<0)?1:0 + (b<0)?2:0 + (c<0)?4:0 and using the following code, where A, B, and C are floats being reinterpreted as uint, int, and int, respectively.