animetosho/gf2p8affineqb-articles.md

## gf2p8affineqb-articles.md

      
    Raw
  

              gf2p8affineqb-articles.md
            
          
    Unexpected Uses for the Galois Field Affine Transformation Instruction

Intel added the Galois Field instruction set (GFNI) extensions to their Sunny Cove and Tremont cores. What’s particularly interesting is that GFNI is the only new SIMD extension that came with SSE and VEX/AVX encodings (in addition to EVEX/AVX512), to allow it to be supported on all future Intel cores, including those which don’t support AVX512 (such as the Atom line, as well as Celeron/Pentium branded “big” cores).
I suspect GFNI was aimed at accelerating SM4 encryption, however, one of the instructions can be used for many other purposes. The extension includes three instructions, but of particular interest here is the Affine Transformation (GF2P8AFFINEQB), aka bit-matrix multiply, instruction.
There have been various articles which discuss out-of-band use-cases where the instruction can be used, however, they’re somewhat spread around, so rather than re-explain it all, this will just be a listing of these.
Articles


Why Ice Lake is Important (a bit-basher’s perspective)

Bit permutation within bytes (8-bit shift/rotate), 8x8 bit matrix transpose


Use AVX512 Galois field affine transformation for bit shuffling

Provides more of an explanation of the first article, plus examples
Additional samples: bit replication (or bit test), bit interleave, bit shuffle macro


InstLatX64’s Twitter series:

Bit reversal, rotate, shift
8x8 bit transpose, left-shift + add
Prefix-xor, 8x8 binary matrix multiply, Rijndael xtime
Replicate MSB/LSB, mirror on diagonal
512-bit prefix xor
(more) 512-bit prefix xor
Broadcast imm8 byte
Parallel byte-histogramming
pospopcnt (plus link to implementations of the above)


Bit Matrix Multiplication in Commodity Processors

Bit-permutation, bit gather/scatter


A list of “out-of-band” uses for the GF2P8AFFINEQB instruction I haven’t seen documented elsewhere

Count leading/trailing zero bits, arbitrary modular GF(2^w) multiplication, fixed 2-bit packed arithmetic, bit-wise variable shift


Wunk’s Yuzu emulator acceleration explanation

Bit reversal
int8 shifting


Intel's GFNI Technology Guide

Calculating a Toeplitz Hash Using GFNI