Skip to content

Instantly share code, notes, and snippets.

@cyanreg
Last active January 27, 2021 02:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cyanreg/bbf25c8a8dfb910ed3b9ae7663983ca6 to your computer and use it in GitHub Desktop.
Save cyanreg/bbf25c8a8dfb910ed3b9ae7663983ca6 to your computer and use it in GitHub Desktop.
8-point FFT (permuted input)
static void fft8(FFTComplex *z)
{
/* _1_ sub */
FFTSample r1 = z[0].re - z[4].re;
FFTSample r2 = z[0].im - z[4].im; // IN LOW m1
FFTSample r3 = z[1].re - z[5].re;
FFTSample r4 = z[1].im - z[5].im;
FFTSample r5 = z[2].re - z[6].re;
FFTSample r6 = z[2].im - z[6].im; // IN HIGH m1
FFTSample r7 = z[3].re - z[7].re;
FFTSample r8 = z[3].im - z[7].im;
/* _1_ add */
FFTSample q1 = z[0].re + z[4].re;
FFTSample q2 = z[0].im + z[4].im; // IN LOW m2
FFTSample q3 = z[1].re + z[5].re;
FFTSample q4 = z[1].im + z[5].im;
FFTSample q5 = z[2].re + z[6].re;
FFTSample q6 = z[2].im + z[6].im; // IN HIGH m2
FFTSample q7 = z[3].re + z[7].re;
FFTSample q8 = z[3].im + z[7].im;
/* 2 shufps, 1 xor MEM, 1 add = _4_ */
FFTSample s1 = q1 + q3;
FFTSample s2 = q2 + q4;
FFTSample s3 = q1 - q3;
FFTSample s4 = q2 - q4;
FFTSample s5 = q5 + q7;
FFTSample s6 = q6 + q8;
FFTSample s7 = q5 - q7;
FFTSample s8 = q6 - q8;
/* (1 extract, 1 shuffle = _2_) */
/* additionally 1 insert, 1 xor, 1 add = _3_ to actually perform it */
q1 = s1 + s5;
q2 = s2 + s6;
q3 = s3 + s8;
q4 = s4 - s7;
q5 = s1 - s5;
q6 = s2 - s6;
q7 = s3 - s8;
q8 = s4 + s7;
z[0].re = q1;
z[0].im = q2;
z[2].re = q3;
z[2].im = q4;
z[4].re = q5;
z[4].im = q6;
z[6].re = q7;
z[6].im = q8;
/* 11 instructions */
/* 2 shufps + 1 xor + 1 addsub = 4 instr */
FFTSample z1 = r1 + r4;
FFTSample z2 = r3 - r2;
FFTSample z3 = r3 + r2;
FFTSample z4 = r1 - r4;
FFTSample z5 = r5 + r6;
FFTSample z6 = r7 - r8;
FFTSample z7 = r7 + r8;
FFTSample z8 = r5 - r6;
/* 1 mov, 2 shuffles, 1 addsub = 3 */
FFTSample t5 = z6 + z5;
FFTSample t6 = z7 - z8;
FFTSample t7 = z7 + z8;
FFTSample t8 = z6 - z5;
/* 2 extract, 1 shuffle, 1 fma, 1 xors = 5 */
FFTSample u1 = z1 + t5*( M_SQRT1_2);
FFTSample u2 = -z2 + t6*( M_SQRT1_2);
FFTSample u3 = z4 + t7*(-M_SQRT1_2);
FFTSample u4 = z3 + t8*( M_SQRT1_2);
FFTSample u5 = z1 + t5*(-M_SQRT1_2);
FFTSample u6 = -z2 + t6*(-M_SQRT1_2);
FFTSample u7 = z4 + t7*( M_SQRT1_2);
FFTSample u8 = z3 + t8*(-M_SQRT1_2);
z[1].re = u1;
z[1].im = u2;
z[3].re = u3;
z[3].im = u4;
z[5].re = u5;
z[5].im = u6;
z[7].re = u7;
z[7].im = u8;
/* 12 instructions = 23 instructions total */
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment