pervognsen/fft.txt

## fft.txt
Suppose we want to compute the frequency spectrum of an n-point sampled signal. That is,
we want to compute the signal's discrete Fourier transform. Taking a cue from multi-rate
signal processing, let's try a divide-and-conquer approach where we downsample the signal
by 2:1 and recursively compute the spectrum of that. There are two possible downsamplings,
corresponding to the even and odd phases.

By the Nyquist sampling theorem, assuming the signal has no upper half-band frequencies,
i.e. its top n/2 frequency bins are zero, the spectrum can be perfectly reconstructed
from the spectrum of _either_ the even subsignal or the odd subsignal, without any aliasing.

The only difference between the even and odd spectra is a phase shift by a sample's worth of time.
Constant time delay corresponds to a linear phase shift, and in this case the phase delay at
the kth frequency is k times 2pi/n. This is equivalent to multiplication by exp(i k 2pi/n).
If we negate the sign in the exponent, we are "undelaying" rather than delaying, which produces
the infamous "twiddle factors" w[k] = exp(-i k 2pi/n). We don't need the definition as a complex
exponential in our derivation, only that w[k] is a phase shift operator.

Some quick notation:

x[k] is the original signal in the time domain (0 <= k < n).
e[k] = x[2k] and o[k] = x[2k + 1] are the even and odd subsignals in the time domain (0 <= k < n/2).
Upper-cased signals like X[k], E[k], O[k] are the Fourier transforms of the lower-cased signals.

So, if there are no upper half-band frequencies, then

X[k] = E[k] = w[k] O[k] for 0 <= k < n/2
X[k] = 0                for n/2 <= k < n.

But what happens if the downsampling of x to e and o has aliasing, i.e. X[k] is nonzero for k >= n/2?
Then the upper half-band components wrap around into the lower half-band. That's the definition of aliasing.
But critically, they alias in the same way in both E and O except for the one-sample delay we saw before:

E[k] = X[k] + X[k + n/2]
O[k] = 1/w[k] X[k] + 1/w[k + n/2] X[k + n/2])

The multiplication here is by 1/w[k] rather than w[k] since it's the delay rather than the "undelay".

This is a linear system with two independent equations and two unknowns X[k] and X[k + n/2], so we're done!

To put it into the conventional form, first note that 1/w[k + n/2] = 1/w[k] 1/w[n/2] = -1/w[k] because phase shifting
is additive and phase shifting by a half-period is equivalent to negation ("inverting the phase"). We therefore have
O[k] = 1/w[k] (X[k] - X[k + n/2]), and we can take sums and differences of E[k] and O[k] to isolate X[k] and X[k + n/2]:

X[k] = (E[k] + w[k] O[k])/2
X[k + n/2] = (E[k] - w[k] O[k])/2

And that's the FFT in its more conventional form! Except for the division of 2, which is just a matter of normalization.
There are log2(n) factors of 1/2 when you use this recursively, so the overall division is by n. If you want the FFT
to be energy preserving (Parseval's theorem), there needs to be a division by n, but some implementations don't do that.

To confirm your understanding of these ideas, work out what happens with a signal like [0, 1, 0, 1, ...].
	Suppose we want to compute the frequency spectrum of an n-point sampled signal. That is,
	we want to compute the signal's discrete Fourier transform. Taking a cue from multi-rate
	signal processing, let's try a divide-and-conquer approach where we downsample the signal
	by 2:1 and recursively compute the spectrum of that. There are two possible downsamplings,
	corresponding to the even and odd phases.

	By the Nyquist sampling theorem, assuming the signal has no upper half-band frequencies,
	i.e. its top n/2 frequency bins are zero, the spectrum can be perfectly reconstructed
	from the spectrum of _either_ the even subsignal or the odd subsignal, without any aliasing.

	The only difference between the even and odd spectra is a phase shift by a sample's worth of time.
	Constant time delay corresponds to a linear phase shift, and in this case the phase delay at
	the kth frequency is k times 2pi/n. This is equivalent to multiplication by exp(i k 2pi/n).
	If we negate the sign in the exponent, we are "undelaying" rather than delaying, which produces
	the infamous "twiddle factors" w[k] = exp(-i k 2pi/n). We don't need the definition as a complex
	exponential in our derivation, only that w[k] is a phase shift operator.

	Some quick notation:

	x[k] is the original signal in the time domain (0 <= k < n).
	e[k] = x[2k] and o[k] = x[2k + 1] are the even and odd subsignals in the time domain (0 <= k < n/2).
	Upper-cased signals like X[k], E[k], O[k] are the Fourier transforms of the lower-cased signals.

	So, if there are no upper half-band frequencies, then

	X[k] = E[k] = w[k] O[k] for 0 <= k < n/2
	X[k] = 0 for n/2 <= k < n.

	But what happens if the downsampling of x to e and o has aliasing, i.e. X[k] is nonzero for k >= n/2?
	Then the upper half-band components wrap around into the lower half-band. That's the definition of aliasing.
	But critically, they alias in the same way in both E and O except for the one-sample delay we saw before:

	E[k] = X[k] + X[k + n/2]
	O[k] = 1/w[k] X[k] + 1/w[k + n/2] X[k + n/2])

	The multiplication here is by 1/w[k] rather than w[k] since it's the delay rather than the "undelay".

	This is a linear system with two independent equations and two unknowns X[k] and X[k + n/2], so we're done!

	To put it into the conventional form, first note that 1/w[k + n/2] = 1/w[k] 1/w[n/2] = -1/w[k] because phase shifting
	is additive and phase shifting by a half-period is equivalent to negation ("inverting the phase"). We therefore have
	O[k] = 1/w[k] (X[k] - X[k + n/2]), and we can take sums and differences of E[k] and O[k] to isolate X[k] and X[k + n/2]:

	X[k] = (E[k] + w[k] O[k])/2
	X[k + n/2] = (E[k] - w[k] O[k])/2

	And that's the FFT in its more conventional form! Except for the division of 2, which is just a matter of normalization.
	There are log2(n) factors of 1/2 when you use this recursively, so the overall division is by n. If you want the FFT
	to be energy preserving (Parseval's theorem), there needs to be a division by n, but some implementations don't do that.

	To confirm your understanding of these ideas, work out what happens with a signal like [0, 1, 0, 1, ...].