soulthreads/ShadertoyFFT.md

## ShadertoyFFT.md

      
    Raw
  

              ShadertoyFFT.md
            
          
    Shadertoy exposes audio through a 512x2 texture,
where the first row is the spectrum, and second row is wave data.
Pixel format is GL_RED GL_UNSIGNED_BYTE,
meaning that each pixel contains only one 8-bit channel.
According to this shader from iq
the audio in the browser is supposed to have a samplerate of 48kHz,
but as it turns out, that's not the case: most likely it will be in 44.1kHz.
NOTE: If you wrote the shader assuming the samplerate is 44.1kHz and
then played that shader on a system with 48kHz output, what you will see is that
the spectrum is going to be "squished" by 1.09 frequency-wise, and also
spectrum values are going to be 1.09x smaller on average.
First, we load the audio as floating point data.
The audio is then downmixed from stereo to mono as follows:
v = 0.5 * (left_v + right_v)

Wave data

The floating-point audio data is scaled into the 0..255 range as follows:
a = clamp(128 * (1 + v), 0, 255)

Spectrum

The spectrum is calculated according to the
Web Audio API specification:

Take 2048 samples of audio data as an array of floating point data
Multiply it with Blackman window
Convert samples into complex numbers (imaginary parts are all zeros)
Apply the Fourier transform with fftSize = 2048, as a result we get 1024 FFT bins
Convert complex result into real values using cabs() function
Divide each value by fftSize
Apply smoothing by using previously calculated spectrum values:
v = k * v_prev + (1 - k) * v

Where k is smoothing constant equal to 0.8.
If calculating spectrum the first time, the previous value is assumed to be 0.
Convert resulting values to dB: dB = 20 * log10(v)
Convert floating point dB spectrum into 8-bit values:

Clamp the value between dB_min = -100 and dB_max = -30
Scale the dB_min..dB_max range into 0..255 range:

t = clamp(255 / (dB_max - dB_min) * (dB - dB_min), 0, 255)


Write 8-bit values into texture

Important!

We can see that, even though we perform FFT on 2048 samples and get 1024 bins
(where for 44.1kHz audio bin 0 corresponds to frequencies from 0 to 21.5
and bin 1023 corresponds to frequencies from 22028 to 22050)
the texture is only 512-pixels wide, meaning that we can only draw
the lower half of the spectrum (from 0 to 11025Hz)!