Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Some details on Shadertoy FFT

Shadertoy exposes audio through a 512x2 texture, where the first row is the spectrum, and second row is wave data.

Pixel format is GL_RED GL_UNSIGNED_BYTE, meaning that each pixel contains only one 8-bit channel.

According to this shader from iq the audio in the browser is supposed to have a samplerate of 48kHz, but as it turns out, that's not the case: most likely it will be in 44.1kHz.

NOTE: If you wrote the shader assuming the samplerate is 44.1kHz and then played that shader on a system with 48kHz output, what you will see is that the spectrum is going to be "squished" by 1.09 frequency-wise, and also spectrum values are going to be 1.09x smaller on average.

First, we load the audio as floating point data. The audio is then downmixed from stereo to mono as follows:

v = 0.5 * (left_v + right_v)

Wave data

The floating-point audio data is scaled into the 0..255 range as follows:

a = clamp(128 * (1 + v), 0, 255)

Spectrum

The spectrum is calculated according to the Web Audio API specification:

  1. Take 2048 samples of audio data as an array of floating point data
  2. Multiply it with Blackman window
  3. Convert samples into complex numbers (imaginary parts are all zeros)
  4. Apply the Fourier transform with fftSize = 2048, as a result we get 1024 FFT bins
  5. Convert complex result into real values using cabs() function
  6. Divide each value by fftSize
  7. Apply smoothing by using previously calculated spectrum values:
    v = k * v_prev + (1 - k) * v
    
    Where k is smoothing constant equal to 0.8. If calculating spectrum the first time, the previous value is assumed to be 0.
  8. Convert resulting values to dB: dB = 20 * log10(v)
  9. Convert floating point dB spectrum into 8-bit values:
    1. Clamp the value between dB_min = -100 and dB_max = -30
    2. Scale the dB_min..dB_max range into 0..255 range:
    t = clamp(255 / (dB_max - dB_min) * (dB - dB_min), 0, 255)
    
  10. Write 8-bit values into texture

Important!

We can see that, even though we perform FFT on 2048 samples and get 1024 bins (where for 44.1kHz audio bin 0 corresponds to frequencies from 0 to 21.5 and bin 1023 corresponds to frequencies from 22028 to 22050) the texture is only 512-pixels wide, meaning that we can only draw the lower half of the spectrum (from 0 to 11025Hz)!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment