Skip to content

Instantly share code, notes, and snippets.

@soulthreads
Last active November 23, 2023 09:57
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save soulthreads/2efe50da4be1fb5f7ab60ff14ca434b8 to your computer and use it in GitHub Desktop.
Save soulthreads/2efe50da4be1fb5f7ab60ff14ca434b8 to your computer and use it in GitHub Desktop.
Some details on Shadertoy FFT

Shadertoy exposes audio through a 512x2 texture, where the first row is the spectrum, and second row is wave data.

Pixel format is GL_RED GL_UNSIGNED_BYTE, meaning that each pixel contains only one 8-bit channel.

According to this shader from iq the audio in the browser is supposed to have a samplerate of 48kHz, but as it turns out, that's not the case: most likely it will be in 44.1kHz.

NOTE: If you wrote the shader assuming the samplerate is 44.1kHz and then played that shader on a system with 48kHz output, what you will see is that the spectrum is going to be "squished" by 1.09 frequency-wise, and also spectrum values are going to be 1.09x smaller on average.

First, we load the audio as floating point data. The audio is then downmixed from stereo to mono as follows:

v = 0.5 * (left_v + right_v)

Wave data

The floating-point audio data is scaled into the 0..255 range as follows:

a = clamp(128 * (1 + v), 0, 255)

Spectrum

The spectrum is calculated according to the Web Audio API specification:

  1. Take 2048 samples of audio data as an array of floating point data
  2. Multiply it with Blackman window
  3. Convert samples into complex numbers (imaginary parts are all zeros)
  4. Apply the Fourier transform with fftSize = 2048, as a result we get 1024 FFT bins
  5. Convert complex result into real values using cabs() function
  6. Divide each value by fftSize
  7. Apply smoothing by using previously calculated spectrum values:
    v = k * v_prev + (1 - k) * v
    
    Where k is smoothing constant equal to 0.8. If calculating spectrum the first time, the previous value is assumed to be 0.
  8. Convert resulting values to dB: dB = 20 * log10(v)
  9. Convert floating point dB spectrum into 8-bit values:
    1. Clamp the value between dB_min = -100 and dB_max = -30
    2. Scale the dB_min..dB_max range into 0..255 range:
    t = clamp(255 / (dB_max - dB_min) * (dB - dB_min), 0, 255)
    
  10. Write 8-bit values into texture

Important!

We can see that, even though we perform FFT on 2048 samples and get 1024 bins (where for 44.1kHz audio bin 0 corresponds to frequencies from 0 to 21.5 and bin 1023 corresponds to frequencies from 22028 to 22050) the texture is only 512-pixels wide, meaning that we can only draw the lower half of the spectrum (from 0 to 11025Hz)!

@GerrieWell
Copy link

Thank for your sharing! It help a lot to me!
But how do you know the argument value, like k is smoothing constant equal to 0.8

@soulthreads
Copy link
Author

@GerrieWell I'm glad you have found it useful!
The values are taken from the Web Audio API specification: k is the smoothingTimeConstant there, and its default value is specified as 0.8 there.

@seifane
Copy link

seifane commented Oct 20, 2023

Great info. Thanks for sharing !

However when implementing this on my end I found that the dB max/min were set way too low. I was not getting anything near that range while playing audio. On quiet parts max would reach 0db and loud parts I would see somewhere in the 20db high.

Could it be that your audio while testing this was very quiet ?

Never mind I did not apply the divide by fftSize step. I get results that are a lot closer now but still not quite 100% the same as the same shader would render on the website itself.

@soulthreads
Copy link
Author

@seifane Thanks for checking it out!

Never mind I did not apply the divide by fftSize step. I get results that are a lot closer now but still not quite 100% the same as the same shader would render on the website itself.

Have you checked that the sampling rate is the same in the browser and in your implementation? That's usually one of the main reasons for this happening.

@seifane
Copy link

seifane commented Oct 21, 2023

@soulthreads You're right, I think this might be the last thing missing here. It really does seem like 48kHz is not what's used in ShaderToy. It turned out there is also some other inaccuracies with colors in the viewer I use that might cause the difference but that's not relevent here :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment