{{ message }}

Instantly share code, notes, and snippets.

Last active Mar 22, 2022

Shadertoy exposes audio through a 512x2 texture, where the first row is the spectrum, and second row is wave data.

Pixel format is `GL_RED` `GL_UNSIGNED_BYTE`, meaning that each pixel contains only one 8-bit channel.

According to this shader from iq the audio in the browser is supposed to have a samplerate of 48kHz, but as it turns out, that's not the case: most likely it will be in 44.1kHz.

NOTE: If you wrote the shader assuming the samplerate is 44.1kHz and then played that shader on a system with 48kHz output, what you will see is that the spectrum is going to be "squished" by 1.09 frequency-wise, and also spectrum values are going to be 1.09x smaller on average.

First, we load the audio as floating point data. The audio is then downmixed from stereo to mono as follows:

``````v = 0.5 * (left_v + right_v)
``````

### Wave data

The floating-point audio data is scaled into the `0..255` range as follows:

``````a = clamp(128 * (1 + v), 0, 255)
``````

### Spectrum

The spectrum is calculated according to the Web Audio API specification:

1. Take `2048` samples of audio data as an array of floating point data
2. Multiply it with Blackman window
3. Convert samples into complex numbers (imaginary parts are all zeros)
4. Apply the Fourier transform with `fftSize = 2048`, as a result we get 1024 FFT bins
5. Convert complex result into real values using `cabs()` function
6. Divide each value by `fftSize`
7. Apply smoothing by using previously calculated spectrum values:
``````v = k * v_prev + (1 - k) * v
``````
Where `k` is smoothing constant equal to `0.8`. If calculating spectrum the first time, the previous value is assumed to be `0`.
8. Convert resulting values to dB: `dB = 20 * log10(v)`
9. Convert floating point dB spectrum into 8-bit values:
1. Clamp the value between `dB_min = -100` and `dB_max = -30`
2. Scale the `dB_min..dB_max` range into `0..255` range:
``````t = clamp(255 / (dB_max - dB_min) * (dB - dB_min), 0, 255)
``````
10. Write 8-bit values into texture

#### Important!

We can see that, even though we perform FFT on 2048 samples and get 1024 bins (where for 44.1kHz audio bin 0 corresponds to frequencies from 0 to 21.5 and bin 1023 corresponds to frequencies from 22028 to 22050) the texture is only 512-pixels wide, meaning that we can only draw the lower half of the spectrum (from 0 to 11025Hz)!

### GerrieWell commented Mar 21, 2022

Thank for your sharing! It help a lot to me!
But how do you know the argument value, like `k is smoothing constant equal to 0.8`

### soulthreads commented Mar 22, 2022

@GerrieWell I'm glad you have found it useful!
The values are taken from the Web Audio API specification: `k` is the smoothingTimeConstant there, and its default value is specified as 0.8 there.