Shadertoy exposes audio through a 512x2 texture, where the first row is the spectrum, and second row is wave data.
Pixel format is GL_RED
GL_UNSIGNED_BYTE
,
meaning that each pixel contains only one 8-bit channel.
According to this shader from iq the audio in the browser is supposed to have a samplerate of 48kHz, but as it turns out, that's not the case: most likely it will be in 44.1kHz.
NOTE: If you wrote the shader assuming the samplerate is 44.1kHz and then played that shader on a system with 48kHz output, what you will see is that the spectrum is going to be "squished" by 1.09 frequency-wise, and also spectrum values are going to be 1.09x smaller on average.
First, we load the audio as floating point data. The audio is then downmixed from stereo to mono as follows:
v = 0.5 * (left_v + right_v)
The floating-point audio data is scaled into the 0..255
range as follows:
a = clamp(128 * (1 + v), 0, 255)
The spectrum is calculated according to the Web Audio API specification:
- Take
2048
samples of audio data as an array of floating point data - Multiply it with Blackman window
- Convert samples into complex numbers (imaginary parts are all zeros)
- Apply the Fourier transform with
fftSize = 2048
, as a result we get 1024 FFT bins - Convert complex result into real values using
cabs()
function - Divide each value by
fftSize
- Apply smoothing by using previously calculated spectrum values:
Wherev = k * v_prev + (1 - k) * v
k
is smoothing constant equal to0.8
. If calculating spectrum the first time, the previous value is assumed to be0
. - Convert resulting values to dB:
dB = 20 * log10(v)
- Convert floating point dB spectrum into 8-bit values:
- Clamp the value between
dB_min = -100
anddB_max = -30
- Scale the
dB_min..dB_max
range into0..255
range:
t = clamp(255 / (dB_max - dB_min) * (dB - dB_min), 0, 255)
- Clamp the value between
- Write 8-bit values into texture
We can see that, even though we perform FFT on 2048 samples and get 1024 bins (where for 44.1kHz audio bin 0 corresponds to frequencies from 0 to 21.5 and bin 1023 corresponds to frequencies from 22028 to 22050) the texture is only 512-pixels wide, meaning that we can only draw the lower half of the spectrum (from 0 to 11025Hz)!
Thank for your sharing! It help a lot to me!
But how do you know the argument value, like
k is smoothing constant equal to 0.8