Skip to content

Instantly share code, notes, and snippets.

@endolith
Last active August 3, 2023 23:45
Show Gist options
  • Star 10 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save endolith/e8597a58bcd11a6462f33fa8eb75c43d to your computer and use it in GitHub Desktop.
Save endolith/e8597a58bcd11a6462f33fa8eb75c43d to your computer and use it in GitHub Desktop.
Interpretation of WAV file sample data and asymmetry

How to handle asymmetry of WAV data?

WAV files can store PCM audio (WAVE_FORMAT_PCM). The WAV file format specification says:

The data format and maximum and minimums values for PCM waveform samples of various sizes are as follows:

Sample Size Data Format Maximum Value Minimum Value
One to eight bits Unsigned integer 255 (0xFF) 0
Nine or more bits Signed integer i Largest positive value of i Most negative value of i

For example, the maximum, minimum, and midpoint values for 8-bit and 16-bit PCM waveform data are as follows:

Format Maximum Value Minimum Value Midpoint Value
8-bit PCM 255 (0xFF) 0 128 (0x80)
16-bit PCM 32767 (0x7FFF) -32768 (-0x8000) 0

Both the signed and unsigned formats are asymmetrical. How to handle the asymmetry? The signed version is two's complement representation, and AES17 defines the meaning of full-scale amplitude in this case:

amplitude of a 997-Hz sine wave whose positive peak value reaches the positive digital full scale, leaving the negative maximum code unused.

NOTE In 2's-complement representation, the negative peak is 1 LSB away from the negative maximum code.

As does IEC 61606-3:

amplitude of a 997 Hz sinusoid whose peak positive sample just reaches positive digital full-scale (in 2’s-complement a binary value of 0111…1111 to make up the word length) and whose peak negative sample just reaches a value one away from negative digital full-scale (1000…0001 to make up the word length) leaving the maximum negative code (1000…0000) unused

So, for example, for 16-bit audio, a signal that just reaches +32,767 and −32,767 would be full-scale, while one that reaches −32,768 exceeds full-scale.

The midpoint example for 8-bit clarifies that the symmetry of unsigned data is the same as for signed data. So, for 8-bit data, a signal that reaches from 1 to 255 would be full-scale, and the value 0 exceeds full-scale.

WAVE Audio File Format Specifications says:

For float data, full scale is 1.

So, to correctly convert signed ints to float, divide by 2**(b-1) - 1, where b is the number of bits.

To correctly convert unsigned ints to float, subtract 2**(b-1), then, similarly, divide by 2**(b-1) - 1.

The float representation will then be limited to +1.0 full-scale in the positive direction, but can exceed −1.0 full-scale in the negative direction.

Examples

Unsigned

WAV format actually allows for less than 8 bits:

The bits that represent the sample amplitude are stored in the most significant bits of i, and the remaining bits are set to zero.

So I'll show 2-bit audio first (wBitsPerSample = 2), because it's simpler to follow:

WAV Sample int float Comment
0xC0 0b11 3 +1.0 full-scale
0x80 0b10 2  0.0 midpoint
0x40 0b01 1 −1.0 full-scale
0x00 0b00 0 −2.0

For 8-bit audio, as mentioned above, 255 is full-scale, 128 is midpoint, 1 is negative full-scale, and 0 exceeds full-scale:

WAV Sample int float Comment
0xFF 0b1111_1111 255 +1.000 full-scale
0xFE 0b1111_1110 254 +0.992
0xFD 0b1111_1101 253 +0.984
... ... ... ...
0x82 0b1000_0010 130 +0.016
0x81 0b1000_0001 129 +0.008
0x80 0b1000_0000 128  0.000 midpoint
0x7F 0b0111_1111 127 −0.008
0x7E 0b0111_1110 126 −0.016
... ... ... ...
0x03 0b0000_0011 3 −0.984
0x02 0b0000_0010 2 −0.992
0x01 0b0000_0001 1 −1.000 full-scale
0x00 0b0000_0000 0 −1.008

Signed

For 16-bit audio, the interpretation is signed:

WAV Sample int float Comment
0x7FFF 0b0111_1111_1111_1111 +32,767 +1.00000 full-scale
0x7FFE 0b0111_1111_1111_1110 +32,766 +0.99997
0x7FFD 0b0111_1111_1111_1101 +32,765 +0.99994
... ... ... ...
0x0002 0b0000_0000_0000_0010 +2 +0.00006
0x0001 0b0000_0000_0000_0001 +1 +0.00003
0x0000 0b0000_0000_0000_0000  0  0.00000 midpoint
0xFFFF 0b1111_1111_1111_1111 −1 −0.00003
0xFFFE 0b1111_1111_1111_1110 −2 −0.00006
... ... ... ...
0x8003 0b1000_0000_0000_0011 −32,765 −0.99994
0x8002 0b1000_0000_0000_0010 −32,766 −0.99997
0x8001 0b1000_0000_0000_0001 −32,767 −1.00000 full-scale
0x8000 0b1000_0000_0000_0000 −32,768 −1.00003

As is 9-bit audio:

WAV Sample int float Comment
0x7F80 0b0111_1111_1 +255 +1.000 full-scale
0x7F00 0b0111_1111_0 +254 +0.996
0x7E80 0b0111_1110_1 +253 +0.992
... ... ... ...
0x0100 0b0000_0001_0 +2 +0.008
0x0080 0b0000_0000_1 +1 +0.004
0x0000 0b0000_0000_0  0  0.000 midpoint
0xFF80 0b1111_1111_1 −1 −0.004
0xFF00 0b1111_1111_0 −2 −0.008
... ... ... ...
0x8180 0b1000_0001_1 −253 −0.992
0x8100 0b1000_0001_0 −254 −0.996
0x8080 0b1000_0000_1 −255 −1.000 full-scale
0x8000 0b1000_0000_0 −256 −1.004
@endolith
Copy link
Author

endolith commented May 4, 2020

Also the wBitsPerSample field is 2 bytes, which means you could conceivably store data with 65535 bits per sample? Which is ridiculous. Even 255 bits per sample would be a dynamic range of 1,537 dB!

@gavingc
Copy link

gavingc commented Aug 31, 2020

Hi,
What a great write-up.
Yesterday I came to exactly the same understanding, the WAV file format allows saving a file that is essentially invalid in other audio standards, or even just a sane application of symmetry.
Worse still a lot of texts encourage it.
I'm darn sure that a lot of software implements it.

The 8 kHz 16 bit PCM example on Wikipedia evidences this, the files contains the -32768 value!
https://en.wikipedia.org/wiki/WAV
8k16bitpcm.wav
http://www.nch.com.au/acm/8k16bitpcm.wav

I'm completing a signal processing course in my engineering degree and the first assignment requires a 16 bit audio sample to be iteratively quantised down bit by bit, all the way down to 1 bit, by removing the LSB.

Did you ever answer the question: How to handle asymmetry of WAV data?

Is the only fix to simply clip or compress the value that exceeds full-scale?

@endolith
Copy link
Author

endolith commented Aug 31, 2020

@gavingc Basically there are two different interpretations of PCM data, used when converting to float. I listed two standards above that interpret it such that the largest positive value is considered full-scale, and therefore maps to +1.0, so the most negative number exceeds full-scale and is <−1.0. However, other sources (Android spec, USB Audio spec) interpret PCM as a fixed-point number, with the binary point after the first bit, so that the most negative number is full-scale, and maps to −1.0, so the most positive number is less than full-scale, and is <+1.0.

essentially invalid in other audio standards,

Do you have examples of other audio standards that this can be compared with?

Is the only fix to simply clip or compress the value that exceeds full-scale?

In my code, I'm supporting both interpretations, and for the one that allows negative values to exceed full-scale, I'm just keeping the values, assigning a float value more negative than -1.0. scipy/scipy#12507

@gavingc
Copy link

gavingc commented Sep 13, 2020

Hi,
I just meant from a high level the two audio standards AES17 and IEC 61606-3 both define a symmetrical approach while the WAV file format does not, exactly as you have laid out in detail.

I ended up applying the view demonstrated by MATLAB R2018b audioread(), that scaling is done by multiplying/dividing by 2^(nbits - 1).
This gives the range [-1, +1) i.e. < +1.0 or 1 – 2^( - (16 - 1) ) ≈ 0.999969482 for 16-bit.

@endolith
Copy link
Author

@gavingc I'm not sure what you mean. Two's complement is inherently asymmetrical, so you have to accept one asymmetry or the other. The WAV file format doesn't specify anything about how to interpret this.

Thanks for the info about MATLAB, I'll document that in the scipy function. The MATLAB write function works the same way presumably?

@gavingc
Copy link

gavingc commented Sep 13, 2020

Yes indeed, the WAV file format can even be used to store data that is not audio. Ideally though an audio file format would have specified symmetry?

I didn't investigate this closely but it looks like storing floats in WAV does support symmetry.

The most clear example of the asymmetry is perhaps the limit case of 1-bit signed, where formally the int values are -1 and 0.
I ended up representing that in a double data type as [-1.0, 0.0) .
https://stackoverflow.com/questions/12240925/sign-extend-1-bit-2s-complement-number

Yes I didn't confirm by testing the limits of every function in MATLAB but everything I used worked out with this approach, range [-1, +1), without loss of data/bits only loss of scaling resolution if you like.

@endolith
Copy link
Author

@gavingc
Yes, I wished they specified the interpretation as well.

Yes, float format WAVs can exceed [-1.0, 1.0] by a huge amount, so there is no symmetry limitation there.

Yeah, 1-bit WAV could be interpreted as {-1.0, 0.0} with the fixed-point convention, while the full-scale convention breaks down at 1-bit and results in dividing by zero.

Yes, this says MATLAB does it that way too: https://www.mathworks.com/matlabcentral/answers/294112-what-does-the-audioread-function-actually-do#comment_377989

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment