endolith/WAV interpretation.md

## WAV interpretation.md

      
    Raw
  

              WAV interpretation.md
            
          
    How to handle asymmetry of WAV data?

WAV files can store PCM audio (WAVE_FORMAT_PCM).  The WAV file format specification says:

The data format and maximum and minimums values for PCM waveform samples of various sizes are as follows:


Sample Size
Data Format
Maximum Value
Minimum Value


One to eight bits
Unsigned integer
255 (0xFF)
0


Nine or more bits
Signed integer i
Largest positive value of i
Most negative value of i


For example, the maximum, minimum, and midpoint values for 8-bit and 16-bit PCM waveform data are as follows:


Format
Maximum Value
Minimum Value
Midpoint Value


8-bit PCM
255 (0xFF)
0
128 (0x80)


16-bit PCM
32767 (0x7FFF)
-32768 (-0x8000)
0


Both the signed and unsigned formats are asymmetrical.  How to handle the asymmetry?  The signed version is two's complement representation, and AES17 defines the meaning of full-scale amplitude in this case:

amplitude of a 997-Hz sine wave whose positive peak value reaches the positive digital full scale, leaving the negative maximum code unused.
NOTE In 2's-complement representation, the negative peak is 1 LSB away from the negative maximum code.

As does IEC 61606-3:

amplitude of a 997 Hz sinusoid whose peak positive sample just reaches positive digital full-scale (in 2’s-complement a binary value of 0111…1111 to make up the word length) and whose peak negative sample just reaches a value one away from negative digital full-scale (1000…0001 to make up the word length) leaving the maximum negative code (1000…0000) unused

So, for example, for 16-bit audio, a signal that just reaches +32,767 and −32,767 would be full-scale, while one that reaches −32,768 exceeds full-scale.
The midpoint example for 8-bit clarifies that the symmetry of unsigned data is the same as for signed data.  So, for 8-bit data, a signal that reaches from 1 to 255 would be full-scale, and the value 0 exceeds full-scale.
WAVE Audio File Format Specifications says:

For float data, full scale is 1.

So, to correctly convert signed ints to float, divide by 2**(b-1) - 1, where b is the number of bits.
To correctly convert unsigned ints to float, subtract 2**(b-1), then, similarly, divide by 2**(b-1) - 1.
The float representation will then be limited to +1.0 full-scale in the positive direction, but can exceed −1.0 full-scale in the negative direction.
Examples

Unsigned

WAV format actually allows for less than 8 bits:

The bits that represent the sample amplitude are stored in the most significant bits of i, and the remaining bits are set to zero.

So I'll show 2-bit audio first (wBitsPerSample = 2), because it's simpler to follow:


WAV
Sample
int
float
Comment


0xC0
0b11
3
+1.0
full-scale


0x80
0b10
2
 0.0
midpoint


0x40
0b01
1
−1.0
full-scale


0x00
0b00
0
−2.0


For 8-bit audio, as mentioned above, 255 is full-scale, 128 is midpoint, 1 is negative full-scale, and 0 exceeds full-scale:


WAV
Sample
int
float
Comment


0xFF
0b1111_1111
255
+1.000
full-scale


0xFE
0b1111_1110
254
+0.992


0xFD
0b1111_1101
253
+0.984


...
...
...
...


0x82
0b1000_0010
130
+0.016


0x81
0b1000_0001
129
+0.008


0x80
0b1000_0000
128
 0.000
midpoint


0x7F
0b0111_1111
127
−0.008


0x7E
0b0111_1110
126
−0.016


...
...
...
...


0x03
0b0000_0011
3
−0.984


0x02
0b0000_0010
2
−0.992


0x01
0b0000_0001
1
−1.000
full-scale


0x00
0b0000_0000
0
−1.008


Signed

For 16-bit audio, the interpretation is signed:


WAV
Sample
int
float
Comment


0x7FFF
0b0111_1111_1111_1111
+32,767
+1.00000
full-scale


0x7FFE
0b0111_1111_1111_1110
+32,766
+0.99997


0x7FFD
0b0111_1111_1111_1101
+32,765
+0.99994


...
...
...
...


0x0002
0b0000_0000_0000_0010
+2
+0.00006


0x0001
0b0000_0000_0000_0001
+1
+0.00003


0x0000
0b0000_0000_0000_0000
 0
 0.00000
midpoint


0xFFFF
0b1111_1111_1111_1111
−1
−0.00003


0xFFFE
0b1111_1111_1111_1110
−2
−0.00006


...
...
...
...


0x8003
0b1000_0000_0000_0011
−32,765
−0.99994


0x8002
0b1000_0000_0000_0010
−32,766
−0.99997


0x8001
0b1000_0000_0000_0001
−32,767
−1.00000
full-scale


0x8000
0b1000_0000_0000_0000
−32,768
−1.00003


As is 9-bit audio:


WAV
Sample
int
float
Comment


0x7F80
0b0111_1111_1
+255
+1.000
full-scale


0x7F00
0b0111_1111_0
+254
+0.996


0x7E80
0b0111_1110_1
+253
+0.992


...
...
...
...


0x0100
0b0000_0001_0
+2
+0.008


0x0080
0b0000_0000_1
+1
+0.004


0x0000
0b0000_0000_0
 0
 0.000
midpoint


0xFF80
0b1111_1111_1
−1
−0.004


0xFF00
0b1111_1111_0
−2
−0.008


...
...
...
...


0x8180
0b1000_0001_1
−253
−0.992


0x8100
0b1000_0001_0
−254
−0.996


0x8080
0b1000_0000_1
−255
−1.000
full-scale


0x8000
0b1000_0000_0
−256
−1.004
Sample Size	Data Format	Maximum Value	Minimum Value
One to eight bits	Unsigned integer	255 (0xFF)	0
Nine or more bits	Signed integer i	Largest positive value of i	Most negative value of i
Format	Maximum Value	Minimum Value	Midpoint Value
8-bit PCM	255 (0xFF)	0	128 (0x80)
16-bit PCM	32767 (0x7FFF)	-32768 (-0x8000)	0
WAV	Sample	int	float	Comment
0xC0	0b11	3	+1.0	full-scale
0x80	0b10	2	0.0	midpoint
0x40	0b01	1	−1.0	full-scale
0x00	0b00	0	−2.0
WAV	Sample	int	float	Comment
0xFF	0b1111_1111	255	+1.000	full-scale
0xFE	0b1111_1110	254	+0.992
0xFD	0b1111_1101	253	+0.984
...	...	...	...
0x82	0b1000_0010	130	+0.016
0x81	0b1000_0001	129	+0.008
0x80	0b1000_0000	128	0.000	midpoint
0x7F	0b0111_1111	127	−0.008
0x7E	0b0111_1110	126	−0.016
...	...	...	...
0x03	0b0000_0011	3	−0.984
0x02	0b0000_0010	2	−0.992
0x01	0b0000_0001	1	−1.000	full-scale
0x00	0b0000_0000	0	−1.008
WAV	Sample	int	float	Comment
0x7FFF	0b0111_1111_1111_1111	+32,767	+1.00000	full-scale
0x7FFE	0b0111_1111_1111_1110	+32,766	+0.99997
0x7FFD	0b0111_1111_1111_1101	+32,765	+0.99994
...	...	...	...
0x0002	0b0000_0000_0000_0010	+2	+0.00006
0x0001	0b0000_0000_0000_0001	+1	+0.00003
0x0000	0b0000_0000_0000_0000	0	0.00000	midpoint
0xFFFF	0b1111_1111_1111_1111	−1	−0.00003
0xFFFE	0b1111_1111_1111_1110	−2	−0.00006
...	...	...	...
0x8003	0b1000_0000_0000_0011	−32,765	−0.99994
0x8002	0b1000_0000_0000_0010	−32,766	−0.99997
0x8001	0b1000_0000_0000_0001	−32,767	−1.00000	full-scale
0x8000	0b1000_0000_0000_0000	−32,768	−1.00003
WAV	Sample	int	float	Comment
0x7F80	0b0111_1111_1	+255	+1.000	full-scale
0x7F00	0b0111_1111_0	+254	+0.996
0x7E80	0b0111_1110_1	+253	+0.992
...	...	...	...
0x0100	0b0000_0001_0	+2	+0.008
0x0080	0b0000_0000_1	+1	+0.004
0x0000	0b0000_0000_0	0	0.000	midpoint
0xFF80	0b1111_1111_1	−1	−0.004
0xFF00	0b1111_1111_0	−2	−0.008
...	...	...	...
0x8180	0b1000_0001_1	−253	−0.992
0x8100	0b1000_0001_0	−254	−0.996
0x8080	0b1000_0000_1	−255	−1.000	full-scale
0x8000	0b1000_0000_0	−256	−1.004