pbarfuss/AudioRambles.txt

## AudioRambles.txt
So sometimes I will spend a lot of time randomly thinking about things that just confuse me when they either really should, or really shouldn't. Physics falls into the first category, especially foundations of QM. Audio is a great source for entries in the second category.

Note that when I say doesn't make sense I don't mean "is stupid, but seemed like a good idea at the design-by-committee" (see, for instance, SBR envelope coefficient deltas having the ability to be coded as time differentials instead of only as freq differentials - it's not that much more complex and you might shave a tiny bit of extra compression out of the format... and you also drastically decrease its ability to recover from fades on HF, thus making SBR on digital radio mondiale hell until the broadcast engineer finds the "allow deltas in time option" and turns it off). Nor do I mean actual malice (see, for instance, literally any documentation that DVSI releases because they're forced to under the terms of their incredibly lucrative government contract with the P25 group).

Anyway, actually getting to the topic at hand: why the heck would a game store zeros in an LPCM/ADPCM format, in a time when space constraints mattered a lot? I mean heck, why not at that point drop the zeroes and be able to code your coefficients in greater precision, or use less lossy compression on them before storing them on the CD or whatnot? It literally makes no sense at all, and the only option that seems to make sense is...

it's done that way on purpose.

Formats like this one (and most gaming console LPCM/ADPCM/what have you) formats are literally fed directly to the chip in question, often blindly. That explains, for instance, why they're block-interleaved instead of sample-interleaved a lot of the time. Anyway, let's take a quick look at that datasheet again: http://www.retrodev.com/RF5C68A.pdf

YM2608 datasheet this is not, it's very terse and doesn't tell you that much about the chip, but you *can* tell that it's internally an oversampling design. Most DACs are - in fact they started out being 4x - 8x oversampling in the early CD days, and nowadays we feed our signed 16-bit LPCM into what is usually actually a 1-bit sigma-delta DAC with a digital interpolation stage in front of it. The following is a decent quick introduction to the general concept:
http://www.analog.com/media/en/training-seminars/tutorials/MT-017.pdf

Anyway, let's hop off on another tangent. I swear this one is relevant. A lot of HDA-Intel card drivers claim to not support mono source audio (in every case where the driver was easily modifiable, this turns out to be false for every DAC except for analog devices ones, which output audio in one channel and loud high-pitched whine in the other. most other cards output audio in one channel and silence in the other when fed mono, which I will argue is a form of supporting mono source audio, but w/e). Anyway, one solution is just to double your interleaved audio, adding in zeroes every other sample, and telling the codec it's stereo - this is obviously always guaranteed to work.

What isn't guaranteed to work, but if the card supports switching samplerates, is instead the following hack I've often used on windows: feed it mono audio at Fs, say it's stereo audio at Fs/2.

If soundcard DACs worked like software resamplers this'd be a recipie for getting Nyquist Rollover Aliasing and/or highly overzealous brickwall LPFs up in your audio and in general would sound Bloody Awful. But most of the time it sounds good enough that it's transparent to me on computer speakers for most source audio, because the result isn't LPFd and zero-stuffed, it's interpolated and then oversampled by a lot. By the time it comes out to the analog portions of the chip, it actually DOES have an effective output rate higher than what the computer thinks the DAC can actually output.

By the way, back in the day DACs were still hard to both design and fab and had much lower sampling/oversampling/interpolation limits. This resulted in such things as the Creative Soundblaster Pro2 being able to support 44.1kHz mono audio, but stereo audio had a max input samplerate of 22.05kHz. And this wasn't due to bus timings, it was entirely due to DAC limitations.

Anyway, finally note that for the maximal input rate the DAC in question can output a stated samplerate of ~26kHz if clocked at its maximum rate of 10MHz. Which is not that much better than 16kHz and also nonstandard, so it requires resampling your source audio to that rate for playback, which is extremely profoundly annoying. But wait - it's a stereo DAC!

As I said earlier, these audio formats were designed to be handed directly to the audio chips with minimal processing (probably the most notable example in this category is the SNES: the APU was basically its own computer, in fact in early models it WAS and you can hook it up to an 8-bit parallel port and use it to play back SPC700 files, which are actually memory dumps of the state of that microcontroller). You have source audio at 32kHz - you say it's stereo at 16kHz instead, and presumably you actually downsample it instead of just keeping it as/is for any number of reasons: chip filters still not that great, preventing gamedevs from accidentally treating it as stereo and doing things like hard pans to it, etc etc etc.

And thus you get acceptable-sounding audio output at the samplerate of I-Can't-Believe-It's-Not-32kHz, despite the chip not really supporting that, without actually ever violating any of the chip's stated device constraints. Which would actually make storing all those zeroes on disk legitimately worth it.

(by the way, if you think this is unusual for the time, just take a look at anything to do with the Commodore 64 SID6581. That thing is buggy as hell, *and basically every single one of those bugs has been extended to a feature that lets you do something the hardware couldn't originally support*. Adjusting volume on a channel that's switched on but with no audio actively playing causes annoying pops that are roughly proportional to the size of the volume step delta? Maybe for most it's annoying, but for Jeoren Tel it's a way to implement a 4-bit DAC. And that's how we get him yelling "1...2...3... hit it!! OutRun!!!" in audio for a chip that does not officially support sampled PCM whatsoever. And that's not even the craziest example I can think of, not even close).
	So sometimes I will spend a lot of time randomly thinking about things that just confuse me when they either really should, or really shouldn't. Physics falls into the first category, especially foundations of QM. Audio is a great source for entries in the second category.

	Note that when I say doesn't make sense I don't mean "is stupid, but seemed like a good idea at the design-by-committee" (see, for instance, SBR envelope coefficient deltas having the ability to be coded as time differentials instead of only as freq differentials - it's not that much more complex and you might shave a tiny bit of extra compression out of the format... and you also drastically decrease its ability to recover from fades on HF, thus making SBR on digital radio mondiale hell until the broadcast engineer finds the "allow deltas in time option" and turns it off). Nor do I mean actual malice (see, for instance, literally any documentation that DVSI releases because they're forced to under the terms of their incredibly lucrative government contract with the P25 group).

	Anyway, actually getting to the topic at hand: why the heck would a game store zeros in an LPCM/ADPCM format, in a time when space constraints mattered a lot? I mean heck, why not at that point drop the zeroes and be able to code your coefficients in greater precision, or use less lossy compression on them before storing them on the CD or whatnot? It literally makes no sense at all, and the only option that seems to make sense is...

	it's done that way on purpose.

	Formats like this one (and most gaming console LPCM/ADPCM/what have you) formats are literally fed directly to the chip in question, often blindly. That explains, for instance, why they're block-interleaved instead of sample-interleaved a lot of the time. Anyway, let's take a quick look at that datasheet again: http://www.retrodev.com/RF5C68A.pdf

	YM2608 datasheet this is not, it's very terse and doesn't tell you that much about the chip, but you can tell that it's internally an oversampling design. Most DACs are - in fact they started out being 4x - 8x oversampling in the early CD days, and nowadays we feed our signed 16-bit LPCM into what is usually actually a 1-bit sigma-delta DAC with a digital interpolation stage in front of it. The following is a decent quick introduction to the general concept:
	http://www.analog.com/media/en/training-seminars/tutorials/MT-017.pdf

	Anyway, let's hop off on another tangent. I swear this one is relevant. A lot of HDA-Intel card drivers claim to not support mono source audio (in every case where the driver was easily modifiable, this turns out to be false for every DAC except for analog devices ones, which output audio in one channel and loud high-pitched whine in the other. most other cards output audio in one channel and silence in the other when fed mono, which I will argue is a form of supporting mono source audio, but w/e). Anyway, one solution is just to double your interleaved audio, adding in zeroes every other sample, and telling the codec it's stereo - this is obviously always guaranteed to work.

	What isn't guaranteed to work, but if the card supports switching samplerates, is instead the following hack I've often used on windows: feed it mono audio at Fs, say it's stereo audio at Fs/2.

	If soundcard DACs worked like software resamplers this'd be a recipie for getting Nyquist Rollover Aliasing and/or highly overzealous brickwall LPFs up in your audio and in general would sound Bloody Awful. But most of the time it sounds good enough that it's transparent to me on computer speakers for most source audio, because the result isn't LPFd and zero-stuffed, it's interpolated and then oversampled by a lot. By the time it comes out to the analog portions of the chip, it actually DOES have an effective output rate higher than what the computer thinks the DAC can actually output.

	By the way, back in the day DACs were still hard to both design and fab and had much lower sampling/oversampling/interpolation limits. This resulted in such things as the Creative Soundblaster Pro2 being able to support 44.1kHz mono audio, but stereo audio had a max input samplerate of 22.05kHz. And this wasn't due to bus timings, it was entirely due to DAC limitations.

	Anyway, finally note that for the maximal input rate the DAC in question can output a stated samplerate of ~26kHz if clocked at its maximum rate of 10MHz. Which is not that much better than 16kHz and also nonstandard, so it requires resampling your source audio to that rate for playback, which is extremely profoundly annoying. But wait - it's a stereo DAC!

	As I said earlier, these audio formats were designed to be handed directly to the audio chips with minimal processing (probably the most notable example in this category is the SNES: the APU was basically its own computer, in fact in early models it WAS and you can hook it up to an 8-bit parallel port and use it to play back SPC700 files, which are actually memory dumps of the state of that microcontroller). You have source audio at 32kHz - you say it's stereo at 16kHz instead, and presumably you actually downsample it instead of just keeping it as/is for any number of reasons: chip filters still not that great, preventing gamedevs from accidentally treating it as stereo and doing things like hard pans to it, etc etc etc.

	And thus you get acceptable-sounding audio output at the samplerate of I-Can't-Believe-It's-Not-32kHz, despite the chip not really supporting that, without actually ever violating any of the chip's stated device constraints. Which would actually make storing all those zeroes on disk legitimately worth it.

	(by the way, if you think this is unusual for the time, just take a look at anything to do with the Commodore 64 SID6581. That thing is buggy as hell, and basically every single one of those bugs has been extended to a feature that lets you do something the hardware couldn't originally support. Adjusting volume on a channel that's switched on but with no audio actively playing causes annoying pops that are roughly proportional to the size of the volume step delta? Maybe for most it's annoying, but for Jeoren Tel it's a way to implement a 4-bit DAC. And that's how we get him yelling "1...2...3... hit it!! OutRun!!!" in audio for a chip that does not officially support sampled PCM whatsoever. And that's not even the craziest example I can think of, not even close).