nyanpasu64/apudsp_jwdonal.txt

## apudsp_jwdonal.txt
=============================================================================
 Anomie's S-DSP Doc - with updates/fixes/clarifications/etc by jwdonal
 $Revision: 1212 $
 $Date: 2015-09-28 00:41:22 -0700 (Mon, 28 Sep 2015) $
 <anomie@users.sourceforge.net>
=============================================================================

The S-DSP is the actual sound generator for the SNES. It shares 64K of RAM with
the SPC700, and can be poked at via SPC700 registers $00F2 and $00F3. It has an
input clock running nominally at 24576000 Hz, and supplies the SPC700 1024000
Hz clock and the 8192000(?) Hz clock to the expansion port. Note that this
clock has been indirectly observed to vary, with rates of anywhere from
24592800 Hz to 24645600 Hz.

All SPC700 RAM access goes through the S-DSP, and the registers and IPL ROM
may well be located there as well. The S-DSP takes two memory accesses in
between each SPC700 memory access; this means none of the S-DSP external
operations described below can occur at the same time as a SPC700 operation.

The S-DSP's internal clock rate is 3.072MHz. Given a 32kHz sample rate this
means that the S-DSP has exactly 96 clock cycles to generate each new stereo
sample.

Credit to libopenspc, via SNEeSe and snes9x, for much of the below
information. I've since re-verified and re-interpreted much of it. Also, many
thanks to blargg for most of the timing data and several more algorithms, and
just incredible amounts of research.

A note on terminology: "Clip" in this document refers to bit truncation, while
"clamp" refers to range restriction. In C, these could be done as (note that
these equations assume that 'v' is a 32-bit signed int):
  v = v & 0x7FFF;                            /* clip to 15 bits unsigned */
  v = (v<0) ? 0 : ((v>0x7FFF) ? 0x7FFF : v); /* clamp to 15 bits unsigned */


SOUND GENERATION
================

The S-DSP can mix and output up to 8 voices to produce stereo sound (later on,
sound generated by a device on the cart or the expansion port device may also
be mixed in, but the S-DSP has no knowledge of or control over this). Output
is nominally at 32000 Hz, realistically once every 768 clock cycles (32 SPC700
cycles).

At a high level, each voice generates a stereo sample:
 * BRR data is decoded (15-bit mono sample).
 * Interpolation is performed over 4 BRR samples to determine the output
   sample, or the noise sample is selected (15-bit mono).
 * Apply the volume envelope (15-bit mono sample).
 * Apply the VxVOL registers (16-bit stereo sample).
In addition, the left and right echo buffers together generate a 15-bit stereo sample
which is passed through the left and right channel FIR filters resulting in a
16-bit stereo sample. These 9 samples are used in two ways:
 1. Stereo Main Output
    * Mix all voices in order, clamping to 16 bits after each addition
      (16-bit stereo sample).
    * Adjust the main output by the MVOL registers to get the main sample (16-bit stereo sample).
    * Adjust the FIR sample by the EVOL registers (16-bit stereo sample).
    * Mix the EVOL-adjusted FIR sample into the main output, and clamp to 16 bits
      (16-bit stereo sample).
    * Output to the DAC (16-bit stereo sample) unless FLG bit 6 (MUTE) applies.
 2. Stereo Echo Output
    * Mix all voices selected in EON in order, clamping to 16 bits after
      each addition (16-bit stereo sample).
    * Adjust the FIR sample by EFB (16-bit stereo sample).
    * Mix the EFB-adjusted FIR sample back into the echo output, and clamp to 16 bits
      (16-bit stereo sample).
    * Write to the echo buffer (15-bit stereo sample, left-aligned in
      16 bits) unless FLG bit 5 (ECENx) applies.
In all cases, convert from 15- to 16-bits by adding a 0 bit on the low end,
and from 16- to 15-bits by dropping the low bit.

More specifically, the registers and memory are accessed as follows. Note that
most register values are read once per sample output and cached internally for
use as needed. Note also that the S-DSP may perform some of the "if necessary"
operations unconditionally but only make use of the result "if necessary". For
example, in voice processing step S2 it may load the sample pointer
unconditionally, but this has no effect unless there was a loop or KON.

Each voice carries out the following operations:
 S1. Load VxSRCN register, if necessary.
 S2. Load the sample pointer (using previously loaded DIR and VxSRCN) if
      necessary.
     Load VxPITCHL register.
     Load VxADSR1 register.
 S3. a. Load VxPITCHH register.
        Apply pitch modulation if applicable.
     b. Load the BRR header byte (every time), and the first of the two BRR
         bytes that will be decoded.
     c. If applicable, replace the current sample with the noise sample.
        Apply the volume envelope.
         - This is the value used for modulating the next voice's pitch, if
           applicable.
        Check FLG bit 7 (NOT previously loaded).
        Check BRR header 'e' and 'l' bits to determine if the voice ends.
        Handle KOFF and KON using previously loaded values. If KON, ENDX.x will
         be cleared in step S7.
        Load VxGAIN or VxADSR2 register depending on ADSR1.7.
        Update the volume envelope, using previously loaded values.
 S4. Load and apply VxVOLL register.
     If a new group of BRR samples is required, load the second BRR byte and
      decode the group of 4 BRR samples. This is definitely not done when not
      necessary. If necessary, adjust the BRR pointer to the next block, or
      flag the loop address for loading next step S2 and set ENDX.x in step S7.
      Note that this setting of ENDX.x will not override the clearing due to KON
      in step S3c, if both occur during the same sample.
     Increment interpolation sample position as specified by pitch values.
     At any point from now until we next get to S3c, the next sample may be
      calculated using the interpolation position and BRR buffer contents.
 S5. Load and apply VxVOLR register.
     The new ENDX.x value is prepared, and can be overwritten. Reads will not
      see it yet.
 S6. The new VxOUTX value is prepared, and can be overwritten. Reads will not
      see it yet.
 S7. The new ENDX.x value may now be read.
     The new VxENVX value is prepared, and can be overwritten. Reads will not
      see it yet.
 S8. The new VxOUTX value may now be read.
 S9. The new VxENVX value may now be read.

The full sample generation loop is as follows. Note how the above voice
process is interleaved for the 8 voices. The choice of which cycle to call
"cycle 0" is semi-arbitrary. I've included the standard timing of the SPC700
timer ticks, but note that frobbing the SPC700 TEST register can change this
syncronization.

  0. Voice steps: V0:S5  V1:S2
     Tick the SPC700 Stage 1 timers, always for T2 and every 4 samples for
      T0 and T1.
  1. Voice steps: V0:S6  V1:S3
  2. Voice steps: V0:S7  V1:S4         V3:S1
  3. Voice steps: V0:S8  V1:S5  V2:S2
  4. Voice steps: V0:S9  V1:S6  V2:S3
  5. Voice steps:        V1:S7  V2:S4         V4:S1
  6. Voice steps:        V1:S8  V2:S5  V3:S2
  7. Voice steps:        V1:S9  V2:S6  V3:S3
  8. Voice steps:               V2:S7  V3:S4         V5:S1
  9. Voice steps:               V2:S8  V3:S5  V4:S2
 10. Voice steps:               V2:S9  V3:S6  V4:S3
 11. Voice steps:                      V3:S7  V4:S4         V6:S1
 12. Voice steps:                      V3:S8  V4:S5  V5:S2
 13. Voice steps:                      V3:S9  V4:S6  V5:S3
 14. Voice steps:                             V4:S7  V5:S4         V7:S1
 15. Voice steps:                             V4:S8  V5:S5  V6:S2
 16. Voice steps:                             V4:S9  V5:S6  V6:S3
     Tick the SPC700 Stage 1 timer for T2.
 17. Voice steps: V0:S1                              V5:S7  V6:S4
 18. Voice steps:                                    V5:S8  V6:S5  V7:S2
 19. Voice steps:                                    V5:S9  V6:S6  V7:S3
 20. Voice steps:        V1:S1                              V6:S7  V7:S4
 21. Voice steps: V0:S2                                     V6:S8  V7:S5
 22. Voice steps: V0:S3a                                    V6:S9  V7:S6
     Apply ESA using the previously loaded value along with the previously
      calculated echo offset to calculate new echo pointer.
     Load left channel sample from the echo buffer.
     Load FFC0.
 23. Voice steps:                                                  V7:S7
     Load right channel sample from the echo buffer.
     Load FFC1 and FFC2.
 24. Voice steps:                                                  V7:S8
     Load FFC3, FFC4, and FFC5.
 25. Voice steps: V0:S3b                                           V7:S9
     Load FFC6 and FFC7.
 26. Load and apply MVOLL.
     Load and apply EVOLL.
     Output the left sample to the DAC.
     Load and apply EFB.
 27. Load and apply MVOLR.
     Load and apply EVOLR.
     Output the right sample to the DAC.
     Load PMON
 28. Load NON, EON, and DIR.
     Load FLG bit 5 (ECENx) for application to the left channel.
 29. Update global counter.
     Write left channel sample to the echo buffer, if allowed by ECENx.
     Load EDL - if the current echo offset is 0, apply EDL.
     Load ESA for future use.
     Load FLG bit 5 (ECENx) again for application to the right channel.
     ** Clear internal KON bits for any channels keyed on in the previous 2 samples.
 30. Voice steps: V0:S3c
     Write right channel sample to the echo buffer, if allowed by ECENx.
     Increment the echo offset, and set to 0 if it exceeds the buffer length.
     Load FLG bits 0-4 and update noise sample if necessary.
     ** Load KOFF and internal KON.
 31. Voice steps: V0:S4         V2:S1

** These two steps (KON and KOFF related) are performed every other sample.
   Note that the internal KON bits are not cleared until 63 cycles after they
   are loaded. You could also consider the above loop to run from 0-63, with
   everything except these two steps repeated at T+32.

Unless the SPC700 TEST register is frobbed, it is always the case that the
KON/KOFF poll happens either 30 & 94 or 62 & 126 cycles after the SPC700 timer
T0 and T1 tick. On power on, 62 & 126 seems to be chosen more frequently but
30 & 94 can still be chosen sometimes. On reset, either can be chosen.


COUNTERS
========

The S-DSP has a global counter, which is examined by the noise sample
generator and the volume envelope adjustments. The global counter counts from
0x77FF to zero, decrementing by one each sample. Note that the counter is
initialized to zero (not 0x77FF) on reset.

The noise and envelope adjustments use the following tables to determine when
to perform their actions:

    // Number of samples per counter event
    counter_rates[32] = {
             Inf, 2048, 1536,
            1280, 1024,  768,
             640,  512,  384,
             320,  256,  192,
             160,  128,   96,
              80,   64,   48,
              40,   32,   24,
              20,   16,   12,
              10,    8,    6,
               5,    4,    3,
                     2,
                     1
    }

    // Counter offset from zero (i.e. not all counters are aligned at zero for all rates)
    counter_offsets[32] = {
            n/a,    0, 1040,
            536,    0, 1040,
            536,    0, 1040,
            536,    0, 1040,
            536,    0, 1040,
            536,    0, 1040,
            536,    0, 1040,
            536,    0, 1040,
            536,    0, 1040,
            536,    0, 1040,
                    0,
                    0
    }

When (Counter + counter_offsets[R]) % counter_rates[R] is zero (where R is the
current rate) the action is performed. This approach covers more than just the
overall rate, but also the relative synchronization when switching between
different rates (i.e. the first cycle will be shorter than the rest depending
on when the rate change occurs).

It's quite certain that Nintendo did not implement divide-with-remainder logic
in the S-DSP given both the era the chip was designed and the space limitations.
With that said, the above equation does work for all cases but is more geared
towards a software emulator. For an HDL implementation, however, it is not very
practical (although it can be done). It has been demonstrated in FPGA hardware
that exactly identical behavior (including offsets and relative synchronizations
when switching rates) can be generated using only a series of inter-dependent
clocks whose logic resource utilization is >95% smaller than the equivalent
modulus-based implementation.

Another option (created by Mednafen) which does not require the use of
modulus/division and is also more FPGA/HDL friendly is described below.

In this implementation the counter is still initialized to zero on reset, but is
not controlled by a simple decrement once per sample. Instead the following
function is used:

  void run_glbl_cntr (void)
  {
   if(!(Counter & 0x7))
    Counter ^= 0x5;

   if(!(Counter & 0x18))
    Counter ^= 0x18;

   Counter -= 0x29;
  }

The Counter variable will count upwards but unlike the modulus-based method
described earlier it is not a consecutive series of numbers.

The noise and envelope adjustments use the following tables to determine when
to perform their actions:

 // Selects how many bits of the /1 counter to use (to give /1, /2, /4, /8,
 // etc.), and to optionally select the bits of the /5 or /3 divider (to
 // optionally give rates like /5 or /10 or /20 etc., or /3 or /6 or /12 etc.).
 uint16 counter_masks[32] = {
            0x0000, 0xFFE0, 0x3FF8,
            0x1FE7, 0x7FE0, 0x1FF8,
            0x0FE7, 0x3FE0, 0x0FF8,
            0x07E7, 0x1FE0, 0x07F8,
            0x03E7, 0x0FE0, 0x03F8,
            0x01E7, 0x07E0, 0x01F8,
            0x00E7, 0x03E0, 0x00F8,
            0x0067, 0x01E0, 0x0078,
            0x0027, 0x00E0, 0x0038,
            0x0007, 0x0060, 0x0018,
                    0x0020,
                    0x0000
 };

 // Adjusts for relative timing offsets and handles R=0 case. Could also be
 // thought of as counter_compare.
 uint16 counter_xors[32] = {
            0xFFFF, 0x0000, 0x3E08,
            0x1D04, 0x0000, 0x1E08,
            0x0D04, 0x0000, 0x0E08,
            0x0504, 0x0000, 0x0608,
            0x0104, 0x0000, 0x0208,
            0x0104, 0x0000, 0x0008,
            0x0004, 0x0000, 0x0008,
            0x0004, 0x0000, 0x0008,
            0x0004, 0x0000, 0x0008,
            0x0004, 0x0000, 0x0008,
                    0x0000,
                    0x0000
 };

When (Counter & counter_masks[R]) ^ counter_xor[R] is zero (where R is the
current rate) the action is performed. Just like the modulus-based method,
this approach covers both the overall rate and the relative synchronization
when switching between different rates.

VOLUME CONTROL & ECHO
=====================

In all cases, volume samples are adjusted in a simple linear fashion:
  Sout = (Sin * vol) >> vol_shift.

"vol_shift" is chosen to give vol an effective range of -1<=vol<1. Thus, if
vol is unsigned then vol_shift is the number of bits in vol, while if vol is
signed then vol_shift is one less (e.g. 8-bit signed has a vol_shift of 7).

In all cases, mixed values are clamped to 16 bits.

There are several layers to S-DSP volume control. First, the sample is
adjusted by the volume envelope (11 bits unsigned). Then each sample is
adjusted by the per-voice volume (8-bit two's complement) separately for the
left and right channels (which may invert the phase of the signal). After all
voices are mixed the volume is adjusted by the master volume (8-bit two's
complement) separately for the left and right channels. And finally, the whole
thing can be muted by the FLG register.

Echo splits off the main audio path after the per-voice volume, before all
enabled voices are mixed together. The echo buffer (specified by ESA and EDL)
sample pointed to by the current echo offset is fed into the FIR filter, and that output is
adjusted by the echo volume (8-bit two's complement) and mixed back into the
main output (after master volume adjustment).

Then (if echo write is enabled in FLG) the FIR output is adjusted by the echo
feedback volume (8-bit two's complement) and mixed with all voices enabled in
EON, and output into the end of the echo ring buffer.

So note that if echo write is disabled, the "echo ring buffer" becomes a
static sample buffer up to 0.96 seconds long.


BRR DECODING
============

The input samples to the S-DSP are compressed via a method known as "bit rate
reduction", compressing 16 16-bit samples into 9 byte blocks. The block format
is:
    ssssffle 00001111 22223333 44445555 .... EEEEFFFF

    ssss = shift
    ff   = filter
    l    = loop (really "don't end")
    e    = end (really "loop")
    0000 = (D) data for sample #0 in this block, signed 2's complement
    ...
    FFFF = (D) data for sample #15 in this block, signed 2's complement

While the pre-BRR samples were supposedly 16-bit, the BRR decoder seems to
lose the low bit. This can be seen below, in that the input RD loses a bit at
the low end. The bit is 'recovered' after the VxVOLL/VxVOLR volume adjustment.

The 'shift' value scales the sample data D. Values 0-12 work normally, 16-bit
RD=(D<<shift)>>1. Values 13-15 force RD to either 0x0000 or 0xF800 depending on
the sign of the input D (i.e. they give the same values as 0 or F do with
shift=12).

Each voice has a 12-sample ring buffer for decoding BRR data, divided into 3
groups of 4 samples. BRR data is always decoded in a group of 4 samples. There
are two 'active' groups, and one reserve group. When the interpolation index
passes 0x4000, the ring is turned and a new group of BRR data is decoded into
the new reserve group.

There are 4 possible 'filters' to use in decoding the blocks. Some filters use
previous samples in decoding, this does carry over between groups and blocks
and is separate for each voice.
  Filter 0 (Direct):       S(x) = RD
  Filter 1 (15/16):        S(x) = RD + S(x-1) + ((-S(x-1))>>4)
  Filter 2 (61/32-15/16):  S(x) = RD + (S(x-1)<<1) + ((-((S(x-1)<<1)+S(x-1)))>>5)
                                     - S(x-2) + (S(x-2)>>4)
  Filter 3 (115/64-13/16): S(x) = RD + (S(x-1)<<1) + ((-(S(x-1)+(S(x-1)<<2)+(S(x-1)<<3)))>>6)
                                     - S(x-2) + (((S(x-2)<<1) + S(x-2))>>4)

The calculations above are preformed in some higher number of bits, clamped to
16 bits at the end and then clipped to 15 bits. This 15-bit value is the value
output and the value used as S(x-1) or S(x-2) as needed for future filter
iterations.

Certain games do seem to depend on these exact formulas, trying to simplify
will break some sound effects. If the very first block in a sample uses a
filter other than Direct, the previous samples are taken from the *physical*
end of the BRR ring buffer since the buffer index is reset to 0 on KON.

Note that BRR decoding never stops for a voice: KOFF and FLG bit 7 don't
affect it at all (they just set the envelope), and it always loops after
reaching a block with 'e' set ('l' clear again just sets the envelope). KON is
the only thing that actually affects a BRR decode in progress, and that simply
restarts it from the beginning.


Now, as for the remaining two bits. If 'e' is set for the block, the bit in
ENDX is set when the block is complete and the next block will be that pointed
to by the loop pointer for this sample (see DIR and VxSRCN). Also, as soon as
a header is loaded with 'e' set and 'l' clear, the voice goes into the Release
state and the envelope goes to 0 immediately. Due to the 12-sample buffer, the
'e' and 'l' bits of the final block can be seen before the final few samples
of the penultimate block are output if the pitch rate is slow enough. The
samples in the final block will never be output.


When a voice is keyed on, there are 5 '0x0000' samples output before the first
sample encoded by the BRR data. These are used to preload the BRR ring buffer:

 #0 = After the final pre-KON sample is prepared, the envelope is set to 0 and
      enters the Attack state, and is not updated for the next several
      samples. The interpolation index is reset to 0, and is not updated for
      the next several samples. The final pre-KON BRR decode also occurs here
      (which can matter if the first block of the new BRR data uses a
      non-Direct filter).
 #1 = The first '0x0000' sample. At step S2, the start address is read. No BRR
      decoding or header checks, envelope updating, or interpolation index
      updating is performed.
 #2 = At step S4, first BRR group is decoded. No envelope or interpolation
      index updating.
 #3 = At step S4, second BRR group is decoded. No envelope or interpolation
      index updating.
 #4 = At step S4, third BRR group is decoded. No envelope or interpolation
      index updating.
 #5 = Envelope updating begins. The sample output is still '0x0000', because of
      the order in which voice operations are performed. The interpolation
      position is still 0.
 #6 = Finally, we see the first data sample. The first interpolation position
      update is done during step S4.


PITCH ADJUSTMENTS
=================

The S-DSP has two methods to adjust the 'pitch' of the input sound. Each voice
has a 14-bit pitch control, and for voices 1-7 this can be further tweaked by
the output sample of the previous voice.

The pitch adjustment is fairly simple:
  pitch = voice[x].PITCH;
  if(PMON&~NON&~1&(1<<x))
      pitch += ((voice[x-1].outbuffer >> 5) * voice[x].PITCH) >> 10;
  voice[x].interpolation_index += pitch;
  if(voice[x].interpolation_index>0x7FFF)
      voice[x].interpolation_index = 0x7FFF;
In the above, remember that voice[x].PITCH is only 14 bits while the 'pitch'
variable is large enough to never wrap. Additionally, note that the pitch
calculation is performed as a SIGNED operation while the interpolation_index
calculation is performed as an UNSIGNED operation.
When determining whether a new BRR group is needed:
  if(voice[x].interpolation_index>=0x4000){
      NextBRRGroup(x);
      voice[x].interpolation_index -= 0x4000;
  }

The samples in the BRR buffer are then interpolated using a 4-point gaussian
interpolation.

Note that pitch adjustment does not function on noise voices (see NON) or on
voice 0.

The exact interpolation table from libopenspc is:
    // Gaussian table by libopenspc
    // Take note of the 'int32' datatype. These 11-bit hex values are all
    // positive and must be treated as signed.
    static const int32 gauss_coeffs[512] = {
  0x000, 0x000, 0x000, 0x000, 0x000, 0x000, 0x000, 0x000,
  0x000, 0x000, 0x000, 0x000, 0x000, 0x000, 0x000, 0x000,
  0x001, 0x001, 0x001, 0x001, 0x001, 0x001, 0x001, 0x001,
  0x001, 0x001, 0x001, 0x002, 0x002, 0x002, 0x002, 0x002,
  0x002, 0x002, 0x003, 0x003, 0x003, 0x003, 0x003, 0x004,
  0x004, 0x004, 0x004, 0x004, 0x005, 0x005, 0x005, 0x005,
  0x006, 0x006, 0x006, 0x006, 0x007, 0x007, 0x007, 0x008,
  0x008, 0x008, 0x009, 0x009, 0x009, 0x00A, 0x00A, 0x00A,
  0x00B, 0x00B, 0x00B, 0x00C, 0x00C, 0x00D, 0x00D, 0x00E,
  0x00E, 0x00F, 0x00F, 0x00F, 0x010, 0x010, 0x011, 0x011,
  0x012, 0x013, 0x013, 0x014, 0x014, 0x015, 0x015, 0x016,
  0x017, 0x017, 0x018, 0x018, 0x019, 0x01A, 0x01B, 0x01B,
  0x01C, 0x01D, 0x01D, 0x01E, 0x01F, 0x020, 0x020, 0x021,
  0x022, 0x023, 0x024, 0x024, 0x025, 0x026, 0x027, 0x028,
  0x029, 0x02A, 0x02B, 0x02C, 0x02D, 0x02E, 0x02F, 0x030,
  0x031, 0x032, 0x033, 0x034, 0x035, 0x036, 0x037, 0x038,
  0x03A, 0x03B, 0x03C, 0x03D, 0x03E, 0x040, 0x041, 0x042,
  0x043, 0x045, 0x046, 0x047, 0x049, 0x04A, 0x04C, 0x04D,
  0x04E, 0x050, 0x051, 0x053, 0x054, 0x056, 0x057, 0x059,
  0x05A, 0x05C, 0x05E, 0x05F, 0x061, 0x063, 0x064, 0x066,
  0x068, 0x06A, 0x06B, 0x06D, 0x06F, 0x071, 0x073, 0x075,
  0x076, 0x078, 0x07A, 0x07C, 0x07E, 0x080, 0x082, 0x084,
  0x086, 0x089, 0x08B, 0x08D, 0x08F, 0x091, 0x093, 0x096,
  0x098, 0x09A, 0x09C, 0x09F, 0x0A1, 0x0A3, 0x0A6, 0x0A8,
  0x0AB, 0x0AD, 0x0AF, 0x0B2, 0x0B4, 0x0B7, 0x0BA, 0x0BC,
  0x0BF, 0x0C1, 0x0C4, 0x0C7, 0x0C9, 0x0CC, 0x0CF, 0x0D2,
  0x0D4, 0x0D7, 0x0DA, 0x0DD, 0x0E0, 0x0E3, 0x0E6, 0x0E9,
  0x0EC, 0x0EF, 0x0F2, 0x0F5, 0x0F8, 0x0FB, 0x0FE, 0x101,
  0x104, 0x107, 0x10B, 0x10E, 0x111, 0x114, 0x118, 0x11B,
  0x11E, 0x122, 0x125, 0x129, 0x12C, 0x130, 0x133, 0x137,
  0x13A, 0x13E, 0x141, 0x145, 0x148, 0x14C, 0x150, 0x153,
  0x157, 0x15B, 0x15F, 0x162, 0x166, 0x16A, 0x16E, 0x172,
  0x176, 0x17A, 0x17D, 0x181, 0x185, 0x189, 0x18D, 0x191,
  0x195, 0x19A, 0x19E, 0x1A2, 0x1A6, 0x1AA, 0x1AE, 0x1B2,
  0x1B7, 0x1BB, 0x1BF, 0x1C3, 0x1C8, 0x1CC, 0x1D0, 0x1D5,
  0x1D9, 0x1DD, 0x1E2, 0x1E6, 0x1EB, 0x1EF, 0x1F3, 0x1F8,
  0x1FC, 0x201, 0x205, 0x20A, 0x20F, 0x213, 0x218, 0x21C,
  0x221, 0x226, 0x22A, 0x22F, 0x233, 0x238, 0x23D, 0x241,
  0x246, 0x24B, 0x250, 0x254, 0x259, 0x25E, 0x263, 0x267,
  0x26C, 0x271, 0x276, 0x27B, 0x280, 0x284, 0x289, 0x28E,
  0x293, 0x298, 0x29D, 0x2A2, 0x2A6, 0x2AB, 0x2B0, 0x2B5,
  0x2BA, 0x2BF, 0x2C4, 0x2C9, 0x2CE, 0x2D3, 0x2D8, 0x2DC,
  0x2E1, 0x2E6, 0x2EB, 0x2F0, 0x2F5, 0x2FA, 0x2FF, 0x304,
  0x309, 0x30E, 0x313, 0x318, 0x31D, 0x322, 0x326, 0x32B,
  0x330, 0x335, 0x33A, 0x33F, 0x344, 0x349, 0x34E, 0x353,
  0x357, 0x35C, 0x361, 0x366, 0x36B, 0x370, 0x374, 0x379,
  0x37E, 0x383, 0x388, 0x38C, 0x391, 0x396, 0x39B, 0x39F,
  0x3A4, 0x3A9, 0x3AD, 0x3B2, 0x3B7, 0x3BB, 0x3C0, 0x3C5,
  0x3C9, 0x3CE, 0x3D2, 0x3D7, 0x3DC, 0x3E0, 0x3E5, 0x3E9,
  0x3ED, 0x3F2, 0x3F6, 0x3FB, 0x3FF, 0x403, 0x408, 0x40C,
  0x410, 0x415, 0x419, 0x41D, 0x421, 0x425, 0x42A, 0x42E,
  0x432, 0x436, 0x43A, 0x43E, 0x442, 0x446, 0x44A, 0x44E,
  0x452, 0x455, 0x459, 0x45D, 0x461, 0x465, 0x468, 0x46C,
  0x470, 0x473, 0x477, 0x47A, 0x47E, 0x481, 0x485, 0x488,
  0x48C, 0x48F, 0x492, 0x496, 0x499, 0x49C, 0x49F, 0x4A2,
  0x4A6, 0x4A9, 0x4AC, 0x4AF, 0x4B2, 0x4B5, 0x4B7, 0x4BA,
  0x4BD, 0x4C0, 0x4C3, 0x4C5, 0x4C8, 0x4CB, 0x4CD, 0x4D0,
  0x4D2, 0x4D5, 0x4D7, 0x4D9, 0x4DC, 0x4DE, 0x4E0, 0x4E3,
  0x4E5, 0x4E7, 0x4E9, 0x4EB, 0x4ED, 0x4EF, 0x4F1, 0x4F3,
  0x4F5, 0x4F6, 0x4F8, 0x4FA, 0x4FB, 0x4FD, 0x4FF, 0x500,
  0x502, 0x503, 0x504, 0x506, 0x507, 0x508, 0x50A, 0x50B,
  0x50C, 0x50D, 0x50E, 0x50F, 0x510, 0x511, 0x511, 0x512,
  0x513, 0x514, 0x514, 0x515, 0x516, 0x516, 0x517, 0x517,
  0x517, 0x518, 0x518, 0x518, 0x518, 0x518, 0x519, 0x519
    };

    // 4-point gaussian interpolation
    i = voice[x].interpolation_index >> 12;          // 0 <= i <= 7
    d = (voice[x].interpolation_index >> 4) & 0xff;  // 0 <= d <= 255
    outx  = ((gauss_coeffs[255-d] * voice[x].BRRdata[i+0]) >> 11);
    outx += ((gauss_coeffs[511-d] * voice[x].BRRdata[i+1]) >> 11);
    outx += ((gauss_coeffs[256+d] * voice[x].BRRdata[i+2]) >> 11);
    // The above 3 wrap at 15 bits signed. The last is added to that, and is
    // clamped rather than wrapped.
    outx = ((outx & 0x7FFF) ^ 0x4000) - 0x4000;
    outx += ((gauss_coeffs[  0+d] * voice[x].BRRdata[i+3]) >> 11);
    CLAMP15(outx);


S-DSP REGISTERS
===============

The S-DSP contains a number of registers, which are internally polled at
various points during the 32-cycle sample generation loop and often stored
internally for later use. Thus, most writes do not take effect immediately.

All registers are accessed by the SPC700 setting the address in $00F2, then
reading/writing $00F3. Note that the register addresses use only 7 bits:
$80-$ff are read-only mirrors of $00-$7f. Any unspecified registers/bits are
read/write with no known effect.

On power on, most registers are uninitialized. There does seem to be something
of a pattern, but it's nothing specific and seems to differ based between
chips. On reset, most registers retain their previous values.

Some notable exceptions: FLG will always act as if set to 0xE0 after power on
or reset, even if the value read back indicates otherwise. VxENVX and VxOUTX
are of course 0, since all channels are in the Release state due to FLG. And
ENDX will be 0 on power on or reset, but recall that the voices are still
running even when keyed off so the various bits may have been set by BRR
decoding by the time you get to read it.


First, the 10 per-voice registers. These occupy $00-$09, $10-$19, and so on up
to $70-$79.

$x0 rw VxVOLL - Left volume for Voice x
$x1 rw VxVOLR - Right volume for Voice x
        vvvvvvvv

        These are the volumes of the voice in the left/right stereo channel.
        The value is 2's-complement, negative values invert the phase of the
        signal in the channel.

        Volume adjustment is:
          SL = (int16_t)((S * VL)>>7)
          SR = (int16_t)((S * VR)>>7)

        VxVOLL is accessed during voice processing step S4, cycles:
         V0:31  V1:2   V2:5   V3:8   V4:11  V5:14  V6:17  V7:20
        VxVOLR is accessed during voice processing step S5, cycles:
         V0:0   V1:3   V2:6   V3:9   V4:12  V5:15  V6:18  V7:21

$x2 rw VxPITCHL - Pitch scaler for Voice x low byte
$x3 rw VxPITCHH - Pitch scaler for Voice x high byte
        --pppppp pppppppp

        This 14-bit number adjusts the pitch of the sounds output for this
        voice, as the function:
           Fout = Fin * P / 0x1000

        Considering things on the normal 12-note scale, P=0x2000 will increase
        the pitch by one octave, P=0x3FFF will increase by (just about) two
        octaves, P=0x0800 will reduce by one octave, P=0x0400 will reduce by
        two octaves, and so on.

        Note that even though the high bits of $x3 are not significant, they
        are still read back as written.

        VxPITCHL is accessed during voice processing step S2, cycles:
         V0:21  V1:0   V2:3   V3:6   V4:9   V5:12  V6:15  V7:18
        VxPITCHH is accessed during voice processing step S3a, cycles:
         V0:22  V1:1   V2:4   V3:7   V4:10  V5:13  V6:16  V7:19

$x4 rw VxSRCN - Source number for Voice x
        nnnnnnnn

        This selects the "instrument" this voice is to play. The number set
        here is used as an offset into the table pointed to by DIR.

        Changing this while the voice is playing will have no immediate
        effect, but when the voice afterwards loops or is keyed on it will use
        the new value.

        VxSRCN is accessed during voice processing step S1, cycles:
         V0:17  V1:20  V2:31  V3:2   V4:5   V5:8   V6:11  V7:14

$x5 rw VxADSR1 - Attack-Decay-Sustain-Release settings for Voice x (part 1)
        edddaaaa
$x6 rw VxADSR2 - Attack-Decay-Sustain-Release settings for Voice x (part 2)
        lllrrrrr
$x7 rw VxGAIN - Gain settings for Voice x
        EGGGGGGG or Emmggggg

        e/E     = Envelope adjustment method bits.
        ddd     = Decay rate: R=d*2+16
        aaaa    = Attack rate: R=a*2+1
        lll     = Sustain level (see note)
        rrrrr   = Sustain rate: R=r
        mm      = Gain mode
        ggggg   = Gain rate: R=g
        GGGGGGG = Direct Gain mode gain setting: E=g*16

        Note: the "lll" bits are the Sustain Level only when bit 'e' is set.
        If 'e' is clear, the top 3 bits of VxGAIN are used instead.

        These three registers give control over the volume envelope. The
        volume envelope is 11 bits unsigned: volume adjustment is S = (S *
        E)>>11 where S is the current sample. Various settings of these registers
        will automatically adjust the envelope after a certain number of samples,
        based on a counter as described above (see COUNTERS).

        The volume envelope adjustment has 4 states: Attack, Decay, Sustain,
        and Release. When the voice is keyed off or a BRR end-without-loop
        block is reached, the state is set to Release. When the voice is keyed
        on, the state is set to Attack.

        When the envelope is in the Release state, this overrides all settings
        of these registers. In this case, the counter rate R=31 (i.e. adjust
        every sample), and the adjustment is E-=8.

        The simplest method of envelope control ("Direct Gain") is available
        when VxADSR1 bit 7 and VxGAIN bit 7 are both clear. In this case, the
        volume envelope is simply E=%GGGGGGG0000, and R does not matter.

        The second method ("Gain", usually with one of the 4 names below) is
        available when VxADSR1 bit 7 is clear, but VxGAIN bit 7 is set. In
        this case, we have 4 options, chosen based on the 'm' bits.
          00 = Linear Decrease. R=g, E-=32
          01 = Exp Decrease.    R=g, E-=((E-1)>>8)+1
          10 = Linear Increase. R=g, E+=32
          11 = Bent Increase.   R=g, E+=(E<0x600)?32:8
        In all cases, clip E to 0 or 0x7FF rather than wrapping.

        The most complex method ("ADSR") is used when VxADSR1 bit 7 is 1. You
        can think of this method as loading VxGAIN with different values at
        different times based on the value of the volume envelope. VxGAIN is
        not actually altered, however.
          Attack: If aaaa == %1111, R=31 and E+=1024. Otherwise, pretend
              VxGAIN = %110aaaa1. In either case, when E exceeds 0x7FF
              (before clamping) enter the Decay state.
          Decay: Pretend VxGAIN = %1011ddd0. When the upper 3 bits of E equal
              the Sustain Level (see above), enter the Sustain state.
          Sustain: Pretend VxGAIN = %101rrrrr.
        CRITICAL NOTE: These updates happen even when ADSR mode is not selected.

        These registers are actually used to update the envelope every sample.
        The calculated value is used as follows:
         1. If the counter specifies the envelope is to be updated, the
            envelope is set to the new value, clamped to 11 bits.
         2. If the mode is Decay and the Sustain Level is matched, change to
            the Sustain state.
         3. If the mode is Attack and the new value is greater than 0x7FF,
            change to the Decay state. CRITICAL NOTE: Negative values also
            trigger this.
         4. Save the new value, *pre-clamp*, to determine the
            increment for GAIN Bent Increase mode's next sample.

        VxADSR1 is accessed during voice processing step S2, cycles:
         V0:21  V1:0   V2:3   V3:6   V4:9   V5:12  V6:15  V7:18
        VxADSR2 and VxGAIN are accessed during voice processing step S3c,
         cycles:
         V0:30  V1:1   V2:4   V3:7   V4:10  V5:13  V6:16  V7:19

$x8 r- VxENVX - Current envelope value for Voice X
        0eeeeeee

        This returns the high 7 bits of the current volume envelope value
        (IOW, E>>4) for this voice. Note that the high bit will always be 0.
        Also note that (obviously) there is no way to directly determine the
        low 4 bits unless you're using Direct Gain.

        Technically, this register IS writable. But whatever value you write
        will be overwritten at 32000 Hz.

        VxENVX is updated during voice processing step S9, cycles:
         V0:4   V1:7   V2:10  V3:13  V4:16  V5:19  V6:22  V7:25
        However, a write by the SMP to this register up to 2 cycles earlier will
        overwrite the DSP's updated value.

$x9 r- VxOUTX - Current sample value for Voice X
        oooooooo

        This returns the high byte of the current sample for this voice, after
        envelope volume adjustment but before VxVOL[LR] is applied.

        Technically, this register IS writable. But whatever value you write
        will be overwritten at 32000 Hz.

        VxOUTX is updated during voice processing step S8, cycles:
         V0:3   V1:6   V2:9   V3:12  V4:15  V5:18  V6:21  V7:24
        However, a write by the SMP to this register up to 2 cycles earlier will
        overwrite the DSP's updated value.


Now, the general-purpose registers:

$0c rw MVOLL - Left channel master volume
$1c rw MVOLR - Right channel master volume
        vvvvvvvv

        These are the master volumes of the left/right stereo channel. The
        value is 2's-complement, negative values invert the phase of the
        channel. This is the adjustment applied to the mixed 16-bit stereo
        sample output of all 8 voices.

        Volume adjustment is:
          ML = (int16_t)((SL * VL)>>7)
          MR = (int16_t)((SR * VR)>>7)

        MVOLL is accessed during cycle 26.
        MVOLR is accessed during cycle 27.

$2c rw EVOLL - Left channel echo volume
$3c rw EVOLR - Right channel echo volume
        vvvvvvvv

        These are the echo volumes of the left/right stereo channel. The value
        is 2's-complement, negative values invert the phase of the channel.
        This is the adjustment applied to the FIR filter 16-bit output before
        mixing with the main signal (after master volume adjustment).

        Volume adjustment is:
          EL = (int16_t)((SL * VL)>>7)
          ER = (int16_t)((SR * VR)>>7)

        EVOLL is accessed during cycle 26.
        EVOLR is accessed during cycle 27.

$4c rw KON  - Key on for all voices
$5c rw KOFF - Key off for all voices
        76543210

        Each bit of KON/KOFF corresponds to one voice.

        Setting 1 to the KOFF bit will transition the voice to the Release
        state. Thus, the envelope will decrease by 8 every sample (regardless
        of the VxADSR and VxGAIN settings) until it reaches 0, where it will
        stay until the next KON.

        Writing 1 to the KON bit will set the envelope to 0, the state to
        Attack, and will start the channel from the beginning (see DIR and
        VxSRCN). Note that this happens even if the channel is already playing
        (which may cause a click/pop), and that there are 5 'empty' samples
        before envelope updates and BRR decoding actually begin.

        These registers seem to be polled only at 16000 Hz, when every other
        sample is due to be output. Thus, if you write two values in close
        succession, usually but not always only the second value will have an
        effect:
          ; assume KOFF = 0, but no voices playing
          mov $f2, #$4c  ; KON = 1 then KON = 2
          mov $f3, #$01  ; -> *usually* only voice 2 is keyed on. If both are,
          mov $f3, #$02  ; voice 1 will be *2* samples ahead rather than one.
        and
          ; assume various voices playing
          mov $f2, #$5c  ; KOFF = $ff then KOFF = 0
          mov $f3, #$ff
          mov $f3, #$00  ; -> *usually* all voices remain playing
        FLG bit 7, however, is polled every sample and polled for each voice.

        These registers and FLG bit 7 interact as follows:
          1. If FLG bit 7 or the KOFF bit for the channel is set, transition
             to the Release state. If FLG bit 7 is set, also set the envelope
             to 0.
          2. If the 'internal' value of KON has the channel's bit set, perform
             the KON actions described above.
          3. Set the 'internal' value of KON to 0.

        This has a number of consequences:
          * KON effectively takes effect 'on write', even though a non-zero
            value can be read back much later. KOFF and FLG.7, on the other
            hand, exert their influence constantly until a new value is
            written.
          * Writing KON while KOFF or FLG.7 will not result in any samples
            being output by the channel. The channel is keyed on, but it is
            turned off again 2 samples later. Since there is a 5 sample delay
            after KON before the channel actually beings processing, the net
            effect is no output.
          * However, if KOFF is cleared within 63 SPC700 cycles of the
            KON write above, the channel WILL be keyed on as normal. If KOFF
            is cleared betwen 64 and 127 SPC700 cycles later, the channel
            MIGHT be keyed on with decreasing probability depending on how
            many cycles before the KON/KOFF poll the KON write occurred.
          * Setting both KOFF and KON for a channel will turn the channel
            off much faster than just KOFF alone, since the KON will set the
            envelope to 0. This can cause a click/pop, though.

        KOFF and internal KON are accessed during cycle 30 every other sample.
        Internal KON bits are cleared during cycle 29, just before KON is
        accessed.

$6c rw FLG - Reset, Mute, Echo-Write flags and Noise Clock
        rmennnnn

        r = When set, the S-DSP "soft-resets" itself. Mostly, this seems to
            mean the S-DSP acts as if KOFF=$ff and forces all envelopes to 0;
            echo proccessing still continues, and any remaining echo data will
            continue to echo and generate samples. You must clear the bit to
            resume normal operation. See KON/KOFF for some details.

            Note though that this bit is checked much more frequently than
            KOFF.

        m = When set, no sound will be output. Samples will still be decoded,
            echos processed, and such; just no sounds will be output.

        e = When set, the echo ring buffer (see ESA and EDL) will not be
            written. Echo processing on the buffer will continue as normal,
            just the buffer itself will not be updated and so the echo samples
            will loop forever. In other words, the echo pointer is always moving.
            The only thing that changes is whether or not the writes themselves
            occur.

        nnnnn = Noise frequency. This is used with the global counter to
            determine when to generate a new noise sample. Note that there is
            only one noise source shared by all voices for which noise is
            enabled (see NON).

        On reset, this register seems to have a value resembling $E0, even
        though this may not be read back. At least, 'r' is 'set' so we can't
        key on any samples, 'e' is 'set' so the echo buffer is not being
        updated, and 'm' is 'set' because even whatever static data is in the
        echo buffer gives no sound. 'n' is '0', since the noise sample is
        constant until this is set non-zero.

        FLG bit 'r' is accessed during voice processing step S3c, cycles:
         V0:30  V1:1   V2:4   V3:7   V4:10  V5:13  V6:16  V7:19
        FLG bit 'e' is accessed during cycles 28 and 29.
        FLG bits 'n' are accessed during cycle 30.

$7c r* ENDX - Voice end flags
        76543210

        When a BRR block with the end flag set is decoded in a voice, the
        corresponding bit is set in this register. When the voice is keyed on
        (successfully or not), the corresponding bit is cleared. Any write to
        this register will clear ALL bits, no matter what value is written.

        Note that the bit is set at the START of decoding the BRR block, not
        at the end. Recall that BRR processing, and therefore the setting of
        bits in this register, continues even for voices in the Release state.

        On power on or reset, all bits are cleared.

        ENDX is updated during voice processing step S7, cycles:
         V0:2   V1:5   V2:8   V3:11  V4:14  V5:17  V6:20  V7:23
        However, a write by the SMP to this register up to 2 cycles earlier will
        overwrite the DSP's updated value.

$0d rw EFB - Echo feedback volume
        vvvvvvvv

        When echo buffer write is enabled, the FIR output will be adjusted by
        this volume and mixed into the buffer. The value is 2's-complement,
        negative values invert the phase of the signal.

        Volume adjustment is:
          E = (int16_t)(E * V)>>7.

        EFB is accessed during cycle 26.

$2d rw PMON - Pitch modulation enable
        7654321-

        Each bit corresponds to the corresponding voice. When the bit is set,
        the VxPITCH value will be adjusted by the output of the voice x-1. The
        exact formula seems to be:
          P = VxPITCH + (((OutX[x-1] >> 5) * VxPITCH) >> 10)

        For the purposes of pitch adjustment, a voice not playing is all zeros
        and thus has no effect on the pitch.

        PMON is accessed during cycle 27.

$3d rw NON - Noise enable
        76543210

        Each bit corresponds to the corresponding voice. When the bit is set,
        the samples produced by BRR decoding will not be used. Instead, the
        output sample will be the current value of the noise generator (see
        FLG).

        The noise generator outputs a 15-bit noise sample.

        The noise generator operation is as follows: On reset,
        N=0x4000. Each update (see FLG), N=(N>>1)|(((N<<14)^(N<<13))&0x4000).
        And the output noise sample at any point is N (after which is volume
        adjustment then the left-shift to 'restore' the low bit).

        Note that the noise sample is not affected by VxPITCH or PMON, but
        VxPITCH and PMON still control the speed of BRR decoding and the
        end-without-loop of BRR decoding will still transition to Release (and
        update ENDX).

        NON is accessed during cycle 28.

$4d rw EON - Echo enable
        76543210

        Each bit corresponds to the corresponding voice. When the bit is set
        and echo buffer write is enabled, this voice will be mixed into the
        sample to be written to the echo buffer for later echo processing.

        EON is accessed during cycle 28.

$5d rw DIR - Sample table address
        aaaaaaaa

        This forms the high byte of the start address of the sample pointer
        table (the low byte is always 0). The sample pointer table is indexed
        for each voice by VxSRCN to determine which BRR data to decode and
        play.

        Each entry is 4 bytes. The first word points to the start of the BRR
        data, and the second word points to the 'restart' point for when the
        BRR end block is reached. These are referred to as the Source Start
        Address (SA) and the Source Loop Start Addres (LSA), respectively.

        Changing this while voices are playing will have no immediate effect,
        but when any voice afterwards loops or is keyed on it will use the new
        table.

        DIR is accessed during cycle 28.

$6d rw ESA - Echo ring buffer address
        aaaaaaaa

        This forms the high byte of the start address of the echo ring buffer
        (the low byte is always 0). When echo buffer write is enabled in FLG,
        all voices marked in EON will be mixed together, mixed with the FIR
        output (adjusted by the echo feedback volume), and output into the
        ring buffer (4 bytes, 2 per stereo channel). And every sample, one
        entry (4 bytes) will be removed from the ring buffer and passed into
        the FIR filter.

        The size of the buffer is controlled by EDL. The echo buffer will wrap
        within 16 bits, if the ESA and EDL values combine to specify a buffer
        that would go beyond address $FFFF.

        Note that the register is accessed 32 cycles before the value is used
        for a write; at a sample level, this causes writes to appear to be
        delayed by at least a full sample before taking effect.

        ESA is accessed during cycle 29.

$7d rw EDL - Echo delay (ring buffer size)
        ----dddd

        This controls the size of the echo ring buffer, and therefore the
        delay between when a sample is first output and when it enters the
        echo FIR filter. The size of the buffer is simply D<<11 bytes (D<<9
        16-bit stereo samples), however when D=0 the buffer is 4 bytes (1
        16-bit stereo sample) rather than 0.

        Note that only the low 4 bits are used to determine the buffer length.
        The register value is only used under certain conditions:
         * Write the echo buffer at sample 'idx' (cycles 29 and 30)
         * If idx==0, set idx_max = EDL<<9       (cycle 30-ish)
         * Increment idx. If idx>=idx_max, idx=0 (cycle 30-ish)
        This means that it can take up to .24s for a newly written value to
        actually take effect, if the old value was 0x0f and the new value is
        written just after the cycle 30 in which buffer index 0 was written.

        EDL is accessed during cycle 29.

$xf rw FFCx - Echo FIR Filter Coefficient (FFC) X
        cccccccc

        These 8 registers specify the 8 2s-complement coefficients of the 8-tap FIR filter
        used to calculate the echo signal. Each time a sample is generated by
        the voices, one sample is taken from the echo ring buffer and input to
        the FIR filter (this is S(x)). The FIR filter output is then mixed
        with the outputs of the voices to generate the output sound, and mixed
        with the sample being input into the echo buffer for echo feedback.

        Note that the echo buffer contains 15-bit samples left-aligned within
        the 16-bit word, so the 16-bit value read must be right-shifted by one
        bit to get the proper 15-bit S(x). The internal calculations, however,
        are done in 16 bits with the final output of the FIR being a 16-bit value.

        The FIR formula is:

          // The value is clipped when mixing samples x-1 to x-7:
          FIR = (int16)(S(x-7) * FFC0 >> 6 // oldest sample
                      + S(x-6) * FFC1 >> 6
                      + S(x-5) * FFC2 >> 6
                      + S(x-4) * FFC3 >> 6
                      + S(x-3) * FFC4 >> 6
                      + S(x-2) * FFC5 >> 6
                      + S(x-1) * FFC6 >> 6);
          // We have overflow detection when adding the most recent sample
          // only:
          FIR = clamp16(FIR + S(x-0) * FFC7 >> 6); // newest sample
          // Finally, mask of the LSbit to get the final 16-bit result:
          FIR = FIR & ~1;

        Note that the left and right stereo channels are filtered separately
        (no crosstalk), but with identical coefficients.

        FFC0 is accessed during cycle 22.
        FFC1 and FFC2 are accessed during cycle 23.
        FFC3, FFC4, and FFC5 are accessed during cycle 24.
        FFC6 and FFC7 are accessed during cycle 25.
        The echo buffer left channel is read during cycle 22, and written
         during cycle 29.
        The echo buffer right channel is read during cycle 23, and written
         during cycle 30.

## zzBAD_302997615-Anomie-s-S-DSP-Doc.txt
=============================================================================
Anomie's S-DSP Doc - with updates/fixes/clarifications/etc by jwdonal
$Revision: 1212 $
$Date: 2015-09-28 00:41:22 -0700 (Mon, 28 Sep 2015) $
<anomie@users.sourceforge.net>
=============================================================================
The S-DSP is the actual sound generator for the SNES. It shares 64K of RAM with
the SPC700, and can be poked at via SPC700 registers $00F2 and $00F3. It has an
input clock running nominally at 24576000 Hz, and supplies the SPC700 1024000
Hz clock and the 8192000(?) Hz clock to the expansion port. Note that this
clock has been indirectly observed to vary, with rates of anywhere from
24592800 Hz to 24645600 Hz.
All SPC700 RAM access goes through the S-DSP, and the registers and IPL ROM
may well be located there as well. The S-DSP takes two memory accesses in
between each SPC700 memory access; this means none of the S-DSP external
operations described below can occur at the same time as a SPC700 operation.
The S-DSP's internal clock rate is 3.072MHz. Given a 32kHz sample rate this
means that the S-DSP has exactly 96 clock cycles to generate each new stereo
sample.
Credit to libopenspc, via SNEeSe and snes9x, for much of the below
information. I've since re-verified and re-interpreted much of it. Also, many
thanks to blargg for most of the timing data and several more algorithms, and
just incredible amounts of research.
A note on terminology: "Clip" in this document refers to bit truncation, while
"clamp" refers to range restriction. In C, these could be done as (note that
these equations assume that 'v' is a 32-bit signed int):
v = v & 0x7FFF;
/* clip to 15 bits unsigned */
v = (v<0) ? 0 : ((v>0x7FFF) ? 0x7FFF : v); /* clamp to 15 bits unsigned */
SOUND GENERATION
================
The S-DSP can mix and output up to 8 voices to produce stereo sound (later on,
sound generated by a device on the cart or the expansion port device may also
be mixed in, but the S-DSP has no knowledge of or control over this). Output
is nominally at 32000 Hz, realistically once every 768 clock cycles (32 SPC700
cycles).
At a high level, each voice generates a stereo sample:
* BRR data is decoded (15-bit mono sample).
* Interpolation is performed over 4 BRR samples to determine the output
sample, or the noise sample is selected (15-bit mono).
* Apply the volume envelope (15-bit mono sample).
* Apply the VxVOL registers (16-bit stereo sample).
In addition, the left and right echo buffers together generate a 15-bit stereo s
ample
which is passed through the left and right channel FIR filters resulting in a
16-bit stereo sample. These 9 samples are used in two ways:
1. Stereo Main Output
* Mix all voices in order, clamping to 16 bits after each addition
(16-bit stereo sample).
* Adjust the main output by the MVOL registers to get the main sample (16-bi
t stereo sample).
* Adjust the FIR sample by the EVOL registers (16-bit stereo sample).

* Mix the EVOL-adjusted FIR sample into the main output, and clamp to 16 bit
s
(16-bit stereo sample).
* Output to the DAC (16-bit stereo sample) unless FLG bit 6 (MUTE) applies.
2. Stereo Echo Output
* Mix all voices selected in EON in order, clamping to 16 bits after
each addition (16-bit stereo sample).
* Adjust the FIR sample by EFB (16-bit stereo sample).
* Mix the EFB-adjusted FIR sample back into the echo output, and clamp to 16
bits
(16-bit stereo sample).
* Write to the echo buffer (15-bit stereo sample, left-aligned in
16 bits) unless FLG bit 5 (ECENx) applies.
In all cases, convert from 15- to 16-bits by adding a 0 bit on the low end,
and from 16- to 15-bits by dropping the low bit.
More specifically, the registers and memory are accessed as follows. Note that
most register values are read once per sample output and cached internally for
use as needed. Note also that the S-DSP may perform some of the "if necessary"
operations unconditionally but only make use of the result "if necessary". For
example, in voice processing step S2 it may load the sample pointer
unconditionally, but this has no effect unless there was a loop or KON.
Each voice carries out the following operations:
S1. Load VxSRCN register, if necessary.
S2. Load the sample pointer (using previously loaded DIR and VxSRCN) if
necessary.
Load VxPITCHL register.
Load VxADSR1 register.
S3. a. Load VxPITCHH register.
Apply pitch modulation if applicable.
b. Load the BRR header byte (every time), and the first of the two BRR
bytes that will be decoded.
c. If applicable, replace the current sample with the noise sample.
Apply the volume envelope.
- This is the value used for modulating the next voice's pitch, if
applicable.
Check FLG bit 7 (NOT previously loaded).
Check BRR header 'e' and 'l' bits to determine if the voice ends.
Handle KOFF and KON using previously loaded values. If KON, ENDX.x will
be cleared in step S7.
Load VxGAIN or VxADSR2 register depending on ADSR1.7.
Update the volume envelope, using previously loaded values.
S4. Load and apply VxVOLL register.
If a new group of BRR samples is required, load the second BRR byte and
decode the group of 4 BRR samples. This is definitely not done when not
necessary. If necessary, adjust the BRR pointer to the next block, or
flag the loop address for loading next step S2 and set ENDX.x in step S7.
Note that this setting of ENDX.x will not override the clearing due to KON
in step S3c, if both occur during the same sample.
Increment interpolation sample position as specified by pitch values.
At any point from now until we next get to S3c, the next sample may be
calculated using the interpolation position and BRR buffer contents.
S5. Load and apply VxVOLR register.
The new ENDX.x value is prepared, and can be overwritten. Reads will not
see it yet.
S6. The new VxOUTX value is prepared, and can be overwritten. Reads will not
see it yet.
S7. The new ENDX.x value may now be read.
The new VxENVX value is prepared, and can be overwritten. Reads will not

see it yet.
S8. The new VxOUTX value may now be read.
S9. The new VxENVX value may now be read.
The full sample generation loop is as follows. Note how the above voice
process is interleaved for the 8 voices. The choice of which cycle to call
"cycle 0" is semi-arbitrary. I've included the standard timing of the SPC700
timer ticks, but note that frobbing the SPC700 TEST register can change this
syncronization.
0. Voice steps: V0:S5 V1:S2
Tick the SPC700 Stage 1 timers, always for T2 and every 4 samples for
T0 and T1.
1. Voice steps: V0:S6 V1:S3
2. Voice steps: V0:S7 V1:S4
V3:S1
3. Voice steps: V0:S8 V1:S5 V2:S2
4. Voice steps: V0:S9 V1:S6 V2:S3
5. Voice steps:
V1:S7 V2:S4
V4:S1
6. Voice steps:
V1:S8 V2:S5 V3:S2
7. Voice steps:
V1:S9 V2:S6 V3:S3
8. Voice steps:
V2:S7 V3:S4
V5:S1
9. Voice steps:
V2:S8 V3:S5 V4:S2
10. Voice steps:
V2:S9 V3:S6 V4:S3
11. Voice steps:
V3:S7 V4:S4
V6:S1
12. Voice steps:
V3:S8 V4:S5 V5:S2
13. Voice steps:
V3:S9 V4:S6 V5:S3
14. Voice steps:
V4:S7 V5:S4
V7:S1
15. Voice steps:
V4:S8 V5:S5 V6:S2
16. Voice steps:
V4:S9 V5:S6 V6:S3
Tick the SPC700 Stage 1 timer for T2.
17. Voice steps: V0:S1
V5:S7 V6:S4
18. Voice steps:
V5:S8 V6:S5 V7:S2
19. Voice steps:
V5:S9 V6:S6 V7:S3
20. Voice steps:
V1:S1
V6:S7 V7:S4
21. Voice steps: V0:S2
V6:S8 V7:S5
22. Voice steps: V0:S3a
V6:S9 V7:S6
Apply ESA using the previously loaded value along with the previously
calculated echo offset to calculate new echo pointer.
Load left channel sample from the echo buffer.
Load FFC0.
23. Voice steps:
V7:S7
Load right channel sample from the echo buffer.
Load FFC1 and FFC2.
24. Voice steps:
V7:S8
Load FFC3, FFC4, and FFC5.
25. Voice steps: V0:S3b
V7:S9
Load FFC6 and FFC7.
26. Load and apply MVOLL.
Load and apply EVOLL.
Output the left sample to the DAC.
Load and apply EFB.
27. Load and apply MVOLR.
Load and apply EVOLR.
Output the right sample to the DAC.
Load PMON
28. Load NON, EON, and DIR.
Load FLG bit 5 (ECENx) for application to the left channel.
29. Update global counter.
Write left channel sample to the echo buffer, if allowed by ECENx.
Load EDL - if the current echo offset is 0, apply EDL.

Load ESA for future use.
Load FLG bit 5 (ECENx) again for application to the right channel.
** Clear internal KON bits for any channels keyed on in the previous 2 samp
les.
30. Voice steps: V0:S3c
Write right channel sample to the echo buffer, if allowed by ECENx.
Increment the echo offset, and set to 0 if it exceeds the buffer length.
Load FLG bits 0-4 and update noise sample if necessary.
** Load KOFF and internal KON.
31. Voice steps: V0:S4
V2:S1
** These two steps (KON and KOFF related) are performed every other sample.
Note that the internal KON bits are not cleared until 63 cycles after they
are loaded. You could also consider the above loop to run from 0-63, with
everything except these two steps repeated at T+32.
Unless the SPC700 TEST register is frobbed, it is always the case that the
KON/KOFF poll happens either 30 & 94 or 62 & 126 cycles after the SPC700 timer
T0 and T1 tick. On power on, 62 & 126 seems to be chosen more frequently but
30 & 94 can still be chosen sometimes. On reset, either can be chosen.
COUNTERS
========
The S-DSP has a global counter, which is examined by the noise sample
generator and the volume envelope adjustments. The global counter counts from
0x77FF to zero, decrementing by one each sample. Note that the counter is
initialized to zero (not 0x77FF) on reset.
The noise and envelope adjustments use the following tables to determine when
to perform their actions:
// Number of samples per counter event
counter_rates[32] = {
Inf, 2048, 1536,
1280, 1024, 768,
640, 512, 384,
320, 256, 192,
160, 128, 96,
80, 64, 48,
40, 32, 24,
20, 16, 12,
10,
8,
6,
5,
4,
3,
2,
1
}
// Counter offset from zero (i.e. not all counters are aligned at zero for a
ll rates)
counter_offsets[32] = {
n/a,
0, 1040,
536,
0, 1040,
536,
0, 1040,
536,
0, 1040,
536,
0, 1040,
536,
0, 1040,
536,
0, 1040,
536,
0, 1040,

536,
536,

0, 1040,
0, 1040,
0,
0

}
When (Counter + counter_offsets[R]) % counter_rates[R] is zero (where R is the
current rate) the action is performed. This approach covers more than just the
overall rate, but also the relative synchronization when switching between
different rates (i.e. the first cycle will be shorter than the rest depending
on when the rate change occurs).
It's quite certain that Nintendo did not implement divide-with-remainder logic
in the S-DSP given both the era the chip was designed and the space limitations.
With that said, the above equation does work for all cases but is more geared
towards a software emulator. For an HDL implementation, however, it is not very
practical (although it can be done). It has been demonstrated in FPGA hardware
that exactly identical behavior (including offsets and relative synchronizations
when switching rates) can be generated using only a series of inter-dependent
clocks whose logic resource utilization is >95% smaller than the equivalent
modulus-based implementation.
Another option (created by Mednafen) which does not require the use of
modulus/division and is also more FPGA/HDL friendly is described below.
In this implementation the counter is still initialized to zero on reset, but is
not controlled by a simple decrement once per sample. Instead the following
function is used:
void run_glbl_cntr (void)
{
if(!(Counter & 0x7))
Counter ^= 0x5;
if(!(Counter & 0x18))
Counter ^= 0x18;
Counter -= 0x29;
}
The Counter variable will count upwards but unlike the modulus-based method
described earlier it is not a consecutive series of numbers.
The noise and envelope adjustments use the following tables to determine when
to perform their actions:
// Selects how many bits of the /1 counter to use (to give /1, /2, /4, /8,
// etc.), and to optionally select the bits of the /5 or /3 divider (to
// optionally give rates like /5 or /10 or /20 etc., or /3 or /6 or /12 etc.).
uint16 counter_masks[32] = {
0x0000, 0xFFE0, 0x3FF8,
0x1FE7, 0x7FE0, 0x1FF8,
0x0FE7, 0x3FE0, 0x0FF8,
0x07E7, 0x1FE0, 0x07F8,
0x03E7, 0x0FE0, 0x03F8,
0x01E7, 0x07E0, 0x01F8,
0x00E7, 0x03E0, 0x00F8,
0x0067, 0x01E0, 0x0078,
0x0027, 0x00E0, 0x0038,
0x0007, 0x0060, 0x0018,

0x0020,
0x0000
};
// Adjusts for relative timing offsets and handles R=0 case. Could also be
// thought of as counter_compare.
uint16 counter_xors[32] = {
0xFFFF, 0x0000, 0x3E08,
0x1D04, 0x0000, 0x1E08,
0x0D04, 0x0000, 0x0E08,
0x0504, 0x0000, 0x0608,
0x0104, 0x0000, 0x0208,
0x0104, 0x0000, 0x0008,
0x0004, 0x0000, 0x0008,
0x0004, 0x0000, 0x0008,
0x0004, 0x0000, 0x0008,
0x0004, 0x0000, 0x0008,
0x0000,
0x0000
};
When (Counter & counter_masks[R]) ^ counter_xor[R] is zero (where R is the
current rate) the action is performed. Just like the modulus-based method,
this approach covers both the overall rate and the relative synchronization
when switching between different rates.
VOLUME CONTROL & ECHO
=====================
In all cases, volume samples are adjusted in a simple linear fashion:
Sout = (Sin * vol) >> vol_shift.
"vol_shift" is chosen to give vol an effective range of -1<=vol<1. Thus, if
vol is unsigned then vol_shift is the number of bits in vol, while if vol is
signed then vol_shift is one less (e.g. 8-bit signed has a vol_shift of 7).
In all cases, mixed values are clamped to 16 bits.
There are several layers to S-DSP volume control. First, the sample is
adjusted by the volume envelope (11 bits unsigned). Then each sample is
adjusted by the per-voice volume (8-bit two's complement) separately for the
left and right channels (which may invert the phase of the signal). After all
voices are mixed the volume is adjusted by the master volume (8-bit two's
complement) separately for the left and right channels. And finally, the whole
thing can be muted by the FLG register.
Echo splits off the main audio path after the per-voice volume, before all
enabled voices are mixed together. The echo buffer (specified by ESA and EDL)
sample pointed to by the current echo offset is fed into the FIR filter, and tha
t output is
adjusted by the echo volume (8-bit two's complement) and mixed back into the
main output (after master volume adjustment).
Then (if echo write is enabled in FLG) the FIR output is adjusted by the echo
feedback volume (8-bit two's complement) and mixed with all voices enabled in
EON, and output into the end of the echo ring buffer.
So note that if echo write is disabled, the "echo ring buffer" becomes a
static sample buffer up to 0.96 seconds long.

BRR DECODING
============
The input samples to the S-DSP are compressed via a method known as "bit rate
reduction", compressing 16 16-bit samples into 9 byte blocks. The block format
is:
ssssffle 00001111 22223333 44445555 .... EEEEFFFF
ssss
ff
l
e
0000
...
FFFF

=
=
=
=
=

shift
filter
loop (really "don't end")
end (really "loop")
(D) data for sample #0 in this block, signed 2's complement

= (D) data for sample #15 in this block, signed 2's complement

While the pre-BRR samples were supposedly 16-bit, the BRR decoder seems to
lose the low bit. This can be seen below, in that the input RD loses a bit at
the low end. The bit is 'recovered' after the VxVOLL/VxVOLR volume adjustment.
The 'shift' value scales the sample data D. Values 0-12 work normally, 16-bit
RD=(D<<shift)>>1. Values 13-15 force RD to either 0x0000 or 0xF800 depending on
the sign of the input D (i.e. they give the same values as 0 or F do with
shift=12).
Each voice has a 12-sample ring buffer for decoding BRR data, divided into 3
groups of 4 samples. BRR data is always decoded in a group of 4 samples. There
are two 'active' groups, and one reserve group. When the interpolation index
passes 0x4000, the ring is turned and a new group of BRR data is decoded into
the new reserve group.
There are 4 possible 'filters' to use in decoding the blocks. Some filters use
previous samples in decoding, this does carry over between groups and blocks
and is separate for each voice.
Filter 0 (Direct):
S(x) = RD
Filter 1 (15/16):
S(x) = RD + S(x-1) + ((-S(x-1))>>4)
Filter 2 (61/32-15/16): S(x) = RD + (S(x-1)<<1) + ((-((S(x-1)<<1)+S(x-1)))>>5
)
- S(x-2) + (S(x-2)>>4)
Filter 3 (115/64-13/16): S(x) = RD + (S(x-1)<<1) + ((-(S(x-1)+(S(x-1)<<2)+(S(x
-1)<<3)))>>6)
- S(x-2) + (((S(x-2)<<1) + S(x-2))>>4)
The calculations above are preformed in some higher number of bits, clamped to
16 bits at the end and then clipped to 15 bits. This 15-bit value is the value
output and the value used as S(x-1) or S(x-2) as needed for future filter
iterations.
Certain games do seem to depend on these exact formulas, trying to simplify
will break some sound effects. If the very first block in a sample uses a
filter other than Direct, the previous samples are taken from the *physical*
end of the BRR ring buffer since the buffer index is reset to 0 on KON.
Note that BRR decoding never stops for a voice: KOFF and FLG bit 7 don't
affect it at all (they just set the envelope), and it always loops after
reaching a block with 'e' set ('l' clear again just sets the envelope). KON is
the only thing that actually affects a BRR decode in progress, and that simply
restarts it from the beginning.

Now, as for the remaining two bits. If 'e' is set for the block, the bit in
ENDX is set when the block is complete and the next block will be that pointed
to by the loop pointer for this sample (see DIR and VxSRCN). Also, as soon as
a header is loaded with 'e' set and 'l' clear, the voice goes into the Release
state and the envelope goes to 0 immediately. Due to the 12-sample buffer, the
'e' and 'l' bits of the final block can be seen before the final few samples
of the penultimate block are output if the pitch rate is slow enough. The
samples in the final block will never be output.
When a voice is keyed on, there are 5 '0x0000' samples output before the first
sample encoded by the BRR data. These are used to preload the BRR ring buffer:
#0 = After the final pre-KON sample is prepared, the envelope is set to 0 and
enters the Attack state, and is not updated for the next several
samples. The interpolation index is reset to 0, and is not updated for
the next several samples. The final pre-KON BRR decode also occurs here
(which can matter if the first block of the new BRR data uses a
non-Direct filter).
#1 = The first '0x0000' sample. At step S2, the start address is read. No BRR
decoding or header checks, envelope updating, or interpolation index
updating is performed.
#2 = At step S4, first BRR group is decoded. No envelope or interpolation
index updating.
#3 = At step S4, second BRR group is decoded. No envelope or interpolation
index updating.
#4 = At step S4, third BRR group is decoded. No envelope or interpolation
index updating.
#5 = Envelope updating begins. The sample output is still '0x0000', because of
the order in which voice operations are performed. The interpolation
position is still 0.
#6 = Finally, we see the first data sample. The first interpolation position
update is done during step S4.
PITCH ADJUSTMENTS
=================
The S-DSP has two methods to adjust the 'pitch' of the input sound. Each voice
has a 14-bit pitch control, and for voices 1-7 this can be further tweaked by
the output sample of the previous voice.
The pitch adjustment is fairly simple:
pitch = voice[x].PITCH;
if(PMON&~NON&~1&(1<<x))
pitch += ((voice[x-1].outbuffer >> 5) * voice[x].PITCH) >> 10;
voice[x].interpolation_index += pitch;
if(voice[x].interpolation_index>0x7FFF)
voice[x].interpolation_index = 0x7FFF;
In the above, remember that voice[x].PITCH is only 14 bits while the 'pitch'
variable is large enough to never wrap. Additionally, note that the pitch
calculation is performed as a SIGNED operation while the interpolation_index
calculation is performed as an UNSIGNED operation.
When determining whether a new BRR group is needed:
if(voice[x].interpolation_index>=0x4000){
NextBRRGroup(x);
voice[x].interpolation_index -= 0x4000;
}

The samples in the BRR buffer are then interpolated using a 4-point gaussian
interpolation.
Note that pitch adjustment does not function on noise voices (see NON) or on
voice 0.
The exact interpolation table from libopenspc is:
// Gaussian table by libopenspc
// Take note of the 'int32' datatype. These 11-bit hex values are all
// positive and must be treated as signed.
static const int32 gauss_coeffs[512] = {
0x000, 0x000, 0x000, 0x000, 0x000, 0x000, 0x000, 0x000,
0x000, 0x000, 0x000, 0x000, 0x000, 0x000, 0x000, 0x000,
0x001, 0x001, 0x001, 0x001, 0x001, 0x001, 0x001, 0x001,
0x001, 0x001, 0x001, 0x002, 0x002, 0x002, 0x002, 0x002,
0x002, 0x002, 0x003, 0x003, 0x003, 0x003, 0x003, 0x004,
0x004, 0x004, 0x004, 0x004, 0x005, 0x005, 0x005, 0x005,
0x006, 0x006, 0x006, 0x006, 0x007, 0x007, 0x007, 0x008,
0x008, 0x008, 0x009, 0x009, 0x009, 0x00A, 0x00A, 0x00A,
0x00B, 0x00B, 0x00B, 0x00C, 0x00C, 0x00D, 0x00D, 0x00E,
0x00E, 0x00F, 0x00F, 0x00F, 0x010, 0x010, 0x011, 0x011,
0x012, 0x013, 0x013, 0x014, 0x014, 0x015, 0x015, 0x016,
0x017, 0x017, 0x018, 0x018, 0x019, 0x01A, 0x01B, 0x01B,
0x01C, 0x01D, 0x01D, 0x01E, 0x01F, 0x020, 0x020, 0x021,
0x022, 0x023, 0x024, 0x024, 0x025, 0x026, 0x027, 0x028,
0x029, 0x02A, 0x02B, 0x02C, 0x02D, 0x02E, 0x02F, 0x030,
0x031, 0x032, 0x033, 0x034, 0x035, 0x036, 0x037, 0x038,
0x03A, 0x03B, 0x03C, 0x03D, 0x03E, 0x040, 0x041, 0x042,
0x043, 0x045, 0x046, 0x047, 0x049, 0x04A, 0x04C, 0x04D,
0x04E, 0x050, 0x051, 0x053, 0x054, 0x056, 0x057, 0x059,
0x05A, 0x05C, 0x05E, 0x05F, 0x061, 0x063, 0x064, 0x066,
0x068, 0x06A, 0x06B, 0x06D, 0x06F, 0x071, 0x073, 0x075,
0x076, 0x078, 0x07A, 0x07C, 0x07E, 0x080, 0x082, 0x084,
0x086, 0x089, 0x08B, 0x08D, 0x08F, 0x091, 0x093, 0x096,
0x098, 0x09A, 0x09C, 0x09F, 0x0A1, 0x0A3, 0x0A6, 0x0A8,
0x0AB, 0x0AD, 0x0AF, 0x0B2, 0x0B4, 0x0B7, 0x0BA, 0x0BC,
0x0BF, 0x0C1, 0x0C4, 0x0C7, 0x0C9, 0x0CC, 0x0CF, 0x0D2,
0x0D4, 0x0D7, 0x0DA, 0x0DD, 0x0E0, 0x0E3, 0x0E6, 0x0E9,
0x0EC, 0x0EF, 0x0F2, 0x0F5, 0x0F8, 0x0FB, 0x0FE, 0x101,
0x104, 0x107, 0x10B, 0x10E, 0x111, 0x114, 0x118, 0x11B,
0x11E, 0x122, 0x125, 0x129, 0x12C, 0x130, 0x133, 0x137,
0x13A, 0x13E, 0x141, 0x145, 0x148, 0x14C, 0x150, 0x153,
0x157, 0x15B, 0x15F, 0x162, 0x166, 0x16A, 0x16E, 0x172,
0x176, 0x17A, 0x17D, 0x181, 0x185, 0x189, 0x18D, 0x191,
0x195, 0x19A, 0x19E, 0x1A2, 0x1A6, 0x1AA, 0x1AE, 0x1B2,
0x1B7, 0x1BB, 0x1BF, 0x1C3, 0x1C8, 0x1CC, 0x1D0, 0x1D5,
0x1D9, 0x1DD, 0x1E2, 0x1E6, 0x1EB, 0x1EF, 0x1F3, 0x1F8,
0x1FC, 0x201, 0x205, 0x20A, 0x20F, 0x213, 0x218, 0x21C,
0x221, 0x226, 0x22A, 0x22F, 0x233, 0x238, 0x23D, 0x241,
0x246, 0x24B, 0x250, 0x254, 0x259, 0x25E, 0x263, 0x267,
0x26C, 0x271, 0x276, 0x27B, 0x280, 0x284, 0x289, 0x28E,
0x293, 0x298, 0x29D, 0x2A2, 0x2A6, 0x2AB, 0x2B0, 0x2B5,
0x2BA, 0x2BF, 0x2C4, 0x2C9, 0x2CE, 0x2D3, 0x2D8, 0x2DC,
0x2E1, 0x2E6, 0x2EB, 0x2F0, 0x2F5, 0x2FA, 0x2FF, 0x304,
0x309, 0x30E, 0x313, 0x318, 0x31D, 0x322, 0x326, 0x32B,
0x330, 0x335, 0x33A, 0x33F, 0x344, 0x349, 0x34E, 0x353,
0x357, 0x35C, 0x361, 0x366, 0x36B, 0x370, 0x374, 0x379,
0x37E, 0x383, 0x388, 0x38C, 0x391, 0x396, 0x39B, 0x39F,
0x3A4, 0x3A9, 0x3AD, 0x3B2, 0x3B7, 0x3BB, 0x3C0, 0x3C5,
0x3C9, 0x3CE, 0x3D2, 0x3D7, 0x3DC, 0x3E0, 0x3E5, 0x3E9,

0x3ED,
0x410,
0x432,
0x452,
0x470,
0x48C,
0x4A6,
0x4BD,
0x4D2,
0x4E5,
0x4F5,
0x502,
0x50C,
0x513,
0x517,
};

0x3F2,
0x415,
0x436,
0x455,
0x473,
0x48F,
0x4A9,
0x4C0,
0x4D5,
0x4E7,
0x4F6,
0x503,
0x50D,
0x514,
0x518,

0x3F6,
0x419,
0x43A,
0x459,
0x477,
0x492,
0x4AC,
0x4C3,
0x4D7,
0x4E9,
0x4F8,
0x504,
0x50E,
0x514,
0x518,

0x3FB,
0x41D,
0x43E,
0x45D,
0x47A,
0x496,
0x4AF,
0x4C5,
0x4D9,
0x4EB,
0x4FA,
0x506,
0x50F,
0x515,
0x518,

0x3FF,
0x421,
0x442,
0x461,
0x47E,
0x499,
0x4B2,
0x4C8,
0x4DC,
0x4ED,
0x4FB,
0x507,
0x510,
0x516,
0x518,

0x403,
0x425,
0x446,
0x465,
0x481,
0x49C,
0x4B5,
0x4CB,
0x4DE,
0x4EF,
0x4FD,
0x508,
0x511,
0x516,
0x518,

0x408,
0x42A,
0x44A,
0x468,
0x485,
0x49F,
0x4B7,
0x4CD,
0x4E0,
0x4F1,
0x4FF,
0x50A,
0x511,
0x517,
0x519,

0x40C,
0x42E,
0x44E,
0x46C,
0x488,
0x4A2,
0x4BA,
0x4D0,
0x4E3,
0x4F3,
0x500,
0x50B,
0x512,
0x517,
0x519

// 4-point gaussian interpolation
i = voice[x].interpolation_index >> 12;
// 0 <= i <= 7
d = (voice[x].interpolation_index >> 4) & 0xff; // 0 <= d <= 255
outx = ((gauss_coeffs[255-d] * voice[x].BRRdata[i+0]) >> 11);
outx += ((gauss_coeffs[511-d] * voice[x].BRRdata[i+1]) >> 11);
outx += ((gauss_coeffs[256+d] * voice[x].BRRdata[i+2]) >> 11);
// The above 3 wrap at 15 bits signed. The last is added to that, and is
// clamped rather than wrapped.
outx = ((outx & 0x7FFF) ^ 0x4000) - 0x4000;
outx += ((gauss_coeffs[ 0+d] * voice[x].BRRdata[i+3]) >> 11);
CLAMP15(outx);
S-DSP REGISTERS
===============
The S-DSP contains a number of registers, which are internally polled at
various points during the 32-cycle sample generation loop and often stored
internally for later use. Thus, most writes do not take effect immediately.
All registers are accessed by the SPC700 setting the address in $00F2, then
reading/writing $00F3. Note that the register addresses use only 7 bits:
$80-$ff are read-only mirrors of $00-$7f. Any unspecified registers/bits are
read/write with no known effect.
On power on, most registers are uninitialized. There does seem to be something
of a pattern, but it's nothing specific and seems to differ based between
chips. On reset, most registers retain their previous values.
Some notable exceptions: FLG will always act as if set to 0xE0 after power on
or reset, even if the value read back indicates otherwise. VxENVX and VxOUTX
are of course 0, since all channels are in the Release state due to FLG. And
ENDX will be 0 on power on or reset, but recall that the voices are still
running even when keyed off so the various bits may have been set by BRR
decoding by the time you get to read it.
First, the 10 per-voice registers. These occupy $00-$09, $10-$19, and so on up
to $70-$79.
$x0 rw VxVOLL - Left volume for Voice x
$x1 rw VxVOLR - Right volume for Voice x
vvvvvvvv

These are the volumes of the voice in the left/right stereo channel.
The value is 2's-complement, negative values invert the phase of the
signal in the channel.
Volume adjustment is:
SL = (int16_t)((S * VL)>>7)
SR = (int16_t)((S * VR)>>7)
VxVOLL
V0:31
VxVOLR
V0:0

is accessed during voice processing
V1:2 V2:5 V3:8 V4:11 V5:14
is accessed during voice processing
V1:3 V2:6 V3:9 V4:12 V5:15

step S4, cycles:
V6:17 V7:20
step S5, cycles:
V6:18 V7:21

$x2 rw VxPITCHL - Pitch scaler for Voice x low byte
$x3 rw VxPITCHH - Pitch scaler for Voice x high byte
--pppppp pppppppp
This 14-bit number adjusts the pitch of the sounds output for this
voice, as the function:
Fout = Fin * P / 0x1000
Considering things on the normal 12-note scale, P=0x2000 will increase
the pitch by one octave, P=0x3FFF will increase by (just about) two
octaves, P=0x0800 will reduce by one octave, P=0x0400 will reduce by
two octaves, and so on.
Note that even though the high bits of $x3 are not significant, they
are still read back as written.
VxPITCHL is accessed
V0:21 V1:0 V2:3
VxPITCHH is accessed
V0:22 V1:1 V2:4

during
V3:6
during
V3:7

voice processing step
V4:9 V5:12 V6:15
voice processing step
V4:10 V5:13 V6:16

S2, cycles:
V7:18
S3a, cycles:
V7:19

$x4 rw VxSRCN - Source number for Voice x
nnnnnnnn
This selects the "instrument" this voice is to play. The number set
here is used as an offset into the table pointed to by DIR.
Changing this while the voice is playing will have no immediate
effect, but when the voice afterwards loops or is keyed on it will use
the new value.
VxSRCN is accessed during voice processing step S1, cycles:
V0:17 V1:20 V2:31 V3:2 V4:5 V5:8 V6:11 V7:14
$x5 rw VxADSR1 - Attack-Decay-Sustain-Release settings for Voice x (part 1)
edddaaaa
$x6 rw VxADSR2 - Attack-Decay-Sustain-Release settings for Voice x (part 2)
lllrrrrr
$x7 rw VxGAIN - Gain settings for Voice x
EGGGGGGG or Emmggggg
e/E
ddd
aaaa
lll
rrrrr

=
=
=
=
=

Envelope adjustment method bits.
Decay rate: R=d*2+16
Attack rate: R=a*2+1
Sustain level (see note)
Sustain rate: R=r

mm
= Gain mode
ggggg = Gain rate: R=g
GGGGGGG = Direct Gain mode gain setting: E=g*16
Note: the "lll" bits are the Sustain Level only when bit 'e' is set.
If 'e' is clear, the top 3 bits of VxGAIN are used instead.
These three registers give control over the volume envelope. The
volume envelope is 11 bits unsigned: volume adjustment is S = (S *
E)>>11 where S is the current sample. Various settings of these register
s
will automatically adjust the envelope after a certain number of samples
,
based on a counter as described above (see COUNTERS).
The volume envelope adjustment has 4 states: Attack, Decay, Sustain,
and Release. When the voice is keyed off or a BRR end-without-loop
block is reached, the state is set to Release. When the voice is keyed
on, the state is set to Attack.
When the envelope is in the Release state, this overrides all settings
of these registers. In this case, the counter rate R=31 (i.e. adjust
every sample), and the adjustment is E-=8.
The simplest method of envelope control ("Direct Gain") is available
when VxADSR1 bit 7 and VxGAIN bit 7 are both clear. In this case, the
volume envelope is simply E=%GGGGGGG0000, and R does not matter.
The second method ("Gain", usually with one of the 4 names below) is
available when VxADSR1 bit 7 is clear, but VxGAIN bit 7 is set. In
this case, we have 4 options, chosen based on the 'm' bits.
00 = Linear Decrease. R=g, E-=32
01 = Exp Decrease.
R=g, E-=((E-1)>>8)+1
10 = Linear Increase. R=g, E+=32
11 = Bent Increase. R=g, E+=(E<0x600)?32:8
In all cases, clip E to 0 or 0x7FF rather than wrapping.
The most complex method ("ADSR") is used when VxADSR1 bit 7 is 1. You
can think of this method as loading VxGAIN with different values at
different times based on the value of the volume envelope. VxGAIN is
not actually altered, however.
Attack: If aaaa == %1111, R=31 and E+=1024. Otherwise, pretend
VxGAIN = %110aaaa1. In either case, when E exceeds 0x7FF
(before clamping) enter the Decay state.
Decay: Pretend VxGAIN = %1011ddd0. When the upper 3 bits of E equal
the Sustain Level (see above), enter the Sustain state.
Sustain: Pretend VxGAIN = %101rrrrr.
CRITICAL NOTE: These updates happen even when ADSR mode is not selected.
These registers are actually used to update the envelope every sample.
The calculated value is used as follows:
1. If the counter specifies the envelope is to be updated, the
envelope is set to the new value, clamped to 11 bits.
2. If the mode is Decay and the Sustain Level is matched, change to
the Sustain state.
3. If the mode is Attack and the new value is greater than 0x7FF,
change to the Decay state. CRITICAL NOTE: Negative values also
trigger this.
4. Save the new value, *pre-clamp*, to determine the
increment for GAIN Bent Increase mode's next sample.

VxADSR1 is accessed during voice processing step S2, cycles:
V0:21 V1:0 V2:3 V3:6 V4:9 V5:12 V6:15 V7:18
VxADSR2 and VxGAIN are accessed during voice processing step S3c,
cycles:
V0:30 V1:1 V2:4 V3:7 V4:10 V5:13 V6:16 V7:19
$x8 r- VxENVX - Current envelope value for Voice X
0eeeeeee
This returns the high 7 bits of the current volume envelope value
(IOW, E>>4) for this voice. Note that the high bit will always be 0.
Also note that (obviously) there is no way to directly determine the
low 4 bits unless you're using Direct Gain.
Technically, this register IS writable. But whatever value you write
will be overwritten at 32000 Hz.
VxENVX is updated during voice processing step S9, cycles:
V0:4 V1:7 V2:10 V3:13 V4:16 V5:19 V6:22 V7:25
However, a write by the SMP to this register up to 2 cycles earlier will
overwrite the DSP's updated value.
$x9 r- VxOUTX - Current sample value for Voice X
oooooooo
This returns the high byte of the current sample for this voice, after
envelope volume adjustment but before VxVOL[LR] is applied.
Technically, this register IS writable. But whatever value you write
will be overwritten at 32000 Hz.
VxOUTX is updated during voice processing step S8, cycles:
V0:3 V1:6 V2:9 V3:12 V4:15 V5:18 V6:21 V7:24
However, a write by the SMP to this register up to 2 cycles earlier will
overwrite the DSP's updated value.
Now, the general-purpose registers:
$0c rw MVOLL - Left channel master volume
$1c rw MVOLR - Right channel master volume
vvvvvvvv
These are the master volumes of the left/right stereo channel. The
value is 2's-complement, negative values invert the phase of the
channel. This is the adjustment applied to the mixed 16-bit stereo
sample output of all 8 voices.
Volume adjustment is:
ML = (int16_t)((SL * VL)>>7)
MR = (int16_t)((SR * VR)>>7)
MVOLL is accessed during cycle 26.
MVOLR is accessed during cycle 27.
$2c rw EVOLL - Left channel echo volume
$3c rw EVOLR - Right channel echo volume
vvvvvvvv

These are the echo volumes of the left/right stereo channel. The value
is 2's-complement, negative values invert the phase of the channel.
This is the adjustment applied to the FIR filter 16-bit output before
mixing with the main signal (after master volume adjustment).
Volume adjustment is:
EL = (int16_t)((SL * VL)>>7)
ER = (int16_t)((SR * VR)>>7)
EVOLL is accessed during cycle 26.
EVOLR is accessed during cycle 27.
$4c rw KON - Key on for all voices
$5c rw KOFF - Key off for all voices
76543210
Each bit of KON/KOFF corresponds to one voice.
Setting 1 to the KOFF bit will transition the voice to the Release
state. Thus, the envelope will decrease by 8 every sample (regardless
of the VxADSR and VxGAIN settings) until it reaches 0, where it will
stay until the next KON.
Writing 1 to the KON bit will set the envelope to 0, the state to
Attack, and will start the channel from the beginning (see DIR and
VxSRCN). Note that this happens even if the channel is already playing
(which may cause a click/pop), and that there are 5 'empty' samples
before envelope updates and BRR decoding actually begin.
These registers seem to be polled only at 16000 Hz, when every other
sample is due to be output. Thus, if you write two values in close
succession, usually but not always only the second value will have an
effect:
; assume KOFF = 0, but no voices playing
mov $f2, #$4c ; KON = 1 then KON = 2
mov $f3, #$01 ; -> *usually* only voice 2 is keyed on. If both are,
mov $f3, #$02 ; voice 1 will be *2* samples ahead rather than one.
and
; assume various voices playing
mov $f2, #$5c ; KOFF = $ff then KOFF = 0
mov $f3, #$ff
mov $f3, #$00 ; -> *usually* all voices remain playing
FLG bit 7, however, is polled every sample and polled for each voice.
These registers and FLG bit 7 interact as follows:
1. If FLG bit 7 or the KOFF bit for the channel is set, transition
to the Release state. If FLG bit 7 is set, also set the envelope
to 0.
2. If the 'internal' value of KON has the channel's bit set, perform
the KON actions described above.
3. Set the 'internal' value of KON to 0.
This has a number of consequences:
* KON effectively takes effect 'on write', even though a non-zero
value can be read back much later. KOFF and FLG.7, on the other
hand, exert their influence constantly until a new value is
written.
* Writing KON while KOFF or FLG.7 will not result in any samples
being output by the channel. The channel is keyed on, but it is
turned off again 2 samples later. Since there is a 5 sample delay

after KON before the channel actually beings processing, the net
effect is no output.
* However, if KOFF is cleared within 63 SPC700 cycles of the
KON write above, the channel WILL be keyed on as normal. If KOFF
is cleared betwen 64 and 127 SPC700 cycles later, the channel
MIGHT be keyed on with decreasing probability depending on how
many cycles before the KON/KOFF poll the KON write occurred.
* Setting both KOFF and KON for a channel will turn the channel
off much faster than just KOFF alone, since the KON will set the
envelope to 0. This can cause a click/pop, though.
KOFF and internal KON are accessed during cycle 30 every other sample.
Internal KON bits are cleared during cycle 29, just before KON is
accessed.
$6c rw FLG - Reset, Mute, Echo-Write flags and Noise Clock
rmennnnn
r = When set, the S-DSP "soft-resets" itself. Mostly, this seems to
mean the S-DSP acts as if KOFF=$ff and forces all envelopes to 0;
echo proccessing still continues, and any remaining echo data will
continue to echo and generate samples. You must clear the bit to
resume normal operation. See KON/KOFF for some details.
Note though that this bit is checked much more frequently than
KOFF.
m = When set, no sound will be output. Samples will still be decoded,
echos processed, and such; just no sounds will be output.
e = When set, the echo ring buffer (see ESA and EDL) will not be
written. Echo processing on the buffer will continue as normal,
just the buffer itself will not be updated and so the echo samples
will loop forever. In other words, the echo pointer is always moving
.
The only thing that changes is whether or not the writes themselves
occur.
nnnnn = Noise frequency. This is used with the global counter to
determine when to generate a new noise sample. Note that there is
only one noise source shared by all voices for which noise is
enabled (see NON).
On reset, this register seems to have a value resembling $E0, even
though this may not be read back. At least, 'r' is 'set' so we can't
key on any samples, 'e' is 'set' so the echo buffer is not being
updated, and 'm' is 'set' because even whatever static data is in the
echo buffer gives no sound. 'n' is '0', since the noise sample is
constant until this is set non-zero.
FLG bit 'r' is accessed during voice processing step S3c, cycles:
V0:30 V1:1 V2:4 V3:7 V4:10 V5:13 V6:16 V7:19
FLG bit 'e' is accessed during cycles 28 and 29.
FLG bits 'n' are accessed during cycle 30.
$7c r* ENDX - Voice end flags
76543210
When a BRR block with the end flag set is decoded in a voice, the
corresponding bit is set in this register. When the voice is keyed on

(successfully or not), the corresponding bit is cleared. Any write to
this register will clear ALL bits, no matter what value is written.
Note that the bit is set at the START of decoding the BRR block, not
at the end. Recall that BRR processing, and therefore the setting of
bits in this register, continues even for voices in the Release state.
On power on or reset, all bits are cleared.
ENDX is updated during voice processing step S7, cycles:
V0:2 V1:5 V2:8 V3:11 V4:14 V5:17 V6:20 V7:23
However, a write by the SMP to this register up to 2 cycles earlier will
overwrite the DSP's updated value.
$0d rw EFB - Echo feedback volume
vvvvvvvv
When echo buffer write is enabled, the FIR output will be adjusted by
this volume and mixed into the buffer. The value is 2's-complement,
negative values invert the phase of the signal.
Volume adjustment is:
E = (int16_t)(E * V)>>7.
EFB is accessed during cycle 26.
$2d rw PMON - Pitch modulation enable
7654321Each bit corresponds to the corresponding voice. When the bit is set,
the VxPITCH value will be adjusted by the output of the voice x-1. The
exact formula seems to be:
P = VxPITCH + (((OutX[x-1] >> 5) * VxPITCH) >> 10)
For the purposes of pitch adjustment, a voice not playing is all zeros
and thus has no effect on the pitch.
PMON is accessed during cycle 27.
$3d rw NON - Noise enable
76543210
Each bit corresponds to the corresponding voice. When the bit is set,
the samples produced by BRR decoding will not be used. Instead, the
output sample will be the current value of the noise generator (see
FLG).
The noise generator outputs a 15-bit noise sample.
The noise generator operation is as follows: On reset,
N=0x4000. Each update (see FLG), N=(N>>1)|(((N<<14)^(N<<13))&0x4000).
And the output noise sample at any point is N (after which is volume
adjustment then the left-shift to 'restore' the low bit).
Note that the noise sample is not affected by VxPITCH or PMON, but
VxPITCH and PMON still control the speed of BRR decoding and the
end-without-loop of BRR decoding will still transition to Release (and
update ENDX).
NON is accessed during cycle 28.

$4d rw EON - Echo enable
76543210
Each bit corresponds to the corresponding voice. When the bit is set
and echo buffer write is enabled, this voice will be mixed into the
sample to be written to the echo buffer for later echo processing.
EON is accessed during cycle 28.
$5d rw DIR - Sample table address
aaaaaaaa
This forms the high byte of the start address of the sample pointer
table (the low byte is always 0). The sample pointer table is indexed
for each voice by VxSRCN to determine which BRR data to decode and
play.
Each entry is 4 bytes. The first word points to the start of the BRR
data, and the second word points to the 'restart' point for when the
BRR end block is reached. These are referred to as the Source Start
Address (SA) and the Source Loop Start Addres (LSA), respectively.
Changing this while voices are playing will have no immediate effect,
but when any voice afterwards loops or is keyed on it will use the new
table.
DIR is accessed during cycle 28.
$6d rw ESA - Echo ring buffer address
aaaaaaaa
This forms the high byte of the start address of the echo ring buffer
(the low byte is always 0). When echo buffer write is enabled in FLG,
all voices marked in EON will be mixed together, mixed with the FIR
output (adjusted by the echo feedback volume), and output into the
ring buffer (4 bytes, 2 per stereo channel). And every sample, one
entry (4 bytes) will be removed from the ring buffer and passed into
the FIR filter.
The size of the buffer is controlled by EDL. The echo buffer will wrap
within 16 bits, if the ESA and EDL values combine to specify a buffer
that would go beyond address $FFFF.
Note that the register is accessed 32 cycles before the value is used
for a write; at a sample level, this causes writes to appear to be
delayed by at least a full sample before taking effect.
ESA is accessed during cycle 29.
$7d rw EDL - Echo delay (ring buffer size)
----dddd
This controls the size of the echo ring buffer, and therefore the
delay between when a sample is first output and when it enters the
echo FIR filter. The size of the buffer is simply D<<11 bytes (D<<9
16-bit stereo samples), however when D=0 the buffer is 4 bytes (1
16-bit stereo sample) rather than 0.
Note that only the low 4 bits are used to determine the buffer length.

The register value is only used under certain conditions:
* Write the echo buffer at sample 'idx' (cycles 29 and 30)
* If idx==0, set idx_max = EDL<<9
(cycle 30-ish)
* Increment idx. If idx>=idx_max, idx=0 (cycle 30-ish)
This means that it can take up to .24s for a newly written value to
actually take effect, if the old value was 0x0f and the new value is
written just after the cycle 30 in which buffer index 0 was written.
EDL is accessed during cycle 29.
$xf rw FFCx - Echo FIR Filter Coefficient (FFC) X
cccccccc
These 8 registers specify the 8 2s-complement coefficients of the 8-tap
FIR filter
used to calculate the echo signal. Each time a sample is generated by
the voices, one sample is taken from the echo ring buffer and input to
the FIR filter (this is S(x)). The FIR filter output is then mixed
with the outputs of the voices to generate the output sound, and mixed
with the sample being input into the echo buffer for echo feedback.
Note that the echo buffer contains 15-bit samples left-aligned within
the 16-bit word, so the 16-bit value read must be right-shifted by one
bit to get the proper 15-bit S(x). The internal calculations, however,
are done in 16 bits with the final output of the FIR being a 16-bit valu
e.
The FIR formula is:
// The value is clipped when mixing samples x-1 to x-7:
FIR = (int16)(S(x-7) * FFC0 >> 6 // oldest sample
+ S(x-6) * FFC1 >> 6
+ S(x-5) * FFC2 >> 6
+ S(x-4) * FFC3 >> 6
+ S(x-3) * FFC4 >> 6
+ S(x-2) * FFC5 >> 6
+ S(x-1) * FFC6 >> 6);
// We have overflow detection when adding the most recent sample
// only:
FIR = clamp16(FIR + S(x-0) * FFC7 >> 6); // newest sample
// Finally, mask of the LSbit to get the final 16-bit result:
FIR = FIR & ~1;
Note that the left and right stereo channels are filtered separately
(no crosstalk), but with identical coefficients.
FFC0 is accessed during cycle 22.
FFC1 and FFC2 are accessed during cycle 23.
FFC3, FFC4, and FFC5 are accessed during cycle 24.
FFC6 and FFC7 are accessed during cycle 25.
The echo buffer left channel is read during cycle 22, and written
during cycle 29.
The echo buffer right channel is read during cycle 23, and written
during cycle 30.