martindevans/Dissonance Latency.md

## Dissonance Latency.md

      
    Raw
  

              Dissonance Latency.md
            
          
    Note: This was written as a response for a specific Dissonance customer who had applied some tweaks:

This is written assuming Tiny Frames (i.e. 10ms), these can only be used in a LAN setting.
Jitter buffer delay in Dissonance is normally a minimum of 50ms, that's not taken into account here.

All delays here are worst case, the average case will be about half that!


Unity Mic buffer. Unity buffers up mic data, we read all buffered data every frame. So there's up to 1 unity frame of latency there. (~16ms)
Dissonance Mic buffer. We copy the mic data into our own buffer and dispatch a frame to the preprocessor when there's 1 frame of audio. Up to 1 audio frame or 1 unity frame of latency there. The two buffers work closely together and overall I would guess about 10ms of latency here. (~10ms)


Preprocessor input buffer. Frames get put into a preprocessor input queue, the preprocessor thread grabs them and processes them. This should introduce a very small amount of latency (~1ms).
Preprocessor intermediate buffer. The preprocessor has to resample frames of input data to 48kHz. This can introduce a very small amount of latency if your mic is operating at a weird rate/frame size. Probably not affecting you.
Encoder buffer. Resizes preprocessor frames (always 10ms) to encoder frames. Since that's also 10ms in your case that won't introduce any latency.


Network send queue. Encoded audio is put into a queue in the network system and is sent on the Unity main thread next frame. Up to 1 frame (~16ms) of latency here.
Server relay delay. The server is operating entirely on the main thread, it sends and receives packets as soon as they are delivered, so there should be no latency here except due to the underlying network integration buffers.
Receiver delay. Packets are delivered by your network system on the main thread, so there's potentially 1 frame of latency in the network system (~16ms)


Playback buffer. Packets are put into the jitter buffer which is automatically sized based on latency and jitter (delay is 2.5 x estimated jitter). Even assuming you have really dreadful jitter with 20ms ping you're unlikely to get more than 20ms of delay here, more likely it's sat at the minimum of 10ms all the time.
Playback system. Reads buffered audio and plays it back. Playback speed is adaptive (if the jitter buffer is over/undersized it will speed up/slow down playback). This doesn't really introduce any extra latency, but it's worth knowing about if you try to measure total delay from recording->playback and get results which don't make sense!
Hardware playback. Unity has default DSP buffers of 20ms. That's tweakable but you probably don't want to fiddle with it (it's incredibly expensive in terms of CPU time to make DSP buffers smaller).


Summary:

Mic ~24ms*
Preprocessor ~3ms
Network ~32ms + latency
Playback adaptive, most likely ~10ms**

*Unity has a buffer of audio before it puts the recorded Audio into a buffer that Dissonance can access (hardware buffer, audio driver buffer, Unity audio clip internals etc). We have no easy way of measuring that latency but it appears to be fairly significant - around 100ms?

**Unity has a 20ms DSP buffer which comes after Dissonance has played audio.