pervognsen/mosh_ssp_vs_quake3.txt

## mosh_ssp_vs_quake3.txt
The Mosh and Quake 3 Networking Models and State Synchronization Algebra
========================================================================

Mosh is a new remote shell program and protocol: https://mosh.org/

You may read technical details about its internals here:

https://www.usenix.org/system/files/login/articles/winstein.pdf
https://mosh.org/mosh-paper.pdf

This is a real break with the history of remote terminals, all the way from ARPANET's TELNET through SSH.
The basic premise of these traditional protocols is to emulate a serial terminal link over a packet-switched
network. Before the ARPANET, leased telephone lines were used to facilitate remote terminals, and indeed one
of the motivations for the ARPANET was to enable the remote use of large timesharing systems without such
dedicated point-to-point leased lines (leased lines were still used for IMP-to-IMP connectivity in the ARPANET,
but they were treated as a carrier for time-multiplexed packets rather than being circuit switched, so a
pair of hosts that wanted to communicate didn't need a dedicated leased line between them).

What Mosh does differently (among other things) is that it doesn't try to emulate an end-to-end serial link.
The virtual serial link only exists between the server-side Mosh host software and the server-side operating
system, not between the client-side endpoint and the server-side operating system. That is, Mosh runs its
own server-side terminator emulator, maintaining the terminal screen state, and so on, on behalf of the client.
It then synchronizes this screen state (and other relevant terminal state like the echo mode) to the client,
and the client renders the synchronized screen state locally.

You can imagine a simple version of this general idea where the client-side software is entirely terminal agnostic
like VNC or other remote desktop software and just sits there receiving and rendering pixels. And indeed, nothing
prevents you from running xterm over VNC, or running a local X server with a remote xterm as the X client.

But obviously Mosh is a lot smarter than that. As a first observation, a terminal is a text-mode display,
not a bitmap display, and synchronizing the display state pixel by pixel is not the right level of abstraction.
In addition, there is non-graphical state like the echo mode which is important for the client to be able to do
latency-tolerant client-side prediction (e.g. telnet's linemode option, which SSH does not support). And if the
order of business is synchronizing structured data rather than linear data streams, you can do better than TCP.

The UDP-based state synchronization protocol that was invented for Mosh is called SSP. As far as I can tell,
it's not defined as a separate entity from Mosh, and it's not currently used by any other protocols or software.

SSP's design will look uncannily familiar to game programmers. It's almost identical to the Quake 3 Arena
networking model. You may read more about it here:

http://trac.bookofhook.com/bookofhook/trac.cgi/wiki/Quake3Networking
http://fabiensanglard.net/quake3/network.php

In Mosh, state updates are tagged with a monotonic sequence number. States are transmitted as diffs
relative to states previously acknowledged by the receiver. This means the sender needs to maintain a window
of past states. Once the received has acknowledged a state, the sender throws away any older states in its window.
When the sender has seen an acknowledgement from the receiver, it will send its own kind of ack in an
upcoming packet with the so-called throwaway sequence number, which the receiver uses to shrink its window of
reference states.

Client-side input is sent to the server using SSP as well. Here the abstract state object to be synchronized
is the full session input history, which grows in linear append-only fashion. The diff between any two such
input states is just the suffix of input keystrokes that are in the newer state which were not in the older
state. While game coders might not think of client input transmission as a state synchronization problem,
they do tend to take the exact same approach to the problem as I just described, going back to Tribes 1
and probably earlier: http://www.pingz.com/wordpress/wp-content/uploads/2009/11/tribes_networking_model.pdf

One interesting thing is how Mosh/SSP uses both sender and receiver acks to truncate reference states. In
the input history synchronization example, there's obviously no point in maintaining a copy of the full
input history once the receiver has acknowledged reception up to a certain point. At that point you just
want to truncate the acknowledged prefix. While this is a simple and practical notion, it's interesting
to consider what it means at the state object level. This is not something that the Mosh paper talks about,
but I believe it is implicit in the Mosh implementation, and being a math nerd I couldn't help but notice it:

State synchronization implicitly relies on an abstract algebra of states!

States have an ordering: A <= B, where A and B are partial states. In Mosh and in most other games I can think of,
the ordering is linear via the tagging of states with update sequence numbers or timestamps. But partial orderings are
actually more natural for a lot of structured data, where you may think of it as an information ordering where A <= B
asserts that B contains as much or more information than A, so B strictly subsumes A. The ordering is partial
when neither A or B contains strictly more information than the other. For example, with a BitTorrent-like protocol,
if packet A contains fragments 1 and 2 and packet B contains fragments 2 and 3, then neither is subsumed by the other.

Along with this ordering on states, we have operations on states. If D represents a diff from A to B, we may
write A + D = B. We can also turn this around and compute D given A and B as D = B - A. Idempotency of state
updates is represented by the absorption law (A + D) + D = A + D.

There is also a 0 element. Like the 0 you know and love, this has the property that A + 0 = A for all A.
And its interpretation here is very simple: 0 is an empty diff.

The ordering and operations are tied together in a simple way: A <= A + B for all A and B.

Rather than thinking of addition as applying a diff to a state, it's better to think of it as merging information
in a symmetric way that does not distinguish the state object from the diff object.

Going back to our discussion of truncating input histories, the important observation with respect to this
algebraic framework is that truncating is just subtracting the older, acknowledged state from the newer state.
And this is just what D = B - A does. So, D can not only be regarded as a diff, i.e. a recipe for transforming
A into B, but as a partial, truncated state. And this is useful because we can use D to generate diffs relative
to A without keeping a full copy of A around. For example, if we want to compute D' = C - A, we may do as follows:
D' = C - A = (C - B) + (B - A) = (C - B) + D, where C - B is the diff from B to C. This is a sum of diffs! But
these confusing distinctions disappear when you just think of all these things, whether A, B, C or any of their
differences and sums, as objects of the same kind. Then B - A is no conceptually different than D' - D even though
the latter might seem like a diff of diffs from our original vantage point.

In the case of input histories, this is very simple since both input histories and diffs between input histories
are sequences of keystrokes (however, if we subtract sequences without a common base, we get disjoint intervals).
For entity states in a game, it's not quite as simple but it's pretty close: our objects will be partial entity
states, which may or may not contain a defined value for an entity field. Then a complete entity state is just
a partial entity state with all its fields fully defined. An entity diff is the partial entity state that specifies
the values of only those fields that are different between the entities. A diff between diffs specifies the fields
that are different between the diffs. So you can see that diffing is done uniformly whether we're diffing entities
or diffs (in fact, the distinction is erased at this level).

This looks a lot like the algebraic structure known as a lattice. In general, you can't unconditionally subtract
elements in a lattice like we're doing here. That relies on the ordering being linear rather than just partial.
I haven't worked out all the details, but at least the basic algebraic picture provides some useful insights,
and I thought it was worth sharing them even in a preliminary form.

At a practical implementation level, you can benefit from this framework by implementing these abstract things
like addition, subtraction and ordering in your code, making sure they obey the right algebraic laws. You can
see what Mosh does here with its subtract, init_diff, diff_from and apply_string operations:

https://github.com/mobile-shell/mosh/blob/master/src/statesync/completeterminal.cc
https://github.com/mobile-shell/mosh/blob/master/src/statesync/user.cc

This is not quite as clean as the mathematical model I presented, particularly because diff_from and
apply_string operate with partial states serialized as strings rather than with state objects per se, but
seeing this code is what led to the above realizations, so all credit is due to them.
	The Mosh and Quake 3 Networking Models and State Synchronization Algebra
	========================================================================

	Mosh is a new remote shell program and protocol: https://mosh.org/

	You may read technical details about its internals here:

	https://www.usenix.org/system/files/login/articles/winstein.pdf
	https://mosh.org/mosh-paper.pdf

	This is a real break with the history of remote terminals, all the way from ARPANET's TELNET through SSH.
	The basic premise of these traditional protocols is to emulate a serial terminal link over a packet-switched
	network. Before the ARPANET, leased telephone lines were used to facilitate remote terminals, and indeed one
	of the motivations for the ARPANET was to enable the remote use of large timesharing systems without such
	dedicated point-to-point leased lines (leased lines were still used for IMP-to-IMP connectivity in the ARPANET,
	but they were treated as a carrier for time-multiplexed packets rather than being circuit switched, so a
	pair of hosts that wanted to communicate didn't need a dedicated leased line between them).

	What Mosh does differently (among other things) is that it doesn't try to emulate an end-to-end serial link.
	The virtual serial link only exists between the server-side Mosh host software and the server-side operating
	system, not between the client-side endpoint and the server-side operating system. That is, Mosh runs its
	own server-side terminator emulator, maintaining the terminal screen state, and so on, on behalf of the client.
	It then synchronizes this screen state (and other relevant terminal state like the echo mode) to the client,
	and the client renders the synchronized screen state locally.

	You can imagine a simple version of this general idea where the client-side software is entirely terminal agnostic
	like VNC or other remote desktop software and just sits there receiving and rendering pixels. And indeed, nothing
	prevents you from running xterm over VNC, or running a local X server with a remote xterm as the X client.

	But obviously Mosh is a lot smarter than that. As a first observation, a terminal is a text-mode display,
	not a bitmap display, and synchronizing the display state pixel by pixel is not the right level of abstraction.
	In addition, there is non-graphical state like the echo mode which is important for the client to be able to do
	latency-tolerant client-side prediction (e.g. telnet's linemode option, which SSH does not support). And if the
	order of business is synchronizing structured data rather than linear data streams, you can do better than TCP.

	The UDP-based state synchronization protocol that was invented for Mosh is called SSP. As far as I can tell,
	it's not defined as a separate entity from Mosh, and it's not currently used by any other protocols or software.

	SSP's design will look uncannily familiar to game programmers. It's almost identical to the Quake 3 Arena
	networking model. You may read more about it here:

	http://trac.bookofhook.com/bookofhook/trac.cgi/wiki/Quake3Networking
	http://fabiensanglard.net/quake3/network.php

	In Mosh, state updates are tagged with a monotonic sequence number. States are transmitted as diffs
	relative to states previously acknowledged by the receiver. This means the sender needs to maintain a window
	of past states. Once the received has acknowledged a state, the sender throws away any older states in its window.
	When the sender has seen an acknowledgement from the receiver, it will send its own kind of ack in an
	upcoming packet with the so-called throwaway sequence number, which the receiver uses to shrink its window of
	reference states.

	Client-side input is sent to the server using SSP as well. Here the abstract state object to be synchronized
	is the full session input history, which grows in linear append-only fashion. The diff between any two such
	input states is just the suffix of input keystrokes that are in the newer state which were not in the older
	state. While game coders might not think of client input transmission as a state synchronization problem,
	they do tend to take the exact same approach to the problem as I just described, going back to Tribes 1
	and probably earlier: http://www.pingz.com/wordpress/wp-content/uploads/2009/11/tribes_networking_model.pdf

	One interesting thing is how Mosh/SSP uses both sender and receiver acks to truncate reference states. In
	the input history synchronization example, there's obviously no point in maintaining a copy of the full
	input history once the receiver has acknowledged reception up to a certain point. At that point you just
	want to truncate the acknowledged prefix. While this is a simple and practical notion, it's interesting
	to consider what it means at the state object level. This is not something that the Mosh paper talks about,
	but I believe it is implicit in the Mosh implementation, and being a math nerd I couldn't help but notice it:

	State synchronization implicitly relies on an abstract algebra of states!

	States have an ordering: A <= B, where A and B are partial states. In Mosh and in most other games I can think of,
	the ordering is linear via the tagging of states with update sequence numbers or timestamps. But partial orderings are
	actually more natural for a lot of structured data, where you may think of it as an information ordering where A <= B
	asserts that B contains as much or more information than A, so B strictly subsumes A. The ordering is partial
	when neither A or B contains strictly more information than the other. For example, with a BitTorrent-like protocol,
	if packet A contains fragments 1 and 2 and packet B contains fragments 2 and 3, then neither is subsumed by the other.

	Along with this ordering on states, we have operations on states. If D represents a diff from A to B, we may
	write A + D = B. We can also turn this around and compute D given A and B as D = B - A. Idempotency of state
	updates is represented by the absorption law (A + D) + D = A + D.

	There is also a 0 element. Like the 0 you know and love, this has the property that A + 0 = A for all A.
	And its interpretation here is very simple: 0 is an empty diff.

	The ordering and operations are tied together in a simple way: A <= A + B for all A and B.

	Rather than thinking of addition as applying a diff to a state, it's better to think of it as merging information
	in a symmetric way that does not distinguish the state object from the diff object.

	Going back to our discussion of truncating input histories, the important observation with respect to this
	algebraic framework is that truncating is just subtracting the older, acknowledged state from the newer state.
	And this is just what D = B - A does. So, D can not only be regarded as a diff, i.e. a recipe for transforming
	A into B, but as a partial, truncated state. And this is useful because we can use D to generate diffs relative
	to A without keeping a full copy of A around. For example, if we want to compute D' = C - A, we may do as follows:
	D' = C - A = (C - B) + (B - A) = (C - B) + D, where C - B is the diff from B to C. This is a sum of diffs! But
	these confusing distinctions disappear when you just think of all these things, whether A, B, C or any of their
	differences and sums, as objects of the same kind. Then B - A is no conceptually different than D' - D even though
	the latter might seem like a diff of diffs from our original vantage point.

	In the case of input histories, this is very simple since both input histories and diffs between input histories
	are sequences of keystrokes (however, if we subtract sequences without a common base, we get disjoint intervals).
	For entity states in a game, it's not quite as simple but it's pretty close: our objects will be partial entity
	states, which may or may not contain a defined value for an entity field. Then a complete entity state is just
	a partial entity state with all its fields fully defined. An entity diff is the partial entity state that specifies
	the values of only those fields that are different between the entities. A diff between diffs specifies the fields
	that are different between the diffs. So you can see that diffing is done uniformly whether we're diffing entities
	or diffs (in fact, the distinction is erased at this level).

	This looks a lot like the algebraic structure known as a lattice. In general, you can't unconditionally subtract
	elements in a lattice like we're doing here. That relies on the ordering being linear rather than just partial.
	I haven't worked out all the details, but at least the basic algebraic picture provides some useful insights,
	and I thought it was worth sharing them even in a preliminary form.

	At a practical implementation level, you can benefit from this framework by implementing these abstract things
	like addition, subtraction and ordering in your code, making sure they obey the right algebraic laws. You can
	see what Mosh does here with its subtract, init_diff, diff_from and apply_string operations:

	https://github.com/mobile-shell/mosh/blob/master/src/statesync/completeterminal.cc
	https://github.com/mobile-shell/mosh/blob/master/src/statesync/user.cc

	This is not quite as clean as the mathematical model I presented, particularly because diff_from and
	apply_string operate with partial states serialized as strings rather than with state objects per se, but
	seeing this code is what led to the above realizations, so all credit is due to them.