Last active
February 9, 2021 05:44
-
-
Save pervognsen/91474ed5f6a81fa01566e63d928e1d27 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The Mosh and Quake 3 Networking Models and State Synchronization Algebra | |
======================================================================== | |
Mosh is a new remote shell program and protocol: https://mosh.org/ | |
You may read technical details about its internals here: | |
https://www.usenix.org/system/files/login/articles/winstein.pdf | |
https://mosh.org/mosh-paper.pdf | |
This is a real break with the history of remote terminals, all the way from ARPANET's TELNET through SSH. | |
The basic premise of these traditional protocols is to emulate a serial terminal link over a packet-switched | |
network. Before the ARPANET, leased telephone lines were used to facilitate remote terminals, and indeed one | |
of the motivations for the ARPANET was to enable the remote use of large timesharing systems without such | |
dedicated point-to-point leased lines (leased lines were still used for IMP-to-IMP connectivity in the ARPANET, | |
but they were treated as a carrier for time-multiplexed packets rather than being circuit switched, so a | |
pair of hosts that wanted to communicate didn't need a dedicated leased line between them). | |
What Mosh does differently (among other things) is that it doesn't try to emulate an end-to-end serial link. | |
The virtual serial link only exists between the server-side Mosh host software and the server-side operating | |
system, not between the client-side endpoint and the server-side operating system. That is, Mosh runs its | |
own server-side terminator emulator, maintaining the terminal screen state, and so on, on behalf of the client. | |
It then synchronizes this screen state (and other relevant terminal state like the echo mode) to the client, | |
and the client renders the synchronized screen state locally. | |
You can imagine a simple version of this general idea where the client-side software is entirely terminal agnostic | |
like VNC or other remote desktop software and just sits there receiving and rendering pixels. And indeed, nothing | |
prevents you from running xterm over VNC, or running a local X server with a remote xterm as the X client. | |
But obviously Mosh is a lot smarter than that. As a first observation, a terminal is a text-mode display, | |
not a bitmap display, and synchronizing the display state pixel by pixel is not the right level of abstraction. | |
In addition, there is non-graphical state like the echo mode which is important for the client to be able to do | |
latency-tolerant client-side prediction (e.g. telnet's linemode option, which SSH does not support). And if the | |
order of business is synchronizing structured data rather than linear data streams, you can do better than TCP. | |
The UDP-based state synchronization protocol that was invented for Mosh is called SSP. As far as I can tell, | |
it's not defined as a separate entity from Mosh, and it's not currently used by any other protocols or software. | |
SSP's design will look uncannily familiar to game programmers. It's almost identical to the Quake 3 Arena | |
networking model. You may read more about it here: | |
http://trac.bookofhook.com/bookofhook/trac.cgi/wiki/Quake3Networking | |
http://fabiensanglard.net/quake3/network.php | |
In Mosh, state updates are tagged with a monotonic sequence number. States are transmitted as diffs | |
relative to states previously acknowledged by the receiver. This means the sender needs to maintain a window | |
of past states. Once the received has acknowledged a state, the sender throws away any older states in its window. | |
When the sender has seen an acknowledgement from the receiver, it will send its own kind of ack in an | |
upcoming packet with the so-called throwaway sequence number, which the receiver uses to shrink its window of | |
reference states. | |
Client-side input is sent to the server using SSP as well. Here the abstract state object to be synchronized | |
is the full session input history, which grows in linear append-only fashion. The diff between any two such | |
input states is just the suffix of input keystrokes that are in the newer state which were not in the older | |
state. While game coders might not think of client input transmission as a state synchronization problem, | |
they do tend to take the exact same approach to the problem as I just described, going back to Tribes 1 | |
and probably earlier: http://www.pingz.com/wordpress/wp-content/uploads/2009/11/tribes_networking_model.pdf | |
One interesting thing is how Mosh/SSP uses both sender and receiver acks to truncate reference states. In | |
the input history synchronization example, there's obviously no point in maintaining a copy of the full | |
input history once the receiver has acknowledged reception up to a certain point. At that point you just | |
want to truncate the acknowledged prefix. While this is a simple and practical notion, it's interesting | |
to consider what it means at the state object level. This is not something that the Mosh paper talks about, | |
but I believe it is implicit in the Mosh implementation, and being a math nerd I couldn't help but notice it: | |
State synchronization implicitly relies on an abstract algebra of states! | |
States have an ordering: A <= B, where A and B are partial states. In Mosh and in most other games I can think of, | |
the ordering is linear via the tagging of states with update sequence numbers or timestamps. But partial orderings are | |
actually more natural for a lot of structured data, where you may think of it as an information ordering where A <= B | |
asserts that B contains as much or more information than A, so B strictly subsumes A. The ordering is partial | |
when neither A or B contains strictly more information than the other. For example, with a BitTorrent-like protocol, | |
if packet A contains fragments 1 and 2 and packet B contains fragments 2 and 3, then neither is subsumed by the other. | |
Along with this ordering on states, we have operations on states. If D represents a diff from A to B, we may | |
write A + D = B. We can also turn this around and compute D given A and B as D = B - A. Idempotency of state | |
updates is represented by the absorption law (A + D) + D = A + D. | |
There is also a 0 element. Like the 0 you know and love, this has the property that A + 0 = A for all A. | |
And its interpretation here is very simple: 0 is an empty diff. | |
The ordering and operations are tied together in a simple way: A <= A + B for all A and B. | |
Rather than thinking of addition as applying a diff to a state, it's better to think of it as merging information | |
in a symmetric way that does not distinguish the state object from the diff object. | |
Going back to our discussion of truncating input histories, the important observation with respect to this | |
algebraic framework is that truncating is just subtracting the older, acknowledged state from the newer state. | |
And this is just what D = B - A does. So, D can not only be regarded as a diff, i.e. a recipe for transforming | |
A into B, but as a partial, truncated state. And this is useful because we can use D to generate diffs relative | |
to A without keeping a full copy of A around. For example, if we want to compute D' = C - A, we may do as follows: | |
D' = C - A = (C - B) + (B - A) = (C - B) + D, where C - B is the diff from B to C. This is a sum of diffs! But | |
these confusing distinctions disappear when you just think of all these things, whether A, B, C or any of their | |
differences and sums, as objects of the same kind. Then B - A is no conceptually different than D' - D even though | |
the latter might seem like a diff of diffs from our original vantage point. | |
In the case of input histories, this is very simple since both input histories and diffs between input histories | |
are sequences of keystrokes (however, if we subtract sequences without a common base, we get disjoint intervals). | |
For entity states in a game, it's not quite as simple but it's pretty close: our objects will be partial entity | |
states, which may or may not contain a defined value for an entity field. Then a complete entity state is just | |
a partial entity state with all its fields fully defined. An entity diff is the partial entity state that specifies | |
the values of only those fields that are different between the entities. A diff between diffs specifies the fields | |
that are different between the diffs. So you can see that diffing is done uniformly whether we're diffing entities | |
or diffs (in fact, the distinction is erased at this level). | |
This looks a lot like the algebraic structure known as a lattice. In general, you can't unconditionally subtract | |
elements in a lattice like we're doing here. That relies on the ordering being linear rather than just partial. | |
I haven't worked out all the details, but at least the basic algebraic picture provides some useful insights, | |
and I thought it was worth sharing them even in a preliminary form. | |
At a practical implementation level, you can benefit from this framework by implementing these abstract things | |
like addition, subtraction and ordering in your code, making sure they obey the right algebraic laws. You can | |
see what Mosh does here with its subtract, init_diff, diff_from and apply_string operations: | |
https://github.com/mobile-shell/mosh/blob/master/src/statesync/completeterminal.cc | |
https://github.com/mobile-shell/mosh/blob/master/src/statesync/user.cc | |
This is not quite as clean as the mathematical model I presented, particularly because diff_from and | |
apply_string operate with partial states serialized as strings rather than with state objects per se, but | |
seeing this code is what led to the above realizations, so all credit is due to them. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Yep, there's definitely a lot of domain theory lurking under the covers. And the entity structure I outlined would be a product domain. The value domain (assuming we don't differentiate by type) consists of integers, strings, etc. The partial value domain is the value domain augmented with bottom to represent an unspecified/unknown value. The domain of partial entity states with n fields is the n-fold product of the partial value domain. This picture of partial nested tuples/records also seems like a good foundation for the semantics of a language like GraphQL.
In programming language theory and denotational semantics, domains usually have a somewhat different conceptual meaning, where bottom represents an incomplete computation (either because you haven't taken the fixed-point limit yet or because you have taken the limit and the computation is divergent/non-terminating). Of course, "incomplete computation" and "unknown value" are totally compatible interpretations.