My previous symmetric routing proposal entailed building a whole reverse routing envelope into each packet. The requester would have to specify some information about the whole path before sending the packet. As ~master-morzod pointed out to me, such a design does not make for a good bottom-up, self-assembling network.
In contrast, in this proposal, each relay node only needs to know the previous node and the next, not the entire multi-hop route. The basic idea is that when a ship receives a request packet, it figures out where the next hop in the forwarding chain for that packet should be, forwards the packet to that location, and remembers the IP and port of the incoming request packet for thirty seconds, so if it hears a response packet, it can pass that response back to the previous relay.
This means that every request must be satisfied within thirty seconds, but that's a workable constraint. Note that many HTTP systems operate under this constraint. This arrangement is also similar to NDN's "pending interests" table. An NDN "interest" packet corresponds almost exactly to a Fine scry request packet, and stateful Ames request packets can behave the same with respect to routing.
In order for this to work, each node has to have some idea of the next node to try. If the node has a direct route to the receiver ship, it should send the packet there. Otherwise, if it knows the ship's sponsor, it should try that, falling back up the sponsorship chain potentially all the way to the receiver's galaxy.
The next question is how routes tighten, which requires information about lanes for ships later in the relay chain to propagate backward to ships earlier in the relay chain. A maximally tight route is a direct route between the sender ship and the receiver ship.
Each ship in the relay chain, when forwarding a response packet backward through the route, can overwrite a "next" field in the response packet to contain the relay just past it. This enables the previous relay to try the "next" relay on the next request packet, skipping the relay in between.
One simple rubric is that a ship should try the tightest known-good route and the next tightest route simultaneously until the tighter route has been demonstrated to be viable. This way, in case the tighter route turns out not to work (usually because it's behind a firewall and not publicly addressable), the looser route will still get the packet through.
A ship should only overwrite the "next" field when forwarding a response back to a ship that is not one of its direct sponsors (this precludes fraternization from galaxy directly to planet or moon, but I think that's ok). The following example shows how a route tightens over multiple packet roundtrips.
request: ~rovnys -> ~syl -> ~sipsyl -> ~ritpub-sipsyl -> ~mister-ritpub-sipsyl
response: ~rovnys <- ~syl <- ~sipsyl <- ~ritpub-sipsyl <- ~mister-ritpub-sipsyl
::
:: next: ~sipsyl@123.456.789:1234
request: ~rovnys -> ~sipsyl -> ~ritpub-sipsyl -> ~mister-ritpub-sipsyl
response: ~rovnys <- ~sipsyl <- ~ritpub-sipsyl <- ~mister-ritpub-sipsyl
::
:: next: ~ritpub-sipsyl@234.567.890:2345
request: ~rovnys -> ~ritpub-sipsyl -> ~mister-ritpub-sipsyl
response: ~rovnys <- ~ritpub-sipsyl <- ~mister-ritpub-sipsyl
::
:: next: ~mister-ritpub-sipsyl@345.678.901:3456
request: ~rovnys -> ~mister-ritpub-sipsyl
response: ~rovnys <- ~mister-ritpub-sipsyl
::
:: next: ~
As an aside, note that while this example was written using full Urbit ships as relays, this routing procedure would also allow any kind of program that speaks the routing protocol and knows the lanes of ships to act as a relay node, even if it does not have an Urbit address (although the system would need a new way to learn about these non-ship relays).
In order for this procedure to work, each relay must maintain a data structure:
+$ state
$: pending=(map request [=lane expiry=@da])
sponsees=(map ship lane)
==
+$ request
$% [%ames sndr=ship rcvr=ship]
[%scry rcvr=ship =path]
==
The expiry
field is set to 30 seconds from now whenever we hear a request. Repeated requests bump the timeout, keeping the request alive.
Ok so there is a keepalive, but not a very scalable one. Each ping to a sponsor is an Ames message. I think this scales well enough for sponsorship, but would not for userspace publications. We agree that keepalives should be handled by the publisher's king in practice, which means publication keepalives can't impact the publisher's event log.
Unsolicited "push" packets, like a heartbeat response to a request that isn't sent, seems a bit off to me in a symmetric routing world. The subscriber should be pinging the publisher and getting acks, so that a) responsibility lies with the subscriber to trigger a response from the publisher, and b) the subscriber is continually telling the publisher where it is, so the publisher knows where to send responses.
As for your followup, the way we've designed the notifications protocol, notifications are optional. A simple runtime might not implement them. So a publisher might not push updates to you if it has such a simple runtime, in which case you need to keep re-sending scry requests indefinitely.
I suppose we could add an ack packet that says "I heard your request and will store it until Vere restarts, so you will get a response eventually if I know where you are." Then if we (the subscriber) receive one of these, we could almost switch to just sending keepalive packets, whose responses should include a nonce representing the Vere process instance.
The almost is because there's no way to know if a response packet was sent but lost over the wire. We would either need to have the subscriber ack the response, or have the subscriber retry the request on a timer until the data is delivered.
So I'm not convinced there's a better short-term solution than just retrying subscription-related scry requests every thirty seconds or so, with an added repeated keepalive scry request that gets a response each time as long as the publisher is online.
Long-term, I think we could implement something like your proposal, where subscribers could ask once for a datum, get an ack that the publisher will eventually send you the datum, then stop asking -- but only if we had TCP-style sessions in a dedicated king-to-king protocol, as I described in the Nan Madol proposal. To get that kind of behavior with the kind of performance we want, you need notifications-related acks to be stored in transient king state, not the event log.