Skip to content

Instantly share code, notes, and snippets.

@belisarius222
Last active February 20, 2023 08:09
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save belisarius222/7ec6f40b3a498c38e696139d8dbd8b10 to your computer and use it in GitHub Desktop.
Save belisarius222/7ec6f40b3a498c38e696139d8dbd8b10 to your computer and use it in GitHub Desktop.
Symmetric, Local Routing Proposal

Symmetric Routing with Local State

My previous symmetric routing proposal entailed building a whole reverse routing envelope into each packet. The requester would have to specify some information about the whole path before sending the packet. As ~master-morzod pointed out to me, such a design does not make for a good bottom-up, self-assembling network.

In contrast, in this proposal, each relay node only needs to know the previous node and the next, not the entire multi-hop route. The basic idea is that when a ship receives a request packet, it figures out where the next hop in the forwarding chain for that packet should be, forwards the packet to that location, and remembers the IP and port of the incoming request packet for thirty seconds, so if it hears a response packet, it can pass that response back to the previous relay.

This means that every request must be satisfied within thirty seconds, but that's a workable constraint. Note that many HTTP systems operate under this constraint. This arrangement is also similar to NDN's "pending interests" table. An NDN "interest" packet corresponds almost exactly to a Fine scry request packet, and stateful Ames request packets can behave the same with respect to routing.

In order for this to work, each node has to have some idea of the next node to try. If the node has a direct route to the receiver ship, it should send the packet there. Otherwise, if it knows the ship's sponsor, it should try that, falling back up the sponsorship chain potentially all the way to the receiver's galaxy.

The next question is how routes tighten, which requires information about lanes for ships later in the relay chain to propagate backward to ships earlier in the relay chain. A maximally tight route is a direct route between the sender ship and the receiver ship.

Each ship in the relay chain, when forwarding a response packet backward through the route, can overwrite a "next" field in the response packet to contain the relay just past it. This enables the previous relay to try the "next" relay on the next request packet, skipping the relay in between.

One simple rubric is that a ship should try the tightest known-good route and the next tightest route simultaneously until the tighter route has been demonstrated to be viable. This way, in case the tighter route turns out not to work (usually because it's behind a firewall and not publicly addressable), the looser route will still get the packet through.

A ship should only overwrite the "next" field when forwarding a response back to a ship that is not one of its direct sponsors (this precludes fraternization from galaxy directly to planet or moon, but I think that's ok). The following example shows how a route tightens over multiple packet roundtrips.

request:  ~rovnys -> ~syl -> ~sipsyl -> ~ritpub-sipsyl -> ~mister-ritpub-sipsyl
response: ~rovnys <- ~syl <- ~sipsyl <- ~ritpub-sipsyl <- ~mister-ritpub-sipsyl
                  ::
                  :: next: ~sipsyl@123.456.789:1234

request:  ~rovnys -> ~sipsyl -> ~ritpub-sipsyl -> ~mister-ritpub-sipsyl
response: ~rovnys <- ~sipsyl <- ~ritpub-sipsyl <- ~mister-ritpub-sipsyl
                  ::
                  :: next: ~ritpub-sipsyl@234.567.890:2345

request:  ~rovnys -> ~ritpub-sipsyl -> ~mister-ritpub-sipsyl
response: ~rovnys <- ~ritpub-sipsyl <- ~mister-ritpub-sipsyl
                  ::
                  :: next: ~mister-ritpub-sipsyl@345.678.901:3456

request:  ~rovnys -> ~mister-ritpub-sipsyl
response: ~rovnys <- ~mister-ritpub-sipsyl
                  ::
                  :: next: ~

As an aside, note that while this example was written using full Urbit ships as relays, this routing procedure would also allow any kind of program that speaks the routing protocol and knows the lanes of ships to act as a relay node, even if it does not have an Urbit address (although the system would need a new way to learn about these non-ship relays).

In order for this procedure to work, each relay must maintain a data structure:

+$  state
  $:  pending=(map request [=lane expiry=@da])
      sponsees=(map ship lane)
  ==
+$  request
  $%  [%ames sndr=ship rcvr=ship]
      [%scry rcvr=ship =path]
  ==

The expiry field is set to 30 seconds from now whenever we hear a request. Repeated requests bump the timeout, keeping the request alive.

@eamsden
Copy link

eamsden commented Mar 30, 2022

Overall this is the iteration of request/response routing I'm most happy with so far.

How does the thirty-second timeout interact with push-based subscription-type protocols? If a request's "return address" is only remembered for thirty seconds, what if the interval between subscribed occurrences is longer than 30 seconds? Obviously this can be special-cased where the subscription is actually being published, but what about at relay nodes?

A ship should only overwrite the "next" field when forwarding a response back to a ship that is not one of its direct sponsors

Maybe I'm not following closely enough but why this restriction?

As an aside, note that while this example was written using full Urbit ships as relays, this routing procedure would also allow any kind of program that speaks the routing protocol and knows the lanes of ships to act as a relay node, even if it does not have an Urbit address (although the system would need a new way to learn about these non-ship relays).

I think this the wrong way to think about this. These should simply be moons, as far as network protocol and identity is concerned.
Whether or not we use a nock runtime and arvo to implement them is extensionally immaterial. It's not hard to imagine e.g. a C
(or Python or Rust or Haskell) program that has 0 knowledge of Nock but knows ames/the remote scry protocol, loads an azimuth keyfile, and keeps network peer state, and this should be just fine. Of course, a more minimal RouterArvo might also be viable here.

One way to "discover" these would be to have the equivalent of a "default gateway" or "proxy" configuration for a ship, where it presents all of its outgoing requests to such a moon, which then forwards them. One can also envision ARP-type protocols for discovering such peers on the local PHY, etc.

@belisarius222
Copy link
Author

How does the thirty-second timeout interact with push-based subscription-type protocols?

In this request-response world, there are no push-based subscriptions, at least not formally. Instead, a subscriber emits a remote scry request for the next subscription datum, repeatedly on a timer, until that path has been grown. The scry notifications protocol is a layer on top of this that lets a publisher push out the response immediately when the path is grown, without waiting for the next request ping.

So it's formally a pull-based subscription, with pushes as an optimization. The big problem I'm aware of with this is that since requests for future scry paths don't get responses, the subscriber might not be able to tell the difference between the publisher being offline and it just not having a new datum yet, which then leads to abandoning the tight route and falling back to the publisher's sponsoring galaxy. I discussed this problem at some length in this section of my previous symmetric routing proposal.

My general conclusion is that it would be good to somehow subscribe to the publisher's sponsor if the publisher's connectivity in question, but this is probably the biggest flaw in this system.

Whether or not we use a nock runtime and arvo to implement them is extensionally immaterial. It's not hard to imagine e.g. a C
(or Python or Rust or Haskell) program that has 0 knowledge of Nock but knows ames/the remote scry protocol, loads an azimuth keyfile, and keeps network peer state, and this should be just fine. Of course, a more minimal RouterArvo might also be viable here.

This is more or less how I'm thinking about it too, so I don't understand what you consider wrong here. Is it that in this system, you could have a relay without an urbit address, whereas you think they should all at least look like moons on the network? If so, I'm curious what you think is important about that constraint. It's not clear to me if it's good or bad.

@philipcmonk
Copy link

philipcmonk commented Mar 31, 2022

So it's formally a pull-based subscription, with pushes as an optimization.

Even as an optimization, it's not great that the optimization breaks any time the subscriber is behind a NAT.

@eamsden
Copy link

eamsden commented Mar 31, 2022

That answers my push-based-subscriptions

Whether or not we use a nock runtime and arvo to implement them is extensionally immaterial. It's not hard to imagine e.g. a C
(or Python or Rust or Haskell) program that has 0 knowledge of Nock but knows ames/the remote scry protocol, loads an azimuth keyfile, and keeps network peer state, and this should be just fine. Of course, a more minimal RouterArvo might also be viable here.

This is more or less how I'm thinking about it too, so I don't understand what you consider wrong here. Is it that in this system, you could have a relay without an urbit address, whereas you think they should all at least look like moons on the network?

That's more or less exactly it.

If so, I'm curious what you think is important about that constraint. It's not clear to me if it's good or bad.

Routing attacks become trivial if non-authenticated relays are permitted on the network. Azimuth already handles authentication for us.
Why introduce a separate non-authenticated identity type, or a separate identity type with a whole new kind of authentication?
It's trivial to issue a moon ID, interactively or programmatically.
It's not any harder to do Azimuth authentication in C than it is in Arvo (any more than it's harder to do anything in C than in Arvo).

In any case, making Urbit "aware" of non-Urbit entities is in any case contrary to the core Urbit ethos, and we should seek always to excise such awareness rather than introduce it. Even aesthetically, why should routing table keys look like (each @p relay-id) instead of just being @p?

If the argument is that relays should just be dumb repeaters and play no role in the tightening process, then they are just TURN servers. If they get to directly influence the routing behavior of any Urbit ship, they should have to sign their work.

@eamsden
Copy link

eamsden commented Mar 31, 2022

For any kind of subscription model, there would have to be some kind of "heartbeat" in order to determine if the publisher was still alive between events. You can't know things across the network without network traffic.

Nodes which already expect to relay requests/responses can also relay heartbeats, and, for tightening, can set the next field in heartbeat packets. So a subscriber can attempt to subscribe to heartbeats/push-based subscriptions one relay upstream of where it is currently hearing the heartbeat from, by sending its request directly to the relay specified in the heartbeat packet's next field. If it successfully receives some small n heartbeats from the tighter relay, it can send a request cancellation to the looser relay, resubscribing to the looser relay if it stops hearing heartbeats.

This would help alleviate galaxy loading by proactively establishing tightened routes for as many subscribers as possible. In particular, if stars are doing their job and running as high-uptime nodes on public IPs, then they would quickly undertake the load for any planets which aren't directly reachable by the publisher.

This is still strictly better than Ames for subscriptions because the only entity doing state updates / event log entries on every packet is the subscriber. Publishers and relays only have to update state on subscribe requests, which they need to do anyway to update the list of where to push to.

@belisarius222
Copy link
Author

belisarius222 commented Mar 31, 2022

So it's formally a pull-based subscription, with pushes as an optimization.

Even as an optimization, it's not great that the optimization breaks any time the subscriber is behind a NAT.

That doesn't break it, since the subscriber is sending requests repeatedly, which act as keepalives to keep the firewall pinhole open. Assuming the publisher is not NAT'd and the subscriber doesn't erroneously loosen the route, thinking the publisher is offline, this keeps open a pinhole directly to the publisher ship. Otherwise, the subscriber is still pinging the publisher's galaxy, keeping a pinhole open to that, and since each ship pings its sponsor, pinholes should be open from the publisher all the way up to its galaxy, allowing the subscription push to go up through the galaxy and out to the subscriber.

@belisarius222
Copy link
Author

belisarius222 commented Mar 31, 2022

@eamsden

Routing attacks become trivial if non-authenticated relays are permitted on the network. Azimuth already handles authentication for us.

True. Our relaying is not currently authenticated, but maybe it should be. One proposal I made a couple years ago, which was not popular but I still think is worth considering, is authenticating each ship-to-ship hop using an HMAC derived from the Diffie-Hellman of the networking keys of the two ships in the hop. I think to authenticate relaying in this proposal, the "next" field would also need to include an Urbit address, so you would know which ship is next, not just its IP and port.

I agree with your assessment of heartbeat packets, abstractly, but -- what does a heartbeat packet look like? It's not a concept that currently exists in Urbit. Would it be a king-to-king protocol, or is Arvo involved? Does it get acked? Is it part of an existing protocol like Ames or Fine, or is it a separate third protocol?

Nodes which already expect to relay requests/responses can also relay heartbeats, and, for tightening, can set the next field in heartbeat packets. So a subscriber can attempt to subscribe to heartbeats/push-based subscriptions one relay upstream of where it is currently hearing the heartbeat from, by sending its request directly to the relay specified in the heartbeat packet's next field. If it successfully receives some small n heartbeats from the tighter relay, it can send a request cancellation to the looser relay, resubscribing to the looser relay if it stops hearing heartbeats.

Yeah this sounds good.

Publishers and relays only have to update state on subscribe requests, which they need to do anyway to update the list of where to push to.

Yes. This hasn't been implemented, but relays could do their own scry notifications, demultiplexing from many subscribers to the single publisher, multiplexing subscription updates out from the publisher to many subscribers, and caching results. I think this would be quite nice for planets that host large groups.

@belisarius222
Copy link
Author

One possibility for heartbeat acks would be a "scry blocked" response packet, emitted as a courtesy by the publisher each time it receives a request for a scry path that doesn't yet have a value. Right now it just drops the packet, which is why the subscriber doesn't get any information back over the wire and concludes that the publisher is dead. It's not a referentially transparent response, so we'd want some kind of DoS protection on the packet, like a signed publisher date, to prevent eavesdroppers from forging these packets or re-sending old ones to try to confuse subscribers into thinking the data they're requesting isn't available.

But then, how would we interpret that date? Subscribers will often have clock skew, and we don't really want them to only be secure if they've remembered the last heartbeat date. We could enforce that the subscriber remember the date, or use some sort of cryptographic ratchet, but once we've taken the hit of requiring state on the subscriber for authenticating keepalives, I feel like we might as well just adopt wireguard as an outer layer of our networking, and then we could lean on that for forward secrecy, keepalives, and probably even packet forwarding, without reinventing the wheel. Then we wouldn't need in-band Ames or Fine keepalive packets at all. This thought process is basically how I ended up with the Nan Madol proposal.

Another issue with using "scry blocked" responses as keepalive acks is that you'd generally have one per subscription, not one per ship, which could add up if most subscribers are in twenty chat channels within the publisher's group. This thought led to the Keepalive Requests section of my previous symmetric routing proposal, which still strikes me as the best option for in-band keepalives.

@eamsden
Copy link

eamsden commented Apr 1, 2022

@eamsden

Routing attacks become trivial if non-authenticated relays are permitted on the network. Azimuth already handles authentication for us.

True. Our relaying is not currently authenticated, but maybe it should be.

maybe

One proposal I made a couple years ago, which was not popular but I still think is worth considering, is authenticating each ship-to-ship hop using an HMAC derived from the Diffie-Hellman of the networking keys of the two ships in the hop.

I'm curious why the proposal was not popular, and with whom. I'd like to get the steelman against this.

I think to authenticate relaying in this proposal, the "next" field would also need to include an Urbit address, so you would know which ship is next, not just its IP and port.

Yes this is correct.

I agree with your assessment of heartbeat packets, abstractly, but -- what does a heartbeat packet look like?

This is something where prior art is actually a good thing for Urbit to look at. A heartbeat packet is exactly what it needs to be, and no more.
It is "header-only" meaning it looks like a scry response but without a body (and with some kind of heartbeat counter field).

It's not a concept that currently exists in Urbit.

Correct me if I'm wrong, but it does. Ames already pings the galaxy to keep a pinhole alive.

Would it be a king-to-king protocol, or is Arvo involved?

My first thought is publisher king to subscriber arvo. Subscribers (including relays-as-subscribers) need to keep state up to date and fall back to less-tight routes. Publishers can have a list of subscribers set by subscription requests/cancellations, and the king can scry that out of Arvo and use it to send out heartbeats.

It's fine to load Arvo and thus the event log more as you subscribe to more things, far less so to load Arvo more per subscription to you.

Does it get acked?

It is an ack, to a request that we pretend arrives every ~30s and we just elide as an optimization. An "ack taking initiative" perhaps. Don't ack an ack.

Is it part of an existing protocol like Ames or Fine, or is it a separate third protocol?

It's part of the scry notification protocol

Publishers and relays only have to update state on subscribe requests, which they need to do anyway to update the list of where to push to.

Yes. This hasn't been implemented, but relays could do their own scry notifications, demultiplexing from many subscribers to the single publisher, multiplexing subscription updates out from the publisher to many subscribers, and caching results. I think this would be quite nice for planets that host large groups.

Glad we are thinking along the same lines here.

@eamsden
Copy link

eamsden commented Apr 1, 2022

Followup: it is important that keepalives be done per corresponding pair of ships, and not per subscription. The information you really want is "is the publisher ship still alive" and if so and you haven't gotten an update for a specific subscription from that ship, you can assume they don't have one.

@belisarius222
Copy link
Author

Ok so there is a keepalive, but not a very scalable one. Each ping to a sponsor is an Ames message. I think this scales well enough for sponsorship, but would not for userspace publications. We agree that keepalives should be handled by the publisher's king in practice, which means publication keepalives can't impact the publisher's event log.

Unsolicited "push" packets, like a heartbeat response to a request that isn't sent, seems a bit off to me in a symmetric routing world. The subscriber should be pinging the publisher and getting acks, so that a) responsibility lies with the subscriber to trigger a response from the publisher, and b) the subscriber is continually telling the publisher where it is, so the publisher knows where to send responses.

As for your followup, the way we've designed the notifications protocol, notifications are optional. A simple runtime might not implement them. So a publisher might not push updates to you if it has such a simple runtime, in which case you need to keep re-sending scry requests indefinitely.

I suppose we could add an ack packet that says "I heard your request and will store it until Vere restarts, so you will get a response eventually if I know where you are." Then if we (the subscriber) receive one of these, we could almost switch to just sending keepalive packets, whose responses should include a nonce representing the Vere process instance.

The almost is because there's no way to know if a response packet was sent but lost over the wire. We would either need to have the subscriber ack the response, or have the subscriber retry the request on a timer until the data is delivered.

So I'm not convinced there's a better short-term solution than just retrying subscription-related scry requests every thirty seconds or so, with an added repeated keepalive scry request that gets a response each time as long as the publisher is online.

Long-term, I think we could implement something like your proposal, where subscribers could ask once for a datum, get an ack that the publisher will eventually send you the datum, then stop asking -- but only if we had TCP-style sessions in a dedicated king-to-king protocol, as I described in the Nan Madol proposal. To get that kind of behavior with the kind of performance we want, you need notifications-related acks to be stored in transient king state, not the event log.

@eamsden
Copy link

eamsden commented Apr 2, 2022

Unsolicited "push" packets, like a heartbeat response to a request that isn't sent, seems a bit off to me in a symmetric routing world. The subscriber should be pinging the publisher and getting acks, so that a) responsibility lies with the subscriber to trigger a response from the publisher, and b) the subscriber is continually telling the publisher where it is, so the publisher knows where to send responses.

Not necessarily. The subscriber can, of course, behave this way, and may want to in order to e.g. keep NAT pinholes open.
But if the subscriber doesn't want to behave this way, we can just keep optimistically pushing heartbeats from the publisher, and the subscriber can react to timing out by sending a new subscription request.

As for your followup, the way we've designed the notifications protocol, notifications are optional. A simple runtime might not implement them. So a publisher might not push updates to you if it has such a simple runtime, in which case you need to keep re-sending scry requests indefinitely.

Again, the fallback of "resend subscribe if you don't hear the heartbeat" does this.

I suppose we could add an ack packet that says "I heard your request and will store it until Vere restarts, so you will get a response eventually if I know where you are." Then if we (the subscriber) receive one of these, we could almost switch to just sending keepalive packets, whose responses should include a nonce representing the Vere process instance.

I don't think this extra case in the state machine is necessary, see above.

@belisarius222
Copy link
Author

I'm not sure your proposed system works as intended. I've filled in the blanks a bit for parts of the system that you've only described somewhat briefly, so let me know if I misunderstood. Consider:

Subscriber sends scry request for next datum to publisher. Publisher doesn't have the datum, so it just starts sending heartbeat responses on a timer. When the subscriber gets the first heartbeat, it stops re-sending the scry request, assuming that the publisher will notify it. Then at some point, the publisher grows a value at the requested scry path and fires off a scry response packet to the subscriber, but the packet gets lost over the wire.

I guess there are a couple cases in this situation:

  1. This was the only subscription from the subscriber to the publisher, so when the publisher sends the scry response, it also stops sending heartbeats. This triggers the subscriber to time out and re-send its subscription request, which should trigger a re-send of the scry response packet, so this will eventually succeed correctly.
  2. There were other subscriptions from the subscriber to the publisher, so when the publisher sends the scry response on this subscription, it keeps sending heartbeats. When the packet is lost, the subscriber has no way of knowing it should re-send the scry request packet. This is then stuck.

In fact, if there are other subscriptions, then we'd also need some sort of initial ack to each scry request, in addition to the heartbeats. Otherwise, if one subscription is already online and there are heartbeats flowing, then when the subscriber makes a request for a path in a different subscription (and the path is for the next datum and hasn't been grown yet), there's no observable change in the publisher's behavior, so the subscriber will just keep re-sending the request packet. The publisher was already sending heartbeats, and it just keeps sending heartbeats after receiving the second request. So the subscriber will never learn that it can stop re-sending this scry request.

So I don't think this works without the publisher sending some kind of ack on the initial scry request telling the subscriber that it doesn't need to keep re-sending the request, at least while the heartbeats are going. I think this ack is a little different from what I'd been thinking of as a "scry blocked" response, since it implies that the publisher is taking on the responsibility of delivering a response, as opposed to just a courtesy of informing the subscriber that its scry request failed for now.

The publisher would only send this ack if it's running the scry notifications protocol. Otherwise, it shouldn't claim responsibility for delivering the response later. This could act as a form of protocol negotiation: a subscriber polls for a scry request on a timer until it hears this initial ack; if it never comes, then it keeps pinging forever. If the subscriber isn't using the scry notifications protocol, it can drop incoming acks and keep re-sending the scry request packet as if it never heard them.

The ack needs to be authenticated, and it needs to include the sender ship's address, the scry request path, and a session id number of some sort so that if the publisher Vere restarts or the subscriber ship goes offline for a while, the session id will change, informing the subscriber that it needs to re-send all pending scry requests.

If the ack is lost over the wire, the subscriber will re-send its scry request on a timer. Each time the publisher gets a duplicate request, it sends a duplicate ack, which should include the same session id as the first ack unless the session has died.

An issue with eliding heartbeat requests and only sending heartbeat responses is: how does the publisher know to stop trying to push updates to a subscriber, if the subscriber has moved or gone offline? If heartbeats only go from the publisher to the subscriber, then the publisher doesn't know whether the subscriber is still online. It effectively has to keep its pending request table alive forever if the subscriber goes down. Again, I think it makes more sense for this responsibility to lie with the subscriber. If the publisher stops hearing keepalives from the subscriber, it deletes the request packets from that subscriber and stops trying to notify it. If the subscriber comes back online, it starts over from scratch and re-sends all its pending scry requests.

Here's a counterproposal, based on these thoughts:

The general idea is that the subscriber sends heartbeats while it has pending requests open on the publisher, which keeps those requests pinned in the publisher's pending requests table. A heartbeat can be thought of as a shorthand for a set of re-sent scry requests, one for each pending request.

The publisher acks each heartbeat request so that the subscriber can continue to tighten the route. The publisher also sends a subscription ack each time it hears a request for a scry path that doesn't yet exist. The subscriber will re-send each scry request on a timer until it hears such a subscription ack. This means that even if the first couple acks get lost over the wire, one will eventually go through and the subscriber can stop re-sending the scry request and instead switch to heartbeats -- and it might already have been sending heartbeats because of some other subscription.

A session dies either when the publisher Vere restarts or when a certain amount of time (maybe 1 minute) has gone by without the publisher hearing a heartbeat request packet. If the publisher hears a heartbeat request on an unrecognized session, it sends a heartbeat response with the new session id, creating that id if needed. If the subscriber stops hearing heartbeat responses for a certain amount of time (also a minute, let's say), it will consider the connection dead, but will continue to send heartbeats until it hears a response from the publisher. If the subscriber ever hears a new session id from the publisher, it knows it needs to re-send all pending scry requests and switch its heartbeat requests to the new session id.

A subscription ack packet contains:

  • a bit pattern saying "I'm a subscription ack", ideally in the header
  • subscriber ship, life, rift
  • scry request path (includes publisher ship, life, rift)
  • session id
  • signature

A heartbeat request contains:

  • a bit pattern saying "I'm a heartbeat request", ideally in the header
  • subscriber ship, life, rift
  • publisher ship, life, rift
  • session id
  • incrementing counter, starting at 1 each session
  • signature

A heartbeat response contains:

  • a bit pattern saying "I'm a heartbeat response", ideally in the header
  • subscriber ship, life, rift
  • publisher ship, life, rift
  • session id
  • counter (same as the one that was just heard)
  • signature

@belisarius222
Copy link
Author

Ok so there is a keepalive, but not a very scalable one. Each ping to a sponsor is an Ames message. I think this scales well enough for sponsorship, but would not for userspace publications. We agree that keepalives should be handled by the publisher's king in practice, which means publication keepalives can't impact the publisher's event log.

Huh, if we had heartbeat packets, we could just use those for sponsor pinging, instead of Ames messages. That would be nifty, since a star could potentially just store the sponsee's IP and port in the Urth process, without incurring Arvo events and disk writes. The way we do it now, the average fully-laden star would have roughly 65000/30, so 2200 disk writes per second, just to track its sponsees.

@belisarius222
Copy link
Author

As noted in a meeting today, my counterproposal doesn't work. The notification for the new subscription datum would have to be re-sent and explicitly acked in order to make it reliable; otherwise, the notification could be lost permanently.

@belisarius222
Copy link
Author

... that being said, we could just add acks. Sure they're a bit weird, because the ack is a request from the subscriber, acking a response, so it's arguably acking an ack -- but I think this will at least get us to a complete, workable system, which we can then discuss and tweak.

We'll add a new packet type, a subscription cancellation request. The subscriber will send this request to tell the publisher it no longer wants to hear a notification when data is grown at the specified path. This can happen if the subscriber unsubscribes, or if it receives the scry response from the publisher.

A subscription cancellation request packet contains:

a bit pattern saying "I'm a subscription cancellation request", ideally in the header
subscriber ship, life, rift
scry request path (includes publisher ship, life, rift)
session id
signature

Once the publisher has a newly grown scry path, it sends notifications to live subscribers until either it hears a cancellation request packet or the session dies.

@leowini
Copy link

leowini commented May 12, 2022

If my understanding is correct, the key here is that pure acks don't really convey any information except that some request was received by the destination. This makes acks great for route tightening, but not much else related to subscriptions. Therefore, things like "acknowledging" a subscription should be thought of as remote scry protocol requests, not acks.
Here's my pubsub walkthrough:

  1. A subscriber repeatedly sends subscription request packets to a publisher, until the publisher sends a request back to the subscriber saying "stop sending subscription requests," indicating that the publisher knows about the subscription. As @belisarius222 points out, the "stop sending subscription requests" request is arguably an ack, but we'll just say only acks are acks. I don't think the "stop spamming me" packet needs to be acked, because if it isn't received the subscriber will just keep spamming subscription requests.
  2. After the subscriber receives the "stop sending subscription requests" request, the subscriber will stop spamming subscription requests and only send requests as needed to keep the route tight. I think these requests could be serf-to-king, unless they should also verify that the publisher still has the subscription. (My understanding is that most of the remote scry pubsub stuff lives in Arvo.)
  3. When the scry path is grown, the publisher pushes to all of the subscribers until it receives a cancellation request from each of them. The subscriber should be receptive to push notifications whether or not it has received the "stop spamming me" packet.
    In this system we need:
  • a subscription request packet.
  • a keepalive/route-tightening packet - this could possibly be just more subscription request packets.
  • a "stop spamming me" request packet that the publisher sends in response to subscription request packets.
  • packets to push the requested data - these should be fragmented/acked as necessary.
  • a subscription cancellation packet.
    I'm both an Urbit and networking noob, so hopefully I'm not entirely off base.

@belisarius222
Copy link
Author

belisarius222 commented May 13, 2022 via email

@leowini
Copy link

leowini commented May 17, 2022

Okay that makes sense to me. I spent some time looking at ames.c and ames.hoon to get a better understanding.
It sounds like the symmetric version of Ames should:
a. get rid of the origin field - no longer needed.
b. add a bit for request/response - Ames should only look at the 'next' field on response packets.
c. add the next field.
d. add 2 more bits to distinguish subscription acks, heartbeat requests, and heartbeat responses from 'regular' packets.
Initially, how will the galaxies learn to do peer discovery for stars, and how will the stars get the lanes of their sponsee planets? It seems like for every response packet, non-destination relays should check to see if they have the source lane in their sponsees map, and if not somehow request that ship's @p so that it can be added as a sponsee. Does this make sense?

@belisarius222
Copy link
Author

Items a) through d) all look right.

Initially, how will the galaxies learn to do peer discovery for stars, and how will the stars get the lanes of their sponsee planets?

Good question. A sponsee needs to ping the sponsor repeatedly with some kind of request packet. This could be just a heartbeat packet, potentially. Current Ames sends an Ames command to the sponsee for this purpose, but with symmetric routing and remote scry, we should be able to avoid the disk writes on the sponsor by structuring the pings as reads or heartbeats.

When the sponsor hears the packet from the sponsee, it stores the sponsee's lane for some time, just like with any other ship that sent it a request. While it knows the lane, when it receives other packets intended for the sponsee, it knows where to send them. The sponsor should not need to send any requests to the sponsee for this to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment