Skip to content

Instantly share code, notes, and snippets.

@belisarius222
Last active January 26, 2023 19:52
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save belisarius222/390daafc146f7c6ddd98836e61dc307f to your computer and use it in GitHub Desktop.
Save belisarius222/390daafc146f7c6ddd98836e61dc307f to your computer and use it in GitHub Desktop.
Session-Based Notifications Protocol

Session-Based Notifications Protocol

We will introduce the concept of a network session, initially only to be used for scry-based publications. The basic idea is for this protocol to act as an optimization layer above basic repeated scry requests for the next datum in a publication stream.

Let's say a subscriber, ~sub, has already downloaded the chat message from the publisher, ~pub, at scry path /~pub/chat/33. Now it wants to download the next one, at /~pub/chat/34, to stay up to date. However, this chat message doesn't exist yet. Naively, ~sub will just keep re-sending the scry request packet for /~pub/chat/34, with exponential backoff, until the response comes through.

Problems with Naive Scry Polling

Polling for scry responses has three problems. First, the latency will in general be quite high. If there's more than a few-second delay between adding chat messages, subscribers will back off their polling rate, and the latency will generally be on the order of ten to thirty seconds before each subscriber hears the message. This is unacceptable from a user experience perspective.

The second problem is that if subscribers have many different subscriptions open to a publisher, which is common, each of those packets needs to be re-sent repeatedly. This causes load on the publisher, whose incoming bandwidth will be at least somewhat eaten up by all these repeating packets.

The third problem: since none of these repeating scry requests yield responses until the new data comes into existence, there's no data flowing over the network from the publisher back to the subscriber, so the subscriber can't tell whether the publisher is even online. If it thinks the publisher has gone offline or moved to a different IP and port, it loosens the route to the publisher, relaying packets through the publisher's sponsoring star, and then through its galaxy and star. This concentrates network load onto galaxies, which scales poorly.

Overview of a Solution

Our approach to solving these problems is to overlay a new optional network protocol on top of the basic remote scry protocol to handle sessions and notifications. Note that all packets described in this protocol are authenticated with a signature.

When a subscriber sends a request for a scry path that doesn't yet have a value grown on it, the publisher can respond with a "scry promise" packet, i.e. a packet saying "I don't have data at that path yet, but when I do, as long as this session is still alive, I promise I'll deliver it to you." This packet effectively starts a session, and the session id (a pseudorandom number) is specified in the packet.

The subscriber keeps the session alive by sending heartbeat request packets once every 25 seconds. This holds open firewall pinholes, keeping the connection open so responses reach the subscriber even if the subscriber is behind a home or corporate router. This heartbeat request contains the same session id that was contained in the publisher's scry promise packet that initialized the session.

When the publisher receives a heartbeat request packet whose session id is for a live session, it sends a heartbeat response with the same session id. Since we now have a full network roundtrip from subscriber to publisher and back, over time the subscriber can learn the locations of the relays in between and tighten the route in future roundtrips. Both publisher and subscriber maintain up-to-date information about the location of the other.

If the publisher does not receive a heartbeat request for 60 seconds, it kills the session, deleting all notification state with this subscriber and absolving itself of any responsibility to deliver notifications. This implements a lease: the subscriber has to keep pinging the publisher frequently enough to convince the publisher not to consider the subscriber offline.

The subscriber just keeps trying to send heartbeat requests, even if it hasn't heard a heartbeat response for a while. If the publisher receives a heartbeat request with an unrecognized session id, it responds with a heartbeat nack (negative acknowledgment). When the subscriber receives this nack, it knows the session is dead, so it re-sends all pending scry requests, potentially causing the publisher to start a new session.

Within a given live session, the publisher will respond to any newly received scry request packets for not-yet-grown paths with scry promises, all scoped to the current session. Once a scry response is available, the publisher will send it to all subscribers who had asked, repeating the response with exponential backoff until the subscriber sends a "scry cancellation request" packet to indicate that it is no longer interested in hearing that scry response. It's possible for the subscriber to receive multiple notification packets; if it ever hears a notification packet for a path it's not interested in, it sends a scry notification request back.

The scry cancellation request is being used as an ack for this part of the protocol, but it can also be sent as a courtesy to the publisher if the subscriber was interested in a path but then loses interest before the response is given.

Heartbeat requests are numbered within each session, starting at 1. The subscriber numbers its first heartbeat packet 1, then increments the number each time it sends another heartbeat, 25 seconds after the previous one. Note that the subscriber sends the next heartbeat and increments its number, irrespective of whether it heard a heartbeat response from the publisher. Each heartbeat response includes the number of the request that generated it.

The numbering scheme is there to prevent malicious actors from replaying old heartbeat packets to try to hijack a connection. Both sides can ignore packets with old numbers.

Packet Types

Three bits in the 32-bit packet header will be devoted to describing which kind of packet this is. The first bit says whether it's a foreward request or backward response. The next two bits say which protocol this packet is part of.

%fore is short for forward, meaning "request". %back is short for backward, meaning "response".

%scry is the remote scry protocol. %ames is the command protocol. %beat is the heartbeat protocol. %bond is the scry promise protocol.

Packet types:

$%  [%0b0 %scry-fore]
    [%0b1 %scry-back]
    [%0b10 %ames-fore]
    [%0b11 %ames-back]
    [%0b100 %beat-fore]
    [%0b101 %beat-back]
    [%0b110 %bond-fore]
    [%0b111 %bond-back]
==

A %bond-fore packet is a scry cancellation request. A %bond-back packet is a scry promise.

All packets have the format [header prelude body], and only the body differs among packet types, aside from the packet-type field in the header.

Heartbeat Request Body

64 bits: session id
64 bits: sequence number (starting at 1)

Heartbeat Response Body

64 bits: session id
64 bits: sequence number (starting at 1)
 8 bits: live? (flag bit plus 7 bits of padding)

If live is no, then this is a nack.

Scry Cancellation Request ([%bond %fore]) Body

512 bits: client signature
 32 bits: fragment number
 16 bits: path string length
variable: path as ASCII

This is the same as a normal scry request, although the signature covers a slightly larger tuple that looks like [%bond ...], to prevent forgery.

Scry Promise ([%bond %back]) Body

 64 bits: session id

512 bits: server signature
 32 bits: fragment number
 16 bits: path string length
variable: path as ASCII

Also very similar to a scry request body, but with the addition of a session id to scope the promise, and the signature is performed by the publisher, not the subscriber.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment