Skip to content

Instantly share code, notes, and snippets.

@belisarius222
Last active July 13, 2022 13:49
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save belisarius222/4ae249c07d9e169b38b4e9f57e0eced4 to your computer and use it in GitHub Desktop.
Save belisarius222/4ae249c07d9e169b38b4e9f57e0eced4 to your computer and use it in GitHub Desktop.
Nan Madol: Next-Generation Urbit Networking

Nan Madol: Next-Generation Urbit Networking

This proposal advocates for using the wireguard protocol as the Terran basis of the next generation of Urbit networking, with QUIC layered on top. This takes care of the following, all in the Urth process, meaning no Nock execution or Arvo events required:

  • sponsor keepalives
  • star packet forwarding
  • peer discovery
  • authentication
  • encryption (including forward secrecy)
  • transmission control (packetization and congestion control)
  • scry notifications
  • DDoS protection
  • IP anonymization
  • traffic obfuscation

All that's left is Arvo telling Vere to send a message to some ship, and Vere handles the rest.

Initialization

Only stars tighten routes, and only down to other stars. This makes each star like a VPN relay, and it also simplifies routing to a single level of tightening, which we can expect to remain stable, since stars rarely change IP addresses.

We will use supersymmetric routing: all packets to and from a planet or moon go through the sponsoring star. Most packets are planet-to-planet, and will go through the sender's star and then the receiver's star.

Every network hop is between two ships and is its own wireguard channel. End-to-end encryption is achieved through wireguard's standard VPN-style forwarding capability.

A ship will ping its sponsor using wireguard's built-in keepalive configuration.

To stave off DDoS attacks, a ship will use wireguard to accept connections only from specific ships, as authenticated by the Azimuth PKI (augmented by Ames-based key propagation for moons and comets). A ship's curve25519 networking key will be used as its wireguard public key. All other packets will be discarded.

A galaxy allows incoming connections from galaxies and its own subnet*, and it is responsible for knowing and sharing its stars' lanes with other stars. A star allows incoming connections from its sponsoring galaxy, other stars, and its own subnet (planets, registered moons, and comets). A planet or moon allows no incoming connections but maintains an outgoing connection to its sponsoring star.

*Alternatively, a galaxy could support unauthenticated requests for the locations of its stars, reducing the amount of PKI state it needs to store.

A ship initiates a QUIC connection with another ship over its wireguard channel to implement the Urth-to-Urth protocol described below. This QUIC connection uses the ship's curve25519 key to create a TLS 1.3 certificate. OpenSSL has an implementation of the certificate procedure. The need for a TLS handshake is a bit unfortunate since the wireguard connection is already authenticated and encrypted, but QUIC requires it and it's a minimal (1-RTT) handshake.

Requests and Responses

The following requests can be sent over a QUIC channel between two ships. This could use the %newt length-prefixed jammed-noun protocol, or a more bespoke encoding.

+$  request
  $%  [%poke =bone seq=@ud command=*]  ::  command
      [%peek =path]                    ::  namespace read
      [%lane sponsee=ship]             ::  request sponsee location
  ==
+$  response
  $%  [%poke =bone seq=@ud ok=?]       ::  command (n)ack
      [%peek seal=@ux =path value=*]   ::  namespace response, signed
      [%bide =path]                    ::  namespace request ack
      [%lane sponsee=ship =lane]       ::  sponsee location
  ==

A %poke request represents a cross-ship command, which must be acknowledged or rejected by a %poke response. If the QUIC connection dies without either process restarting, both sides should remember how much of the poke request has been sent, and resume from that point when a new connection is established.

TODO: messages to resume %poke request or %peek response after connection death

A %peek is a remote scry request: an attempt to read a path in Urbit's scry namespace. If the read succeeds, which may occur at a significantly later date, the value grown at that path will be sent back to the requester in a %peek response, along with a signature. If the QUIC connection dies without either process restarting, both sides should remember how much of the peek response has been sent, and resume from that point when a new connection is established.

A %bide response indicates that the responder has heard the %peek request and will send a %peek response once it's ready. This happens when the requester asked for a path that doesn't exist yet. Unlike %poke requests and %peek responses, which live until a process restarts, %bide responses are scoped to the QUIC connection. If a connection dies, the server will no longer be responsible for delivering any %peek responses; %peek requests re-sent in a later connection could cause new %bide responses if the requested path still doesn't exist.

This means download resumption will not work across process restarts, and the downloads will need to start from scratch. I don't think this is a big problem, since restarts happen rarely. If it is deemed to be a problem, though, the runtime could checkpoint downloads by periodically delivering data chunks to Arvo as events, possibly through Khan, then using that information to resume the download after restart.

The QUIC handshake should share a process nonce for both sides. When a new connection is established, each side can check whether the peer has restarted since the last connection. If so, all pending poke and peek requests will need to be re-sent.

TODO: how do comets and moons get onto the network?

@eamsden
Copy link

eamsden commented Feb 18, 2022

I want to register strong opposition to proposals that keep route tightening at the star level. Star and galaxy forwarding should be in service of peer discovery, and a backup against failure to create a direct connection.

I understand doing remote scry this way initially, but seeing it now move back into proposals for redoing Urbit networking generally is worrisome.

@belisarius222
Copy link
Author

belisarius222 commented Feb 18, 2022

Ok, well, that route tightening procedure isn't the main point of this proposal, so I'd very much like some feedback on the rest of this proposal too. We don't have to stop tightening at stars in order to use wireguard and QUIC -- there are a few reasons to do it that way, which I'll get into, but a version of this that tries to tighten all routes all the way by default would work basically the same way.

Also: remote scry doesn't limit route tightening to stars, even in the initial version. If your concern is broadly that the engineering team is failing to support network sovereignty in favor of convenience or just out of neglect, I hope you'll accept my reassurance that that's not what's going on.

The initial version of remote scry routing suffers from the same problem as current Ames routing: the responder ship can't usually tighten the response route, and since we use asymmetric routing, even if the request came on a direct route, the response will often need to go through the requester's galaxy. The responder doesn't get any direct feedback about how to tighten a route back to a requester, which is one of the major disadvantages of asymmetric routing generally, and one of the big reasons I've been leaning more and more toward symmetric routing lately.

I think there's a strategic question here, which we've never really explored in detail or formally resolved, so I'll write up how I see this, and then I'm curious if other people see the situation differently. The big advantage of tightening only to stars is that it increases the default IP anonymity set of the network as a whole. The disadvantage is that it causes more reliance on the star, reducing sovereignty. The big questions in my mind are:

  • How big are both of these effects?
  • Can we get the best of both worlds somehow?

My general perspective at the moment is that the anonymity advantage is fairly strong, and the sovereignty reduction is fairly weak, especially since it should still be doable to share direct lanes manually or based on configuration.

Let's say you and I are planets and I want to send you a message. If I always have to go through your star to get to you, and each hop is a wireguard channel authenticated by the Azimuth PKI, I never learn your IP and port.

The only ways for me to learn your IP and port would be to co-opt your star into coughing up that information, or to be a "global passive adversary" who can listen to all the packets into and out of your star and collect enough timing statistics to correlate my packets going into your star with packets going from your star to you, along with the responses taking the reverse route. Both of these attacks are plausible, and I don't expect that simply not tightening routes by default will be enough to grant perfect IP anonymity. I think these are pretty much the same concerns that one should have about using any VPN service.

Note that this only affects the planet location -- if your client is on a different machine, then you could use a normal Terran VPN service to anonymize the IP of your client. It's worth trying to gauge the importance of planet IP anonymization given the ubiquity of VPNs. I'd say it depends on the threat model: if your client is not your planet and all you care about is keeping your physical location private, a (trustworthy) VPN should serve your needs just fine. However, if you want to reduce the risk of DoS, physical break-ins, extortion or pressure applied to your hosting service, or other attacks on your planet itself, keeping its location private should increase the amount of resources that an attacker would need to invest to perform any such attack.

I suppose it's also possible to run your planet behind a (Terran) VPN service, in which case you should get the protections I described in the previous paragraph. That suggests another framing for this question: is it better to use Urbit stars as VPNs, or external Terran VPNs? There have certainly been some privacy breaches in Terran VPNs, so I don't think this is an idle question.

My general take on this is that private-by-default is a goal worth striving for. As far as I understand, an anonymous-by-default design is a much better foundation on which to build stronger anonymity -- a star could randomly defer forwarded packets, for example, to make timing attacks that much harder. I'm less certain about ways of mitigating deanonymization by a malicious star, so I'm not sure how much of a strong constraint that is. Even if it's not preventable, though, the normal Urbit approach of "things will be fine as long as at least one star operator out of the 65,000 is reasonable" seems like it should be a more or less viable solution to that in practice.

In contrast, if the normal procedure is that your star tells me your planet's IP and port as soon as I ask for it, then it's easy for anyone to find your planet on the internet. If this is the default, then I think mapping out which planet lives at which IP and port, and which planets are communicating with which other planets, is significantly easier.

This does reduce the default level of sovereignty for the typical planet, making it more dependent on its star. I'm guessing this is the reason for your "strong opposition". I'd like to examine this question, though, since I'm not sure the effect is large enough to outweigh the anonymity concerns.

I think your reasoning goes something like this: if your star is responsible for forwarding all packets to and from you, and it decides it doesn't like you for some reason, it could silently drop some or all of those packets -- sort of like a network-wide shadowban. In contrast, if you're at a public, non-firewalled IP and port, I can send you packets directly, and even if your star drops all packets headed your way, you can still communicate. Thus, you are at the mercy of your star without automatic route tightening, but you would be an autonomous network participant if direct routes are default.

This analysis breaks down when you consider that new peers need to find you somehow -- if they can't, your network sovereignty is quite limited. Your star is responsible for making sure others can find you, so if it decides to shadowban you, even if all routes to you end up direct, no new peers will be able to communicate with you unless you are both at a static public IP and also ping them first. In practice, this means generally new people won't be able to find you. If someone tries to invite you to a group, for instance, unless you've already been DMing, you'll just never see the invite.

So is this really that different? Direct routes or no, you desperately need to find a new star -- or find a way to bypass the normal peer discovery system and use a parallel system.

I wonder if there's a synthesis to be found here, involving such a parallel system. I could mark certain groups and peers as trusted, perhaps, and then my ship would share its IP and port with those peers (encrypted, in-band over Ames). That way we could get direct routes to each other and keep them even if our stars betray us. One could even imagine ships that are sort of like Tor hidden services, only communicable through private relays. These would be sort of like invite-only intranets, and I don't think it would make sense for those to be default, but I do think the feasibility of sharing direct routes through other means should alleviate your concerns at least somewhat.

@eamsden
Copy link

eamsden commented Jul 9, 2022

  • I don't think IP anonymity/private-by-default is a core part of Urbit's use case. If it is then we are going to have to build something a lot stronger than "stars are relays."
  • I do think that sovereignty and networking are in tension, because sovereignty means you do things yourself, and networking means you do things with other people/ships.
  • I'm also thinking about not building Urbit in such a way that precludes "Urbit maximalism."

The last point is my biggest worry I think. Even if there's a route-injection mechanism, if too much of the design in practice relies on the routing behavior, it could preclude Urbit from ever outgrowing IP.

Practically what this means is that while right now (and for the near and probably medium term) we use IP to handle routing for us, this shouldn't be made necessary by kernel design choices we make. While some default routing behavior is necessary, I want to make sure that systems designs take into account that routing behavior could be changed. I think the way to do this with a kelvin-frozen kernel is to have interfaces that allow agents (or the king) to inject routing information for peers. Notably, this would allow something like Tor-on-Urbit (Torbit) to be a user agent without userspace actually having to handle packets.

So I think we agree that in the default routing case you're reliant on your star, but I don't want that to yield an assumption in kernel protocols that your sponsor relays all your traffic. A default is fine, but it needs to work if I tell it to do something different.

@belisarius222
Copy link
Author

The Urbit maximalism point is particularly good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment