Skip to content

Instantly share code, notes, and snippets.

@HerbCaudill
Last active May 24, 2024 15:18
Show Gist options
  • Save HerbCaudill/038b949e97d8d3140833a07a257c5620 to your computer and use it in GitHub Desktop.
Save HerbCaudill/038b949e97d8d3140833a07a257c5620 to your computer and use it in GitHub Desktop.
automerge-auth strawman

Suppose we wanted to build automerge-auth, a successor to localfirst/auth. What would that look like?

Objectives

Develop a recommended approach to authentication, authorization, and end-to-end encryption in the Automerge ecosystem.

Desiderata

Dependencies

  • No dependencies on central servers (but can take advantage of pre-existing services if desired, like traditional PKI)
  • No infrastructure assumed besides P2P network (e.g. no blockchain dependency)

Trust

  • Would be great if we could treat sync servers as untrusted
  • We're OK with requiring a pre-existing trusted side channel (e.g. for sending invite codes)

Extensibility

  • Supports multiple models for invitation & key management
  • Supports multiple models for group management (e.g. different policies for resolving or avoiding revocation cycles)

DX

  • Works well with automerge-repo, and can piggyback on its networking & storage
  • But core functionality (groups, connections, invitations) not tied to automerge
  • Adaptable to a wide variety of use cases
  • Good defaults; can get up and running without making a lot of decisions
  • Painless upgrade path from lf/auth

UX

  • Not scary or strange; no cryptographic hoo-ha visible to the user
  • Appealing outside our circles of wierdos, hippies, and nerds
  • Supports a flexible sharing model - e.g. parity with Google Docs, where I can:
    • share a doc with one person
    • share a doc with a group (& removed members lose access to new changes)
    • share a set of docs with one group, and a subset of the docs with another group as well
    • share a doc publicly with anyone
    • revoke access to anything I've previously shared

Architecture

localfirst/auth architecture for reference:

image

The overall shape I have in mind would be similar:

  • a Group class (replaces Team in lf/auth) for managing group membership, users, and devices, but with transitive group hierarchies
  • a Connection class implementing a point-to-point network protocol to provide an authenticated secure channel, independently of any particular group
  • an AuthProvider class that interfaces with automerge-repo, with first-class support for multiple possibly-overlapping groups

image

lf/auth also provides an auth-aware sync server; if you use it you have to add the sync server to every team. An E2EE sync server would be a big improvement.

AuthProvider

The AuthProvider in lf/auth works without automerge-repo knowing it exists: It does authentication by wrapping a network adapter before you provide it to automerge-repo. The Repo never sees a peer until they've been authenticated and we have an encrypted session with them.

The auth provider takes an automerge-repo storage adapter, so you can use the same one you give the repo.

I'd stick with this architecture; it's worked pretty well.

Groups

A core capability of this library is managing groups, users, and devices.

In lf/auth you can have a Team, a User, and a Device and these are three totally different constructs. I think it would be very powerful to treat these all as fundamentally the same thing, as follows:

  • a group has:

    • an ID consisting of a GUID assigned on creation
    • a human-facing name
    • a signing keypair
    • an encryption keypair
    • 0 or more members, which are groups
  • a user is a special type of group that only contains of devices

  • a device is a special type of group with no members

  • group membership is transitive: all the members of a child group are members of the parent group.

image

This approach of groups-within-groups and users/devices as groups simplifies a lot of things relative to the current implementation in lf/auth:

  • there's a lot of special-case code in lf/auth around things like "is this a device invitation or a member invitation"; that all goes away
  • it makes it possible to deal cleanly with multiple groups that might be partially overlapping, in terms of things like device revocation (in lf/auth you'd have to revoke the device separately in every group it appears; this way it's just revoked in the user's group of devices, and takes effect in every group they belong to)
  • lf/auth has a bunch of mostly speculative code around arbitrary roles within a group; instead these kinds of thing would just be expressed as subgroups
  • The special admin role probably just becomes a special subgroup as well (have to think about this more)

Extensibility

Group membership & permissions logic for specific use cases could be determined by plugins. This would allow different approaches to the trickier aspects of distributed group membership, like mutual/cyclical revocation.

For example different group membership policy extensions might enable:

  • A Matrix-style system with power levels, where admins can't remove admins and mutual revocation is impossible
  • A system with a group owner or super-admin role that is unitary and transferable, where a founder can never be removed if they don't want to go
  • A DCGKA-lf/auth-style system where any admin can remove any admin, and cycles are resolved by seniority

There should be a default, and of course I'd be inclined for the default to be lf/auth's seniority system; but I don't feel strongly about that.

Q: Would group membership logic vary per group? or per repo? not sure.

Implementation

This is probably obvious to everyone else but it's a recent, if half-baked insight for me: that there's an underlying similarity, maybe even an equivalence? between a graph of chained UCANs (or VCs) and the graph of signed team membership operations that forms the data structure for lf/auth's CRDT.

So while it's always been clear to me that we could use UCANs to express permissions, now I see that we could also use UCANs to express group membership operations. So we have two options for the underlying data structure:

Option 1. Using CRDX

My default approach would be to implement groups as CRDX stores; that's how localfirst/auth teams are implemented so it's what I know.

CRDX gives us an authenticated & encrypted hash-chained graph, along with a sync protocol. To create a CRDX store you provide an initial state, a resolver, and a reducer.

  • The resolver linearizes the graph and applies conflict detection & resolution logic. A base resolver is included. Extensions can replace or extend the base resolver.
  • The reducer calculates group state from a linear sequence of operations. A base reducer is included for handling basic group operations (add member, remove member, set group name, etc). Extensions can extend the base reducer and/or override the way it handles certain operations.

CRDX is essentially a build-your-own CRDT kit, and we're using it to make a group membership CRDT.

Option 2. Using UCANs

Everything's upside-down and backwards in UCAN-land, and you have to start by thinking about things like "how does the document feel about this". But I'm pretty sure that once you get over the initial disorientation, a graph of chained UCANs is just another kind of signature chain, and if you run it through a similar resolver/reducer mechanism it can function as a group membership CRDT.

The mechanics of replicating UCANs is different: In lf/auth we think of a team's graph as a kind of document that you always want to have the latest version of, so you're always trying to make sure you're in a fully-synced state; whereas UCANs are presented as needed and then cached (except for revocations, which are eagerly gossiped). So you only need to "sync" when you have a decision to make, like you've been shown a UCAN authorizing someone to read a document, and you're missing some of its predecessors.

Aside about backdating

As Florian points out, one disadvantage of this model is that it's harder to detect malicious backdating. The (unimplemented) scheme I describe here relies on an assumption that the normal state of affairs is for all peers to be mostly up-to-date, and we just don't accept changes that are based on an "old" state of the graph.

Of course you don't have to go with the lazy replication model just because you're using UCANs.

While the conceptual model is that Bob shows Alice a UCAN when requesting a document that Alice has, in practice it's more likely that Bob's UCAN is embedded in the document itself (e.g. in a well-known location of an Automerge document) and Bob just asks for a document by ID.

The big advantage of using UCANs to express group membership is that if we use UCANs to express document permissions (see below), it would be simpler to be working with a single kind of graph for everything.

Invitations & key management

To bootstrap decentralized PKI you need a way to associate a new peer's identifier with their public keys.

Encrypted messaging apps like Signal, WhatsApp, and Telegram trust an unfamiliar peer's ID the first time they connect — TOFU (trust on first use). The two devices tell each other their public keys, and from then on they use those keys to authenticate.

The problem is that "on first use" also means any time either party gets a new device, factory resets their device, or reinstalls your application. You can supplement TOFU with a manual verification process, e.g. asking two users to compare "safety numbers". Most users don't do this — I never have — and this all really falls apart when you try to scale it up to all the pairwise connections in a large group.

A better approach is to use an invitation process, in which a secret is shared out of band and then used to secure that initial connection. This process can safely link a device's keys to a user, and a user's keys to a group. This way, users can come and go from a group you're in, and other users can change their devices all they want, without you ever having to re-establish that trust relationship with the group.

Implementation

Approaches we've seen include:

  • In DXOS, Alice sends Bob a long discovery key (embedded in a QR code or url) that allows them connect in a private swarm & exchange keys. Once connected, there's a numeric PIN as a second factor; it's shown to Alice, who communicates it to Bob so he can enter it. The invitation and admission process must be completed synchronously, and third parties cannot complete admission.

  • In a PAKE (password-authenticated key exchange) process, Alice gives Bob an arbitrary password, which Bob can then use to generate a zero-knowledge proof when he connects.

  • The Seitan invitation process used by localfirst/auth can be thought of as an asymmetric PAKE (aPAKE), in that a third party can verify Bob's proof without knowing the secret. Alice creates a password, expands it to a signature keypair, and records the public key on the (encrypted) team graph, along with optional settings like an expiration period and limitations on reuse. Bob derives the same keypair and generates a proof. If Charlie is a team member, he can use the information on the graph to admit Bob.

  • OPAQUE is also an asymmetric PAKE protocol. Like Seitan, it supports arbitrary passwords and third-party verification. Unlike Seitan, it's had a lot of eyes on it and it's on its way to becoming a standard.

    (OPAQUE is primarily thought of in server/client terms, where you present the user with a familiar password login UI but you can avoid sending the password itself to the server. But we can take the same material that you would store on the server, and store it on the signature chain that's replicated across all group members. This way any member can validate the invitation without ever knowing the secret.)

  • Brooklyn has pointed out that unless you really think people are going to be spelling these secrets out over the phone, there's no reason to keep them short: Realistically most of the time you're going to provide a QR code, or a link. And if you're doing that, you can just skip a step
    and send the ephemeral private key itself.

If I had to choose one approach, I'd go with OPAQUE: It lets each application decide how strong the passwords should be, and the invitation process can be completed asynchronously by other team members. I'd add the same user-facing characteristics as lf/auth's Seitan implementation (expiration, number of uses).

The invitation model should be provided by extensions. Some applications might use server-based PKI and not need an invitation process at all; some might prefer for admittance to happen synchronously with the original inviter.

To support something like OPAQUE or Seitan, an invitation extension would need to be able to contribute:

  1. reducers for storing invitations and recording admittance
  2. a selector for retrieving invitations from the graph
  3. a phase in the connection protocol where we can present and validate invitations

Aside about key recovery

An aPAKE-based invitation process can be used without modification to provide key recovery in a case where all of a user's devices are compromised/lost/revoked: You can preemptively generate one or more long-lived device invitations for a user. These can be presented to the user as "backup codes" — which is already an established UX pattern in the context of 2FA — and printed out, stored in a password manager, or whatever.

Key management

With some invitation process in place to bootstrap things, the rest seems very straightforward: Groups are the source of truth for associating identities with public keys.

The AuthProvider can consolidate all the ID-pubkey pairs we know across all of our groups into a single rolodex.

Once we know someone's public key from being in at least one group with them, we can add them directly to another group without having to go through the invitation process.

Read permissions

Read access to a document can be enforced two ways: By deciding who to share the bits with, and by encrypting the bits. I think Brooklyn has persuaded us all that you should do both.

Deciding who to share the bits with

Regardless of how this is done, I think Brooklyn's point that you ultimately can't (so shouldn't try to) prevent downstream delegation is both correct and a useful simplification of the sharing model. All I can do is decide who to share a document with; once they have that document I have no say in whether they share it further.

Option 1. Using CRDX

We could store document permissions directly on the CRDX graph for a group.

  • Alice creates document XYZ
  • Alice states on the group graph that document XYZ can be shared with any group member
  • Removing someone from the group then automatically revokes their access to the document

Option 2. Using UCANs

Using UCANs to express document permissions would be a very natural fit since that's exactly what they're designed to do.

  • Alice creates document XYZ
  • Document XYZ creates a UCAN naming Alice as its owner
  • Alice creates a UCAN that delegates read access to a group
  • Revocations are gossiped to the group

If we were to use CRDX group membership and UCANs for document permissions, it seems like there are likely some awkward problems at the interface between the two systems, along the lines of "did Alice have the right to grant access to this document at that point in the group's history". I haven't thought about this very hard - it's probably just a matter of recording the current heads of the group's graph in the UCAN — but it seems clear that these questions go away if you're using the same graph for both things.

Enforcement

I can think of a couple of ways to enforce whatever read access rules we have:

  • it could be done kind of crudely in the AuthProvider at the network level, by letting Repo think it's syncing with everyone but not letting unauthorized sync messages make it out to the real network adapter
  • or, we could tap into the sharePolicy API

Encrypting the bits

Here the idea is that in addition to specifying that a given document can only be shared with a group of people, I encrypt the document when sharing it, using a key that is only available to that group.

The problem in Automerge world is that encrypting every change becomes prohibitively expensive.

But it seems like Nik's SecSync could be a solution to this, and it already works with Automerge.

I'd be curious to know how this might fit in with Alex's plans for a generic collection sync server, or Martin's proposed new sync protocol.

Write permissions

Expressing write permissions can be done using the same approach we use for expressing read permissions.

Enforcing write permissions is trickier though. In theory the idea is "attention rather than permission" - we can't control what changes people make, but we can choose whether to pay attention to those changes or not. But in practice we don't have a way to do that:

  • Even if you can authenticate with the peer you're communicating with, you don't have a secure way of knowing who authored which changes in an Automerge document. Adding a cryptographic signature to every keystroke in a document is of course a non-starter, but we have Martin's assurance that all we need is for each actor in a document to sign their most recent change. So then we need a way to record and replicate that information (without carrying around the full history of signatures of older changes). We could come up with a janky userland way to do that, but ideally this would just be something Automerge just does for us.

  • Once you have that, you need to be able to ignore changes that you've decided were unauthorized. I don't think there's a way in Automerge to reconstitute a document while omitting some subset of its changes? At any rate that's what we'd need.

Connections

lf/auth has a sprawling Connection class for point-to-point connections. It uses a heinous XState state machine to implement a protocol including:

  1. hooks for presenting and validating invitations
  2. mutual authentication of known team members (using a signature challenge that is probably vulnerable to PITM attacks and isn't necessary anyway because of the next step)
  3. doing a key exchange and creating an encrypted channel (using NaCl public-key authenticated encryption)
  4. syncing the team graph (CRDX provides a sync protocol similar to Automerge's but encrypted)

image

This should be modularized, probably with these various processes implemented as independent state machines and one machine to orchestrate it all.

Here's how I would change these elements:

  1. invitations: would be provided by an extension
  2. mutual authentication via signature challenge: this step can be dropped
  3. key exchange/encrypted channel: I'd probably borrow from the AWAKE approach and do MLS, using our cross-group rolodex as the PKI service.
  4. sync: no change unless the group is expressed using UCANs in which case ask Brooklyn 😛

In lf/auth, a Connection always has a single Team as its context. So if Alice and Bob both belong to Team X and Team Y, they need to make two connections.

image

A better approach would be to use the collection of all groups we know about as our connection's context, so we have a single connection per peer:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment