Skip to content

Instantly share code, notes, and snippets.

@nikclayton
Last active February 28, 2024 03:20
Show Gist options
  • Save nikclayton/7bc5166b910878fd433c4b23350227ef to your computer and use it in GitHub Desktop.
Save nikclayton/7bc5166b910878fd433c4b23350227ef to your computer and use it in GitHub Desktop.

Clients should be able to determine the operations the server supports

Version Date Changes
3.0 2024-01-22 Re-write; include the operation information in the nodeinfo
2.0 2023-09-25 Re-write; replace the original suggestion to use the OpenAPI definition with a simpler specification
1.0 2023-08-14 Initial draft

Synopsis

This document proposes an extension to the NodeInfo schema that would allow developers of Mastodon and Mastodon-like servers to unambigiously communicate the operations their servers support, and allow developers of clients for those servers to detect these features and update their UX accordingly.

This document is written for:

  • The maintainers of the NodeInfo specification
  • Developers of Mastodon and Mastodon-like servers
  • Developers of clients for those servers

After reading this document you should:

  • Understand the general problem this is intended to solve
  • Understand the proposed solution
  • Understand alternatives to the solution, and why they are not appropriate
  • Be able to provide feedback on the proposal

Overview

For the purposes of this document a "Mastodon or Mastodon-like" server is a server that presents the Mastodon client API, optionally with extensions to that API that provide additional functionality. These servers include, but are not limited to:

  • Mastodon
  • Glitch
  • Hometown
  • Pleroma
  • Akkoma
  • Firefish
  • Iceshrimp
  • Sharkey
  • Friendica
  • GoToSocial

Clients of these servers have an API discovery problem. Since different servers support different (but similar) APIs the client has to determine what API operations the server supports.

Given the wide variety of servers that are available, and their many forks, it's not feasible for clients to maintain an accurate list of all the possible server software names while mapping the names to API features.

Instead the server should have a mechanism for advertising the operations it supports.

The client would use this when determining what features to show the user, without needing to employ complex, error-prone heuristics.

This would also provide a clear mechanism for Mastodon and Mastodon-like servers to incrementally deploy new features and deprecate old ones without inconveniencing clients.

It also provides a clear mechanism to advertise server functionality without continually bolting it on to the "instance info" mechanism in the inconsistent fashion that has been done so far.

The rest of this document sets out the specific problems I'm interested in solving, with motivating examples, and then describes how the new approach would solve these problems.

Goals in writing this down

  1. Get feedback from developers of client and server software for Mastodon and Mastodon-like systems
  2. Get buy-in from client developers that they would support this in their apps if adopted
  3. Get buy-in from one or more server developers to support this in their servers

Problems

The supported API is not easily discoverable

Changes are made to the Mastodon API in a manner that is not easily discoverable by clients.

For example, Add POST /api/v1/conversations/:id/unread by ClearlyClaire · Pull Request #25509 · mastodon/mastodon · GitHub adds a new API endpoint (api/v1/conversations/:id/unread).

The only way a client can discover that this API exists is to maintain, per-client, a mapping between Mastodon server version and the API supported at each version.

This is:

  1. A lot of work for each client
  2. Something that every client needs to do
  3. Easy to get wrong
  4. Doesn't scale across multitudes of different servers

No standard way for Mastodon servers to advertise that some functionality is disabled

The Instance information contains a configuration block that has some, but not all the information necessary to determine the features a server supports.

Other servers have extended this information in incompatible ways (e.g., the pleroma block).

No standard way for Mastodon-like servers to advertise their functionality to clients

Mastodon-like servers implement some or all of the Mastodon API.

In many cases they also extend the API, providing additional functionality (local-only posting, quoting, markdown formatting, bookmarks, etc.)

In some cases that functionality has already been incorporated in Mastodon (e.g., bookmarks), in other cases there are plans to include that functionality in Mastodon (e.g., quoting, markdown formatting).

This leads to three problems.

  1. There is no simple way for clients to know which parts of the Mastodon API the server supports
  2. There is no simple way for clients to know if the server supports additional operations
  3. If Mastodon decides to implement an API that was first introduced in a Mastodon-like server there is no way for clients to detect this, without recompiling the client with new information about what features a given Mastodon server version implements

Server developers have too much to do

Server developers already have a lot of work to do. Any proposal should therefore be straightforward to implement. Additional complexity, such as changing the contents of existing API responses, or requiring developers of different servers to tightly coordinate when new functionality is introduced is going to make it less likely that groups adopt any proposed solutions.

Proposed solution

A given Mastodon or Mastodon-like server supports a set of operations.

To expose those to the user a Mastodon client needs to know:

  • Which operations does the server support?
  • What's the overlap between the operations the server supports and the operations the client supports?

Therefore we need:

  1. A unique identifer for each operation that a set of servers supports identically
  2. A mechanism for a server to report the operations it supports

A unique identifier for each operation

Operations are named after the reverse FQDN of the server software that first implemented that operation, then an arbitrary number of dot-separated components determined by the server authors.

This ensures that operation IDs are unique without needing tight coordination between different server developer groups.

For example:

  • org.joinmastodon.api.statuses.post
  • org.joinmastodon.api.statuses.translate
  • io.github.glitch-soc.api.statuses.bookmark
  • dev.iceshrimp.api.notes.reactions.create

[!NOTE] Precise reverse FQDN to use for each server is to be decided

This example use the reverse FQDNs for the server's primary websites or documentation sites, but each server group would determine and document the reverse FQDN for their server's operations.

[!NOTE] Dot-separated components do not have to map 1:1 to API endpoint components

In these examples the dotted components after the api correspond to the path components of the API endpoint, but there is no requirement that they do so.

Each operation has one or more versions

Each operation exists at one or more semver-compatible (v2.0.0) versions. Semver is used because it is a widely deployed standard, easily understandable, and client libraries that can parse this format are available across many different programming languages.

For example, in the Mastodon API documentation "Post a new status" describes the API for posting a new status. That API has changed three times in the Mastodon server implementation.

  1. Initial implementation
  2. Support for scheduled_at
  3. Support for poll

There are no backwards-incompatible breaking changes across those versions so this is the same operation at three different versions; per semver the major version stays the same and the minor version is incremented.

  • 1.0.0 - initial implementation
  • 1.1.0 - support for scheduled_at
  • 1.2.0 - support for polls

[!IMPORTANT] These version numbers are unrelated to the version number of the software that introduced the operation

Example: Bookmarks

Bookmarking statuses originated in the glitch-soc fork and was incorporated in to Mastodon.

Therefore, the ID for the bookmark operations -- if they are compatible with the glitch-soc implementation -- use the io.github.glitch-soc.api prefix.

  • io.github.glitch-soc.api.statuses.bookmark @ 1.0.0 - bookmark a status
  • io.github.glitch-soc.api.statuses.unbookmark @ 1.0.0 - remove a status from bookmarks
  • io.github.glitch-soc.api.timeline.bookmarks @ 1.0.0 - fetch a timeline of the user's bookmarks
  • io.github.glitch-soc.api.timeline.bookmarks @ 1.1.0 - fetch a timeline of the user's bookmarks, supporting min_id and max_id simultaneously

Client discovery of supported operations and endpoints

Clients must be able to discover which operations the server supports and the endpoints to use for those operations.

To do this the nodeinfo (determined via /.well-known/nodeinfo) schema should be extended to support a new clientApis property.

The property's value is a map from a string key -- the operation ID -- to a set of one or more semver versions of the operation that the server supports.

For example:

"clientApis": {
    ...
    "org.joinmastodon.api.some.operation": ["1.0.0", "1.1.0", "1.2.0", "2.0.0"]
    ...
}

[!NOTE] Unordered versions

The supported version operations is not ordered; client code should treat this as a set, not a list.

[!NOTE] Not limited to Mastodon / Mastodon-like servers

This clientApis map is not limited to operations supported by Mastodon/Mastodon-like servers. This is a general mechanism that can be used by servers to expose information about their supported operations and could be used by other Fediverse software like Lemmy, KBin, etc.

Because of the semver rules for breaking changes servers may omit earlier versions from the list if they are included in a later version. In the previous example the 1.0.0 and 1.1.0 versions can be omitted as a server supporting v1.2.0 of an operation implicitly supports all preceding versions with the same major number.

"clientApis": {
    ...
    "org.joinmastodon.api.some.operation": ["1.2.0", "2.0.0"]
    ...
}

[!NOTE] There is no need to specify the operation semantics

The semantics of each {operation, version} pair are already known by the client (for each operation it supports). Semantics like whether these endpoints are GET, POST, DELETE, or PATCH, the exact names of the URL query parameters, the API endpoint, etc.

In other words, it is not permissible for a server to advertise an existing operation ID and change anything about how that operation works. The server developers should either register a new operation ID, or implement the operation as a new version (bumping the major version if it is a breaking change).

To deploy this...

Server developers

Servers where the set of supported operations is not user configurable would need to maintain a static map of operations to versions, and return that map as part of the nodeinfo response.

If the set of operations is user configurable (e.g., perhaps the server software supports a translation API but the server operator has not enabled translation support) the nodeinfo response would need to be dynamically generated from the current software configuration.

In both cases developing a new operation or changing an existing operation would require the developers to:

  1. Determine the operation's version number, following semver backwards-compatible rules
  2. Document the behaviour of the new operation / version
  3. Include the new operation / version in the server's response

Client developers

To provide the best user experience client developers would fetch the operations map when the user logs in.

If the client supports a particular operation at a particular version the client can query the map and determine whether the concrete version they need is in the map, or met by a higher version. Semver client libraries are available for Kotlin and Java (Android) and Swift (iOS), as well as many other languages.

If the server does not support the operation the client can fall back to a different operation, or disable the particular operation in the UI.

To use the example from earlier, Add POST /api/v1/conversations/:id/unread by ClearlyClaire · Pull Request #25509 · mastodon/mastodon · GitHub which adds a new API endpoint (api/v1/conversations/:id/unread).

The server would report this as:

"clientApis": {
    ...
    "org.joinmastodon.api.conversations.id.unread": ["1.0.0"]
    ...
}

and a client that wanted to conditionally support this would query the operations map for org.joinmastodon.api.conversations.id.unread with any version entry with a major version of 1, and if the operation/version pair is not found then disable the "Mark a conversation unread" UI affordances where they occur.

Is there a proof of concept?

Yes.

I have started implementing the client side of this in Pachli. At the moment this uses server version parsing heuristics to maintain a Pachli-specific map of operations and supported versions (Server.kt ) and then query the server's reported capabilities and adjust the UI accordingly.

For example, this snippet conditionally enables the "edit filters" UI only if the user's server supports filtering.

Maintaining the server-specific operations map in Pachli is error prone, slow to update, and does not benefit the wider ecosystem of Mastodon clients and servers, hence this proposal.

This solves...

This solves the problems described earlier:

  • "[[#The supported API is not easily discoverable]]"
    • The client can easily discover the specific operations the server supports, and adjust UX accordingly
  • "[[#No standard way for Mastodon servers to advertise that some functionality is disabled]]"
    • The clientApis property must reflect the active configuration of the server.
  • "[[#No standard way for Mastodon-like servers to advertise their functionality to clients]]
    • If a Mastodon-like server implements a Mastodon-compatible API endpoint it lists that endpoint using the relevant org.mastodon... operation identifier.
  • [[#Server developers have too much to do]]
    • This proposal doesn't modify any existing API responses
    • For a given server the list of supported operations can be statically configured, and does not change after the server has launched
    • The work of developing a dictionary of supported operations can be sharded amongst different groups
      • Server developers have a vested interest in contributing details of operations specific to their server, so more third party clients support them
      • Client developers have a vested interest in reviewing and contributing details of operations specific to servers their users use, to make their clients more attractive to potential users
      • No coordination is required between different groups of server developers to develop operation IDs
    • Developers are incentivised to re-use existing operations instead of inventing new ones
      • Implementing an existing operation in a compatible manner with another server increases the speed with which your users will be able to use the feature in their preferred clients.

Not in scope

This proposal doesn't address how clients can discover any limits associated with the operations. For example, how many characters are allowed per post, or the number of options that can be included in a poll.

That information is already included in the server's /api/v2/instance call (in the language of this proposal, the org.mastodon.api.instance operation).

I did consider extending the clientApis definition so that each operation mapped to an object that contained multiple keys, like this:

clientApis": {
    "org.joinmastodon.api.statuses.post": {
        "1.0.0": {
            "endpoint": "/api/v1/statuses",
            "limits": {
                "max_characters": 500,
                // ...
            },
            "mimeTypes": ["text/plain"],
            // ...
        },
        "1.1.0": { /* ... */ }
    }
}

That would significantly complicate this proposal, increasing the risk that it's not adopted. There's also no clear value in doing this.

Alternatives considered

Reporting capabilities alongside operations

It's tempting to think that operations could be broken down in to smaller parts.

For example, instead of different versions for the "post a status" operation you could include more specific capabilities in the operation description:

"clientApis": {
    ...
    "org.joinmastodon.api.statuses.post": {
        "contentWarning": true,
        "polls": true,
        "media": true,
        ...
    }
    ...
}

This indicates this server supports the "post a new status" operation with statuses that include content warnings, polls, and media.

You don't do that because it results in a combinatorial explosion of the different sub-types of operations that clients need to support, without any significant benefit.

Even the example above is incomplete; for example, some servers support including images in content warnings, so a simple boolean for the contentWarning property is insufficient.

So treating the thing-that-has-to-be-versioned as the operation (post a status, translate, reblog, etc) seems to be the better level of granularity.

Reporting capabilities in API responses

A server could include metadata in each response that contains an object that describes the operations that can be performed on that object. For example, the Status object could be modified to include an operations property that looks like this:

{
  "id": "103270115826048975",
  "created_at": "2019-12-08T03:48:33.901Z",
  ...
  "operations": {
      "org.joinmastodon.api.statuses.reply": ["POST", "https://example.com/api/v1/statuses"],
      "org.joinmastodon.api.statuses.view": ["GET", "https://example.com/api/v1/statuses/103270115826048975"],
      "org.joinmastodon.api.statuses.favourite": ["POST", "http/api/v1/statuses/103270115826048975/favourite"],
      ... etc
   }
  }
}

This is the Hypermedia as the engine of application state (HATEOAS) model.

It's an interesting approach, and a possible future direction. But it would require significant work on the part of server developers to implement as it would affect every response returned by the server.

On the other hand the approach in this proposal is static content in the nodeinfo response. It's significantly easier to implement and iterate on.

Clients keep a hardcoded server version : capabilities map

This could go the other way, and instead require servers to have a consistent name and parseable version number, and expect clients to keep a map of "server A at version V can perform operations X, Y, and Z".

I think this is the wrong approach because:

  1. It requires every client development team to independently maintain a mapping between server versions and capabilities
  2. It requires client updates whenever a server is released that supports a capability the client already supports on another server

Re that last point a worked example might make it clearer.

Suppose there are two server types, A and B. A supports operations X and Y, B supports X, Y, and Z.

A client is released which supports operations X, Y, and Z, and is hardcoded with knowledge about which server type supports a given operation.

A new version of server type A is released which now supports operation Z as well. But users of the client who connect to server type A cannot benefit from this until a new version of the client is released with updated information about the capabilities of server type A.

With the proposal in this document this problem does not occur; if a client supports operation Z (at a given version) and a server advertises that it supports that operation then the client can choose to use it without needing a new release.

This is better for our users.

Return an OpenAPI definition for the supported API

OpenAPI is a popular schema for defining an API. The server could just return the OpenAPI schema for the API that it supports.

I did consider this (an earlier version of this proposal was built around it). But it complicates the data the client needs to process, and includes data that the client will ignore.

Consider the /api/v1/timelines/home endpoint, which would have an operation ID something like org.joinmastodon.api.timelines.home under this proposal.

This is the OpenAPI definition for that endpoint, copied from the GoToSocial project's OpenAPI definition (swagger.yaml, the descriptions have been deleted to keep this a reasonable length):

    /api/v1/timelines/home:
        get:
            description: |-
                The statuses [... deleted ...]
        operationId: homeTimeline
            parameters:
                - description: [deleted]
                  in: query
                  name: max_id
                  type: string
                - description: [deleted]
                  in: query
                  name: since_id
                  type: string
                - description: [deleted]
                  in: query
                  name: min_id
                  type: string
                - default: 20
                  description: [deleted]
                  in: query
                  name: limit
                  type: integer
            produces:
                - application/json
            responses:
                "200":
                    description: Array of statuses.
                    headers:
                        Link:
                            description: [deleted]
                            type: string
                    schema:
                        items:
                            $ref: '#/definitions/status'
                        type: array
                "400":
                    description: bad request
                "401":
                    description: unauthorized
            security:
                - OAuth2 Bearer:
                    - read:statuses
            summary: See statuses/posts by accounts you follow.
            tags:
                - timelines

Most of the information in that definition is redundant for the client.

It's absolutely essential information to have for the server developer, and for producing documentation.

But the client should already have this compiled in. The contract between the client and the server, if the server reports that it supports the org.joinmastodon.api.timelines.home operation at v1.0.0 is that:

  • the valid parameters are max_id, since_id, min_id, and limit.
  • the response is JSON
  • there will be pagination details in the Link header

So returning an OpenAPI definition to the client significantly complicates things for no benefit.

OpenAPI is also is endpoint-oriented; by which I mean that the definition leads with the endpoint (/api/v1/statuses) and then describes the single operation that is present at that endpoint.

This is backwards to what we need, where the operation comes first, and multiple operations might be supported at the same endpoint.

Related links / prior art

Not an exhaustive list:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment