Skip to content

Instantly share code, notes, and snippets.

@samba
Last active January 25, 2024 04:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save samba/df45d50e545621831ad846596e68e53d to your computer and use it in GitHub Desktop.
Save samba/df45d50e545621831ad846596e68e53d to your computer and use it in GitHub Desktop.
API Design Principles

Principles of API Design

Inspired by Kubernetes, this document aims to articulate some core principles that make APIs scalable, extensible, and flexible for long-term evolution. Hopefully these concepts will be useful to you in designing your next application.

This is a living document. Please feel free to comment with ideas/feedback.

Adopt the Actor Model

The actor model allows for strict decoupling, and reduces the complexity of extending or adapting a system.

At its core, Kubernetes implements an actor model paradigm using a single API server architecture. Data types can be observed in the API server by controllers, which then act upon the new or modified resource, or on other resources in its domain. Each time a change is introduced on a data resource in the API, the controller reconciles the new (desired) state from the previous state, and makes any necessary changes in its domain. The domain of a controller is either within Kubernetes, acting on other data resources in the API, or systems outside of Kubernetes, such as creating a virtual machine or storage bucket.

This model allows a single architecture to grow and evolve within a single application domain, and to be extended into new domains when new data types are created, and new controllers created to interact with them.

In Kubernetes, each data type the responsibility of a dedicated component, and components interact only through a single, centralized data model. This allows new data types, and new components to be introduced without directly impacting existing system behaviors.

Each actor should have strict, narrowly defined responsibilities for a limited number of data types; ideally a single data type per actor. In practices, actors may require input of one datatype, and output to another.

A single actor (controller) should be responsible for the lifecycle of a given data type; exactly one, and it should be concerned with reflecting observed conditions in the status of the data resource. Only this actor is responsible for modifying the status of a resource.

All Actors are Reconcilers

Reconciliation should drive the actors' actions upon their domains, observing differences between desired state and prior state, and making appropriate changes to achieve the desired state. To correctly exert changes within their domains, many actors/controllers should functionally be level-triggered.

Reconciliation occurs after the resource is stored by the API server to its backing data store.

The status of a data resource must present a well-known structure, so that other observers of that data type can behave correctly based on the lifecycle of each resource dependency, and new observers can easily be created to integrate them.

Observers of a data resource should be edge-triggered, so that any modification is correctly handled.

Validation and Mutation: admission webhooks

When a resource is created in Kubernetes (corresponds to an HTTP POST), the API server passes it through two stages:

  1. Mutation: the resource is passed to zero or more mutation webhooks that are registered for that type. The result of these mutations is regarded as the authoritative resource representation to store.
  2. Validation: the resource is passed to zero or more validation webhooks that are registered for that type. These are responsible for throwing errors if a resource is somehow invalid, or otherwise approving it to be stored.

A defaulting webhook is a special case of a mutation webhook, which provides default values on a resource where its schema alone cannot present the default values, or where the defaults should be explicitly encoded onto the representation in the API. This is considered a best-practice, so that future changes of default values will not implicitly affect earlier uses of an API.

These occur before the resource is stored and admitted to the reconciliation process.

Access Control

Kubernetes' role-based access control brilliantly corresponds to the actor model. Each actor has a discrete identity in the system (a service account, in Kubernetes). Permissions are declared using roles. Actors are granted permissions by binding their identity to a role.

Notably, Kuberentes' authentication capabilities are oriented primarily to delegated authentication with other systems via OIDC, SAML, etc. For service accounts, Kubernetes can used token-based and certificate-based authentication; the component identity is encoded into a certificate which is signed by Kubernetes' CA. In practice, this means Kuberentes maintains no concept of user accounts, and relies only on certificate validation for system components.

Namespacing

Namespaceing is critical in multi-tenant environments. In Kubernetes, each application is practically a tenant, and therefore should be isolated, to ensure that resource names will not conflict. This becomes an ease-of-use concern for end-users, allowing them to name resources at whim, without fighting the system for uniqueness.

When designing systems, it's often wise to implement namespace isolation by default, even when only a single namespace is expected initially. This allows for general API behaviors to scale easily in future.

Because Kubernetes aligned their access control model to use namespaces as the scope of permissions, isolating clients to a specific namespace is very simple. This ensures reliable enforcement of security configuration, in a relatively intuitive logical model.

Schema Versioning

Kubernetes resource types are declared using a group-version-kind (GVK) taxonomy. Each data type is named (given a "Kind"), and belongs to a versioned group that coevolves.

For example, in Kubernetes:

apiVersion: apps/v1  # this is the group-version
kind: Deployment     # this is the kind
metadata:
  name:  test        # the resource name; unique within namespace
  namespace:  test   # the namespace in which this resource resides
spec:
  ... # desired state (properties here are indicated below as "fields")
status:
  ... # indicates the current state within the domains of relevant controllers.

An API group is expected to change over time, and therefore declares a version.

Data resources declare their own GVK, which specifies how they will be marshalled into a data structure; it indicates their schema. The stored representation of the resource includes this schema information; the API server is responsible for storing the resource after validating its content against its schema. Additional validation may be provided using a validating webhook, which is performed before the resource is stored.

When controllers load a resource from storage, the resource may first be transparently upgraded (or, as needed, downgraded) to the group-version the controller supports. The controller requests a resource, represented with a given version, and the resource may be restructured to meet the requested schema version. The controller receives a representation in the schema it expects.

The upgrade process of a data model is defined in a separate component, which is called by the API server when loading a resource, to convert it from its stored schema to the expected schema of each controller that reads it. The stored representation does not change until the resource is rewritten.

The status of a resource is technically treated as an independent API artifact; this is important, because it allows for mutation events to be isolated, so that controllers can respond only when the desired state has changed, and their updates to the status artifact will not be misinterpreted as desired-state changes.

Controllers (actors) can be versioned independently from the APIs they support, provided they maintain the behavioral compatibility guarantees associated with the API version.

Upgradability

To facilitate long-term evolution of APIs, strict conventions are defined on API versioning:

  • Alpha
    • Any alpha version may add optional fields at any time.
    • Alpha APIs may remove any field when the version is bumped to the next alpha.
    • An upgrade path is not required, but strongly encouraged, between alpha versions.
    • An alpha version must be supported for at least 1 release period.
    • No compatibility guarantees are made for clients.
  • Beta
    • Beta versions may add optional fields when the version is bumped to the next beta.
    • Beta versions must not remove required fields, and must not change the resulting behavior of any required field.
    • Upgrade paths are required between beta versions.
    • A beta version must be supported for the greater of 9 months or 3 release periods, and requires a 6 month deprecation notice.
    • Multiple beta API versions will be supported simultaneously, to support upgrades.
  • GA
    • GA versions are supported indefinitely. Long-term compatibility is guaranteed, as third-party clients, or other system components, will rely heavily on GA APIs for business-critical operations.
    • Deprecating a GA API requires 1 year deprecation notice, and comprehensive API migration for all suported use-cases to a replacement API, that reaches GA before the start of the 1 year deprecation cycle.

When we say an API version is supported, this means clients can still interact with the system using these APIs. There may be newer APIs doing something better or differently in the same domain, but the API they're built against is reliable for the given period.

Upgradability is achieved by:

  • Overlapping the support periods of two versions of the API; this is required for Beta and GA.
    • Data model upgrade automation is required for Beta and GA versions.
  • A deprecated API version must still be supported for at least 1 release during which its successor is also supported.

Applying Principles outside of Kubernetes

To use these principles outside of Kubernetes will likely feel like reimplementing Kubernetes itself. This is perfectly fine when the needs of the application meaningfully differ from Kubernetes' purpose.

It is worth noting, however, that Kubernetes can be extended very easily, and therefore should be considered as an application server if your needs align with its core behavior.

Suppose your application needs to:

  • Poll data from a network socket, from a number of clients.
  • Map that data into one of several message types.
  • Route the messages to appropriate handling logic.

To leverage these principles, your code would:

  1. Implement a central polling mechanism, responsible for buffering messages.
  2. Read a message data type from the message itself, before marshalling the rest of it (as a data structure).
  3. Find the appropriate schema in a dictionary of types; then marshal the message into the given schema.
  4. Pass the marshaled data type and client identification to an authorization handler, responsible for ensuring that the client has permission to pass that type of message in that namespace.
  5. Pass the marshaled data structure to the admission handlers registered for that type, converting between type-versions as needed for the schema supported by each handler.
  6. Store the resulting representation to persistent storage.
  7. Pass the resource in marshalled form to all controller (actor) handlers registered to observe that type, converting between type-versions as needed for the schema supported by each handler.

This requires:

  1. A versioned type registry, which declares schemata, admission handlers and controller handlers.
    • In some architectures these handlers could simply be function pointers.
    • In Kubernetes they're webhooks, i.e. URL callbacks to other servers.
  2. A control loop governed by a central API manager.
  3. A socket managed solely by that central API manager.
  4. Handlers that concern themselves with marshaled data types, not sockets.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment