Skip to content

Instantly share code, notes, and snippets.

@deviantony
Last active May 4, 2022 18:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save deviantony/ad556e0a84b54b667535791103525402 to your computer and use it in GitHub Desktop.
Save deviantony/ad556e0a84b54b667535791103525402 to your computer and use it in GitHub Desktop.
Swarm aggregation feature discussion

Swarm aggregation specs

Functional requirements

  • Ability to aggregate data from multiple nodes in a Swarm cluster
  • Easy to deploy solution
  • Works on both Windows and Linux platforms (and thus, multi-platform clusters)

Using an agent

Using an agent solution is an interesting way to go here. We can deploy an agent inside a Swarm overlay network using a global deployment method. This would schedule the agent to run on each node of the cluster and automatically creates/remove agents when nodes are added/removed from the cluster.

The agent should act as a proxy to the Docker API (similar to what Portainer is doing at the moment) via the /endpoints/<ID>/docker endpoint. It should be able to plug on the Docker API of a Docker node using the Unix socket or a TCP URL with or without TLS (for Windows hosts, evolution to support named pipe could be added later on).

Communications between Portainer and the agents must be secured/encrypted. An authentication mechanism is also required here to ensure that anybody cannot query the Docker API through the agent.

That agent should also be available to be deployed in a standalone Docker host and thus simplifying the management of a standalone host as well (no need to expose the Docker API anymore for example).

Potential evolutions:

  • The agent could be used to expose node metrics (disk, CPU, MEM usage...)

Possible implementations

Have a look at the following files for possible implementations.

Things to investigate

  • How can the agent communicate with the Docker API on Windows hosts? On Linux, each agent could communicate with the Unix socket but on Windows there is no equivalent (named piped support is currently a WIP in Portainer and only available on Windows 1709: portainer/portainer#1186). If the Docker API is exposed on each node via TCP, how can the agent determine the correct IP:PORT to reach it?

Cluster of agents

This implementation rely on the fact that the agent is able to auto-discover other agents. Deployed as a global service inside a Swarm cluster, each agent automatically discover the other agents in the cluster and register them.

Portainer can then be plugged on any of these agents (either by using DNS-SRV records ensure high-availability or using the URL to a specific agent). To do so, a user would just need to create a new endpoint and add the IP:PORT to one of the agents in the cluster (or use the Swarm service name to be able to use DNS-SRV records).

The agent would be responsible for the following:

  • Aggregate the data of multiple nodes (list the containers available in the cluster for example)
  • Redirect requests to specific nodes in the cluster (inspect a container on a specific node or create a new secret via a cluster manager for example)

This would give the advantage to be a totally transparent solution from the Portainer point of view. Changes in the Portainer codebase would be limited. For example, when querying the list of containers Portainer would just redirect the /containers/json request on the agent which would take care of the data aggregation and return the response.

On startup, the agent should do the following:

  • Retrieve information about the Docker engine where the agent is running: is it a Swarm manager? What version of the API is it using?
  • Auto-discover the other agents inside the network where the agent has been started and register them

When querying the Docker API via an agent, some queries should be intercepted and rewrited/redirected. Some examples are:

  • GET /containers/json: The agent should execute that request against all the existing nodes and aggregate the data into a new response object.

IMPORTANT: When aggregating data, the response must be as close as possible to the Docker API response and thus the agent should only decorate the response and not create a different response object. We don't want to create a new Docker API and should stay compatible.

  • POST /services: The agent should redirect the request to a manager node inside the cluster as this query can only be executed on manager nodes.

  • GET /containers/<ID>/json: The agent should redirect the request to the node where the container is located. To do so, a reference to the node where the container is located can be passed inside a HTTP header.

The advantages of this solution are the following:

  • Simple to deploy, just deploy a global agent service inside an overlay network
  • Highly-available, you can connect the Portainer endpoint to any agent located on any node inside your Swarm cluster (even more HA when using DNS-SRV)
  • Portainer does not need to be connected to a Swarm manager anymore, the agent take care of redirecting the requests to specific nodes in the cluster.
  • Transparency, will not require to add a lot of changes inside the Portainer codebase. UAC are still managed inside Portainer API.

Example usage:

  • Create your Docker Swarm cluster
  • Create a new overlay network inside the cluster: docker network create --driver overlay portainer_agent
  • Deploy the agent as a global service inside the cluster: docker service create portainer/agent --network portainer_agent (whether a port should be exposed depends if we use the DNS-SRV approach)
  • Within Portainer, create a new endpoint and either put the IP:PORT to an agent inside the endpoint URL or use the service name for the DNS-SRV approach.

Example Compose:

version: "3"

services:
  portainer-agent:
    image: portainer/agent
    # ports:
    #  - "6000:6000"
    deploy:
      mode: global
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock"
    networks:
      - portainer
  portainer:
    deploy:
      replicas: 1
    image: portainer/portainer
    volumes:
      /path/to/data:/data
    networks:
      - portainer      
      
networks:
  portainer:

Technical details

  • Communications between Portainer and the agents are one-way only (that is Portainer -> agents). At no time an agent should need to communicate with the Portainer instance.
  • Memberlist or even Serf can be used to register/manage the list of agents for each agent (with metadata such as role in the Swarm cluster, API version...).
  • No socat type service to deploy, the agent should act as a reverse proxy to the Docker API.
  • The agent should be isolated from the Portainer codebase and have its own Docker image (portainer/agent).

Things to investigate

  • How to secure the communications between Portainer and the agents
  • Authentication mechanism to prevent unauthenticated queries
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment