Skip to content

Instantly share code, notes, and snippets.

@coltonbh
Last active May 6, 2024 21:04
Show Gist options
  • Save coltonbh/374c415517dbeb4a6aa92f462b9eb287 to your computer and use it in GitHub Desktop.
Save coltonbh/374c415517dbeb4a6aa92f462b9eb287 to your computer and use it in GitHub Desktop.
Docker Swarm GPU Support

GPU Support For Docker Swarm

Docker compose has nice support for GPUs, K8s has moved their cluster-wide GPU scheduler from experimental to stable status. Docker swarm has yet to support the device option used in docker compose so the mechanisms for supporting GPUs on swarm are a bit more open-ended.

Basic documentation

  • NVIDIA container runtime for docker. The runtime is no longer required to run GPU support with the docker cli or compose; however, it appears necessary so that one can set Default Runtime: nvidia for swarm mode.
  • docker compose GPU support
  • Good GitHub Gist Reference for an overview on Swarm with GPUs. It is a bit dated, but has good links and conversation.
  • Miscellaneous Options for docker configuration. Go down to "Node Generic Resources" for an explanation of how this is intended to support NVIDIA GPUs. The main idea is one has to change the /etc/docker/daemon.json file to advertise the node-generic-resources (NVIDIA GPUs) on each node. GPUs have to be added by hand the the daemon.json file, swarm does not detect and advertise them automatically.
  • How to create a service with generic resources. This shows how to create stacks/services requesting the generic resources advertised in the /etc/docker/daemon.json file.
  • Quick blog overview confirming these basic approaches.
  • Really good overview on Generic Resources in swarm.

Solutions to Enable Swarm GPU Support

Both solutions need to follow these steps first:

  1. Install nvidia-container-runtime. Follow the steps here. Takes <5 minutes.
  2. Update /etc/docker/daemon.json to use nvidia as the default runtime.
{
  "default-runtime": "nvidia",
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}
  1. Restart the docker daemon on each node sudo service docker restart. Confirm the default runtime is nvidia with docker info.

Solution 1

You're done. When you deploy a service to a node, it will by default see all the GPUs on that node. Generally this means you are deploying global services (one per node) or assigning services to specific nodes so that there aren't accidental collisions between services accessing the same GPU resources simultaneously.

If you want to expose only certain GPUs to a given service (e.g., multiple services on one node with each having access only to its own GPU(s)) use the NVIDIA_VISIBLE_DEVICES environment variable for each service. To do this dynamically so that each service gets access to its own GPU using docker service templates looks like this:


services:
  my-service-node-001:
    image: blah blah
    environment:
      - NVIDIA_VISIBLE_DEVICES={{.Task.Slot}}
      deploy:
        replicas: 15
        placement:
          constraints:
            - node.hostname==some-node-001

Because {{.Task.Slot}} starts counting at 1, you may want to include a global service in the template to make use of GPU 0.

Solution 2

Advertise NVIDA GPUs using Node Generic Resources. This is the most general purpose approach and will enable services to simply declare the required GPU resources and swarm will schedule them accordingly.

The /etc/docker/daemon.json file on each node needs to be updated to advertise its GPU resources. You can find the UUID for each GPU by running nvidia-smi -a | grep UUID. You only need to include GPU plus the first 8 digits of the UUID, it seems, i.e., GPU-ba74caf3 for the UUID. The following needs to be added to the daemon.json file already declaring nvidia as the default runtime.

{
  "node-generic-resources": [
    "NVIDIA-GPU=GPU-ba74caf3",
    "NVIDIA-GPU=GPU-dl23cdb4"
  ]
}

Enable GPU resource advertising by uncommenting the swarm-resource = "DOCKER_RESOURCE_GPU" line (line 2) in /etc/nvidia-container-runtime/config.toml.

The docker daemon must be restarted after updating these files by running sudo service docker restart on each node. Services can now request GPUs using the generic-resource flag.

docker service create \
    --name cuda \
    --generic-resource "NVIDIA-GPU=2" \
    --generic-resource "SSD=1" \
    nvidia/cuda

The names for node-generic-resources in /etc/docker/daemon.json could be anything you want. So if you want to declare NVIDIA-H100 and NVIDIA-4090 you could and then request specific GPU types with --generic-resource "NVIDIA-H100".

To request GPU resources in a docker-compose.yaml file for the stack use the following under the deploy key.

services:
  my-gpu-service:
    ...
    deploy:
      resources:
        reservations:
          generic_resources:
            - discrete_resource_spec:
              kind: "NVIDIA-GPU"
              value: 2
@coltonbh
Copy link
Author

coltonbh commented Feb 1, 2024

@NiklasWilson I believe this is the expected behavior! Docker swarm does not have support for individual GPU allocation for stacks/services. So you either need each service to pick up an environment variable like NVIDIA_VISIBLE_DEVICE=X using the {{.Task.Slot}} environment variables so that it uses only one device, or you'll need to explicitly declare the GPUs in your device configuration in the /etc/docker/daemon.json. When running docker not in swarm mode (i.e., docker run --it --gpus 1 ... then docker DOES know how to grab only certain GPUs for the container. The key distinction here is running the docker daemon in single node mode vs. swarm mode.

@NiklasWilson
Copy link

@NiklasWilson I believe this is the expected behavior! Docker swarm does not have support for individual GPU allocation for stacks/services. So you either need each service to pick up an environment variable like NVIDIA_VISIBLE_DEVICE=X using the {{.Task.Slot}} environment variables so that it uses only one device, or you'll need to explicitly declare the GPUs in your device configuration in the /etc/docker/daemon.json. When running docker not in swarm mode (i.e., docker run --it --gpus 1 ... then docker DOES know how to grab only certain GPUs for the container. The key distinction here is running the docker daemon in single node mode vs. swarm mode.

@coltonbh If this is the case what exactly is the purpose of setting up node-generic-resources with individual gpu uuids if the swarm can't use them to assign an image to a specific gpu?

@NiklasWilson
Copy link

NiklasWilson commented Feb 1, 2024

After 3 days I have finally found the solution.

  1. Use complete UUIDs
  2. In /etc/nvidia-container-runtime/config.toml change "DOCKER_RESOURCE_GPU" to "DOCKER_RESOURCE_NVIDIA-GPU"

After making those changes the docker swarm will be able to select the correct GPU on machines with multiple GPUs.

@coltonbh
Copy link
Author

coltonbh commented Feb 1, 2024

@NiklasWilson the purpose of setting up the node-generic-resources is so that swarm mode CAN now assign the specific GPUs to services. But swarm mode cannot do it out-of-the-box without the manual configuration. docker run CAN assign only certain GPUs to a container out-of-the-box without manual configuration. Sounds like you got it working! Congrats :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment