Skip to content

Instantly share code, notes, and snippets.

@Hopobcn
Last active September 16, 2020 21:47
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save Hopobcn/e38726fac4da272341b0e36ef464c744 to your computer and use it in GitHub Desktop.
Save Hopobcn/e38726fac4da272341b0e36ef464c744 to your computer and use it in GitHub Desktop.
gitlab-runner configuration file with docker runner for using NVIDIA GPUs (nvidia-docker)

Use Gitlab-CI with GPU support

Since gitlab-runner cannot be forced to use nvidia-docker wrapper, follow this steps:

  1. Install all required software: docker, nvidia-docker, gitlab-ci-multi-runner
  2. Execute: curl -s http://localhost:3476/docker/cli
  3. Use that data to fill devices/volumes/volume_driver fields in /etc/gitlab-runner/config.toml
concurrent = 1
check_interval = 0
[[runners]]
name = "Docker runner <---complete-me--->"
url = "https://<---complete-me---->"
token = "28ce17edc8ea7437f3e49969c86341"
executor = "docker"
[runners.docker]
tls_verify = false
image = "nvidia/cuda"
privileged = false
disable_cache = false
devices = ["/dev/nvidiactl", "/dev/nvidia-uvm", "/dev/nvidia-uvm-tools", "/dev/nvidia3", "/dev/nvidia2", "/dev/nvidia1", "/dev/nvidia0"]
volumes = ["/cache", "nvidia_driver_384.81:/usr/local/nvidia:ro"]
volume_driver = "nvidia-docker"
shm_size = 0
[runners.cache]
@Hopobcn
Copy link
Author

Hopobcn commented Feb 27, 2018

This method is outdated.

@pafelin
Copy link

pafelin commented Oct 5, 2018

Is there a newer method ?
I have tried by installing the nvidia-docker, + docker + the runner itself. Then I set only the runtime parameter of the runner to be "nvidia" and the executor to be "docker" but tensorflow for example doesn't detect the GPUs at all.

@frtrotta
Copy link

frtrotta commented Jul 5, 2019

The following config.toml provides GPU support (notice the runtime parameter).

concurrent = 1
check_interval = 0

[[runners]]
  name = "Docker runner <---complete-me--->"
  url = "https://<---complete-me---->"
  token = "28ce17edc8ea7437f3e49969c86341"
  executor = "docker"
  [runners.docker]
    tls_verify = false
    image = "nvidia/cuda"
    privileged = false
    disable_cache = false
    volumes = ["/cache"]
    shm_size = 0
    runtime = "nvidia"
[runners.cache]

Yet, is it not clear to me how to restrict the GPUs assigned to the runner, on a multi-GPU server. This functionality is named "GPU isolation".

The docker run command for GPU isolation follows: please notice the -e NVIDIA_VISIBLE_DEVICES=0. how can this be set for the runner in config.toml?

docker run --runtime=nvidia --rm -e NVIDIA_VISIBLE_DEVICES=0 nvidia/cuda:9.0-base nvidia-smi

@Hopobcn
Copy link
Author

Hopobcn commented Jul 8, 2019

In the [[runers]] section there's and environment keyword to define environment vars. But I guess that it wont work because you have to specify that environment var to docker.

So the only way I see is to specify NVIDIA_VISIBLE_DEVICES directly in the Dockerfile
https://github.com/NVIDIA/nvidia-docker/wiki/Usage#dockerfiles

@frtrotta
Copy link

It seems that environment in [[runners]] section is exactly what we were looking for.

Actually, whatever environment variable setting that happens before running the script section of the .gitlab-ci.yml configuration file is ok. See the following two examples: both of them worked for me.

Example 1: using gitlab-runner configuration only

In /etc/gitlab-runner/config.toml:

[[runners]]
  name = "runner-gpu0-test"
  url = "<url>"
  token = "<token>"
  executor = "docker"
  environment = ["NVIDIA_VISIBLE_DEVICES=0"]   # <== Notice this
  [runners.docker]
    runtime = "nvidia"  # <== Notice this
    tls_verify = false
    image = "nvidia/cuda:9.0-base"
    privileged = false
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache"]
    shm_size = 0
  [runners.cache]
    [runners.cache.s3]
    [runners.cache.gcs]

[[runners]]
  name = "runner-gpu1-test"
  url = "<url>"
  token = "<token>"
  executor = "docker"
  environment = ["NVIDIA_VISIBLE_DEVICES=1"]  # <== Notice this
  [runners.docker]
    runtime = "nvidia"  # <== Notice this
    tls_verify = false
    image = "nvidia/cuda:9.0-base"
    privileged = false
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache"]
    shm_size = 0
  [runners.cache]
    [runners.cache.s3]
    [runners.cache.gcs]

The .gitlab-ci.yml file.

image: nvidia/cuda:9.0-base

test:run_on_gpu0:
  stage: test
  script:
    - echo NVIDIA_VISIBLE_DEVICES=${NVIDIA_VISIBLE_DEVICES}
    - nvidia-smi
    - sleep 10s
  tags:
    - docker
    - gpu0

test:run_on_gpu1:
  stage: test
  script:
    - echo NVIDIA_VISIBLE_DEVICES=${NVIDIA_VISIBLE_DEVICES}
    - nvidia-smi
    - sleep 7s
  tags:
    - docker
    - gpu1

The two runners have been tagged with docker, gpu0 and docker, gpu1 respectively.

Example2: using Gitlab CI custom environment variables

Gitlab CI custom environment variables

/etc/gitlab-runner/config.toml same as Example 1.

The .gitlab-ci.yml file.

image: nvidia/cuda:9.0-base

variables:
   NVIDIA_VISIBLE_DEVICES: "3"  # This is going to override definition(s) in /etc/gitlab-runner/config.toml

test:run_on_gpu0:
  stage: test
  script:
    - echo NVIDIA_VISIBLE_DEVICES=${NVIDIA_VISIBLE_DEVICES}
    - nvidia-smi
    - sleep 10s
  tags:
    - docker
    - gpu0

test:run_on_gpu1:
  stage: test
  script:
    - echo NVIDIA_VISIBLE_DEVICES=${NVIDIA_VISIBLE_DEVICES}
    - nvidia-smi
    - sleep 7s
  tags:
    - docker
    - gpu1

@hyviquel
Copy link

hyviquel commented Sep 18, 2019

Do you guys know how to make it work with docker v19.03.2 which integrates native support for nvidia gpus?
The runtime = "nvidia"does not work anymore, containers should be executed with --gpus flag now.

docker run -it --rm --gpus all ubuntu nvidia-smi

@frtrotta
Copy link

frtrotta commented Sep 19, 2019

it is an open issue and, looking at the comments, it does not seem to be fixed soon.

I am using Docker 19.03 together with nvidia-docker2. This provides the new --gpu switch, while keeping the compatibility with the old --runtime switch (refer to https://github.com/NVIDIA/nvidia-docker/tree/master#upgrading-with-nvidia-docker2-deprecated).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment