dreampuf/readme.md

## readme.md

      
    Raw
  

              readme.md
            
          
    Source: https://gitlab.com/gitlab-org/gitlab-runner/issues/1583#note_93170156
OK, I've experimented a lot getting this going with the docker+machine executor (specifically with the amazonec2 driver, which I suspect is quite common for people looking at this thread!), it may also be helpful to others when debugging what's going on for them.
docker+machine is interesting because it has several relevant contexts (i.e. a file system and environment variables), which I shall refer to as:

"runner": what is running the gitlab-runner binary - in my case this is an ECS-managed docker container for the gitlab/gitlab-runner image on docker hub, but it could the systemd service configuration if you're running directly on the machine.
"job host": the docker-machine created machine (e.g. EC2 instance) that runs the docker daemon
"job container": the docker container for the image specified in the project .gitlab-ci.yaml (or the default in config.toml)

Of course, if you're not using the docker machine (or ssh?) executor, then the runner and job host context are on the same physical machine.
With some experimenting, and spelunking through this project, I found out the following:

The gitlab-runner binary is what calls docker-credential-ecr-login, so make sure docker-credential-ecr-login version in the runner context succeeds, and that the runner context is the one with IAM permissions for ECR
gitlab-runner uses the docker go client library to talk to the docker daemon, not the docker CLI, so it must re-implement configuration parsing and authentication. In particular, this means that credsStore is implemented (by !501 (merged)), but not credHelpers
DOCKER_AUTH_CONFIG is defined and used by gitlab-runner, not by docker, so don't expect setting that to make the docker CLI work.
DOCKER_AUTH_CONFIG should still be specified as a job-visible environment variable, e.g. in config.toml environment, or pipeline secret variables etc., even though it's actually read by gitlab-runner in the runner context, not the job container. That one is weird. I suspect using engine-env in MachineOptions to set this would not work because of this?
gitlab-runner uses the provided credsStore list command for... some reason? Unfortunately, at some point AWS added the requirement to docker-credential-ecr-login list that the AWS region is provided, the simplest way to do this is to set the AWS_REGION environment variable - but unlike DOCKER_AUTH_CONFIG this must be in the runner context
Test the final call that actually gets the token with echo $REGISTRY_NAME | docker-credential-ecr-login get, where $REGISTRY_NAME should look like 123456789012.dkr.ecr.my-region-1.amazonaws.com (the part of the repository name before the first /)

Unrelated to gitlab, but also:

By default the EC2 instance profile is exposed to docker containers that are run in it. You can test this with curl http://169.254.169.254/latest/meta-data/iam/security-credentials/<iam-role-name>, which will return the access key id and secret key along with other metadata. You can lock this down further with ECS task roles, but I haven't looked into that myself. This applies both to running gitlab-runner as a docker container, and to docker-machine created EC2 instances with the amazonec2-iam-instance-profile machine option.
The only relevant ECR permission when actually using docker is ecr:GetAuthorizationToken, which doesn't distinguish between read and write, nor to individual repositories (only at the registry level), so don't bother trying to lock down permission to push to ECR.

In summary, to pull ECR as the job image:

ensure the runner context has credentials with ECR permissions - including via IAM profiles if it's on EC2, but the default profile in ~/.aws/config / ~/.aws/credentials should also work?
put docker-credential-ecr-login on the PATH for gitlab-runner (and don't forget to +x, of course)
set AWS_REGION to the region of your ECR repository (don't think it's possible to be cross-region yet)
config.toml should have environment = ["DOCKER_AUTH_CONFIG={\"credsStore\":\"ecr-login\"}"] in [[runners]], or if you have multiple private registries(?), as a runner pipeline variable or in .gitlab-ci.yaml variables.

This wont get you the ability to use ECR in your CI job scripts though, for that you have a few options, but it's easy enough to extend the solution:

grant the docker client in the job container access to the docker daemon on the job host (installed by docker-machine) by sharing /var/run/docker.sock
make sure in the job /root/.docker/config.json (remember, DOCKER_AUTH_CONFIG is not read by docker CLI) has {"credsStore":"ecr-login"}, and docker-credential-ecr-login is on the path.
that the job container context has AWS credentials with ECR permissions, so docker-credential-ecr-login can get the token, same as above.
that you have the docker client binary, of course! You can use the docker image, or also mount the job host docker binary.

Note that docker doesn't require AWS_REGION, it only uses get with the actually accessed registry.
The way I did this is update config.toml to have:
[[runners]]
  [runners.docker]
    volumes = [
      "/cache",

      # So 'docker' client works in CI
      "/var/run/docker.sock:/var/run/docker.sock",

      # So 'docker push <ECR image> works in CI
      "/root/.docker:/root/.docker",
      "/usr/local/bin/docker-credential-ecr-login:/usr/local/bin/docker-credential-ecr-login"
    ]
  [runners.machine]
    MachineOptions = [
      "amazonec2-iam-instance-profile=RUNNER_INSTANCE_PROFILE_NAME",
      "amazonec2-userdata=/path/to/userdata"
    ]

where /path/to/userdata contains something like:
#!/bin/bash
set -eu

curl --fail \
    https://MY_BUCKET.s3-MY_REGION.amazonaws.com/SOME_PREFIX/docker-credential-ecr-login \
    -o /usr/local/bin/docker-credential-ecr-login
chmod +x /usr/local/bin/docker-credential-ecr-login

mkdir -p ~/.docker
echo > ~/.docker/config.json '{ "credsStore": "ecr-login" }'

And the URL to docker-credential-ecr-login works because the object was uploaded with --acl public-read
Thanks to all the above commenters for helping me nail this down!