DerekV/scanning_a_docker_registry_with_curl.md

## scanning_a_docker_registry_with_curl.md

      
    Raw
  

              scanning_a_docker_registry_with_curl.md
            
          
    Pulling all the images from a docker registry

Normally, this would be a bad idea.  My use case was this, we had a private registry with all our images going back a while, badly needing to get cleaned up and purged, and I wanted to grab a archive first.  We couldn't just ssh to the machine hosting the registry to back it up that way.
To pull an image we need to know the registry, repository, and tag.
Luckily getting a list of repositories in the registry is easy with curl and jq.
curl --user "$REG_USER:$REG_PASS" -H "Accept: application/json"   "https://$REGISTRY/v2/_catalog"  | jq -r '.repositories[]'

Next, we need to get all the tags for a repository, which can be done like this:
curl --user "$REG_USER:$REG_PASS" -H "Accept: application/json"   "https://$REGISTRY/v2/$REP/tags/list"  | jq -r '.tags[]'

All together in a big ugly nested for loop.  This will take a while, and you'll need a big drive. Consider carefully what registry you target.
for REP in $(curl --user "$REG_USER:$REG_PASS" -H "Accept: application/json"   "https://$REGISTRY/v2/_catalog"  | jq -r '.repositories[]');
  do for TAG in $(curl --user "$REG_USER:$REG_PASS" -H "Accept: application/json"   "https://$REGISTRY/v2/$REP/tags/list"  | jq -r '.tags[]');
    do docker pull $REGISTRY/$REP:$TAG;   
  done;
done

Archival

# assumes gnu tar, with zstd installed... however gzip or any other compression will work
sudo tar cavf docker-2021-02-13.tar.zst /var/lib/docker 
aws s3 cp --storage-class=GLACIER ./docker-stable-2021-02-13.tar.zst  s3://docker-archive.example.com/docker-2021-02-13.tar.zst

This was a rather brute-force approach, was time consuming, and will be a slow should I need to restore it again.  Using docker save command would have been much nicer to restore a specific image, but it would also un-de-duplicate the layers and make the size unmanageable.
A far more convient method would have been to leave the disk image avialable, ie by running this in the office with an drive.  The above for loop would execute fast if repeated, only pulling new layers and moving fast over already pulled images, so this archive would be real easy to keep up to date, until the drive fills up anyways.