Skip to content

Instantly share code, notes, and snippets.

@natefoo
Last active March 8, 2023 18:57
Show Gist options
  • Save natefoo/19cefeedd1942c30f9d88027a61b3f83 to your computer and use it in GitHub Desktop.
Save natefoo/19cefeedd1942c30f9d88027a61b3f83 to your computer and use it in GitHub Desktop.
Galaxy/BioContainers mulled container command line utilities

Manually mulling container images

This is occasionally necessary e.g. when trying to build images for old software or old versions that won't be supported by BioContainers, custom channels, etc.

Another common use case is for when you are trying to run a Galaxy tool that is missing a requirement specification for something it depends on (this is common especially with older tools that assumed Python would be present) or that requires older conda packages that did not fully specify their dependencies.

Prerequisites

You will need Docker on whatever host you plan to run on. Singularity is not required (even if building Singularity images) since the Singularity build occurs in Docker.

You will also need galaxy-tool-util. However, as the Galaxy packages aren't regularly updated, you'll want at least 23.0 versions, since those include some nice new features such as the ability to use mamba as the resolver.

python3 -m venv galaxy-tool-util
. ./galaxy-tool-util/bin/activate
pip install --upgrade pip setuptools wheel
pip install git+https://github.com/galaxyproject/galaxy.git@release_23.0#subdirectory=packages/util
pip install git+https://github.com/galaxyproject/galaxy.git@release_23.0#subdirectory=packages/tool_util

Procedure

Custom Base Image

Mulled containers are built by installing conda packages to /usr/local in a miniconda- or mambaforge-based Docker container. After installation, the env is then "wrapped" into a different base container in order to be as slim as possible.

The base images are maintained by bioconda. By default, quay.io/bioconda/base-glibc-busybox-bash will be used, but if any of your conda packages specify extra.container=extended-base in their metadata, then quay.io/bioconda/base-glibc-debian-bash will be used instead.

You can force the base container of your image with $DEST_BASE_IMAGE. You can also generate a custom image and use that image. If you don't need a custom base image, skip to the following section.

  1. Create a base image

    A silly example for the sake of demonstration: you need to force the value of $FOO to foo in your container:

    mkdir context
    cat >context/Dockerfile << EOF
    FROM "quay.io/bioconda/base-glibc-debian-bash:latest"
    ENV FOO=foo
    EOF
    docker build -t natefoo/base-glibc-debian-bash:foo

    Replace natefoo above with the name of a Docker Hub account or org you have access to.

  2. Push the base image to a public Docker registry (only if step 1 was necessary)

    See also galaxyproject/galaxy#15716 for an explanation of why the push is necessary and any changes that might make it unnecessary in the future.

    docker push natefoo/base-glibc-debian-bash:foo

Build the image

. ./galaxy-tool-util/bin/activate
mulled-build \
    --verbose --singularity --use-mamba -c conda-forge,bioconda \
    --test 'some-command-that-returns-true' \
    build-and-test \
    'packageA=X.Y.Z,packageB=X.Y.Z,...'

To use a different or custom base image, set $DEST_BASE_IMAGE like so:

# use the debian-based bioconda base image even if none of the packages specify it should be used:
DEST_BASE_IMAGE='quay.io/bioconda/base-glibc-debian-bash:latest' mulled-build [ARGS ...]
# use the custom image generated in the previous section
DEST_BASE_IMAGE='natefoo/base-glibc-debian-bash:foo' mulled-build [ARGS ...]

By default, mulled-build will automatically generate the mulled hash name of your container based on the packages . This name is important for Galaxy because the container image that Galaxy uses for a particular tool execution is based on the hash generated from the tool's <requirement type="package"> tags.

For example, let's say I have a Galaxy tool with the following requirements:

<requirements>
    <requirement version="3.4.1">R</requirement>
    <requirement version="2.7">python</requirement>
</requirements>

This is probably an old tool from before the conda days, especially since the conda R package is named r-base. We could fix the <requirement> tags manually, but these requirements are so old there probably isn't a mulled biocontainer anyway, plus what if someone reinstalls the tool, eliminating our changes?

We want a container containing python=2.7.15,r-base=3.4.1 but with the mulled hash for python=2.7,R=3.4.1. Use mulled-name.py (in this gist) to generate the name:

(galaxy-tool-util)user@host:~$ mulled-name.py --type=v2 python=2.7 R=3.4.1
mulled-v2-d9e66a3dd0cec7ef78a20cf220d44ffbda883044:fc039d3aa515807e2ecc69b744c962d9be12f559

Now use the --name-override flag with mulled-build to set the name accordingly:

mulled-build \
    --verbose --singularity --use-mamba -c conda-forge \
    --test 'python -V; R --version' \
    --name-override=mulled-v2-d9e66a3dd0cec7ef78a20cf220d44ffbda883044:fc039d3aa515807e2ecc69b744c962d9be12f559 \
    build-and-test \
    'python=2.7.15,r-base=3.4.1'

Upon build you should have a Docker image:

user@host:~$ docker images quay.io/biocontainers/mulled-v2-d9e66a3dd0cec7ef78a20cf220d44ffbda883044:fc039d3aa515807e2ecc69b744c962d9be12f559
REPOSITORY                                                                 TAG                                        IMAGE ID       CREATED          SIZE
quay.io/biocontainers/mulled-v2-d9e66a3dd0cec7ef78a20cf220d44ffbda883044   fc039d3aa515807e2ecc69b744c962d9be12f559   575f8833bf1b   57 seconds ago   441MB

And a Singularity image:

user@host:~$ ls -lh singularity_import/mulled-v2-d9e66a3dd0cec7ef78a20cf220d44ffbda883044:fc039d3aa515807e2ecc69b744c962d9be12f559
-rwxr-xr-x 1 user user 144M Mar  8 13:49 singularity_import/mulled-v2-d9e66a3dd0cec7ef78a20cf220d44ffbda883044:fc039d3aa515807e2ecc69b744c962d9be12f559*

Automatically generated image names will have -0 appended to the version - this is the optional build identifier and is only significant to Galaxy in that if multiple images are available that match the correct name:version hash (version comparison stops at the -), the one with the newest build identifier will be used.

Install the image

If you built this image on your Galaxy server and are using Docker to run the tool, you're all set. Otherwise, you'll need to transfer the image to where the tool executes. Alternatively, you can upload it to a Docker registry such as Docker Hub or quay.io under your own namespace, and configure a docker container resolver to inspect that namespace.

If you are using Singularity to run the tool, copy the image to a shared filesystem between your Galaxy server and cluster, and then configure container resolvers like so in galaxy.yml:

galaxy:
  container_resolvers:
    # this contains your manumulled images
    - type: cached_mulled_singularity
      cache_dir: /path/to/dir/containing/images
    # this is for biocontainers pulled from quay.io
    - type: mulled_singularity

See Additional

  • involucro is the tool that orchestrates the container build and "wrapping"
  • mulled-build is the Galaxy/mulled wrapper around involucro
#!/usr/bin/env python3
#
# Generate the "mulled" name of a set of packages.
#
# Note: the "v1" mulled hash is not currently used by anything - Galaxy conda environments use a different "v1" hash, for this use `-t conda`. BioContainers and Galaxy's container resolution use the v2 hash.
#
# Depends on galaxy-tool-util: pip install galaxy-tool-util
import argparse
import sys
from enum import Enum
from functools import partial
from galaxy.tool_util.deps.conda_util import CondaTarget, hash_conda_packages
from galaxy.tool_util.deps.mulled.util import build_target, v2_image_name, v1_image_name
class HashType(Enum):
CONDA = "conda"
V1 = "v1"
V2 = "v2"
hash_type_values = [ht.value for ht in HashType]
def handle_args():
valid_hash_types = ", ".join(hash_type_values)
parser = argparse.ArgumentParser()
parser.add_argument("--type", "-t", default=HashType.V2.value,
help=f"Hash type (one of: {valid_hash_types}) [default: {HashType.V2.value}]")
parser.add_argument("packages", nargs="+", metavar="PACKAGE",
help="Packages (name=version) to generate mulled hash for")
return parser.parse_args()
return args
def get_funcs(hash_type):
if hash_type == HashType.CONDA.value:
hash_func = lambda x: f"mulled-v1-{hash_conda_packages(x)}"
return (CondaTarget, hash_func)
elif hash_type == HashType.V1.value:
return (build_target, v1_image_name)
elif hash_type == HashType.V2.value:
return (build_target, v2_image_name)
raise RuntimeError(f"Invalid hash type: {hash_type}")
def get_targets(packages, target_func):
targets = []
for package in packages:
assert '=' in package, f"Invalid package syntax: {package}"
name, version = package.split('=')
version.strip('=') # in case you write ==
targets.append(target_func(name, version))
return targets
if __name__ == "__main__":
args = handle_args()
target_func, hash_func = get_funcs(args.type)
targets = get_targets(args.packages, target_func)
hash_value = hash_func(targets)
print(hash_value)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment