Skip to content

Instantly share code, notes, and snippets.

@mwiebe
Forked from mcg1969/variants.md
Created July 8, 2016 15:56
Show Gist options
  • Save mwiebe/3658739e44be67cc93fde3aed7be679d to your computer and use it in GitHub Desktop.
Save mwiebe/3658739e44be67cc93fde3aed7be679d to your computer and use it in GitHub Desktop.
Conda hackery: variants

Conda hackery: variants

Motivation

There are many situations where we are inclined to produce multiple variants of the same package, with each variant depending on a different set of low-level dependencies. For instance:

  • A numerical package might rely on the use of the Basic Linear Algebra Subprograms (BLAS). There are a variety of implementations of the BLAS we might wish to support, including including MKL, OpenBLAS, ACML, Accelerate, ATLAS.
  • We might wish to compile Python against different compilers that are not link-compatible with each other; thus all packages compiled against the CPython API must be recompiled.

The existence of these multiple variants can potentially pose a problem for users: how do they make sure that all of the packages in their environment are compatible with each other? That is: how do we ensure all packages that rely on BLAS use the same BLAS variant? How do we ensure that all packages with CPython dependency use the same ABI?

If there are only two variants, then conda's features/track_features facility provides a solution. For instance, if a user installs the nomkl metapackage, it turns on the nomkl feature, which causes all packages that link to BLAS to select an OpenBLAS variant instead of an MKL variant. Unfortunately, features (or perhaps our deployment of them) have proven to be a bit fragile, and they are necessarily limited to two variants.

To address this problem, we propose to formalize an approach for relying on conda's natural dependency resolution facilities. As you might have guessed, we are calling this approach variants.

Variant metapackages

To construct a set of variants, we begin by collecting the following information:

  • A name for the variant class; e.g., blas
  • Names for variant instance; e.g., mkl, openblas, accelerate, atlas.

These names must be compatible with Windows and Unix filename conventions, and cannot contain dash - characters (underscores are fine). Armed with this information, we proceed to build a set of packages, one for each variant instance, as follows:

  • Package name: the variant class; e.g., blas
  • Build string: the variant instance; e.g., mkl
  • Version number: 1 for the preferred instance; 0 for all others
  • Build number: 0, identical across all instances
  • Dependencies: none

The specific choices of 0 and 1 are not necessarily important for the version and build numbers. However, selecting exactly one variant instance to have version 1, and using identical values in all other cases, is important to communicating the preference information to conda. In theory, you could provide a preference hierarchy using version numbers 2, 3, etc. as well.

As a result of this build process, we will obtain a set of files with names of the form name-0-instance.tar.bz2 or name-1-instance, assuming that the standard naming convention is employed. For instance, for BLAS, we might have the following filenames:

     blas-1-mkl.tar.bz2
     blas-0-openblas.tar.bz2
     blas-0-accelerate.tar.bz
     blas-0-atlas.tar.bz2

Using the variants when package building

Once the variants have been built, we can now build packages that rely on them. To do so, we simply include the appropriate package as a dependency. For instance, the MKL version of a package might have this in their dependency list:

    depends:
       - mkl
       - blas * mkl

Note the use of the wildcard for version number. This gives you the ability to build these packages without knowing which variant is preferred. In fact, you can even change the preferences after the fact without having to rebuild these packages.

One might be tempted to simplify this process by including mkl as a dependency of blas-1-mkl, openblas as a dependency of blas-0-openblas, and so forth. In some cases, this should work just fine, but I would recommend this approach only if that dependency can be made completely version free. In other words, don't make blas-1-mkl depend on mkl 12.1.*; just make it depend on mkl. It will be very important to avoid the need to update these metapackages as the new versions of their underlying dependencies change. If a particular package does require a specific version of MKL, it can still be specified alongside the variant metapackage; e.g.,

    depends:
       - mkl >=12.1,<13
       - blas * mkl

Having said this, in some cases a variant will naturally be tied to particular versions. For instance, suppose we used a variant approach to differentiate between incompatible C++ ABIs. In this case, the individual variant instances might be drawn from a matrix of different C++ compilers and versions; e.g., cppabi-*-gcc5, cppabi-*-icc4, etc. (These are simply examples; I have no specific knowledge of C++ ABI issues.) In this case, it would be desirable for the variant metapackages to include version specifications in their dependencies.

Specifying a variant on the command line

Now that the variants have been put in place, a user can begin taking advantage of them without even knowing they are present. Suppose for instance the NumPy and SciPy have been built against multiple BLAS versions. Then performing

    conda create -n newenv python=2.7 numpy scipy

will automatically install blas-1-mkl.tar.bz, and sure that the mkl variant of both NumPy and SciPy are selected.

If the user wishes to specify a particular variant, they can do this:

    conda create -n newenv python=2.7 numpy scipy blas=*=openblas

Note the use of the wildcard to specify the version number. This will create the same environment as before, but with the openblas variant. To change variants, the user can simply install the a new variant package; for instance,

    conda install blas=*=atlas

force NumPy and SciPy to be updated to their ATLAS variants.

Challenge: conda update --all

Using the version number to specify the "preferred" or "default" variant introduces a problem with conda update --all. When this command is run, conda will select the highest version number of the variant class. It will switch the user to this preferred variant instance, whether or not they asked for it.

Unfortunately, giving all of the variant metapackages the same version number eliminates our ability to specify one as the default---and it still runs into problems with conda update --all. Under this scenario, conda will see a tie across all of the variants, and it will break that tie in an undefined manner. There will be no predicatbility on initial installs of the variant unless it is explicitly specified.

So it is clear that we will need to come up with an improvement to the Conda solver that will allow us to achieve the full behavior we seek. I propose a simple modification: when conda update --all is specified, we do not include variant metapackages in the list of packages to be updated. This will require some formal way to communicate to the solver that a package is not to be included in conda update --all.

If someone wishes to use variant packages effectively with an older version of conda, then they could pin the particular variant metapackage.

Minor challege: orphan packages

Consider again the following sequence of commands:

    conda create -n newenv python=2.7 numpy scipy blas=*=openblas
    conda install blas=*=mkl

The first command will install the OpenBlas variants of NumPy and SciPy, which will require the installation of the openblas conda package. The second command will replace NumPy and SciPy with ATLAS variants, and install the mkl conda package. The second command, however, does not remove OpenBLAS from the conda environment, even though it is not being used.

This is a natural consequence of the way conda works, and is not necessarily a problem if mkl and openblas are properly designed. In fact, we might want both packages to be installed alongside each other. For instance, there might be an applciation outside of the Python ecosystem that depends on a different BLAS variant than the one we have specified for Python.

Nevertheless, it points to a potential improvement in conda: the ability to detect these "orphan" packages and, upon request (say, with a conda clean command) remove them from an environment. This can be accomplished by examining the install and remove history for a given environment and differentiating between packages that are explicitly installed, those required because of dependencies, and orphans.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment