Hot-fixing and extending conda environments
Introduction
We deploy root-owned conda environments which are the basis of the data collection and analysis environments. On one hand because these are owned by root they are write-protected and ensure that users can not accidentally break the environment, on the other hand because they are write-protected they can not be upgraded or extended. While we want to run with a stable, standard, well understood software environment, we do need this ability for both development and for time-critical hot-fixes.
There are a number of possible solutions to this:
- use
sudo
to edit the environment in place (viaconda
,pip
, or "by hand") - create / clone the conda environment into user space
- use
$PYTHONPATH
andpip install --prefix
to create "overlay" directories
Historically, we have primarily gone with option 1 and 2, however they have significant down sides. Modifying the environment requires doing operations with elevated privileges and it is very hard to track what has been done after the fact.
This document lays out using "overlays" as a technique to both locally replace already installed packages and to add new packages for development.
General theory of operation
When you do python import foo
Python goes through a
process to find and load the
requested module. An early step uses import
path to search disk
locations for the requested modules. This path can be accessed via sys.path
Installation tools typically place files in directories that Python searches by
default, conventionally site-packages
. In addition to being directly
controllable from inside of a Python process, the entries in sys.path
can be
controlled via the
PYTHONPATH
envvar. When searching for an import Python stops looking when it finds the
module allowing you to effectively shadow modules by putting their locations
earlier in the path.
Taken together we can now do two things:
- install place modules someplace we can write to as an un-privileged user
- use
PYTHONPATH
to tell Python to find our modules there
Location, location, location
From Python's point of view these extra files can be anywhere, however as a matter of policy we are going to use the location
/some/path/overlays/{env_name}/
as the prefix which means we will have to add the path
/some/path/overlays/{env_name}/lib/{python_version}/site-packages
to the PYTHONPATH.
Similarly, if the package contains anything that will be run from the shell, then
/some/path/overlays/{env_name}/bin
needs to be added to PATH
by any mechanism.
Install a new package for development
To install a new packages into our overlay directory using pip we use the
--prefix
flag
for pip
:
$ conda activate {env_name}
$ pip install --prefix=/some/path/overlays/{env_name} ...
Any dependencies that are already installed in the host environment will be
picked up (conda provides the meta-data that pip needs to agree a package is
installed) and any missing dependencies will be installed along side your
requested package. All standard pip
command line flags and arguments should
work as expected.
To access the packages you need to arrange for the site-packages
directory in
the overlay to be added to the PYTHONPATH
/ sys.path
.
Upgrade an existing package
If we want to upgrade an existing package using this technique the above will
fail because as part of the installation process pip
will (rather sensibly)
attempt to uninstall any existing versions of the package. Because our host
environment required elevated privileges to modify this will fail. To upgrade
a package we need to additionally add the
-I
flag to
ignore any information about the already installed packages which prevents the
permissions error. However, because this also means that pip
is no longer
aware of the already installed dependencies! To avoid re-installing all of the
dependencies along with the target package we use the
--no-deps
flag to tell pip
to not try to install them. Thus :
$ conda activate {env_name}
$ pip install \
-I --no-deps \
--prefix=/some/path/overlays/{env_name} \
...
Nice write-up!
A comment on "Upgrade an existing package": One has to be aware that this will fail if the updated version needs updated dependencies as well.
I'm trying out a different approach for a similar goal:
Task: "Get an environment that matches a standard reference, but can be modified in certain aspects"
Steps:
conda list --explicit
(usually stored in a central place)conda
orpip
commands.This can and should be hidden in a script controlled via a configuration file.
Advantages
Disadvantages
--prefix
solution (but you could strip out something from the spec file, so that you could get away without the need for-I
.The last point can become a problem if users hold on to their environments for a long time. A key point here is that environments are disposable and can be easily recreated from the config file. So to update to a new state of the baseline, you simply delete the env and create it from the config anew. - One could even introduce monitoring and notification if the baseline changes, but that may be overkill.