It's good to have pinned/versioned dependencies for reproducible builds: https://pythonspeed.com/articles/pipenv-docker/
The conda-lock and pip-compile tools are helpful for this. But, they're not ideal when installing dependencies from both conda and pip because the solvers run independently and may generate inconsistent versions. Plus, it's annoying to juggle environment.yml, conda-linux-64.lock, requirements.in, and requirements.txt.
Create an environment-spec.yml
with both your conda and pip dependencies:
name: base
channels:
- conda-forge
- defaults
# etc.
dependencies:
- matplotlib
- pandas
- pip # needed to have a pip section below
- scikit-learn
- pip:
- pyplot_themes # only available on PyPI
Write a Dockerfile to install these dependencies, say regenerate_conda_enviroment.Dockerfile
:
# syntax=docker/dockerfile:1
# Note: using miniconda instead of micromamba because micromamba lacks the
# `conda env export` command.
FROM continuumio/miniconda3:4.9.2
COPY environment-spec.yml /environment-spec.yml
# mounts are for conda caching and pip caching
RUN --mount=type=cache,target=/opt/conda/pkgs --mount=type=cache,target=/root/.cache \
conda env create -n regen_env --file /environment-spec.yml
# Export dependencies.
RUN conda env export -n regen_env > /environment-lock-raw.yml
CMD ["cat", "/environment-lock.yml"]
Pair this with a script like regenerate_conda_enivornment.sh
that updates environment-lock.yml
:
#!/bin/bash
set -euo pipefail
# Run this script whenever environment-spec.yml changes or you
# want to update to the latest version of your dependencies.
# Install dependencies and export pinned versions.
docker build -t regen_conda_env -f regenerate_conda_enviroment.Dockerfile .
# Copy environment lock file out from the docker image.
docker run --rm regen_conda_env > environment-lock.yaml
Then, in your main Dockerfile
, do something like:
# syntax=docker/dockerfile:1
FROM mambaorg/micromamba:0.13.1
ENV PYTHONUNBUFFERED 1
ENV PYTHONDONTWRITEBYTECODE 1
COPY environment-lock.yml /
RUN --mount=type=cache,target=/opt/conda/pkgs --mount=type=cache,target=/root/.cache \
micromamba install -n base -y --file /environment-lock.yml
COPY . /app
RUN --mount=type=cache,target=/root/.cache/pip \
pip install -e /app
- https://stackoverflow.com/questions/68171629/how-do-i-pin-versioned-dependencies-in-python-when-using-both-conda-and-pip
- conda-lock feature request to support pip dependencies: conda/conda-lock#4
- You can use this approach for pip repo/github installs too (eg if you have a private library in a repo).
- You can specify editable installs in environment.yml: https://stackoverflow.com/questions/19042389/conda-installing-upgrading-directly-from-github
conda env export
seems to not export repo installs. You can dogrep 'git+ssh' /environment-spec.yml >> /environment-lock.yml
to add it. (This also requires removing theprefix: /opt/conda/envs/...
line from the lock file.)- Need to add github's ssh key to
/root/.ssh/known_hosts
. I just ranssh-keyscan github.com
locally and copied the results into the Dockerfile. - If it's a private repo:
docker build
needs your ssh key:--ssh default=~/.ssh/id_rsa
- Add
--mount=type=ssh
in theconda install
commands in the Dockerfiles.
- If
conda env export
fails with something about invalid version specs, one of your dependencies might have a bug. See conda/conda#8687 for examples of workarounds. - The
conda env export
step includesname
andprefix
fields. These get ignored by installing with-n base
in the main Dockerfile.