Skip to content

Instantly share code, notes, and snippets.

@jli
Last active October 15, 2021 13:36
Show Gist options
  • Save jli/b2d2d62ad44b7fcb5101502c08dca1ae to your computer and use it in GitHub Desktop.
Save jli/b2d2d62ad44b7fcb5101502c08dca1ae to your computer and use it in GitHub Desktop.
Pinned/versioned Python dependencies when using both conda and pip

Pinned/versioned Python dependencies when using both conda and pip

Problem

It's good to have pinned/versioned dependencies for reproducible builds: https://pythonspeed.com/articles/pipenv-docker/

The conda-lock and pip-compile tools are helpful for this. But, they're not ideal when installing dependencies from both conda and pip because the solvers run independently and may generate inconsistent versions. Plus, it's annoying to juggle environment.yml, conda-linux-64.lock, requirements.in, and requirements.txt.

Solution

Create an environment-spec.yml with both your conda and pip dependencies:

name: base
channels:
  - conda-forge
  - defaults
  # etc.
dependencies:
  - matplotlib
  - pandas
  - pip  # needed to have a pip section below
  - scikit-learn
  - pip:
    - pyplot_themes  # only available on PyPI

Write a Dockerfile to install these dependencies, say regenerate_conda_enviroment.Dockerfile:

# syntax=docker/dockerfile:1

# Note: using miniconda instead of micromamba because micromamba lacks the
# `conda env export` command.
FROM continuumio/miniconda3:4.9.2

COPY environment-spec.yml /environment-spec.yml
# mounts are for conda caching and pip caching
RUN --mount=type=cache,target=/opt/conda/pkgs --mount=type=cache,target=/root/.cache \
    conda env create -n regen_env --file /environment-spec.yml

# Export dependencies.
RUN conda env export -n regen_env > /environment-lock-raw.yml
CMD ["cat", "/environment-lock.yml"]

Pair this with a script like regenerate_conda_enivornment.sh that updates environment-lock.yml:

#!/bin/bash
set -euo pipefail

# Run this script whenever environment-spec.yml changes or you
# want to update to the latest version of your dependencies.

# Install dependencies and export pinned versions.
docker build -t regen_conda_env -f regenerate_conda_enviroment.Dockerfile .
# Copy environment lock file out from the docker image.
docker run --rm regen_conda_env > environment-lock.yaml

Then, in your main Dockerfile, do something like:

# syntax=docker/dockerfile:1

FROM mambaorg/micromamba:0.13.1

ENV PYTHONUNBUFFERED 1
ENV PYTHONDONTWRITEBYTECODE 1

COPY environment-lock.yml /
RUN --mount=type=cache,target=/opt/conda/pkgs --mount=type=cache,target=/root/.cache \
    micromamba install -n base -y --file /environment-lock.yml

COPY . /app
RUN --mount=type=cache,target=/root/.cache/pip \
    pip install -e /app

References

Other details

  • You can use this approach for pip repo/github installs too (eg if you have a private library in a repo).
    • You can specify editable installs in environment.yml: https://stackoverflow.com/questions/19042389/conda-installing-upgrading-directly-from-github
    • conda env export seems to not export repo installs. You can do grep 'git+ssh' /environment-spec.yml >> /environment-lock.yml to add it. (This also requires removing the prefix: /opt/conda/envs/... line from the lock file.)
    • Need to add github's ssh key to /root/.ssh/known_hosts. I just ran ssh-keyscan github.com locally and copied the results into the Dockerfile.
    • If it's a private repo:
      • docker build needs your ssh key: --ssh default=~/.ssh/id_rsa
      • Add --mount=type=ssh in the conda install commands in the Dockerfiles.
  • If conda env export fails with something about invalid version specs, one of your dependencies might have a bug. See conda/conda#8687 for examples of workarounds.
  • The conda env export step includes name and prefix fields. These get ignored by installing with -n base in the main Dockerfile.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment