jli/pinned_conda_and_pip_deps.md

## pinned_conda_and_pip_deps.md

      
    Raw
  

              pinned_conda_and_pip_deps.md
            
          
    Pinned/versioned Python dependencies when using both conda and pip

Problem

It's good to have pinned/versioned dependencies for reproducible builds: https://pythonspeed.com/articles/pipenv-docker/
The conda-lock and
pip-compile tools are helpful for this.
But, they're not ideal when installing dependencies from both conda and pip because the solvers
run independently and may generate inconsistent versions.
Plus, it's annoying to juggle environment.yml, conda-linux-64.lock, requirements.in, and requirements.txt.
Solution

Create an environment-spec.yml with both your conda and pip dependencies:
name: base
channels:
  - conda-forge
  - defaults
  # etc.
dependencies:
  - matplotlib
  - pandas
  - pip  # needed to have a pip section below
  - scikit-learn
  - pip:
    - pyplot_themes  # only available on PyPI
Write a Dockerfile to install these dependencies, say regenerate_conda_enviroment.Dockerfile:
# syntax=docker/dockerfile:1

# Note: using miniconda instead of micromamba because micromamba lacks the
# `conda env export` command.
FROM continuumio/miniconda3:4.9.2

COPY environment-spec.yml /environment-spec.yml
# mounts are for conda caching and pip caching
RUN --mount=type=cache,target=/opt/conda/pkgs --mount=type=cache,target=/root/.cache \
    conda env create -n regen_env --file /environment-spec.yml

# Export dependencies.
RUN conda env export -n regen_env > /environment-lock-raw.yml
CMD ["cat", "/environment-lock.yml"]
Pair this with a script like regenerate_conda_enivornment.sh that updates environment-lock.yml:
#!/bin/bash
set -euo pipefail

# Run this script whenever environment-spec.yml changes or you
# want to update to the latest version of your dependencies.

# Install dependencies and export pinned versions.
docker build -t regen_conda_env -f regenerate_conda_enviroment.Dockerfile .
# Copy environment lock file out from the docker image.
docker run --rm regen_conda_env > environment-lock.yaml
Then, in your main Dockerfile, do something like:
# syntax=docker/dockerfile:1

FROM mambaorg/micromamba:0.13.1

ENV PYTHONUNBUFFERED 1
ENV PYTHONDONTWRITEBYTECODE 1

COPY environment-lock.yml /
RUN --mount=type=cache,target=/opt/conda/pkgs --mount=type=cache,target=/root/.cache \
    micromamba install -n base -y --file /environment-lock.yml

COPY . /app
RUN --mount=type=cache,target=/root/.cache/pip \
    pip install -e /app
References


https://stackoverflow.com/questions/68171629/how-do-i-pin-versioned-dependencies-in-python-when-using-both-conda-and-pip
conda-lock feature request to support pip dependencies: conda/conda-lock#4

Other details


You can use this approach for pip repo/github installs too (eg if you have a private library in a repo).

You can specify editable installs in environment.yml: https://stackoverflow.com/questions/19042389/conda-installing-upgrading-directly-from-github
conda env export seems to not export repo installs. You can do grep 'git+ssh' /environment-spec.yml >> /environment-lock.yml to add it. (This also requires removing the prefix: /opt/conda/envs/... line from the lock file.)
Need to add github's ssh key to /root/.ssh/known_hosts. I just ran ssh-keyscan github.com locally and copied the results into the Dockerfile.
If it's a private repo:

docker build needs your ssh key: --ssh default=~/.ssh/id_rsa
Add --mount=type=ssh in the conda install commands in the Dockerfiles.


If conda env export fails with something about invalid version specs, one of your dependencies might
have a bug. See conda/conda#8687 for examples of workarounds.
The conda env export step includes name and prefix fields.
These get ignored by installing with -n base in the main Dockerfile.