Skip to content

Instantly share code, notes, and snippets.

@sinoroc
Last active April 13, 2024 08:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sinoroc/2a9a319dfc6715cadabe73c7389bf2f8 to your computer and use it in GitHub Desktop.
Save sinoroc/2a9a319dfc6715cadabe73c7389bf2f8 to your computer and use it in GitHub Desktop.

Python package installers overrides

This is a proposal for a solution to help alleviate some frequent paint points with Python packaging. Mainly those things:

  • Fix bad dependency requirements in metadata
  • Fix bad build system in dependencies
  • Provide more options per-depency (--pre, --config-settings, --index-url, --find-links, etc.)

This document mostly focuses on the expected user experience, not on how this should be implemented in installers.

This proposition is born from real issue reports and questions.

In most cases, when "pip" is mentioned in this document, one should read "Python package installers" instead.

Main discussion thread:

Proposition in short

python -m pip install Application --overrides overrides.toml

where overrides.toml looks like this:

[Library]
requirements = [
    "LibraryNightly[cpu]",
]

[LibraryNightly]
index-url = "https://index.internal/simple/"
pre = true

In this scenario, the user wants to install Application. But in case it has a dependency on Library the user wants pip to install LibraryNightly with the cpu extra instead. Additionally the user wants that pip looks for LibraryNigtly on an internal index exclusively, and that pip considers pre-releases of that LibraryNightly. Other potential dependencies should be handled as usual, including potential dependencies of LibraryNightly.

Specification

If the installer's dependency resolver encounters a dependency whose name is a top-level section title in overrides.toml then the installer should replace this and any other occurence of this dependency with the override as defined in that section.

Examples

Exclude some specific version ranges, similar to pip's constraints.txt:

[Django]
requirements = [
    "Django>=4",
]

Skip a library altogether, do not install it at all:

[importlib]
requirements = []

Enforce some extra:

[Something]
requirements = [
    "Something[feature]",
]

Use specific distribution file for a specific library:

[Something]
file-uri = "/path/to/Something.whl"

Install the CPU variant of PyTorch:

[torch]
index-url = "https://download.pytorch.org/whl/cpu"

Install a pre-release of the CPU variant of torchvision:

[torchvision]
pre = true
index-url = "https://download.pytorch.org/whl/nightly/cpu"

Install a nightly variant of TensorFlow:

[tensorflow]
requirements = [
    "tf-nightly",
]

Fix build dependencies:

[pysam.build-system]
requires = [
    "cython",
    "setuptools",
]

Use alternative index for a specific dependency:

[requests]
index-url = "https://repository.internal/simple"

Use --find-links for a specific dependency:

[Something]
find-links = "/path/to/wheelhouse"

File format

Let's assume TOML for now.

We need something that is easy to write for humans and easy to parse for computers.

There is tomllib to read TOML files in Python's standard library (since 3.11).

File name

Suggested file name overrides.toml.

requirements

Format: list of PEP-508 (or subset?) strings

Replace any requirement mentioning this project with the requirements listed here. List can contain the same project but with a different version range or different list of extras. List can also be empty.

file-uri

Format: URI string

Install the dependency from this URI specifically.

Can be https: or file:.

path

Format: Path to a source code directory on file system as string

Maybe redundant with file-uri

editable

Format: boolean

Option to install dependency as editable.

pre

Format: boolean

find-links

index-url

Format: URL string

Any search and/or download for this project name MUST happen on the specified index.

If a required dependency has a version already installed that satisfies the requirements, then no need to check that it was installed from a specific index.

Regarding index priority, this is not a concern in this proposal. If a required has no version installed that satisfies the requirements, and there is an index override, then only this index shall be considered and no other.

Local cache

What happens in the installer's local cache (download or build)? Potentially there could be artifacts from multiple indexes for the same combination:

  • project name
  • version string
  • distribution type (sdist or wheel)

How do pip and other installers handle this currently?

[{DependencyName}.build-system]

Format as defined by:

Use cases

A list of currently existing issues with Python packaging that could be solved by this suggestion.

How to fix well-intentioned but ill-advised upper caps?

Some packagers put upper bound version constraints (upper caps) on the dependencies of their libraries. They have good intentions in mind. But it breaks things, and it is difficult, if not impossible, to un-break. No one can predict the future. Putting upper bound version constraints before knowing if a potential new major update (in the semver meaning) will actually break the usage of the library can be counter-productive.

Further reading:

"dependency conflicts when developing an application, where two dependencies couldn’t be installed due to (unnecessary) upper version bounds on their dependencies. Haven’t had this problem since we switched from poetry to pdm but what I remember is that you basically had to either wait for the maintainers to release a new version with updated dependencies or (temporarily) fork the dependency and bump the sub-dependency yourself. As an application developer I’d really appreciate an escape hatch that let me override whatever the dependency solver thought was correct and just let me use the version that I tell it to."

Source:

How to fix bad version ranges in dependency requirements?

Examples:

How to fix missing or superfluous dependencies?

Some packagers forget to list a dependency. For example setuptools or pip, since it is often assumed that those are always available anyway. Or the other way around a dependency that is wrongly listed, and is not actually needed. For example listing something that is part of Python's standard library.

Some package that relies on a library that is no longer in Python's standard library but that has a drop-in replacement on PyPI.

How to handle multiple development branches of same project?

Some projects have multiple (git) branches and dev teams try to let pip choose the right branch by putting the branch name in the version string. Maybe it would be better to have the branch name in the project name (this is easy-ish with the right CI/CD process) meaning that each dev should be able to override the actual dependency name for their own local dev environment.

Examples:

How to apply installer settings to a single dependency?

In some case one might want to allow pre-releases for one specific dependency only. Examples:

"It may still happen, but I think the bigger issue is that some config_settings are unlikely to apply to all packages involved in an installation. So you end up needing a syntax for “when installing package X, use this config, otherwise ignore it”."

Source:

How to install specific dependencies from specific indexes?

Often packagers want to force the users of their library to install dependencies from a specific index. Of course it is not up to the packager of library to enforce the choice of an index on the user doing the actual installation. Actually what they want is to communicate to their users that they should install some dependencies from some specific index. But there is no standard way, so it is a bit difficult to communicate this correctly.

This also covers prevention against dependency confusion attacks (or at least some aspects of it, a full proxy/mirror is the better solution).

Examples:

How to fix faulty build instructions?

Unspecified build dependencies.

Is it possible to override the [build-system] of a dependency or define one when it is missing?

Examples:

How to install a dependency as editable?

Examples:

Other potential use cases

Regarding indexes

This range of issues can usually be solved by setting up and using an alternate package index server. For example something as simple and straightforward as simpleindex can solve quite a lot of such issues.

But for some users, the hurdle to setting this up might be too high. And if this could be solved directly in the installers (pip, and so on) then I believe the configuration should be done as proposed here.

So I guess, this is something that can be considered optional. This proposition has value with or without the handling of indexes. It could be left out entirely, it could be tackled in a later format update.

Impact

If we take the pip case (but I guess the impact would be similar for other package installers) the impact in terms of necessary code changes (without breaking other things) is most likely big.

But it seems to me like the impact in terms of improvement of the developer/user experience could be worth it.

pip (and the larger PyPA ecosystem) works on the principle that all distributions across all indexes for a specific combination of "project name" and "version string" are considered to be the same. It means that if "Something v1.2.3" is found on PyPI as well as on another package index server, then it should be considered that pip will pick a distribution from a random server, there is no concept of priority here. So a rather large conceptual change might be necessary to accomodate for this proposition.

Competitors

Further ideas

Local pip.conf

Maybe it could be useful if we could have some global settings, so that it would act like a pip.conf that is local to a project.

Replacement for requirements.txt

If we change the initial logic to "everything in the file should be installed" then it could be some kind of superset replacement for requirements.txt.

python -m pip install requirements.toml

Rejected Ideas

This is kind of a last resort tool. Feature set should be small but very powerful.

In most cases, something is rejected because it would imply code complexity in the installer software beyond what seems necessary (complexity outweighs usefulness). If it turns out that the implementation is actually straightforward then the rejection can be reverted.

index-url as list

We could have a list of index server URLs, but it seems pointless for the use cases envisioned here. We should assume that who writes the override already knows exactly the one index that should be used for that library.

Use a proxy server instead (such as simpleindex).

Multiple overrides per project

Here we were trying to offer the possibility to specify multiple overrides for the same library. Based on the version range, for example:

  • If Library has its version string in range <4 then use override A
  • but if it is in range >=4 then use override B.

It could have looked something like this:

Library = [
    {
        condition = "<4"
        requirements = []
    },
    {
        condition = ">=4"
        requirements = ["AnotherLibrary"]
    },
]

It would probably be a real mess for dependency resolver. How would a dependency resolver be able to handle this?

But the more important question: Is that even necessary?

When we write overrides, we are in a situation where we know what the dependency resolver wants to give us and we know that it is not what we want and we know what we want instead. So there is not much point in adding those conditions, we already know that pip is gonna try to give us Library<4.

There is also the risk of having logical inconsistencies (range overlaps and whatnot).

Multiple override files

This would require some logic for conflict resolution required in case a configuration exists in multiple files with different values.

Sure, if we could design something that is able to handle this, then maybe we should, but that seems unnecessary complexity.

Default file locations

Maybe we need three file names and/or locations:

  • One file that can be pushed to the project's source code repository so that some configuration can be shared by all maintainers/developers. Especially if the project is an application where dependencies should be pinned and locked.

  • One file that can be placed in the project's source code directory without pushing into the shared repository (something that will be added to .gitignore) so that this developer can have their own preferences for that particular project.

  • One file that can be placed in a user location (typically .config/ on Linux, see XDG and platformdirs) for this user's preferred settings for all their projects.

Instead we want it to be always explicit. pip should not automatically pick up files, but the user should always specify explicitly on the command line. And also we rejected the idea of having multiple files anyway because of conflict resolution.

Related discussions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment