Skip to content

Instantly share code, notes, and snippets.

@jaraco
Forked from FFY00/python-vendor-config.md
Last active December 26, 2021 18:01
Show Gist options
  • Save jaraco/b48e1acd05ecd3e54ddf0f04f91065c3 to your computer and use it in GitHub Desktop.
Save jaraco/b48e1acd05ecd3e54ddf0f04f91065c3 to your computer and use it in GitHub Desktop.

Python Vendor Configuration

Currently, several Python distributors modify the Python install layout. Making such modifications requires distributors to patch multiple standard library modules. The install layout is currently not meant to be a configurable option in a Python installation, but Python developers, distro packagers and module authors all have conflicting assumptions in this area. The resulting outcome is problematic because Python distributors, understandably, fail to correctly modify all places required to satisfy all these assumptions, resulting in incoherent or outright broken Python distributions being shipped to millions of users. The inconsistencies have taken a big toll on the Python ecosystem, as they have made certain parts of Python simply unreliable and forced library/tool authors to resort to workarounds to get their software to behave correctly, which in turn prompted workarounds in Python and the distros, and so on.

These issues have snowballed into a much bigger one and have complicated the Python packaging ecosystem, over which Python core currently has very little control. Most distributors that customize the locations used by the site module will adjust the install locations in distutils to match their desired locations but will not update the sysconfig install schemes to reflect their changes. sysconfig is a module introduced in Python 3.2 that "provides access to Python’s configuration information like the list of installation paths" and is, or would be, the preferred method of getting the Python install locations. On most Linux systems, like mentioned above (but including unpatched Python, before recent fixes), sysconfig is inconsistent with distutils and contains incorrect information, forcing most users to use distutils instead. This problem is compounded by the pending deprecation and removal of distutils in Python 3.10 and 3.12. Facing this deprecation, users have no migration path and are left to hope that distributors will now patch sysconfig instead, but there is no guarantee they will. There is also a chicken and egg problem: distributors will patch the necessary Python components to make software behave correctly, but software authors don't know how to write their software to behave correctly because that relies on knowing the behavior of the Python distributions they support. So, currently, software authors have no way to write forward-compatible code. All they can do is guess that relying on sysconfig is the best bet, which is a very poor position to be in, especially when this reliance affects critical components of the ecosystem like pip.

Therefore, it is incredibly important that Python core takes back control of how these customizations happen, so that it can make sure distribution vendors, package maintainers, and users are not confounded like this again. This document outlines an officially supported mechanism to customize the install layout. Having such a mechanism maintained directly by Python core should help make sure that all modules that need to account for the install layout behave correctly and consistently.

Implementation

The implementation adds three ./configure options, adds two functions to the platform module, and tweaks the current install location.

--with-distributor-id

This option contains a short distributor identifier. The identifier is used by the default install location when constructing unique folder names and is appended to the interpreter name. It must be an ASCII identifier ([A-Za-z_][A-Za-z_0-9]*).

A platform.distributor_id function returns this string and makes it easy to progammatically identify vendor distributions.

Example

Setting --with-distributor-id=my_distro will change the Python lib folder to /usr/lib/python3.9-my_distro and the executable to python-my_distro.

--with-distributor-name

This option contains a human readable distributor name. The identifier is used when the the Python version is displayed to the user.

A platform.distributor_name function returns this string.

Example

Setting --with-distributor-name='My Distro' will result in the following outputs.

$ python --version
My Distro Python 3.10.0
$ python
My Distro Python 3.10.0 (default, Oct 21 2021, 21:07:02) [GCC 11.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

--with-vendor-config

This option contains a Python file that provides settings to customize a couple things:

  • Adding new install schemes
  • Adding extra install schemes to the site module initialization

Additionally, sysconfig._get_preferred_schemes will be moved to the vendor config under the name get_preferred_schemes. This function was added in Python 3.10 with the intent of distributors overriding it to change the default install scheme.

Example

Using the following config adds a new install scheme that places site packages in distro-packages, as opposed to site-packages, and configures the site module to pick it up, putting the distro-packages location in sys.path.

EXTRA_INSTALL_SCHEMES = {
    'my_distro': {
        'stdlib': '{installed_base}/{platlibdir}/python{py_version_short}',
        'platstdlib': '{platbase}/{platlibdir}/python{py_version_short}',
        'purelib': '{base}/lib/python{py_version_short}/distro-packages',
        'platlib': '{platbase}/{platlibdir}/python{py_version_short}/distro-packages',
        'include': '{installed_base}/include/python{py_version_short}{abiflags}',
        'platinclude': '{installed_platbase}/include/python{py_version_short}{abiflags}',
        'scripts': '{base}/bin',
        'data': '{base}',
    },
}

EXTRA_SITE_INSTALL_SCHEMES = [
    'my_distro',
]

Considerations

The main considerations for the proposed implementation were:

  • What distributors need
  • What would be feasible for Python core to maintain

One of the requests for Python distributors was to be able to change the default locations, as they would like to be accomplish two things: 1) change the site packages path to alternate locations, and 2) be able to resolve path conflicts with other locations or installations.

The proposed implementation disallows such customization on the basis that it would be too difficult for Python core to maintain, as it would break assumptions modules are already making and it would break a lot of downstream code, similarly, due to assumptions that code may be making. The site packages directories from the default install scheme should have a constant value, independent of the distribution in use, and should always be used by the site module.

The goals behind (1) should be implemented by adding an extra install scheme, adding it to the site module, and setting get_preferred_schemes to make it the default one.

And (2) can be accomplished by setting --distributor-id, which will put all Python paths on a different namespace, preventing any conflict with other Python distributions. This feature supersedes distributors current approach of altering/overriding --prefix.

Downsides

Increased Startup Latency

The current draft implementation requires sysconfig to be imported by the site module in any environment where a vendor config adds any schemes, slowing down the interpreter initialization. At least some of this cost is essentially required in order to resolve "config vars" in "platlib" and "purelib" paths before registering them as site-packages paths.

An initial implementation reveals an additional ~.5ms (1.05x) to the interpreter initialization time when schemes are added to site by the vendor config. Although small, this degradation is comparable to some of the hard-won gains by the faster-cpython project.

There are potential optimizations that may reduce these costs, including:

  • Functionality required by site for resolving site-packages could be split into a module separate from and shared by sysconfig.
  • This separate module above could be frozen.
  • sysconfig could be rewriten to lazy load expensive attributes as they are needed.
  • The site module could cache the result of the resolved site-packages from vendor config.

Although regretable, even without further optimization, the benefits of the change justify the modest performance penalty, so optimizations should be explored separately.

Acknowledgements

I would like to thank Petr Viktorin, Steve Dower, Jason R. Coombs for discussing, reviewing, and proposing changes to this proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment