Skip to content

Instantly share code, notes, and snippets.

@encukou
Forked from FFY00/python-vendor-config.md
Last active October 22, 2021 13:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save encukou/89b967aad1dffa9820a57ba3c00c419d to your computer and use it in GitHub Desktop.
Save encukou/89b967aad1dffa9820a57ba3c00c419d to your computer and use it in GitHub Desktop.

Python Vendor Configuration

Currently, several Python distributors modify the Python install layout. Making such modifications requires them to patch multiple standard library modules. The install layout is currently not meant to be a configurable option in a Python installation, but Python developers, distro packagers and module authors all have conflicting certain assumptions in this area. This has presented itself as problematic because Python distributors, understandably, fail to correctly modify all places required to satisfy all these assumptions, resulting in incoherent or straight-out broken Python distributions being shipped to millions of users. The inconsistencies have taken a big toll on the Python ecosystem, as they have made certain parts of Python simply unreliable, and forced library/tool authors to resort to workarounds to get their software to behave correctly, which in turn prompted workarounds in Python and the distros, and so on.

These issues have snowballed into a much bigger one and have put us in a very complicated situation, which Python core currently has very little control over. Most distributors that customize the locations used by the site module will adjust the install locations in distutils to match their desired locations, but will not update the sysconfig install schemes to reflect their changes. sysconfig is a module introduced in Python 3.2 that "provides access to Python’s configuration information like the list of installation paths" and is, or would be, the prefered method of getting the Python install locations. On most Linux systems, like mentioned above (but including unpatched Python, before recent fixes), sysconfig is inconsistent with distutils and contains incorrect information, forcing most users to use distutils instead. That brings us a big problem, distutils has been deprecated in Python 3.10 and will be removed in Python 3.12. Facing this users have no migration path, all they can do is guess that distributors will now patch sysconfig instead, but there is no guarantee they will. There is also a chicken and egg problem: distributors will patch the necessary Python components to make software behave correctly, but software authors don't know how to write their software to behave correctly because that relies on knowing the behavior of the Python distributions they support. So, currently, software authors have no way to write forward-compatible code, all they can do is guess that relying on sysconfig is the best bet, which is a very poor position to be in, especially when this affects critical components of the ecosystem like pip.

Because of this, I believe it is incredibly important that Python core takes back control of how these customizations happen, so that it can make sure we are not left in a position like this again. The way I propose it to do so is by adding an officially supported mechanism to customize the install layout. Having such a mechanism maintained directly by Python core should help making sure that all modules that need to account for the install layout behave correctly and that there are no inconsistencies between them.

Implementation

This consists of adding three ./configure options, two functions to the platform module, and tweaking the current install location:

--with-distributor-id

This option contains a short distributor identifier. It will be used by the default install location when constructing unique folder names, and will be appended to the interpreter name. It must be an ASCII identifier

A platform.distributor_id function that returns this string should be added. This makes it easy to progamatically identify vendor distributions.

Example

Setting --with-distributor-id=my_distro will change the Python lib folder to /usr/lib/python3.9-my_distro and the executable to python-my_distro.

--with-distributor-name

This option contains a human readable distributor name. It will be used when the the Python version is displayed to the user.

A platform.distributor_name function that returns this string should be added.

Example

Setting --with-distributor-name='My Distro' will result in the following outputs.

$ python --version
My Distro Python 3.10.0
$ python
My Distro Python 3.10.0 (default, Oct 21 2021, 21:07:02) [GCC 11.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

--with-vendor-config

This option contains a Python file that will provide settings to customize a couple things:

  • Adding new install schemes
  • Adding extra install schemes to the site module initialization

Additionally, sysconfig._get_preferred_schemes should be moved to the vendor config under the name get_preferred_schemes. This function was added in Python 3.10 with the intent of distributors overriding it to change the default install scheme.

Example

Using the following config adds a new install scheme that places site packages in distro-packages, as oposed to site-packages, and configures the site module to pick it up, putting the distro-packages location in sys.path.

EXTRA_INSTALL_SCHEMES = {
    'my_distro': {
        'stdlib': '{installed_base}/{platlibdir}/python{py_version_short}',
        'platstdlib': '{platbase}/{platlibdir}/python{py_version_short}',
        'purelib': '{base}/lib/python{py_version_short}/distro-packages',
        'platlib': '{platbase}/{platlibdir}/python{py_version_short}/distro-packages',
        'include': '{installed_base}/include/python{py_version_short}{abiflags}',
        'platinclude': '{installed_platbase}/include/python{py_version_short}{abiflags}',
        'scripts': '{base}/bin',
        'data': '{base}',
    },
}

EXTRA_SITE_INSTALL_SCHEMES = [
    'my_distro',
]

Considerations

The main considerations for the proposed implementation were:

  • What distributors need
  • What would be feasible for Python core to maintain

One of the requests for Python distributors was to be able to change the default locations, as they would like to be acomplish two things, 1) change the site packages path other locations, and 2) be able to resolve path conflicts with other locations or installations.

The proposed implementation does not allow this on the basis that it would be too difficult for Python core to maintain, as it would break assumptions modules are already making, and it would break a lot of downstream code, similarly, due to assumptions it may be making. The site packages directories from the default install scheme should have a constant value, independly from the distribution we are using, and should always be used by the site module.

The goals behind 1) should be implemented by adding an extra install scheme, adding it to the site module, and setting get_preferred_schemes to make it the default one.

And 2) can be acomplished by setting --distributor-id, which will put all Python paths on a different namespace, preventing any conflict with other Python distributions. This is mostly needed due to distributors not respecting --prefix.

Downsides

This proposal will require sysconfig to be imported by the site module if there are vendor config wants to add any schemes to site, slowing down the interpreter initialization.

There are a couple optimizations that can be done to help with this. sysconfig can be split into two modules, one with only the core bits necessary by site, and the full-featured one with everything else. sysconfig could also be rewriten to lazy load expensive attributes as they are needed. An initial implementation of this shows the interpreter initization time changing to 1.08x when schemes are added to site by the vendor config.

The change in the interpreter initilization time is certainly unfortunate, but I believe the benefits of this proposal outweigh it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment