Currently, several Python distributors modify the Python install layout. Making such modifications requires them to patch multiple standard library modules. The install layout is currently not meant to be a configurable option in a Python installation, but Python developers, distro packagers and module authors all have conflicting certain assumptions in this area. This has presented itself as problematic because Python distributors, understandably, fail to correctly modify all places required to satisfy all these assumptions, resulting in incoherent or straight-out broken Python distributions being shipped to millions of users. The inconsistencies have taken a big toll on the Python ecosystem, as they have made certain parts of Python simply unreliable, and forced library/tool authors to resort to workarounds to get their software to behave correctly, which in turn prompted workarounds in Python and the distros, and so on.
These issues have snowballed into a much bigger one and have put us in a very
complicated situation, which Python core currently has very little control over.
Most distributors that customize the locations used by the site
module will
adjust the install locations in distutils
to match their desired locations,
but will not update the sysconfig
install schemes to reflect their changes.
sysconfig
is a module introduced in Python 3.2 that "provides access to
Python’s configuration information like the list of installation paths" and is,
or would be, the prefered method of getting the Python install locations.
On most Linux systems, like mentioned above (but including unpatched Python,
before recent
fixes), sysconfig
is inconsistent with
distutils
and contains incorrect information, forcing most users to use
distutils
instead. That brings us a big problem, distutils
has been
deprecated in Python 3.10 and will be removed in Python 3.12. Facing this users
have no migration path, all they can do is guess that distributors will now
patch sysconfig
instead, but there is no guarantee they will. There is also a
chicken and egg problem: distributors will patch the necessary Python components
to make software behave correctly, but software authors don't know how to write
their software to behave correctly because that relies on knowing the behavior
of the Python distributions they support. So, currently, software authors have
no way to write forward-compatible code, all they can do is guess that relying
on sysconfig
is the best bet, which is a very poor position to be in,
especially when this affects critical components of the ecosystem like pip.
Because of this, I believe it is incredibly important that Python core takes back control of how these customizations happen, so that it can make sure we are not left in a position like this again. The way I propose it to do so is by adding an officially supported mechanism to customize the install layout. Having such a mechanism maintained directly by Python core should help making sure that all modules that need to account for the install layout behave correctly and that there are no inconsistencies between them.
This consists of adding three ./configure
options, two functions to the
platform
module, and tweaking the current install location:
This option contains a short distributor identifier. It will be used by the
default install location when constructing unique folder names, and will be
appended to the interpreter name. It must be an ASCII identifier
([A-Za-z_][A-Za-z_0-9]*
).
A platform.distributor_id
function that returns this string should be added.
This makes it easy to progamatically identify vendor distributions.
Setting --with-distributor-id=my_distro
will change the Python lib
folder to
/usr/lib/python3.9-my_distro
and the executable to python-my_distro
.
This option contains a human readable distributor name. It will be used when the the Python version is displayed to the user.
A platform.distributor_name
function that returns this string should be added.
Setting --with-distributor-name='My Distro'
will result in the following
outputs.
$ python --version
My Distro Python 3.10.0
$ python
My Distro Python 3.10.0 (default, Oct 21 2021, 21:07:02) [GCC 11.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
This option contains a Python file that will provide settings to customize a couple things:
- Adding new install schemes
- Adding extra install schemes to the
site
module initialization
Additionally, sysconfig._get_preferred_schemes
should be moved to the vendor
config under the name get_preferred_schemes
. This function was added in Python
3.10 with the intent of distributors overriding it to change the default install
scheme.
Using the following config adds a new install scheme that places site packages
in distro-packages
, as oposed to site-packages
, and configures the site
module to pick it up, putting the distro-packages
location in sys.path
.
EXTRA_INSTALL_SCHEMES = {
'my_distro': {
'stdlib': '{installed_base}/{platlibdir}/python{py_version_short}',
'platstdlib': '{platbase}/{platlibdir}/python{py_version_short}',
'purelib': '{base}/lib/python{py_version_short}/distro-packages',
'platlib': '{platbase}/{platlibdir}/python{py_version_short}/distro-packages',
'include': '{installed_base}/include/python{py_version_short}{abiflags}',
'platinclude': '{installed_platbase}/include/python{py_version_short}{abiflags}',
'scripts': '{base}/bin',
'data': '{base}',
},
}
EXTRA_SITE_INSTALL_SCHEMES = [
'my_distro',
]
The main considerations for the proposed implementation were:
- What distributors need
- What would be feasible for Python core to maintain
One of the requests for Python distributors was to be able to change the default locations, as they would like to be acomplish two things, 1) change the site packages path other locations, and 2) be able to resolve path conflicts with other locations or installations.
The proposed implementation does not allow this on the basis that it would be
too difficult for Python core to maintain, as it would break assumptions modules
are already making, and it would break a lot of downstream code, similarly, due
to assumptions it may be making. The site packages directories from the default
install scheme should have a constant value, independly from the distribution we
are using, and should always be used by the site
module.
The goals behind 1) should be implemented by adding an extra install scheme,
adding it to the site
module, and setting get_preferred_schemes
to make it
the default one.
And 2) can be acomplished by setting --distributor-id
, which will put all
Python paths on a different namespace, preventing any conflict with other Python
distributions. This is mostly needed due to distributors not respecting
--prefix
.
This proposal will require sysconfig
to be imported by the site
module if
there are vendor config wants to add any schemes to site
, slowing down the
interpreter initialization.
There are a couple optimizations that can be done to help with this. sysconfig
can be split into two modules, one with only the core bits necessary by site
,
and the full-featured one with everything else. sysconfig
could also be
rewriten to lazy load expensive attributes as they are needed. On top of this,
the split module that gets imported by site
can also be frozen.
An initial implementation of this shows the interpreter initization time
changing to 1.05x when schemes are added to site
by the vendor config.
The change in the interpreter initilization time is certainly unfortunate, but I believe the benefits of this proposal outweigh it.
I would like to thank Petr Viktorin and Steve Dower for discussing, reviewing, and proposing changes to this proposal.
Is this being discussed somewhere outside of this gist? (Discourse, python-ideas, other)