Python is used throughout Nixpkgs. We use it for certain scripts, we provide Python libraries, and we provide applications. There are several methods on how to use Python on Nix, each with their pros and cons. An overview of all issues with the current Python infrastructure on Nix is available in the placeholder issue 1819.
While many things work really nice there definitely still are issues. This document states how we intend to support Python on Nix, it describes our current infrastructure, and contains a proposal for an improved infrastructure that supports the following use cases:
- installing Python applications in a profile. These should expose the program but not the Python modules.
- creating environments for Python development like
virtualenv
but with the additional possibility of including other non-Python programs. - temporary Python environments but also a permanent environment by installing it in a profile.
- Python programs that call other Python programs without mangling the search path for modules. That means e.g. that a Python 2 program can call a Python 3 program without issues.
- namespace packages
- combine any of the above without issues.
Furthermore, we would like to support the following Python tools:
- virtualenv for creating virtual environments. While
nix-shell
can do the same and more many still needvirtualenv
. - tox for testing against multiple environments.
- nuitka, a Python compiler that depends on SCons. The challenge here is that SCons is a Python 2.7 tool while Nuitka can work with any CPython version.
The following are test cases for each of the issues:
TODO
The following issues are supposed to be solved
- #11423: Have a Python package on
PATH
without adding it toPYTHONPATH
- #16591:
PYTHONPATH
leaks in subprocesses. - #22688: Do not use
--prefix PYTHONPATH
because it leaksPYTHONPATH
. - #23676: Subprocesses do not have modules on their
sys.path
. - #24128:
wrapPythonPrograms
should not add(propagated)BuildInputs
build inputs to wrappers.
A common distinction to make is that between applications and libraries. An application is a standalone program. The application can depend on Python libraries but any libraries provided by the application (direct or indirect) shouldn't be shared or integrated in other applications or environments.
When developing one is interested in the interpreter, (Python) libraries and possibly some tools that may depend on the exact environment they're used in.
An environment provides all the programs needed. In the case of an application this typically means the only entry point provided by the environment is the application itself, whereas in the case of development environments multiple tools may be available.
Let's clarify each with an example. The e-book suite Calibre is a program that is written in Python. When using Calibre one is not interested in any libraries. One just wants to use the program and thus we call this an application.
The package numpy
is a library, and is used for development. It does provide
the program f2py2
but this is typically only used in conjunction with the
development numpy
is used in. A similar example is pytest
; one typically
uses pytest
in the actual development environment.
In some cases this distinction may not be so clear. E.g. the Jupyter Notebook is an application that is used for development. It depends on a kernel which is chosen for the environment one uses for development, e.g. a Python 2 or 3 kernel. However, it supports multiple kernels simultaneously so one could separate the package, having the Notebook as an application and the kernels as libraries.
Python code can be distributed in different ways.
The most common format is a Source Distribution
or "sdist". This contains the essential source code along with some meta
data for pip
. Source distributions are typically installed with setuptools
and can be recognized by its setup.py
file. They used to be installed with
python setup.py install
but are nowadays commonly installed in two steps by first
creating a wheel
with python setup.py bdist_wheel
and then installing the wheel.
The wheel isn't just an intermediate step in the building process but is
also a popular distribution format. A wheel is a
Built Distribution.
Wheels are often pure Python but can contain binary code. Wheels are installed
with pip
using pip install *.whl
. While setuptools
is most commonly used for
building wheels, there exist other tools for building wheels.
One example is flit
.
In some cases installation is done entirely different, e.g. with the help of a Makefile
.
Libraries can sometimes also provide Python bindings.
Finally, when developing one might want to use an editable or
development mode installation with pip install -e
.
Python modules are installed in lib/pythonX.X/site-packages/<pname>
. Installed
right next to it is the dist-info
folder,
lib/pythonX.X/site-packages/<pname>-<version>.dist-info
.
This folder is needed for pip
/setuptools
to determine which packages have
been installed.
The exact Python import logic is quite extensive. What follows is a very brief summary:
- Python modules can be imported from folders that are on
sys.path
. - During startup of the interpreter it looks for the folder
sitecustomize.py
onsys.path
. This file can be used to add additionalsite-packages
folders tosys.path
. - After startup, it checks the environment variable
PYTHONPATH
which is a list of folders. These folders are added tosys.path
before everything else. - Entries that are added directly to
sys.path
are not recursed into. One can instead usesite.addsitedir
to add folders tosys.path
.site.addsitedir
does recurse by e.g. following.pth
files. .pth
files list folders or other.pth
files that can be added tosys.path
.
The first entry in sys.path
is special and is the directory containing the
script that was used to invoke the interpreter.
Another environment variable of interested is PYTHONHOME
.
This environment variable can be used to change the location of the standard
Python libraries. By default, the libraries are searched in
prefix/lib/pythonversion
and exec_prefix/lib/pythonversion
, where prefix
and exec_prefix
are installation-dependent directories, both defaulting to
/usr/local
.
The sys.argv
attribute
represents the list of arguments passed to a Python program.
The first value, argv[0]
, is the script name. Its OS-dependent whether
this is a full path or not but on Linux and Darwin systems it is. If the command
was executed using the -c
command line option to the interpreter, argv[0]
is set
to the string '-c'
. If no script name was passed to the Python interpreter,
argv[0]
is an empty string.
The name and full path to the program are of interest because programs might want to call themselves.
Python applications are spreadout throughout the Nixpkgs tree following the general guidelines.
The file pkgs/top-level/python-packages.nix
contains or refers to all Python
library expressions, and these packages can be accessed through
pkgs.pythonXX.pkgs.<name>
. Typically one creates a environment with
pythonXX.withPackages
or pythonXX.buildEnv
.
The main function for packaging Python packages is buildPythonPackage
.
Furthermore, buildPythonApplication
exists for applications. The only
difference is that buildPythonPackage
modifies the name to include the
interpreter version.
An important argument is format
which is used to choose between setuptools
(sdist), flit
, wheel
and other
. The most common format is setuptools
.
Wheels are also increasingly used in Nixpkgs. The last option is used when none
of the others apply. In this case the packager needs to provide a buildPhase
and installPhase
.
The goal of the buildPythonPackage
(and buildPythonApplication
) is to
guarantee that applications work and modules can be found.
The Python interpreter provides a setup hook that recurses into the
propagatedBuildInputs
and adds the site-packages
folder of each to the
environment variable PYTHONPATH
. This allows the package that is being build
to find its dependencies. The hook is also run by nix-shell
. While that makes
sense when building/debugging the build, it is also abused for creating
temporary environments with nix-shell -p python3.numpy python3.pytest
.
The wrapPythonPrograms
shell function wraps all executables in a derivation and does two things:
- it uses
site.addsitedir
to updatesys.path
with dependencies. It recursively traversespropagatedBuildInputs
andpythonPath
. - it fixes the name,
sys.argv[0]
, of the script. This has to be done because the wrapper moves the original script.
The buildPythonPackage
function patches the shebangs of all scripts provided.
That way, the scripts can find the correct Python interpreter. It also exectutes
wrapPythonPrograms
. Python applications that are installed can now find its
dependencies and will function.
The python.buildEnv
function creates an environment that consists of symbolic
links to all files that are provided by the packages that are to be included in
the environment. The shebangs of the scripts have already been patched by
buildPythonPackage
to point to the correct Python interpreter. However, that
store entry contains just the interpreter, and not other Python packages that
are to be included. Therefore, python.buildEnv
not only creates symbolic links
but also wraps each script with a wrapper that sets PYTHONHOME
to the
interpreter in the newly created environment. This environment can now be
installed. The python.withPackages
function provides as simpler interface to
python.buildEnv
.
- add a
sitepackages.py
to the interpreter that listens toNIX_PYTHONPATH
and/orNIX_PYTHON_PTH
with the latter referring to a.pth
file.
- when installing individual libraries in a profile, the interpreter cannot find them. It was suggested to use
propagatedUserEnvPkgs
.
- use
exec -a name program
to set the name ofprogram
. Python however does not supportexec -a
. - wrap the interpreter and set the name of the program through
sitecustomize.py
. The attributesys.argv
is unavailable at that point. - patch the interpreter to listen to an environment variable,
NIX_PYTHON_NAME
, that defines the name. https://github.com/python/cpython/blob/3.5/Python/sysmodule.c#L2050
- use
--set PYTHONPATH
. This breaks a Python feature. See the discussion.
- add a
sitecustomize.py
to the interpreter that listens toNIX_PYTHONPATH
and/orNIX_PYTHON_PTH
with the latter referring to a.pth
file. - patch the interpreter to listen to an environment variable,
NIX_PYTHON_NAME
, that defines the name. https://github.com/python/cpython/blob/3.5/Python/sysmodule.c#L2050
IMHO, the distinction between an application and a library is not something that can generally be decided at packaging time. (You mention Jupyter Notebook as an example.) Each user might have a different opinion on where to draw the line.
What are the problems with exposing modules/libraries in profiles? I can think of a few, but they are not unsolvable. (You can cut yourself with a knife, but that doesn't mean we should stop using knives alltogether.)
I think this is similar to the "plugin" problem in NixOS, which I think is not that big of a problem after all. Because I don't think it hurts that much to allow programs to look in $NIX_PROFILES. The funny thing is that NixOS is quite inconsistent in that regard today. Python does not look in $NIX_PROFILES, but Perl and a few other do.