As a user of scientific computing software packages, Python packages always trips over binary extensions that require cooperation with binaries on the target system. Packaging things in wheels becomes impossible, because there is currently no way for a user to specify that they want a wheel, but specifically the wheel that is compatible with their local library.
Take mpi4py
, the most popular parallelization scheme in my field. It requires to know which MPI
implementation is used: MPICH
,
OpenMPI
, ...? What mpi4py
does is provide a source distribution that detects the local library at install time, and errors out if it can't. So the user is
tasked with installing the binary dependencies of a Python package, before they can install the Python package. Python users have by design 0 knowledge of building binary software, we pip install
or riot ┬─┬ノ(ಠ_ಠノ).
To include the binary dependency, the maintainer of the package should be able to provide several choices to the installing user.
- Do you want me to install a bundled
MPI
implementation for you? - Or, would you like me to detect a local library you already have, and build against that?
User's response:
- I don't have MPI, don't know how to build it, please take the wheel bundled with
OpenMPI
:pip install mpi4py{mpi:openmpi}
- I already have MPI, and am aware that
mpi4py
(or was made aware bympi4py
's docs) that it needs to cooperate with my local library:pip install mpi4py{mpi:local}
Same story for h5py
: pip install h5py{hdf5:local}
/pip install h5py{hdf5:bundled}
/pip install h5py{hdf5:bundled, parallel:true}
Same for any neuroscience simulator that requires MPI
, gpu, ...:
pip install arbor{mpi:local, gpu:nvidia}
,pip install NEURON{mpi:local}
,pip install nest{mpi:local}
I recall reading a PEP or discussions about one wrt CUDA. How to you distribute versions of wheels for different CUDA archs? Or CPU types? E.g. we can't upload Arbor wheels with vectorization (AVX, AVX512, AVX2, etc.). Might want to look that discussion up first, and I would be interested in if there is any movement.
One point of view is that system admins are in charge of providing 'fundamental' packages/software, and therefore also of providing wheels. MPI does not even only depend on distro and version, but also build, and some SC centers provide custom versions of whatever MPI package they use (to map better to network arch I presume) and you're never going to be able to provide wheels for those, unless you are working at that SC center. What is the use of having a version of a wheel op on PyPI if it can only be used on a particular machine? In an ideal world,
pip
would not be equal toPyPI
(because it isn't!), but on the other hand, system admins are never going to be able to track their users at the speed they want to move. Also, wheels andpip
have the problem of being Python specific. Spack is probably a better choice.Another way is to provide 'fat wheels'; i.e. bundle a matrix of versions of deps/builds and load the right version at runtime. This is already possible (I do sth like that for a personal Python project). It's not the prettiest or cleanest (or lightest in terms of user diskspace), but it gets the job done. Again I would ask: is a wheel on
PyPI
really the right place?