Skip to content

Instantly share code, notes, and snippets.

@Micket
Last active January 15, 2019 17:19
Show Gist options
  • Save Micket/68153b209e29fc44bf0b85c7414e197d to your computer and use it in GitHub Desktop.
Save Micket/68153b209e29fc44bf0b85c7414e197d to your computer and use it in GitHub Desktop.

Background

Compiling python together with the scipy bundle at the toolchain level has disadvantages:

  1. Unnecessarily duplicated python interpreters
  2. Any modules with direct or indirect runtime dependencies on Python is forced to be on the toolchain level as well. This primarily

Having a runtime dependency libpython at GCCcore will significantly reduce number of modules, as well as ensuring that all GCCcore based toolchains have access to a wider base of modules automatically.

Some affected configs that are currently at toolchain levels (due to Python) are:

  • Mesa, Qt, PyQt, Tkinter, GTK, PyGTK, GObject-Introspection, PyGObject, PyCairo, PyOpenGL, wxPython, CGAL, GDAL,
  • Meson, pkgconfig, SCons, SWIG, Mako, ANTLR, wheel,
  • Boost, PyYAML, lxml, Pillow, PostgreSQL, nd2reader, PIMS, xarray, configparser, future, cftime, Greenlet, sympy

(above list is just from browsing modules from past toolchains, some may be applicable for moving to GCCcore, nor is the list exhaustive)

Limitations

Some discussions have been going on about we handle PYTHONPATH in EB. That is a completely seperate issue which can be applied here regardless.

The list of packages in the Python easyconfigs that can't/shouldn't move to GCCcore are:

  • numpy
  • scipy
  • pandas
  • mpi4py
  • deap (unsure about this one)

Anything that in turn depends on these packages specifically, would stay at the toolchain level.

Suggestion 1: Just split existing package bundle

Keeping Scipy and friends seperately would let the current Python config, and related dependees, be moved to GCCcore.

  • Positive: Change in EasyBuild is trivial (just change a few easyconfigs).
  • Positive: No need for new names (like bare, base, core suffixes). Just Python-x.x.x-GCCcore-x.x.x.eb.
  • Positive/Negative: Seperate Scipy module lets users search for it among the modules (no need to explain that it's part of Python). Have to explain for existing users that from now on, it's no longer going to be part of Python.
  • Negative: exp function may be slower in GCCcore version compared to an intel version. Uncertain if this has an impact on any real code, as it doesn't affect vectorized operations in Numpy.

The scipy, numpy, pandas, mpi4py, deap packages can either be partially packaged as a bundle "SciPy", or as individual packages. There are (old) invidual packages for pandas, numpy, so it might be consistent to just have them invidual? Invidual packages are also easier to debug then bundles, when something goes wrong in the install.

Suggestion 2: Introduce a Python-core config

This is exemplified in: easybuilders/easybuild-easyconfigs#6537

Similar to suggestion 1, but by introducing a new name, Python-core, we can keep toolchain level Python

  • Negative: Requires fixes in the easybuild framework to recognize Python-core as a Python package.
  • Positive: Modules look basically the same for users.
  • Negative: Have to keep explaining to new users that numpy is part of Python.
  • Negative: same exp function issue as with suggestion 1.
  • Negative: Requires slightly uglier easyconfigs
  • Negative: Issues with configs/code that use EBROOTPYTHON
  • Negative: Issues with configs that use get_software_root('Python')

Suggestion 3: Shadowing libpython

Suggested by Jack Perdue Building Python(base) as GCCcore as well as building Python at toolchain level.

  • Negative: Shadowing may introduce ABI problems
  • Negative: Leaves modules broken; requiring users to pick a Python version to load manually before they can be used.
  • Negative: Requires building more Python's.
  • Positive/Negative: Python gets built with icc (probably avoid exp-function issue?)

Suggestion 4: Using MKL with GCCcore without toolchain

This approach, as done locally by Damian(?) makes very significant changes to the whole toolchain level.

  • Positive: Has other great benefits, but will only save a few extra modules.
  • Negative: Requires changes that are very unlikely to make it into the next toolchains.
  • Positive/Negative: Numpy is just compiled with GCC, shared between all toolchains.

Suggestion 5: Only as a build dep

Ref: easybuilders/easybuild-easyconfigs#5072

  • Negative: Doesn't tackle the runtime dependencies, so it won't have much impact?

Suggestion 6: Use a hidden python module and shadow it

Ref: easybuilders/easybuild-easyconfigs#4962

  • Negative: ABI compatiblity to consider
  • Negative: Leveas modules broken; requires users to pick a Python to load before dependees (like PyGTK) would work.

Known ABI difference between intel and gcc compiled versions:

objdump -t libpython2.7.so.1.0 | grep _PyUnicode

Conclusion:

I (Mikael Öhman @ C3SE) prefer suggestion 1. All in all, I don't think the changes for users is going to be that problematic. I don't think the exp function issue is likely to affect much real code. Suggestion 2 was tried in production, and, while doable, is a hazzle with no benefit.

exp-function performance with GCCcore is tied to glibc, so the issue should already be present in all in all foss, fosscuda toolchains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment