Skip to content

Instantly share code, notes, and snippets.

@effigies
Last active January 25, 2024 13:42
Show Gist options
  • Save effigies/9bbb424535d6a1d838d6325191c0a736 to your computer and use it in GitHub Desktop.
Save effigies/9bbb424535d6a1d838d6325191c0a736 to your computer and use it in GitHub Desktop.
Contemporary Python Packaging - 2020

Contemporary Python Packaging

This document lays out a set of Python packaging practices. I don't claim they are best practices, but they fit my needs, and might fit yours.

Validity

This document has been superseded as of January 2023.

This was written in July 2020, superseding this gist from 2019.

As of this writing, Python 3.5 is approaching its end-of-life and many packages have already set a minimum version of Python 3.6. This document should be superseded or disregarded no later than the Python 3.7 end-of-life. If you cite this as a justification for your behavior, please stop doing so at that time.

Summary

  • For versioning, use versioneer
  • For describing your build requirements, use pyproject.toml
  • For all static (and some dynamic-ish) metadata, use setup.cfg
  • A very small setup.py for dynamic metadata and to tie everything together
  • Use MANIFEST.in for files that cannot otherwise be included in the sdist

Changes from previous revision

  • Updates to how to include data files in packages.
    • Reduce scope of MANIFEST.in
    • Do not use include_package_data = True in setup.cfg
  • Dropped tests_require from setup.cfg - python setup.py test was deprecated in setuptools 41.5
  • Make pyproject.toml more clearly optional; some use cases are made harder by it.

Likely future changes

  • Versioneer is starting to show its age, and has not accepted a change since 2017. There are alternatives that are worth considering, but I haven't evaluated them yet.

Perspective

My position in the Python ecosystem will color my perspective and approach to packaging, and, presumably, how much weight you give what I think.

I am most active with the NIPY collection of projects, generally related to neuroimaging and neuroscience. I am currently the lead maintainer of NiBabel and do a reasonable amount of maintenance work for Nipype, PyBIDS, fMRIPrep and a few other packages closely related to the aforementioned.

I am not an active developer in CPython, PyPA or any packaging-related tools. I have not followed deep arguments, but have relied mostly on PEPs, documentation and sporadic searches to identify the current state of the art.

So my perspective is less concerned with what packaging should become, and more with what works today and where things appear to be heading, as I look to prepare or update packaging infrastructure for several tools. Infrastructure I hope to stop thinking about so much.

Desiderata

Motivating my recommendations are a few desiderata, in rough order of importance:

  1. Installation should work, from source, on fairly old systems. Debian Stable (10; "buster") is my touchstone here.
  2. Prefer declarative syntax, and limit dynamic metadata, as much as possible.
  3. Enable revision-based versions, with minimal opportunity for error.
  4. Limit custom code to absolute minimum. (Partially redundant with limiting dynamic metadata.)

To operationalize (1), the following approaches should all install correctly:

  • pip install .
  • python setup.py sdist && pip install dist/*.tar.gz
  • python setup.py bdist_wheel && pip install dist/*.whl
  • python setup.py install

And development/editable mode should work:

  • python setup.py develop
  • pip install -e .

This also means that newer, better build systems that do not rely on setuptools are not really under consideration here.

To operationalize (3), all of the above should produce an install with the same version string, and setting the version should be done from a version control tag if possible. Assuming a git repository, the following should also work:

  • git archive -o archive.tar.gz $TAG && pip install archive.tar.gz

Recommendations

I recommend on a setuptools-based approach, using setup.cfg to declare as much of the metadata as possible, along with an OPTIONAL pyproject.toml laid out in PEP 518. Versioneer is used to handle versioning.

pyproject.toml

A bare minimum pyproject.toml is as follows:

[build-system]
requires = ["setuptools >= 30.3.0", "wheel"]

Additional build dependencies such as cython and numpy might be put here.

I would relax this to a mere suggestion this year. While it's mostly been fine, I have seen a case where pip install -e --user . fails, so it's not as consequence-free as I thought.

setup.cfg

As of setuptools 30.3.0, most packaging metadata can be set declaratively in setup.cfg.

The following skeleton can be used as a model.

[metadata]
url = https://github.com/your/package
author = You
author_email = your@email.tld
maintainer = You
maintainer_email = your@email.tld
description = A package
long_description = file:README.rst
long_description_content_type = text/x-rst; charset=UTF-8
license = GPL
classifiers =
    Programming Language :: Python

[options]
python_requires = >= 3.6
install_requires =
packages = find:

[options.package_data]
* =
    data/*

[options.extras_require]
doc =
    sphinx
test =
    pytest
    coverage
all =
    %(doc)s
    %(test)s

I recommend against using the include_package_data option, which counterintuitively overrides the package_data options with the directives in MANIFEST.in.

I want to draw attention to the python_requires metadata which will prevent pip from attempting to install on incompatible systems. When you drop 3.5 - or any other versions - update the python_requires to avoid breaking downstream tools that still install on unsupported versions.

In addition to plain key-value pairs, there are some constrained options for common dynamic metadata. For example, long_description = file:<filename> allows you to place a long description in a separate file, to be included in your documentation. packages = find: replaces the find_packages() option often used in setup.py. Finally, interpolated strings are used in extras_require to provide a meta-extra like all.

I recommend not placing the version in setup.cfg.

setup.py

The dynamic components of my package setup are as follows:

#!/usr/bin/env python
import sys
from setuptools import setup
import versioneer

# Give setuptools a hint to complain if it's too old a version
# 30.3.0 allows us to put most metadata in setup.cfg
# Should match pyproject.toml
SETUP_REQUIRES = ['setuptools >= 30.3.0']
# This enables setuptools to install wheel on-the-fly
SETUP_REQUIRES += ['wheel'] if 'bdist_wheel' in sys.argv else []

if __name__ == '__main__':
    setup(name='package',
          version=versioneer.get_version(),
          cmdclass=versioneer.get_cmdclass(),
          setup_requires=SETUP_REQUIRES,
          )

I place the package name in setup.py mostly because, without this, GitHub will not recognize your package to place it in its dependency graphs.

By using versioneer in setup.py as opposed to adding version = attr:package.__version__ to the setup.cfg, we avoid the issue of missing import-time dependencies. versioneer.get_cmdclass() tells setuptools how to encode the current version into various installation methods.

Finally, setup_requires is mostly here as a fall-back to let old versions of setuptools provide a user-readable explanation for failures.

Versioneer

Versioneer will set the version based on your git tag, and handle all of the install cases I described in desiderata.

This requires an additional section to your setup.cfg:

[versioneer]
VCS = git
style = pep440
versionfile_source = package/_version.py
versionfile_build = package/_version.py
tag_prefix =
parentdir_prefix = 

It can then be installed from your repository root with:

pip install versioneer
versioneer install

Once done, it places a copy of itself in your repository root, so other users do not need to install it for it to be used correctly.

N.B. Versioneer does not work out of the box with git archives for non-tag releases. If you need any archived revision, this will not be sufficient. I don't know of a general solution to that problem at this point, as git archive substitution is quite limited.

Package Data / MANIFEST.in

This was probably the most confusing and thing to nail down, so I want to lay it out clearly.

The package_data metadata determines what data files inside your package directory will follow your Python files into their install location. Which is the same as saying these files will be packaged in a wheel, as that is (more-or-less) unzipped into your site-packages/ directory.

MANIFEST.in determines what data files are included in your sdist on top of what is included in your package_data. Use it to include anything outside your package directory that you want included in source. Note that there are some defaults that you don't need to specify.

DO NOT use include_package_data = True. That will change the rules of how this all works to something even less intuitive.

Dependencies

The minimum setuptools version needed for setup.cfg to work is 30.3.0, although more fields have been defined since then, and the minimum pip needed for PEP 518 compatibility is 10.0.0.

Notes on new build systems and legacy operating systems

As noted above, setuptools was the only system under serious consideration, simply because it has long been the standard to run python setup.py. Until pip 10+ is universal, alternative build systems will create headaches that I don't want to deal with.

CentOS 7, for instance still packages pip 7.1 and setuptools 0.9.8, which means the above will not work out of the box (though this may be changing... I'm having a hard time reading pkgs.org). However, sticking with setuptools and setup_requires ensures that a user will at least be told to upgrade setuptools.

References

PEPs

Other links

License

To the extent copyright can be claimed, I disclaim it under CC0.

@arokem
Copy link

arokem commented Sep 8, 2021

Thanks for writing these great guidelines!

Regarding this one:

Versioneer is starting to show its age, and has not accepted a change since 2017. There are alternatives that are worth considering, but I haven't evaluated them yet.

We have been using setuptools_scm to manage versioning in a few software packages.

The main issue that we have encountered in our workflow, that also combines frequent uploads to test pypi is that the version strings need to be monkeyed with to be compliant, which we do in our setup file.

Happy to hear your thoughts about this!

@effigies
Copy link
Author

effigies commented Sep 8, 2021

So for pybids we've switched to the pep440-pre flavor of Versioneer, which plays nicely with test.pypi.org:

https://github.com/bids-standard/pybids/blob/master/setup.cfg#L74-L80

This ends up with <lastversion>.post0.dev<N> where N is the commit count. You get some collisions across branches, but not enough to be annoying. You also end up losing the git hash, but I've never actually gotten a bug report with a hash, so it doesn't feel like a big loss in value.

@arokem
Copy link

arokem commented Sep 8, 2021

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment