Skip to content

Instantly share code, notes, and snippets.

@astrojuanlu
Last active February 9, 2023 09:04
Show Gist options
  • Save astrojuanlu/9a7419d3281d4689ae05df613e0bf9c0 to your computer and use it in GitHub Desktop.
Save astrojuanlu/9a7419d3281d4689ae05df613e0bf9c0 to your computer and use it in GitHub Desktop.
Rough notes for a potential blog post or series "Python packaging is better than you think"

title: Python packaging is better than you think created at: Mon Jun 20 2022 17:48:50 GMT+0000 (Coordinated Universal Time) updated at: Mon Jun 20 2022 17:48:58 GMT+0000 (Coordinated Universal Time)

Python packaging is better than you think

Alternative titles: "Stop saying Python packaging is terrible", "Python packaging for the 99 %"

Proof that there's an audience for this: https://twitter.com/juanluisback/status/1538936104824492033

[https://twitter.com/juanluisback/status/1538936104824492033]

Unbundle what people really mean when they say that "Python packaging is bad":

  • bootstrapping Python for development
    • OS-specific
    • surprisingly, more difficult on Linux, since there are too many options and also Python is a core part of the system
    • hard only because there is no canonical method or bad docs
    • problem solved by Anaconda
    • pyenv too, more intrusive and Linux specific but offers a wider range of Python versions
  • diagnosing packaging problems
    • a real mess because bootstrapping is hard and therefore people end up with chaotic Python installations
    • takes skill, but some simple tricks: which python (tells you where does it come from), which pip , python -m pip to make sure, import sys; print(sys.prefix) to be really sure
  • installing system-wide binaries based on Python
    • use pipx or fades and forget about it
    • avoid system Python like the plague
    • you could use environments for this, but you'd have to remember to activate it, which is not very convenient: avoid it if you don't need it!
  • managing environments
    • absolutely not OS-specific after the bootstrapping is done
    • only 2 kinds of environments exist
      • conda environments, managed by conda
      • python environments, managed by stdlib venv, pyenv, virtualenv, PEP 582
  • dealing with non-Python dependencies
    • Python native solution for non-Python dependencies is bundling shared libraries inside wheels. mostly works!
    • however, wheels can be quite fat (tensorflow, pytorch), not have enough specificity (GPU vs non-GPU etc), not be available for certain packages (RAPIDS), or lead to incompatibilities (Cartopy & rasterio)
    • conda solves this, and it will be difficult that pip solves this for the general case. use conda, it's fine!
  • declaring environment dependencies
    • Python cannot install/import several versions of the same package in the same environment as opposed to Node.js
    • that might be a good thing though! security patches are applied uniformly. too long to discuss
    • but this leads of course to conflicts! which must be handled somehow
    • libraries doing weird things with dependencies is not Python's fault (now upper version pinnings are frowned upon for example)
    • pip solves dependencies these days! even though backtracking is often not verbose enough for good diagnosis
    • mamba is a blazing fast replacement of conda
  • installing environment dependencies
    • conda, pip, poetry, pdm work fine, there are probably others
    • but there's lots of outdated advice: Pipenv is largely dead
    • conda and pip don't interoperate very well, so they need to be combined with care
    • pip-tools and poetry are currently lagging behind in terms of standards adoption and bug fixing, but they are excellent projects and will get there with some time
  • publishing packages
    • nowadays most needs are solved by PEP 621 pyproject.toml
    • you can use setuptools, flit, hatch and pdm and your metadata will look 90 % the same
    • a separate tool i.e. twine is needed for publishing, is it really that bad?
  • hot-reloading
    • editable installations are now standardized, not a problem for the majority
    • unless you're using Meson, like SciPy does, in which case there's still no good solution

Inspiration and links

@zooba
Copy link

zooba commented Jun 21, 2022

PEP 582 is used by PDM by default (can be disabled), and I think I've seen it as an option on at least one other. It's a shame if it's completely dead, as manually enabling it as default in Python and using it with PDM is really pleasant.

I mean, I obviously agree that it's a great workflow 😄 But hopefully referencing PDM is enough to give people the information that there's simpler/lighter options than venv without suggesting that it's on its way to become a CPython standard.

@astrojuanlu
Copy link
Author

Thanks a lot @henryiii for the comments! Just a minor follow-up

Is pip-tools lagging?

These two PRs are being a challenge jazzband/pip-tools#1539, jazzband/pip-tools#1329 and mean that pip-tools does not work well with the new pip versions. Hopefully they'll get to fix these issues soon though, I love pip-tools (fingers crossed)

@benjyw
Copy link

benjyw commented Jun 27, 2022

Thanks for taking on this important, interesting, nuanced (and occasionally frustrating) topic!

I wonder if it might make sense to talk about pex in this context. In case you're not familiar with it, it's a tool for packaging first-party and third-party code into a single executable file. So deploying python code becomes as simple as copying that file (in a container, or directly to a machine), as long as a compatible python interpreter is discoverable on the target. Pex uses pip under the covers to resolve requirements, and it can generate and use cross-platform lockfiles so that pex builds are repeatable.

You can use the pex tool directly, or via pants, which is a full-fledged polyglot build system with a design focus on scaling Python repos, that uses pex heavily.

The pex/pants stack is, among other things, an attempt to tackle the python packaging challenge with rigor, making those kinds of workflows faster and repeatable even across multiple platforms, and in large or growing codebases.

[Full disclosure: I am one of the maintainers of pex and pants, so necessarily opinionated... that said, happy to provide more info if desired]

@henryiii
Copy link

How does pex compare with a .pyz file? I've supplied a .pyz file for Particle for a long time, and other than not being very well documented (such as how to pack the packages you depend on), it does work fairly well. Though it doesn't really support downloading dependencies or binary dependencies, but on the flip side I guess pex needs an Internet connection. (I need to go watch the pants PyCon talk in entirety, I only saw a bit of it live)

@benjyw
Copy link

benjyw commented Jun 27, 2022

pex is similar in spirit to zipapp, but:

  • pex has robust support for third-party requirements, while zipapp only supports zipping up first-party code from a single directory. Pex can bundle first-party code (from any selection of files and directories) and also (transitive) third-party requirements (you can sort-of do this with zipapp by first installing the requirements into the source directory, but that's a bit of a copout).

  • A pex can be multi-platform, containing wheels for multiple platforms in a single file.

  • pex can generate a lockfile so that third-party requirements can be resolved repeatably and much more quickly than by rerunning a pip resolve.

  • A lot of effort has gone into performance optimization in pex. For example, a pex can run like zipapp, directly from the zipfile. It can also splat itself out to disk on first invocation. Or it can create a proper venv on first run, so that subsequent runs have the same performance as a venv.

  • pex is very flexible, with many build and runtime options, such as virtually "merging" multiple pex files at runtime.

Hope this helps! Happy to provide more info.

@astrojuanlu
Copy link
Author

Thanks a lot @benjyw for adding pex to the mix! I think "packaging Python applications in binary form" is a whole different topic though, probably deserving its own blog post. However, I'd say 99 % of people struggle with more basic stuff.

@astrojuanlu
Copy link
Author

I came here to amend my "pip-tools is lagging" comment: pip-tools 6.7.0 included jazzband/pip-tools#1519 with a bunch of fixes (including jazzband/pip-tools#1505) and looks like 6.8.0 is very close to including pip's 2020 dependency resolver, finally 🎉

@astrojuanlu
Copy link
Author

Developments from the past days:

@astrojuanlu
Copy link
Author

Recent developments:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment