Skip to content

Instantly share code, notes, and snippets.

@astrojuanlu
Last active February 9, 2023 09:04
Show Gist options
  • Save astrojuanlu/9a7419d3281d4689ae05df613e0bf9c0 to your computer and use it in GitHub Desktop.
Save astrojuanlu/9a7419d3281d4689ae05df613e0bf9c0 to your computer and use it in GitHub Desktop.
Rough notes for a potential blog post or series "Python packaging is better than you think"

title: Python packaging is better than you think created at: Mon Jun 20 2022 17:48:50 GMT+0000 (Coordinated Universal Time) updated at: Mon Jun 20 2022 17:48:58 GMT+0000 (Coordinated Universal Time)

Python packaging is better than you think

Alternative titles: "Stop saying Python packaging is terrible", "Python packaging for the 99 %"

Proof that there's an audience for this: https://twitter.com/juanluisback/status/1538936104824492033

[https://twitter.com/juanluisback/status/1538936104824492033]

Unbundle what people really mean when they say that "Python packaging is bad":

  • bootstrapping Python for development
    • OS-specific
    • surprisingly, more difficult on Linux, since there are too many options and also Python is a core part of the system
    • hard only because there is no canonical method or bad docs
    • problem solved by Anaconda
    • pyenv too, more intrusive and Linux specific but offers a wider range of Python versions
  • diagnosing packaging problems
    • a real mess because bootstrapping is hard and therefore people end up with chaotic Python installations
    • takes skill, but some simple tricks: which python (tells you where does it come from), which pip , python -m pip to make sure, import sys; print(sys.prefix) to be really sure
  • installing system-wide binaries based on Python
    • use pipx or fades and forget about it
    • avoid system Python like the plague
    • you could use environments for this, but you'd have to remember to activate it, which is not very convenient: avoid it if you don't need it!
  • managing environments
    • absolutely not OS-specific after the bootstrapping is done
    • only 2 kinds of environments exist
      • conda environments, managed by conda
      • python environments, managed by stdlib venv, pyenv, virtualenv, PEP 582
  • dealing with non-Python dependencies
    • Python native solution for non-Python dependencies is bundling shared libraries inside wheels. mostly works!
    • however, wheels can be quite fat (tensorflow, pytorch), not have enough specificity (GPU vs non-GPU etc), not be available for certain packages (RAPIDS), or lead to incompatibilities (Cartopy & rasterio)
    • conda solves this, and it will be difficult that pip solves this for the general case. use conda, it's fine!
  • declaring environment dependencies
    • Python cannot install/import several versions of the same package in the same environment as opposed to Node.js
    • that might be a good thing though! security patches are applied uniformly. too long to discuss
    • but this leads of course to conflicts! which must be handled somehow
    • libraries doing weird things with dependencies is not Python's fault (now upper version pinnings are frowned upon for example)
    • pip solves dependencies these days! even though backtracking is often not verbose enough for good diagnosis
    • mamba is a blazing fast replacement of conda
  • installing environment dependencies
    • conda, pip, poetry, pdm work fine, there are probably others
    • but there's lots of outdated advice: Pipenv is largely dead
    • conda and pip don't interoperate very well, so they need to be combined with care
    • pip-tools and poetry are currently lagging behind in terms of standards adoption and bug fixing, but they are excellent projects and will get there with some time
  • publishing packages
    • nowadays most needs are solved by PEP 621 pyproject.toml
    • you can use setuptools, flit, hatch and pdm and your metadata will look 90 % the same
    • a separate tool i.e. twine is needed for publishing, is it really that bad?
  • hot-reloading
    • editable installations are now standardized, not a problem for the majority
    • unless you're using Meson, like SciPy does, in which case there's still no good solution

Inspiration and links

@zooba
Copy link

zooba commented Jun 21, 2022

Great overview of the issues out there! Looking forward to the posts.

avoid system Python like the plague

Unless you're distributing your app in another system package 😉 Since this is exactly what it's for.

python environments, managed by stdlib venv, pyenv, virtualenv, PEP 582

PEP 582 is essentially withdrawn. If you're going to mention it, please only mention it in the context that the existing mess made it too hard to migrate to something simpler without breaking everyone's edge cases.

Pipenv is largely dead

I think they've restarted recently, but the point about the ridiculous breadth of advice out there is correct. There's way too much to know where to even start, or to get a sense of what's a real recommendation.

nowadays most needs are solved by PEP 621

This recent thread may be informative on this point: https://discuss.python.org/t/pep517s-definition-of-frontend-and-backend-is-unclear-to-me/16575 (as is the packaging forum more generally, but I think I've seen you following it?)

hot-reloading ... in which case there's still no good solution

If you've got any native code at all, there's still no good solution. I personally just put my src directory in PYTHONPATH and skip the editable install anyway.

@astrojuanlu
Copy link
Author

Thanks a lot @zooba! I agree with all your remarks

@henryiii
Copy link

Looks good! A few thoughts while reading:

pyenv is Unix specific, not Linux. Also it's a fork of rbenv - the Ruby version is very standard, while the Python version is more of a "one of many" choice. Biggest issue with it is the shim mechanism is terrible and breaks python discovery for most tools (tox, nox, CMake, etc). I still use it though to grab a specific version of Python and just deal with the fact that nox "fails" on all the versions it can't actually access because they are shims. There are some projects out there for simpler binary distribution of Python that might help in the near future.

Strong second on pipx. That also solves the issue of using separate tools (twine, build) - just use pipx run twine or pipx run build. Then they are never more than a week old and you don't have to pre-install anything. I use pipx run for pretty much everything that's not in homebrew and used daily.

For wheels, I'd say it's fine to use them unless you need something specific (like heavy data science work needing GPU PyTorch or something), then it's fine to use Conda. You don't need conda just for simple compiled dependencies - many libraries ship cibuildwheel built wheels these days. I'd recommend at least mentioning cibuildwheel, as it's been huge in simplifying compiled wheel building. That reminds me - compiling your own code with conda is usually painful - the compilers conda-forge package really helps, but if you don't include that, you often mix system and conda compilers and segfault. pybind11 gets an issue about once a week on that, not counting Gitter, I think.

I'd mention hatch in the list of environment tools; it doesn't provide locking environments yet, but it does provide multiple environments (think nox/tox), which pdm/poetry do not. I'd probably call the PEP 517 backend "hatchling" as well, just to differentiate the two parts.

Is pip-tools lagging? At least pip-compile, there's no standard lock file format, so not sure there's a standard there to lag behind. And Poetry plans to support PEP 621 in Poetry 2.0 (but they've talked about 1.2 for years, and are over a year past first alpha for it and it's still not out; so they expect 2.0 to be very far off).

I like the mention of no capping, obviously. :)

Scikit-build is likely to be rewritten like Meson is. But we still won't have a solution of live reloading. There's also a project to make extension building something pluggable into PEP 517 builders (extensionlib by @ofek is a start on that project).

PS: Not sure if all or any of the above needs to be in the article series, just pointing out things to make sure you know them, and then you can and should decide no the perfect level of detail to include. :) I usually put too much. ;)


Couple of followups for @zooba's notes:

PEP 582 is used by PDM by default (can be disabled), and I think I've seen it as an option on at least one other. It's a shame if it's completely dead, as manually enabling it as default in Python and using it with PDM is really pleasant. Maybe a good replacement would be to mention the Python Launcher for Unix? It has native support for .venv which is also great and fills a similar purpose.

Agree, Pipenv is maintained by the maintainer of PDM (@frostming), and is very much not dead. It's not the "one and only correct way to manage environments" either, which is where it went wrong for a while.

I second highlighting that the system Python is used for making system packages. It's also "okay" to use the system Python for venv's if you are happy with the version (don't know if it's great practice, but it is easy and works without risking breaking anything).

@zooba
Copy link

zooba commented Jun 21, 2022

PEP 582 is used by PDM by default (can be disabled), and I think I've seen it as an option on at least one other. It's a shame if it's completely dead, as manually enabling it as default in Python and using it with PDM is really pleasant.

I mean, I obviously agree that it's a great workflow 😄 But hopefully referencing PDM is enough to give people the information that there's simpler/lighter options than venv without suggesting that it's on its way to become a CPython standard.

@astrojuanlu
Copy link
Author

Thanks a lot @henryiii for the comments! Just a minor follow-up

Is pip-tools lagging?

These two PRs are being a challenge jazzband/pip-tools#1539, jazzband/pip-tools#1329 and mean that pip-tools does not work well with the new pip versions. Hopefully they'll get to fix these issues soon though, I love pip-tools (fingers crossed)

@benjyw
Copy link

benjyw commented Jun 27, 2022

Thanks for taking on this important, interesting, nuanced (and occasionally frustrating) topic!

I wonder if it might make sense to talk about pex in this context. In case you're not familiar with it, it's a tool for packaging first-party and third-party code into a single executable file. So deploying python code becomes as simple as copying that file (in a container, or directly to a machine), as long as a compatible python interpreter is discoverable on the target. Pex uses pip under the covers to resolve requirements, and it can generate and use cross-platform lockfiles so that pex builds are repeatable.

You can use the pex tool directly, or via pants, which is a full-fledged polyglot build system with a design focus on scaling Python repos, that uses pex heavily.

The pex/pants stack is, among other things, an attempt to tackle the python packaging challenge with rigor, making those kinds of workflows faster and repeatable even across multiple platforms, and in large or growing codebases.

[Full disclosure: I am one of the maintainers of pex and pants, so necessarily opinionated... that said, happy to provide more info if desired]

@henryiii
Copy link

How does pex compare with a .pyz file? I've supplied a .pyz file for Particle for a long time, and other than not being very well documented (such as how to pack the packages you depend on), it does work fairly well. Though it doesn't really support downloading dependencies or binary dependencies, but on the flip side I guess pex needs an Internet connection. (I need to go watch the pants PyCon talk in entirety, I only saw a bit of it live)

@benjyw
Copy link

benjyw commented Jun 27, 2022

pex is similar in spirit to zipapp, but:

  • pex has robust support for third-party requirements, while zipapp only supports zipping up first-party code from a single directory. Pex can bundle first-party code (from any selection of files and directories) and also (transitive) third-party requirements (you can sort-of do this with zipapp by first installing the requirements into the source directory, but that's a bit of a copout).

  • A pex can be multi-platform, containing wheels for multiple platforms in a single file.

  • pex can generate a lockfile so that third-party requirements can be resolved repeatably and much more quickly than by rerunning a pip resolve.

  • A lot of effort has gone into performance optimization in pex. For example, a pex can run like zipapp, directly from the zipfile. It can also splat itself out to disk on first invocation. Or it can create a proper venv on first run, so that subsequent runs have the same performance as a venv.

  • pex is very flexible, with many build and runtime options, such as virtually "merging" multiple pex files at runtime.

Hope this helps! Happy to provide more info.

@astrojuanlu
Copy link
Author

Thanks a lot @benjyw for adding pex to the mix! I think "packaging Python applications in binary form" is a whole different topic though, probably deserving its own blog post. However, I'd say 99 % of people struggle with more basic stuff.

@astrojuanlu
Copy link
Author

I came here to amend my "pip-tools is lagging" comment: pip-tools 6.7.0 included jazzband/pip-tools#1519 with a bunch of fixes (including jazzband/pip-tools#1505) and looks like 6.8.0 is very close to including pip's 2020 dependency resolver, finally 🎉

@astrojuanlu
Copy link
Author

Developments from the past days:

@astrojuanlu
Copy link
Author

Recent developments:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment