Skip to content

Instantly share code, notes, and snippets.

@jimratliff
Last active October 8, 2022 16:24
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jimratliff/fc799e74e8104e6b05e6894ce8555144 to your computer and use it in GitHub Desktop.
Save jimratliff/fc799e74e8104e6b05e6894ce8555144 to your computer and use it in GitHub Desktop.
Discussion of conflicting uses of “package” when discussing Python and distribution of Python software.

Unpacking “package” terminology in Python

Introduction

“Package” is an overloaded term in the context of Python projects

In the context of Python projects, “package” is often used, confusingly, to refer to at least three non-equivalent entities:

  • A structure that organizes modules for importation or execution, often a directory containing a __init__.py file, that contains one or more Python modules.
  • The Python project itself
  • A file (such as a wheel (.whl) or a source archive) associated with a particular release and even a particular platform of a particular Python project. This file can be downloaded and installed to make the project’s code available to the user.

Leading Python organizations provide definitions that could and should prevent confusion, but even those organizations continue to use “package” in confusing ways

I compile definitions related to the above uses of “package” from the Python Language Reference, the Python Package Index (PyPI), and the Python Packaging Authority (PyPA).

Both PyPI and PyPA are aware that there can be confusion arising from use of term “package.” PyPI acknowledges that

Sometimes those terms [“project,” “release,” “file,” and “package”] are confusing because they’re used to describe different things in other contexts.

PyPA very helpfully distinguishes between “Import Package” and “Distribution Package.” PyPA notes that each of these two distinct concepts “is more commonly referred to with the single word ‘package,’” but assures that “this guide may use the expanded term when more clarity is needed to prevent confusion” between these two types of package. If only they followed through on that good idea! For example, although PyPA does officially define “distribution package,” it appears to walk away from it at every opportunity.

The terminological conventions I adopt

The conventions I adopt, and urge others to adopt, are nothing new. These exist already in documentation.

What is novel (revolutionary?) is that I pledge to take these seriously, and urge others to do so, too, rather than yield to the temptation of the false economy of using “package” unadorned when doing so is either unclear or misleading.

My resolutions:

  • “package” (without a modifier) or “import package” will refer to only the standard Python definition of a package as an organizer of modules (often a directory containing a __init__.py file).
  • “project”—not “package”—will refer to a collection of related releases and files and the information about them (and eschew “package” to refer to these).
  • “distribution package” will refer to a file (such as a .whl wheel file) that can be downloaded in order to install a particular release of a package.

In one sense, this proposal does nothing more than accept and adopt (slightly cherry-picked) official definitions. My proposal advocates for (a) eschewing “package” when “project” is meant and (b) embracing the official definition of “distribution package.”

Official definitions

Here I compile official definitions, by Python organizations that are more or less related to each other, of relevant terms. Nevertheless, there are tensions in the definitions across these organizations.

Standard Python definition of a “package”

Quite apart from projects on PyPI, package is a well-defined concept in Python. Packages help organize modules and provide a naming hierarchy. (See, e.g., §5.2 Packages from The Python Language Reference.) These can be:

  • a regular/traditional package, typically implemented as a directory containing a __init__.py file. When a regular package is imported, this __init__.py file is implicitly executed, and the objects it defines are bound to names in the package’s namespace. The __init__.py file can contain the same Python code that any other module can contain, and Python will add some additional attributes to the module when it is imported.
  • a namespace package, which is a composite of various “portions,” each of which contributes a subpackage to the parent package. Portions may reside in different locations on the file system. Portions may also be found in zip files, on the network, or anywhere else that Python searches during import. Namespace packages may or may not correspond directly to objects on the file system; they may be virtual modules that have no concrete representation. With namespace packages, there is no parent/__init__.py file. In fact, there may be multiple parent directories found during import search, where each one is provided by a different portion. Thus parent/one may not be physically located next to parent/two.

PyPI’s official definitions of “project,” “release,” and “package”

From PyPI’s Help documentation:

We use a number of terms to describe software available on PyPI, like “project”, “release”, “file”, and “package”. Sometimes those terms are confusing because they’re used to describe different things in other contexts. Here’s how we use them on PyPI:

A “project” on PyPI is the name of a collection of releases and files, and information about them. Projects on PyPI are made and shared by other members of the Python community so that you can use them.

A “release” on PyPI is a specific version of a project. For example, the requests project has many releases, like “requests 2.10” and “requests 1.2.1”. A release consists of one or more “files”.

A “file”, also known as a “package”, on PyPI is something that you can download and install. Because of different hardware, operating systems, and file formats, a release may have several files (packages), like an archive containing source code or a binary wheel.

PyPA’s definitions of “distribution package,” “import package,” and “project”

From the Python Packaging Authority’s Glossary:

Distribution Package

A versioned archive file that contains Python packages, modules, and other resource files that are used to distribute a Release. The archive file is what an end-user will download from the internet and install.

A distribution package is more commonly referred to with the single words “package” or “distribution”, but this guide may use the expanded term when more clarity is needed to prevent confusion with an Import Package (which is also commonly called a “package”) or another kind of distribution (e.g. a Linux distribution or the Python language distribution), which are often referred to with the single term “distribution”.

Import Package

A Python module which can contain other modules or recursively, other packages.

An import package is more commonly referred to with the single word “package”, but this guide will use the expanded term when more clarity is needed to prevent confusion with a Distribution Package which is also commonly called a “package”.

Project

A library, framework, script, plugin, application, or collection of data or other resources, or some combination thereof that is intended to be packaged into a Distribution.

Since most projects create Distributions using either PEP 518 build-system, distutils or setuptools, another practical way to define projects currently is something that contains a pyproject.toml, setup.py, or setup.cfg file at the root of the project source directory.

Python projects must have unique names, which are registered on PyPI. Each project will then contain one or more Releases, and each release may comprise one or more distributions.

Note that there is a strong convention to name a project after the name of the package that is imported to run that project. However, this doesn’t have to hold true. It’s possible to install a distribution from the project ‘foo’ and have it provide a package importable only as ‘bar’.

Discussion of the official definitions

The distinction drawn by PyPA’s definitions of “Import Package” and “Distribution Package” is appropriate and very helpful.

  • “Import Package” corresponds to the standard Python package, both regular/traditional packages (with a __init.py) and namespace packages.
  • “Distribution Package” corresponds to PyPI’s definition of a package (in the .whl or source archive sense), for example, a particular “wheel” (.whl) file, such as chess-1.9.0-py3-none-any.whl, or source archive, such as chess-1.9.0.tar.gz. (See the release chess 1.9.0.)

Bifurcating the concept of package into the import and distribution senses could go a long way to preventing and resolving much of the confusions that arise.

PyPA formulates helpful definitions with one hand, and ignores them with the other

Unfortunately, even PyPA continues to use “package” by itself when doing so causes confusion.

Take for example PyPA’s own “Installing Packages” tutorial. Before clicking on this tutorial’s link, ask yourself: What kind of package does this tutorial tell you how to install? Not sure? Does it help to know that the first sentence is the following?:

This section covers the basics of how to install Python packages.

“Python packages” should be a giveaway. If you Google the phrase “Python package,” you’ll find results like (a) “Python Packages,” by GeeksforGeeks, that goes into detail about the “import package” sense of package and (b) “Most Popular Python Packages in 2021,” by Kateryna Koidan, which also focuses on “A Python package is a directory of a collection of modules.” So that settles it: this tutorial tells you how to install a Python package in the sense of a directory that contains modules, right?

Well, nope. In that above-quoted sentence, the word “packages” is hyperlinked to the PyPA’s glossary entry for “Distribution Package,” i.e., a wheel file or source archive. So why didn’t PyPA simply say “Installing Distribution Packages” and preempt the confusion?

Or consider setuptools, a project of PyPA. The setuptools Quickstart endeavors to explain how to specify the metadata in your setup.cfg or setup.py file. One of the keywords for which you need to specify a value is name. The example they provide is:

name = mypackage

To what sense of “package” does this refer? An import package? A distribution package? Actually, neither of the above. This should be filled out with the name of your project. The name keyword is where PyPI gets the project name of your software.

Resolving these confusions

I distill my proposal for using “package” and “project”:

  • Use “distribution package” to refer to a .whl or source archive file to be downloaded and installed
  • Use “import package” or simply “package” to refer to a regular/traditional or namespace package that serves to organize modules.
    • Allowing simply “package” without any other qualifier to serve this purpose is justified because PyPI’s sense of distribution package is much less often used than the standard Python definition of import package.
  • Use “project”—not “package”—when you’re referring to the overarching software effort that over time spawns multiple releases and version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment