Skip to content

Instantly share code, notes, and snippets.

@lelandbatey
Last active January 20, 2024 11:27
Show Gist options
  • Save lelandbatey/8383da16b197ec4fbe7270b186bf8a61 to your computer and use it in GitHub Desktop.
Save lelandbatey/8383da16b197ec4fbe7270b186bf8a61 to your computer and use it in GitHub Desktop.
Python Packaging vs Golang Packaging: Breaking down the differences in terminology and what it means for you

Python Packaging vs Golang Packaging

The original question was:

But I'm not sure that the library name = "cp-compat-logs-logger" defined in pyproject.toml would work. I tried importing that in my httpclient library and it complained. Also, I see in the log_bridge library the logger is imported as from cp_compat_logs_logger.logger import Logger, but cp_compat_logs_logger is not the library name, so how is that working?

I think you've asked a totally valid question about "what's up with the names here?" The short answer is "Python has messy conventions, so the name you use to poetry install <name> is different than the name you use in code when import name." It's conventional to have the poetry install <name> (let's call this the "distribution name") use dashes as a delimiter. However, actual module names in Python cannot have dashes, so the name used in code during import name (let's call this the "module name") will (usually) use underscores in place of dashes.

Longer explanation with more context:

How do Golang packages work (as a point of comparison)

I DESPERATELY wish Python packages/modules/libraries worked the way that Golang works but sadly Python packaging doesn't work the way that Golang packages work. In Golang, a folder with .go files in it is a "package" and everything in that package is compiled together. In Golang, a module is a collection of related Go packages (folders with .go files) which contains a go.mod file at the root. A Golang module has a single "module path" which you use to import the module in other code, and you declare it in go.mod with a module <import path> statement. If you make the module available on the network at a URL, and you make the "module path" of that module the URL where you can find that module, e.g. "github.com/gorilla/mux", then the Golang tooling will try to download that module from that location. Thus the act of distributing a Golang module means making the source code available over the internet (typically inside a version control system (VCS) repository, such as a Git repo). In this way "ability to access the source code git repo of a Go module" and "ability to install a Go module" are synonymous, and this is one of Golang's biggest innovations in the code packaging landscape. Amazing! So convenient! Look how well thought out and consistent this all is!

Let's review terms:

Term Definition
Go package A folder with .go file in it. A package can have sub-packages.
Go module A collection of related Go packages with a go.mod file at the root
Distributing a Go module Make the module available on the internet (usually in a git repo) so that the "module" name is also where to download the module

How Python packages work

Sadly, the world of Python packaging is NOT clear and cohesive in the way Golang modules are. Golang has made all our lives easier by bringing two separate concepts together so that being able to do one means being able to do the other; if you can clone the Git repo of a Go module, then you can also seamlessly install, develop against, and have your program depend on that Golang module. This is all made possible because Golang only distributes libraries in source-code form.

The key difference between Python and Golang is the same as the difference between other older languages and Golang, older languages like C and Java. Just like in the world of C and Java, Python has "the code" and has the "compiled distribution" which is created from the code. This distinction exists in C, Java, and Python, but not in Golang, and is emblematic of the difference in packaging between these older languages and Golang. Like the distinction between "the code" and "the compiled distribution", the Python packaging world has more interrelated concepts than Golang, and each concept has subtle but important distinctions from other related concepts. Let's dig into the vocabulary around those concepts, then we'll cover how those concepts relate to Golang concepts.

In Python, a Project is a "library, framework, script, plugin, application, or collection of data or other resources, or some combination thereof that is intended to be packaged into a Distribution." A Distribution or Distribution Package is "a versioned archive file that contains Python packages, modules, and other resource files that are used to distribute a Release. The archive file is what an end-user will download from the internet and install." A package or Import Package is "a Python module which can contain other modules or recursively, other packages.". A module is "the basic unit of code reusability in Python", typically in the form of a pure module which is "a Module written in Python and contained in a single .py file (and possibly associated .pyc and/or .pyo files)." A Release is "a snapshot of a Project at a particular point in time, denoted by a version identifier." Note that a distribution is necessarily a release, but packaged up so that it can be distributed.

Term Definition Example
Project "library, framework, script, plugin, application, or collection of data or other resources, or some combination thereof that is intended to be packaged into a Distribution." A folder with a pyproject.toml in it is a project, since pyproject.toml is a format for describing Projects in the Python ecosystem.
Distribution or Distribution Package "a versioned archive file that contains Python packages, modules, and other resource files that are used to distribute a Release. The archive file is what an end-user will download from the internet and install." A .whl file is an example, which is what pip or poetry downloads from a PyPi when installing a Python package.
package or Import Package "a Python module which can contain other modules or recursively, other packages." A folder which contains a file named __init__.py is an import package, and can contain other sub-packages (recursive subfolders each also with an __init__.py) or modules (.py files in the folder)
module & pure module "the basic unit of code reusability in Python", typically in form of a pure module which is "a Module written in Python and contained in a single .py file." A .py file is a module. A .py file in a folder with a __init__.py file in it is a module within a package. Note that when actually importing code, when you import a package it's imported as a "module object" inside the running python program. We use module/package here as a filesystem and package-management term, but "packages" don't really exist as a thing when writing Python code.

How these relate to each other can be described as follows:

Project is a library/framework/application/etc written in Python. They're composed of
    → packages (folders with __init__.py) which are then themselves composed of
        → modules, .py files

Projects are installed by programs like pip and poetry by downloading
    → distributions, usually in .whl files which are special zip files containing
        → packages (folders with __init__.py) which are then themselves composed of
            → modules, .py (or .pyc) files on disk
        → a distribution also contains other special files which help tooling
          know exactly how to install all the other
          dependencies/scripts/applications/etc in the distribution in order
          for the "project" to be correctly installed.

Differences between Python and Golang

This table does its best to compare the Golang and Python terms around packaging:

Term Equivalent Term in Go
Project Go module (git repo with go.mod in it)
Distribution or Distribution Package Doesn't really exist in go, can think of it as combined with concept of module
package or Import Package Go package
module & pure module Go can't address down to the individual file in the way Python does, so there isn't an equivalent in the Golang world.

So what's this all this have to do with the bad names?

What this all means is that we have a Project named cp-compat-logs-logger which is published to our internal PyPi registry as a distribution with a name like cp-compat-logs-logger-0.1.0. The name of this distribution is based on the name of the folder with the pyproject.toml file in it, which is cp-compat-logs-logger. However, inside of that distribution the top-level python package (and remember, packages are folders with __init__.py in them) which is installed is called cp_compat_logs_logger. The name of the top-level Python package comes from the package in the Project, which is cp_compat_logs_logger. So to summarize:

cp-compat-logs-logger/ # ← This folder name determines the `poetry install <name>`
├── pyproject.toml
└── cp_compat_logs_logger/ # ← This folder name determines how you `import <name>` in code
	├── __init__.py
	└── logger.py
@ashish111333
Copy link

thanks,this was helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment