theory/extension-ecosystem-jobs-tools.md

## extension-ecosystem-jobs-tools.md

      
    Raw
  

              extension-ecosystem-jobs-tools.md
            
          
NOTE: Moved to the Postgres Wiki

Extension Ecosystem: Jobs and Tools

This document sketches an idealized Postgres extension ecosystem.
Why?

The past year has seen a surge in interest in Postgres extension distribution. A
number of people have noted in particular the challenges in finding and
installing extensions.
PGXN, released way back in 2011, aimed to be the canonical registry for the
community, but currently indexes only the 350 distributions that developers
have made the effort to publish --- perhaps a third of all [known
extensions][extension-gist]
Furthermore, PGXN distributes source code packages, requiring developers to
download, compile, install, and test them. Binary packagers and cloud providers
have added a limited number of extensions to their registries, such as
apt.postgresql.org, yum.postgresql.org, and the like, but neither PGXN nor
the pgxn client provide integration into such tooling.
As a result of these challenges, a number of new entrants have emerged in recent
months, all aiming to be the source of record for extensions and a focus on
discovery and ease of installation. These include dbdev, trunk,
pgxman, pgpm. They too exhibit limitations, such as minimal OS and platform
availability and, notably, the manual process to add extensions, further
constraining inclusion. None has even reached the number provided by PGXN.
These challenges and the interest and energy put into exploring new solutions
make clear that the time has come to revisit the whole idea of the PostgreSQL
extension ecosystem: to work though the jobs to be done,
specify the tools to do those jobs, and outline a plan for the broader
Postgres community to design and build them.
🎬 Let's get started.

Jobs to be Done

Theodore Levitt cites a famous adage:

"Last year 1 million quarter-inch drills were sold," Leo McGivena once said,
"not because people wanted quarter-inch drills but because they wanted
quarter-inch holes."
People don't buy products; they buy the expectation of benefits.

Today the most cited descriptor underpinning Jobs to be Done Theory, thanks
to Clayton Christensen, we bear it in mind to think through the "jobs" of an
idealized Postgres extension ecosystem --- without reference to existing
Postgres solutions. However, we cite examples for the jobs from other projects
and communities, both to clarify the jobs to be done and to reference prior art
from which to borrow.
So what are the jobs to be done by the ideal Postgres extension ecosystem?
Authoritative

Job: Be the official, canonical source of record for all Postgres extensions
Developers need to know the one place to reliably release and find all
publicly-available Postgres extensions. It can't be a subset, but must be
comprehensive, the assumed default recognized and used by the community and
included in the Postgres documentation.
Exemplars:

The Go Module Index indexes Go package that it
automatically discovers
crates.io indexes Rust crates published to it
PAUSE indexes Perl modules uploaded to it
RubyGems indexes Ruby Gems published to it
PyPI indexes Python distributions uploaded to it

Beneficial

Job: It's expected for extension developers to develop and publish their
extensions in the ecosystem.
Extension developers want their extensions to be a part of the ecosystem, to
gain all the advantages and privileges of participation, without undo effort.
Publishing an extension should be as straightforward as possible. Ideally it
simply requires maintaining a metadata file and publishing releases to the
authoritative index.
Exemplars:

Go Package Discovery automatically discovers and indexes
go.mod-configured packages
Registered crates.io developers publish Rust crates defined by Cargo
Manifest Files via the cargo CLI
Registered NPM developers publish NPM projects defined by package.json files
via the npm publish command
Registered PAUSE developers publish Perl modules defined by CPAN Meta
Spec files via the cpan-upload CLI or web API

Integrated

Job: It's easy for developers to start writing and publishing extensions.
Extension developers need excellent tooling to simplify the process of creating,
maintaining, testing, and distributing extensions. Such tools should be
well-integrated into the extension ecosystem, and empower developers to get
started and easily follow best practices throughout the extension development
lifecycle: from beginning development, managing dependencies and other metadata,
testing it, and releasing it.
Exemplars:

cargo
Go module-aware commands
Language Servers

Discoverable

Job: Make extensions easy to find and understand
There's no use in being the canonical index of available extensions if no one
can find them. There needs to be a well-known web site where people find
extensions, learn their purposes, distinguish them from each other, and read
their docs. Ideally most people use integrated search rather than a search
engine, but extension pages should appear high up in search engine results, too.
Exemplars:

pkg.go.dev
crates.io
metacpan.org
rubygems.org
pypi.org

Informative

Job: Make extension documentation consistent, standardized, and comprehensive
Extensions are only as easy to use as their documentation enables. It's
therefore important to provide well-organized, standardized, and accessible
documentation --- and that it be expedient for developers to document their
extension packages. Ideally development tools encourage the inclusion of
documentation in an extension. With standardized documentation, it's easy to
publish docs in multiple formats, especially on the canonical
discovery site.
Exemplars:

Go Doc Comments
rustdoc

Programmable

Job: Provide stable, easy-to-use, and comprehensive APIs with which downstream
developers can build other tools and products
Provide excellent, well-documented, intuitive web APIs that allow anyone to
build useful tools and services on the index. The jobs below require it, but
creative coders will invent and mashup APIs in ways we haven't thought of. Web
APIs enable imaginative solutions within the ecosystem and without. Some example
APIs:

Recent releases feeds
Extension metadata
Full text search
Event webhooks to notify registered services

Exemplars:

metacpan-api
RubyGems API
PyPI API
PyPI project releases RSS feed

Installable

Job: Provide automatic binary packaging for wide variety of platforms
It must be as easy as possible for users to install all indexed extensions.
Ideally, independent services consume the index APIs to be
notified of new releases and automatically build binary packages for as wide a
variety of OSes, OS versions, architectures (arm64, amd64, etc.), and
PostgreSQL versions as possible.
Some example repositories to support or enable to build and distribute
extensions from the core registry:

Community packaging: apt.postgresql.org, yum.postgresql.org
Binary packaging: Homebrew, Chocolatey, etc.
Vendor packaging & distribution: pgxman, dbdev, pgpm, trunk

Exemplars: ❓❓
Trusted

Job: Validate extensions and protect from supply chain vulnerabilities
Provide tools and features to earn and retain the user trust by ensuring that
that extensions indexed by the canonical registry have been
validated, come from trustworthy sources, and are unlikely to be a vector for
supply chain attacks. This means some combination of author validation (e.g.
badging projects from well-known sources), checksums, public key signing, and
perhaps static code analysis.
Exemplars:

How Go Mitigates Supply Chain Attacks
Improving Supply Chain Security for Rust Through Artifact Signing
GitHub Code Scanning
Docker Scout

Manageable

Job: Provide intuitive, powerful interfaces for installing and managing
extensions
Provide applications to manage the extensions in one or more Postgres clusters.
At a minimum provide a command line interface that manages extensions for a
locally-installed Postgres --- as currently required for it to install packages,
for example.¹
Users need to easily manage their Postgres extensions in a single, consistent
manner. The extensions manager should be aware of installed extensions,
available extensions, dependency graphs, and known vulnerabilities; allow for
search (via the APIs); and of course install or upgrade
extensions.
Exemplars:

cpan, cpanminus, cpm
cargo
Go module-aware commands
pip
gem
apt-get
yum


Tools

Tools fulfill the jobs to be done. Or rather categories of
tools, since some may be deployed a single, well-defined service, while others
denote a type of service or tool that may be implemented by multiple literal
apps (many of which we have yet to dream up!).
Core Registry

Jobs:

Authoritative
Beneficial
Programmable
Trusted

The core registry is the single, canonical service that indexes Postgres
extensions (ideally replacing hacks like this). It provides
the APIs for fetching the complete index, individual extension metadata, release
notifications, and more.
Some requirements to get to that point:

A community-defined and maintained metadata standard to enable indexing,
building, and installing extensions. Contents may include, among other
things:

PGXN metadata
control file metadata
Binary dependencies (configure, build, test, and runtime)
Unpackaged dependencies (Python packages, Perl modules, Ruby gems, Java
JARs, etc.)
Platform requirements (OS, architecture)
Conflicts
Required configurations (e.g., "must be in shared_preload", "requires
restart", etc.)
New metadata, e.g., TLE-enabled, extension type, etc.
Other useful metadata pioneered by the pgxman buildkit,
trunk.toml, and other registries


A versioning standard, such as SemVer, to recognize new releases and index
them appropriately and in meaningful order
An extension registration and hosting pattern; options to consider:²

Automatically index releases hosted on and namespaced to GitHub,
BitBucket, GitLab, etc., akin to the Go Module Index
Provide developer registration and host all indexed extensions
internally, with extension names defining namespaces; akin to PGXN,
crates.io, PAUSE, RubyGems, and PyPI


Curation and ownership: always use the distribution metadata provided by the
developer, but consider generating default metadata for extensions that
don't provide the metadata. But allow it to be superseded by the extension
owner if they later add it.

Trusted, registered clients provide metadata and can use the index for
additional use cases, such as:

Incrementing download counts
URI templates for additional links in the discovery site
(e.g., binary downloads)
Registering a webhook to be notified of new releases or other events

Notionally the Core Registry is a single service, but in reality it may
encompass multiple services. Some examples:³

Extension discovery
Event notification (webhook queuing and dispatch)
User management⁴ (registration, permissions)
Release management (indexing, namespacing)
Feed service
Sum service (auditable checksum database which will be used by the go
command to authenticate modules)

Prior Art:

PGXN Manager
Trunk
pgxman
dbdev
pgpm

API Service

Jobs:

Informative
Programmable
Trusted

A service to provide all the web APIs for the index and search
site (in fact, the search site itself should be a thin wrapper
around this API). This would be the canonical programmable
resource for developers building services and mashups with Postgres extension
data. It might also provide an API to fetch clean HTML rendered from extension
documentation.
All versions of extensions will be present and queryable via the API.
Over time it will expand to support additional features, such as:

Metrics collection from trusted third parties: download counts, test, build,
install successes and failures, etc.
Repository metadata (stars, commit count, last update, etc.)
Version comparisons (diffs)
User comments and/or reviews
Categorization/Tagging/Bundling

Prior Art:

PGXN API

Search Site

Jobs:

Discoverable
Informative
Programmable
Trusted

A web site that presents a compelling interface over the API,
providing effective search, categorization, standardized presentation of docs,
and links to other resources. Provides extension home pages generally linked by
third party search engines in the first few results when searching for extension
names.⁵
The web site is likely to be the most feature rich service; many features can be
added over time to the API and rendered here, such as detailed
metadata, version history, user reviews, user-provided documentation, bundling,
etc.
Required features:

Search engine indexing (Google, Duck Duck Go, Kari, etc.)
Full text search akin to pkg.go.dev and crates.io
Release home pages
User home pages
Beautifully-formatted extension documentation
Extension categorization
User feedback and reviews
Metrics for downloads, usage, most visited extensions, build/test/install
failures, vulnerabilities, etc.
User management UX (registration, user profile)
Release management UX (ownership, permissions, archiving)

Prior Art:

PGXN
Trunk
pgxman
dbdev
pgpm

Packaging

Jobs:

Installable
Trusted

Downstream services that consume from the core registry to
automatically, download and verify source code, then build and sign binary
distributions. Initial targets include apt.postgresql.org,
yum.postgresql.org, and packaging required for Tembo.
Such a service can be manually managed in the short term, as it is today: a
curated list of extensions with hand-edited specifications (RPM spec files, Apt
files, Homebrew Bottles, Chocolatey packages, etc.).
But work toward the ideal end state, where community packaging repositories
support all major platforms, OSes, PostgreSQL, and major versions of all,
automatically built from extension metadata provided by the core
registry upon release. The flow would look something like this:

Developers release versioned extension source files and metadata to the
core registry, which notifies trusted downstream services
Such trusted services install dependencies documented in the metadata, then
download, verify, build, test, sign, and upload to apt.postgresql.org,
yum.postgresql.org, Trunk, pgxman, etc.

A complementary service provides an API to represent each release and binary
packaging distribution, allowing clients to query for the appropriate packaging
repository for a given OS, architecture, and Postgres cluster (as well as source
distributions to be compiled into non-packaged PostgreSQL clusters).
Name Mapping Service

Jobs:

Installable

Automated packaging likely requires some sort of canonical name mapping service,
so an extension author can just require, say, libxml2 and the automated
packaging system can find out what things need to be installed to meet that
canonical name on each OS and version.
Prior Art:

❓❓

Installation

Jobs:

Installable
Discoverable
Informative
Programmable
Integrated
Trusted

An SDK and CLI and perhaps other interfaces that use the API and
Packaging services to quickly and easily verify and install
extensions on supported platforms, architectures, and Postgres versions. Should
also support source installation (perhaps borrowing from the packaging service's
techniques).
Ideally implemented as an SDK so that thin wrappers can be provided for various
environments: A CLI, a native Mac or Windows app, a web app, etc.

SDK to build interfaces for downloading and verify binaries from packaging
services --- or source from canonical index and compiling --- and
installing⁶ on wide variety of OSes (and architectures) and Postgres
versions
CLI for supported OSes (Linux, MacOS, Windows, *BSD) and architectures
(amd64, arm64, i386, etc.), with an extensible architecture to integrate
development features and allow third parties to quickly
features
Native apps or web apps?
Catalog of extensions for specific Postgres clusters, so the
installer/extension⁷ manager knows what is installed

Prior Art:

pgxnclient
pgxman
dbdev
trunk
pgpm

Development

Jobs:

Beneficial
Manageable
Integrated

An SDK and CLI and interfaces that use the API to manage the
extension development lifecycle, from starting an extension, managing
dependencies and other metadata, testing it, and releasing it to the core
registry.
Ideally an SDK and CLI that complements or, better still, extends the
management CLI by adding the appropriate subcommands. Features:

Initialize new extension project, scaffolding the essential files, metadata,
and community-recommended project organization.
Commands to edit metadata, eventually format and lint source files and docs,
add new files (for upgrades, tests, etc.), and run tests.
Commands to bundle, sign, and publish a release.
Continuous integration tools to enable testing on multiple platforms, OSes,
and Postgres versions.
Continuous deployment tools to bundle and release an extension.

Prior Art:

pgxn-utils
cargo-pgrx

Other Tool Ideas

Brainstorming other tool ideas, to be developed by anyone with the itch to
scratch.
Testing

Downstream service that consumes from the core index and automatically tests
extensions on all major platforms, OSes, PostgreSQL, and major versions of all.
May be part of the ideal end state for packaging, since there's do much
overleap? Provides a matrix for each extension⁸ as data and perhaps a
web page linked from the search site. Perl example.
Tooling

Editor plugins or a language server to support extension auto-completion.
Certification of Validation

A service similar to Testing but focused on additional quality
metrics, such as build, test, and install successes and failures; static code
analysis warnings and errors; vulnerabilities, vulnerable dependencies, coding
errors, etc. Reports issues back to the API service to appear on
site extension home pages.

Questions

Given the efforts made toward building extension Postgres distribution services
and tools in the past, not to mention the ambitions of some the project.

What will PGXN's role be in this ideal extension ecosystem?
What will be the role of other extension registries?
What should be the format of standard metadata, and what will become of
existing metadata (control file, PGXN META.json, Trunk.toml, pgxman
extension.yaml, pgrx Cargo.toml)?
What should be the source of record for identity? The PostgreSQL community
account might be ideal
Do we want to auto-discover and -index extensions akin to how Go package
discovery? If not, how can we make participation as
simple as possible? Includes metadata management, identity and
authentication, publishing and permissions.
Auto-discovery would also raise the question of namespacing. How would we
tie discovered extensions like github.com/theory/pg-semver to Postgres
extension naming conventions? If we just allow the last part of the name,
how would we deal with naming conflicts?
For new components, what language or languages should we use? Defer to
whatever each contributor is comfortable with? Establish some community
standards to increase credibility, and attract developers, and/or
simplify deployment?
Who's going to do what and how can be best collaborate? Do we nail down the
core registry and go from there? Something else?

Once we have some answers or move in the direction of solid answers to these
(and surely other!) questions, we can start to build a project plan.


Footnotes


Would be interesting to provide some method for bundling
extensions, so that one could, say, build a bundle in a CI/CD pipeline and
not need the CLI to install it on a production host. But on the other hand,
many will do this via Docker builds or similar patterns anyway. ↩


This really is the key decision to make. Go-style auto-discovery
is super cool and ensures comprehensive indexing without the overhead of
developer registration or source storage. On the other hand, it requires Go
package-style $host/$user/$package namespacing to avoid naming Conflicts
between hosting services. On the other other hand, it directly allows for
alternative and private source distribution. But see also the explanation
for why crates.io doesn't use GitHub for its registry. ↩


Inspired by Go registry services, among others. ↩


We'd also need to consider authentication and identity management. Do
we trust relevant OAuth2 providers like GitHub and
BitBucket? Or perhaps we require use of the postgresql.org
account? ↩


This will be an interesting challenge, as we don't want to overdo SEO
(which is gross and usually wrong), but need some method to get results
higher in the search results. Maybe best to host it on PostgreSQL.org and
reach out to search engine companies? ↩


Do permissions need thinking through? ↩


Integrate with Postgres core config or make a separate thing? ↩


Years ago someone actually implemented this! The service
consumed from PGXN and tested extensions on a matrix of Postgres versions
and OSes and made a very nice report. I'd meant to work with them to
standardize it, host it on a PGXN subdomain, and link from each page on
PGXN, but didn't get around to it and the project stalled. Will try to dig
it up and see if we can build on it. ↩