Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save sarah-savage/64057b1db4d2e00a86434e61c0af6d20 to your computer and use it in GitHub Desktop.
Save sarah-savage/64057b1db4d2e00a86434e61c0af6d20 to your computer and use it in GitHub Desktop.

Visions for a distributed package repository in WordPress

Many if not most WordPress users are aware now of the challenge that having a single-point-of-failure in the package ecosystem provides. Even though WordPress users are (currently) able to upload plugins directly through the user interface, distributing a plugin outside the repository that .org offers is incredibly challenging.

AspirePress exists entirely to solve this problem.

Our focus is on building a sustainable, distributed, federated model of managing and distributing packages for WordPress.

The advantage of distributed over centralized

The internet is known for having core systems be distributed. In fact, the internet itself is a distributed system: any person can connect to the internet from anywhere and interact with essentially any other machine online today – if they know where to find it. How do they find those machines? That’s what DNS is for – another distributed system – which provides an easy translation layer for example.com to go to 93.184.215.14.

Distributed systems aren’t without their drawbacks: Distributed systems aren't without drawbacks like eventual rather than instant consistency (think DNS). But they do offer one thing that we currently do not have in the WordPress ecosystem: an inability to be controlled by any single party.

While the internet surely has authorities for determining good actors from bad, it essentially allows all peers equal access to all other peers. The institution of authorities on the internet to determine good actors from bad was an add-on, not a design feature: the concept of spam filtering and mitigation of DDoS attacks come aslater components, not originally built-in ones.

Distributed systems are designed to prevent any one person or system taking them over. If I drop the DNS records that I manage from the internet, the only thing that happens is that my (and anyone else’s records that I control) are lost from circulation. Anyone pointing those domains at new nameservers and replacing the missing records would effectively recreate the access that was lost.

Distributed repositories can work similarly. In a truly distributed, federated model there is no “single point of failure” or “authority”. Instead, each peer is responsible for determining the validity, acceptability, and authority of each peering node, and assigning levels of trust to the information they serve. Some peers may choose not to federate with other peers. Some peers may accept packages from peers while rejecting others

How the ACF/SCF fiasco could have been avoided

Until now, the only source of truth for automatic updates to WordPress is through the WordPress.org API, which is hard-coded into every installation of WordPress out there. It’s not even configurable in the wp-admin.php file. And with plugins leaving the WordPress repository, and being blocked/closed by WordPress.org, we now have many sources of truth, and a fractured community ecosystem.

In that model, it was simple for the ACF/SCF fiasco to unfold: replacing ACF with SCF on the same slug meant that anyone asking for updates got SCF without any kind of checksum or signature being checked. It was implicitly trusted, and accepted by WordPress.

Contrast this with a distributed system where one system goes rogue and replaces a plugin with another plugin, perhaps containing malware (SCF did not contain malware). Upon discovery of the malfeasance, the distributed network could choose to defederate with the offending repository, as well as refuse to serve that content. For a short time this would create a system where some users are at risk and others are protected (a drawback of eventual consistency). But it would eventually resolve in favor of the authentic, genuine software being distributed to everybody.

In short, the kind of supply chain attack executed recently by WordPress.org would be difficult to execute. And there are ways of making it even more difficult to execute, as well.

Code signing and authenticity checking

One way of ensuring that users get what they’re asking for is to check – against a known good source - for authenticity.

For example, a plugin author would create a package of their asset, and then push it to the package repository of their choice with a signature that signs a hash of the file with a private key. The public key, known to the repository, can verify the authenticity of the signature.

Upon reciept, the repository would compare what it received with the hash, and if they match, validate the signature to ensure the hash is valid. It would then apply its own private key signature to authenticate that it verified the information, and distribute the entire chain to all the other mirrors for distribution.

With this system we have some advantages. First, we know that the plugin author authored the commit. Since a private key is kept secret by the plugin author, even if someone hard forked the repository they can't authenticate the releases they produce (ACF couldn't have been forked under this model).

Next, because the repository attests to the authenticity of the plugin, and the repository is trusted by other peers, that information can be trusted as authentic without rerunning the checks. If a repository is particularly paranoid, it can rerun the checks and even issue its own certificate of authenticity.

This isn't new technology either: JWTs work similarly. The first two portions of a JWT are signed by either a symetric or asymetric key. We're proposing an asymetric key to ensure no one party holds "all the keys."

The best part is that this process can be entirely automated. For example, a private key stored as a secret in GitHub can be used to sign the package, and then GitHub publishes the public keys of users for authentication purposes. The repository can simply check a list of trusted public keys to verify that the right key was used for signing. If the private key is compromised, that key can be dropped from the public keys, and the mirror will no longer consider it a valid authentication source.

This process also significantly improves the current model, which offers no verification that a package was provided by the author other than the SVN account of the user being authenticated. This very system allowed the ACF/SCF crisis to occur.

Trust amongst peers

In order for a distributed model to work, there has to be trust between peers. Therefore, we am proposing a model that offers three levels of trust. Peers also always have the option of Zero Trust, meaning they do not trust a peer at all and ignore anything the peer provides.

  1. Basic Trust. This level of trust requires verification of a peer through another more trusted peer. For example, if a Basic Trust peer were to publish a new plugin, the peer implementing Basic Trust would either have to check authenticity of the package for itself, OR trust another peer to have completed the same checks, befoere trusting that the peer is authentic. This is useful for new peers that the community does not know well.
  2. Implicit Trust With implicit trust, a peer trusts that another peer does the verification required and generally is reliable as a source of truth. However, it still prefers the next level of trust over this peer. So, if another source provides information that conflicts with a peer at the Implicit Trust level, the higher level source controls and overwrites the information from the lower level peer.
  3. Source of Truth Trust When a peer is a Source of Truth, it is considered an authority for all things related to packages and assets. For example, that might be AspireCloud itself, or another trusted partner that disseminates large numbers of plugin and theme updates from trusted sources. There should be few Source of Truth Trust levels set, and the goal of a distriuted system is to have >1 to ensure that the system is not vulnerable to a single source of truth, but at this level, the trust implied is absolute.

A peer must have at least one other peer that is an Implicit Trust peer in order to recieve updates. All peers generally start at the Basic Trust level, and must manually be elevated to the Source of Truth Trust. However, when creating a mirror, it would be assumed that mirrors would have the option to assert one or two Soruce of Truth Trusts to pull their data from.

Conclusion

This is a vision, not a technical architecture for the future. It outlines the goals of a distributed, federated system that implicitly and explicitly trusts other peers, and offers a vision of a world where the ACF/SCF fiasco would be impossible. AspirePress is looking for individuals interested in working on specification for this vision, and the development of a standard for mirrors to pass information to one another, not just in the WordPress space but in any space where code repositories are distributed over HTTP(S).

@toderash
Copy link

change "Distributed systems aren’t without their drawbacks: latency, eventual consistency, etc. But"
to "Distributed systems aren't without drawbacks like eventual rather than instant consistency (think of DNS caching), but"
because latency can improve with distributed systems, and "eventual" is a scary unquantified word

change "Institution of authorities on the internet" and "extra components, not built-in ones."
to "The institution of trusted authorities on the internet" and "later components, not originally built-in ones."
because smooth wording and some components now built in even though not original

change "Distributed systems are designed to be resilient from any one person taking them over. if tomorrow, I drop"
to "Distributed systems are designed to prevent any one person or system taking them over. If I drop"
because style, clarity, capitalization

change "determining the validity, acceptability and authority of each other peer, and deciding what to do with their information. Some peers may choose not to federate with other peers. Some peers may accept packages from peers and not others"
to "determining the validity, acceptability, and authority of each peering node, and assigning levels of trust to the information they serve. Some peers may choose not to federate with other peers. Some peers may accept packages from peers while rejecting others"
because style and clarity (add comma, rewording)

change "supply chain attack executed by WordPress.org"
to "supply chain attack executed recently by WordPress.org"
because specificity

change "In the model that I have toyed with, a plugin author would create"
to "For example, a plugin author would create"
because remove implied uncertainty

change "I am proposing"
to "We are proposing"
because speaking for group adds some already-established consensus for increased impact - same for any other "I" statements

change "HTTP" to "HTTP(S)"

Rule of thumb for acronyms: define on first use, like "Advanced Custom Fields (ACF)" then can use ACF thereafter. Also JWT, etc.

wrt peer trust, source of truth trust, can add that DNS works this way. Any nameserver can mirror domain records from another, but for certain domains it is tagged as the source of truth, so anyone wanting to find aspirepress.org has to ask ns.whatever what server it lives on if they don't already know the answer. Each domain also carries a "TTL" for when they must update their cache, basically saying something like "If you haven't checked with me in the last 3 hours, you should ask again in case something has changed."

Not sure how technical / non-technical this needs to be, but removing some TLAs & jargon and explaining some concepts might help the mid-tech audience. (Low-tech probably need not apply.)

@toderash
Copy link

change "Right now, the only source of truth" to "Until now, the only source of truth" - add something about some plugins now leaving the .org directory, either because they're blocked or because they've chosen to withdraw. This means we already have multiple sources of truth for different plugins.

also, section on signing & authenticity, can we add something about this being an improvement over the current system? i.e., improvement over the now-splitting to multiple unsigned sources of truth as well as an improvement over .org's unsigned source of truth, which it didn't do because it presumed a monopoly.

@asirota
Copy link

asirota commented Oct 17, 2024

One other thing about distributed systems -- they gain value as they scale with more distribution. Meaning that the more content in each mirror and the more mirrors the more overall the distribution network is valued. Think movies/movie theaters, books/libraries etc etc.

I think it's important to note that when talking about distributed systems to articulate that it is valuable to build such a system and the value is distributed as well instead of concentrated at one node (as it is today for the most part with WP.org)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment