Skip to content

Instantly share code, notes, and snippets.

@arschles
Last active November 16, 2023 11:32
Show Gist options
  • Save arschles/5d7ba90495eb50fa04fc to your computer and use it in GitHub Desktop.
Save arschles/5d7ba90495eb50fa04fc to your computer and use it in GitHub Desktop.
Why We Should Use Monolithic Repositories

I think we should have all our code in a monolithic repository.

I've detailed the big benefits to having one, addressed possible issues to having one, and mentioned a few risks to not moving to a monorepo below.

Benefits To Adopting a Monolithic Repo

Golang package dependencies

  1. Single vendor/ dir at the top level of deis/deis
  2. All internal packages use the same external dependencies
  3. Internal package dependencies need not be vendored (for example, no need to vendor deis/pkg as we do now in many repos now)

Search & refactoring

  1. Search all Deis code at once on your filesystem
  2. Use the go toolchain on the entire Deis codebase at once (e.g. oracle, vet, ...). Same goes for other tooling
  3. Refactor a low level dependency (e.g. deis/pkg and fix all upstream code in a single PR. related to #2 and #3 in the previous section. Effectively, you get 'atomic' changes across the whole suite.
  4. If an RPC interface changes, modify all dependencies at once in a single PR

Builds

There is one and only one place to see the health of all Deis software

Community health

Any contributor or prospective contributor:

  1. Can look at one repository to gain context for how their contribution fits into the entire Deis ecosystem
  2. Only needs to set up one development environment on a single repository

Issues When Operating a Monolithic Repo

We need to determine what component(s) each commit or PR modifies, each GH issue refers to

  1. Commit messages and PR titles should include the component(s) that are being modified. I suggest prefixing the title with brackets that contain a comma separated list. For example: [builder,runner]. [all] is also valid.
  2. Issues and pull requests should be labelled with the component(s) that are being modified, so it's easy for maintainers to filter toward their area of expertise

We need to isolate running components

  1. Code can be tightly coupled but each component may still be loosely coupled. We already tightly couple to deis/pkg by vendoring it
  2. Our tests should verify that each service satisfies its component-level contract, even when underlying dependencies change. We already do this too

We need to build independent artifacts

  1. Our goal is to be able to release each component separately, without having to rev everything at once.
  2. We may achieve #1 by using Github tags that include the package names of the components that are being updated. For example, v2.1.1-builder indicates that we are releasing version 2.1.1 of builder
  3. We already encode the semantic version into each component's codebase, and that will not change
  4. We may write a simple shell script which parses the tag name for (1) the component(s) that should be released and (2) the new version number of the component(s), and then builds and releases those new components.
  5. This mechanism effectively namespaces Git tags on the mono repo, and still allows us to release individual components independently

We need to lock down privileges for a single component

We can't do this in a monolithic repo. We should enforce a rule that any contributor who we can't or won't trust to contribute to one of more components may not contribute directly to the monorepo.

Rebasing or merging master will be a nightmare!

All else being equal, a single monorepo will always move faster than multiple, smaller repos. However, rebase or merge conflicts carry the benefit of ensuring a pristine build across all Deis components.

See the below example scenario that illustrates the benefits of rebasing on a monolithic repo over multiple independent repos.

Timeline:

  1. Contributor A branches on deis/minio (which has a dependency on deis/pkg) and starts work
  2. Contributor B merges code to master (via the standard PR process) in deis/pkg that causes a rebase conflict for contributor A

Resolving the conflict on multiple repositories

Assuming A has a vendored deis/pkg in deis/minio, they don't have to resolve any conflicts. However, the conflict is deferred until the vendored package is updated. Later, a contributor C (who may be A, two weeks from now) will have cause to update the deis/pkg version, which will pull in a large number of changes at once, including the breaking one. Effectively we've batched and deferred the changes and conflicts, and assigned a third party to apply and resolve them later.

Resolving the conflict on a single repository

The breaking change is exposed immediately to A when the rebase conflict occurs (this may also be later in time). This process has the following benefits:

  1. Contributor A wrote the code, found the conflict, and will fix it. Most of the time, those three events will all happen within days at most. If contributor B needs to get involved, they will be more likely to remember details of their change
  2. Even if the rebase produced no conflicts, it happened soon after changes were made to master, and contributor A has a shorter changelist to look at. This fact is especially useful if tests begin failing after the rebase.

CI Builds will take forever!

Any change to a mono-repo will trigger the Github webhook and will start a Travis build. If build/test cycles in CI get prohibitively slow, we can write smarter logic to execute go build, go test and other applicable commands on portions of the tree that have changed. Google's Blaze build system and Twitter's Pants build system both do this on repositories that are orders of magnitude larger in size.

Risks of keeping independent repos

As code velocity increases (inevitable for OSS):

  • Each project will have a correspondingly harder time managing its dependency versions
  • Maintaining and improving packages at lower levels of the dependency graph will become more expensive in programmer time
  • External dependency versions are more likely to conflict
  • Contributors face a higher barrier to entry to get their development environment set up (clone multiple repos, get builds running, merge upstream on multiple repos, ...)
@technosophos
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment