I think we should have all our code in a monolithic repository.
I've detailed the big benefits to having one, addressed possible issues to having one, and mentioned a few risks to not moving to a monorepo below.
Benefits To Adopting a Monolithic Repo
Golang package dependencies
vendor/dir at the top level of
- All internal packages use the same external dependencies
- Internal package dependencies need not be vendored (for example, no need to vendor
deis/pkgas we do now in many repos now)
Search & refactoring
- Search all Deis code at once on your filesystem
- Use the
gotoolchain on the entire Deis codebase at once (e.g.
vet, ...). Same goes for other tooling
- Refactor a low level dependency (e.g.
deis/pkgand fix all upstream code in a single PR. related to #2 and #3 in the previous section. Effectively, you get 'atomic' changes across the whole suite.
- If an RPC interface changes, modify all dependencies at once in a single PR
There is one and only one place to see the health of all Deis software
Any contributor or prospective contributor:
- Can look at one repository to gain context for how their contribution fits into the entire Deis ecosystem
- Only needs to set up one development environment on a single repository
Issues When Operating a Monolithic Repo
We need to determine what component(s) each commit or PR modifies, each GH issue refers to
- Commit messages and PR titles should include the component(s) that are being modified. I suggest prefixing the title with brackets that contain a comma separated list. For example:
[all]is also valid.
- Issues and pull requests should be labelled with the component(s) that are being modified, so it's easy for maintainers to filter toward their area of expertise
We need to isolate running components
- Code can be tightly coupled but each component may still be loosely coupled. We already tightly couple to
deis/pkgby vendoring it
- Our tests should verify that each service satisfies its component-level contract, even when underlying dependencies change. We already do this too
We need to build independent artifacts
- Our goal is to be able to release each component separately, without having to rev everything at once.
- We may achieve #1 by using Github tags that include the package names of the components that are being updated. For example,
v2.1.1-builderindicates that we are releasing version 2.1.1 of builder
- We already encode the semantic version into each component's codebase, and that will not change
- We may write a simple shell script which parses the tag name for (1) the component(s) that should be released and (2) the new version number of the component(s), and then builds and releases those new components.
- This mechanism effectively namespaces Git tags on the mono repo, and still allows us to release individual components independently
We need to lock down privileges for a single component
We can't do this in a monolithic repo. We should enforce a rule that any contributor who we can't or won't trust to contribute to one of more components may not contribute directly to the monorepo.
Rebasing or merging master will be a nightmare!
All else being equal, a single monorepo will always move faster than multiple, smaller repos. However, rebase or merge conflicts carry the benefit of ensuring a pristine build across all Deis components.
See the below example scenario that illustrates the benefits of rebasing on a monolithic repo over multiple independent repos.
deis/minio(which has a dependency on
deis/pkg) and starts work
Bmerges code to
master(via the standard PR process) in
deis/pkgthat causes a rebase conflict for contributor A
Resolving the conflict on multiple repositories
A has a vendored
deis/minio, they don't have to resolve any conflicts. However, the conflict is deferred until the vendored package is updated. Later, a contributor
C (who may be
A, two weeks from now) will have cause to update the
deis/pkg version, which will pull in a large number of changes at once, including the breaking one. Effectively we've batched and deferred the changes and conflicts, and assigned a third party to apply and resolve them later.
Resolving the conflict on a single repository
The breaking change is exposed immediately to
A when the rebase conflict occurs (this may also be later in time). This process has the following benefits:
- Contributor A wrote the code, found the conflict, and will fix it. Most of the time, those three events will all happen within days at most. If contributor B needs to get involved, they will be more likely to remember details of their change
- Even if the rebase produced no conflicts, it happened soon after changes were made to
master, and contributor A has a shorter changelist to look at. This fact is especially useful if tests begin failing after the rebase.
CI Builds will take forever!
Any change to a mono-repo will trigger the Github webhook and will start a Travis build. If build/test cycles in CI get prohibitively slow, we can write smarter logic to execute
go test and other applicable commands on portions of the tree that have changed. Google's Blaze build system and Twitter's Pants build system both do this on repositories that are orders of magnitude larger in size.
Risks of keeping independent repos
As code velocity increases (inevitable for OSS):
- Each project will have a correspondingly harder time managing its dependency versions
- Maintaining and improving packages at lower levels of the dependency graph will become more expensive in programmer time
- External dependency versions are more likely to conflict
- Contributors face a higher barrier to entry to get their development environment set up (clone multiple repos, get builds running, merge upstream on multiple repos, ...)