arschles/monorepo.md

## monorepo.md

      
    Raw
  

              monorepo.md
            
          
    I think we should have all our code in a monolithic repository.
I've detailed the big benefits to having one, addressed possible issues to having one, and mentioned a few risks to not moving to a monorepo below.
Benefits To Adopting a Monolithic Repo

Golang package dependencies

Single vendor/ dir at the top level of deis/deis
All internal packages use the same external dependencies
Internal package dependencies need not be vendored (for example, no need to vendor deis/pkg as we do now in many repos now)

Search & refactoring

Search all Deis code at once on your filesystem
Use the go toolchain on the entire Deis codebase at once (e.g. oracle, vet, ...). Same goes for other tooling
Refactor a low level dependency (e.g. deis/pkg and fix all upstream code in a single PR. related to #2 and #3 in the previous section. Effectively, you get 'atomic' changes across the whole suite.
If an RPC interface changes, modify all dependencies at once in a single PR

Builds
There is one and only one place to see the health of all Deis software
Community health
Any contributor or prospective contributor:

Can look at one repository to gain context for how their contribution fits into the entire Deis ecosystem
Only needs to set up one development environment on a single repository

Issues When Operating a Monolithic Repo

We need to determine what component(s) each commit or PR modifies, each GH issue refers to

Commit messages and PR titles should include the component(s) that are being modified. I suggest prefixing the title with brackets that contain a comma separated list. For example: [builder,runner]. [all] is also valid.
Issues and pull requests should be labelled with the component(s) that are being modified, so it's easy for maintainers to filter toward their area of expertise

We need to isolate running components

Code can be tightly coupled but each component may still be loosely coupled. We already tightly couple to deis/pkg by vendoring it
Our tests should verify that each service satisfies its component-level contract, even when underlying dependencies change. We already do this too

We need to build independent artifacts

Our goal is to be able to release each component separately, without having to rev everything at once.
We may achieve #1 by using Github tags that include the package names of the components that are being updated. For example, v2.1.1-builder indicates that we are releasing version 2.1.1 of builder
We already encode the semantic version into each component's codebase, and that will not change
We may write a simple shell script which parses the tag name for (1) the component(s) that should be released and (2) the new version number of the component(s), and then builds and releases those new components.
This mechanism effectively namespaces Git tags on the mono repo, and still allows us to release individual components independently

We need to lock down privileges for a single component
We can't do this in a monolithic repo. We should enforce a rule that any contributor who we can't or won't trust to contribute to one of more components may not contribute directly to the monorepo.
Rebasing or merging master will be a nightmare!
All else being equal, a single monorepo will always move faster than multiple, smaller repos. However, rebase or merge conflicts carry the benefit of ensuring a pristine build across all Deis components.
See the below example scenario that illustrates the benefits of rebasing on a monolithic repo over multiple independent repos.
Timeline:

Contributor A branches on deis/minio (which has a dependency on deis/pkg) and starts work
Contributor B merges code to master (via the standard PR process) in deis/pkg that causes a rebase conflict for contributor A

Resolving the conflict on multiple repositories
Assuming A has a vendored deis/pkg in deis/minio, they don't have to resolve any conflicts. However, the conflict is deferred until the vendored package is updated. Later, a contributor C (who may be A, two weeks from now) will have cause to update the deis/pkg version, which will pull in a large number of changes at once, including the breaking one. Effectively we've batched and deferred the changes and conflicts, and assigned a third party to apply and resolve them later.
Resolving the conflict on a single repository
The breaking change is exposed immediately to A when the rebase conflict occurs (this may also be later in time). This process has the following benefits:

Contributor A wrote the code, found the conflict, and will fix it. Most of the time, those three events will all happen within days at most. If contributor B needs to get involved, they will be more likely to remember details of their change
Even if the rebase produced no conflicts, it happened soon after changes were made to master, and contributor A has a shorter changelist to look at. This fact is especially useful if tests begin failing after the rebase.

CI Builds will take forever!
Any change to a mono-repo will trigger the Github webhook and will start a Travis build. If build/test cycles in CI get prohibitively slow, we can write smarter logic to execute go build, go test and other applicable commands on portions of the tree that have changed. Google's Blaze build system and Twitter's Pants build system both do this on repositories that are orders of magnitude larger in size.
Risks of keeping independent repos

As code velocity increases (inevitable for OSS):

Each project will have a correspondingly harder time managing its dependency versions
Maintaining and improving packages at lower levels of the dependency graph will become more expensive in programmer time
External dependency versions are more likely to conflict
Contributors face a higher barrier to entry to get their development environment set up (clone multiple repos, get builds running, merge upstream on multiple repos, ...)