Skip to content

Instantly share code, notes, and snippets.

@matthiasr
Created August 23, 2023 20:47
Show Gist options
  • Save matthiasr/8fff19775d947def6637cb7ff9646a2b to your computer and use it in GitHub Desktop.
Save matthiasr/8fff19775d947def6637cb7ff9646a2b to your computer and use it in GitHub Desktop.
On monorepos and many repositories

With the caveat that this is based on a sample size of 2, I would lean towards one repository per deliverable thing – one app, one service, or service with worker components.

Tooling support for monorepos is not great. You can make do but you end up either running bazel or wishing you were running bazel. The usual hosted CI feels geared towards repos per use case, GitHub gets creaky with large amounts of code and change, IDEs start to struggle.

In a monorepo, it's easy to share code, but hard to keep a handle on ownership of that shared code. With separate repositories it's much more legible to management when a core piece of code is unowned and unmaintained, so it's easier to get clear funding for this.

Versioned releases of shared frameworks, and generally smaller repositories, make it a lot easier to see what's in a deploy. Consequently they make deployment and rollback a lot safer. You pay a price for that though: either you live with a huge amount of drift in those core frameworks, making it nearly impossible to roll out something like a change to how you do service discovery. Or you invest in tooling, and people power, to push those things out. You then need to make sure all these repositories are actually buildable, the tests pass, and deployment works.

$job[-1] went radical and forced every main pipeline to run, and to deploy to production, at least once a week. Teams would get alerts if that broke, and were expected to prioritize fixing that over sprint work. This also addressed a problem where a repository would not get any material changes for a while, and then every product feature that required touching it had to plan for weeks of work to get it going again, making us avoid touching it which dug the hole deeper. Forcing regular test and deploy runs made the owning teams deal with breakage as it came up. You still have the same amount of random issues, but dealing with them one at a time is much easier.

On the other hand, in a monorepo, making cross cutting code changes is easy, but keeping the whole thing deployable is even harder. There's such a high rate of change that you probably can't deploy every service on every commit, and it's hard to get everyone to depend on everyone else's tests, so you end up building micro repositories within the monorepo ("ensure you only trigger the tests/build in the affected project"). Except you have a lot more grey areas in between these projects with unclear mandate, knowledge, and responsibility.

If I were to start fresh, I would start with one repository per deployable or ownable unit. Every repository needs a team that owns it, and potentially other teams that contribute. This ownership must mean something – there's a baseline cost to each thing a team owns, and handover must come with knowledge transfer.

Some of those repositories will feel more monorepo-ish, e.g. those for mobile apps, but for a backend microservice architecture, that means one per service. If you have enough that there is value in sharing code, factor out the repeated concerns into a central framework repository. Make frequent releases of that, and have tooling that automatically brings all the repositories that use it up to date. Merge those updates on green. Follow up if they fail.

Force everything to pass all the way from code to deployment regularly, even if it's not changing. Automatically feed ambient updates, like the common framework and base containers, into that regular stream of deploys. Hold the owners accountable for keeping this stream flowing.

Explicitly assign ownership of the common framework to a team close to the infrastructure platform, there is a lot of synergy in having these closely aligned. Empower that team to go into every codebase and adapt the points of use, or force all teams to adopt changes within some SLA.

You can do all these things in a single repository, but it is too easy for something to slip. Separate repositories enforce clear boundaries of ownership, and allow you to get on a path where the complexity of each individual deployment is exactly one or zero changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment