Skip to content

Instantly share code, notes, and snippets.

@martinamps
Created November 21, 2015 16:15
Show Gist options
  • Save martinamps/c290bc01454d3c1b8c39 to your computer and use it in GitHub Desktop.
Save martinamps/c290bc01454d3c1b8c39 to your computer and use it in GitHub Desktop.
Google: The motivation for a monolithic codebase
Rachel Potvin - Engineering Manager
Started career in video game industry.
Company work on multiple games at once
One game per repo
One copy of game engine in each repo + diverge
Features would be wanted between diverged game engines, and the merge conflicts ensue.
One giant shared codebase
Google repository scale
1 Billion Files (generated source, config file, data files, documentation, includes copies into release branches, etc)
9 million source files
2 billion lines of code
35 million commits
86 terabytes of content
45k commits/day
Repository Usage
25 thousand googlers in dozens of offices around the world
15 thousand commits by humans
30 thousand commits by automated systems
800k QPS of reads at daily peak. Avg 500k
Mostly from distributed build/test tools. See Bazel.io for subset.
Perspective
Linux Kernel repo
15 million LOC
40,000 files
Google repo
15 million lines in 250k files changed per week by humans
2 Billion LOC, 9 million files
Google systems and workflows
Sync user workspace to repo ( Basically a fork)
Write Code
Code review
Commit
All code is reviewed before commit by humans and tooling
Each directory has a set of owners who must approve the change tot heir area of the repo
Tests and automated checks are performed before and after commit
Auto-rollback of a commit may occur in the case of widespread breakage
Google has a tree structure showing owners which must give approval before a commit passes.
Piper
Stores a single, large repo
Implemented on top of standard google infra replicated over 10 data centers worldwide
CitC
Cloud based storage backend and a local file system view
users see local changes overlaid on top of the full piper source tree
Users can navigate and edit files across the entire codebase
Supports regular tooling on local machines as it (sounds like) it essentially is NFS.
All writes are saved as CitC snapshots to make rollbacks easily, tooling works from snapshots.
Only modified files are stored in their workspace, but CitC allows you to see the entire codebase seamlessly
Tools
Critique
Code review
CodeSearch
Code browsing, exploration, understanding and archeology
Tricorder
Static analysis of code surfaced in Critique, CodeSearch. Code quality, test results, etc.
Can offer suggestions for fixes to common errors with one click acceptance
Presubmits
Customizable checks, testing, can block commit
TAP
Comprehensive testing before and after commit, auto-rollback
Allows teams to defend code against breaking changes from others
Rosie
Large scale change distribution and management
After teams make changes, tests happen and Rosie automatically submits a PR equivalent
Google does trunk based development
Combined with a centralizedrepo that defines the monolithic model
Piper users work at “head”, a conssitent view of the codebase.
Commits are immediately visible and usable by other engineers
Branching is incredibly rare
Avoids painful merges
Branches are used for releases - snapshot of trunk with optional cherrypicks.
Simple conditionals can mean different versions of code is executed in production.
Advantages of a monolithic repository
Unified versioning
Single source of truth
No confusion about which is the authoritative version of a file
No forking of shared libraries
No painful cross-repo merging of copied code
No artificial boundaries between teams/projects
Supports gradual refactoring and reorganization of codebase
Changes to base libraries are instantly propagated through the dependency chain, greatly simplifying dependency management
No broken dependencies downstream (e.g. if D depends on B and C which depends on A and all are differing versions)
Entire history of project remains intact and browsable
Extensive code sharing and reuse
simplified dependency management
atomic changes
Make large, backwards incompatible changes easily
Change hundreds/thousands of files in a single consistent operation
Rename a class or function in a single commit, with no broken builds or tests
large scale refactoring
codebase modernization
Single view of the codebase facilities clean-up, modernization efforts
Can be centrally managed by dedicated specialists
e.g. updating the codebase to make use of c++11 features
Monolithic codebase captures all dependency information
Old APIs can be removed with confidence
Software errors or design mistakes can be found and fixed across the entire codebase and coupled with new compiler warnings or presubmit checks
collaboration across teams
flexible team boundaries and code ownership
code visibility and clear structure providing implicit team namespacing
easier to reason about relationship between code
Costs associated with this model
Tooling investments are valuable but can be costly
Development to scale tools
Cost of execution of computationally intensive tools (e.g. builds)
Codebase complexity is a risk to productivity
encourages tons of sharing and reuse
Very easy to add dependencies
Un-necessary dependencies increase:
exposure to build breakage
binary sizes
costs for building/testing and maintenance
Code health must be a priority
Tools have been built to:
Find and remove unused/underused dependencies and dead code
Support large-scale clean-ups and refactoring
Google introduced API visibility, with default set to “private”
APIs must explicitly be set as appropriate for use
APIs can be marked as deprecated
Lesson learned: Add these early to encourage sane/hygienic dependency structures.
Conclusions
Monolithic codebase != monolithic software design
Monolithic model of source management works well when coupled with an engineering culture of transparency and collaboration
Google has invested heavily in scalability and productivity tooling to support this model, due to the significant advantages it provides
This may or may not be the right approach for all companies
Google has shown this model can scale to a repo with 1 bn files and 35mm commits, and thousands of users around the globe
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment