Skip to content

Instantly share code, notes, and snippets.

@timfoster
Created May 4, 2018 16:41
Show Gist options
  • Save timfoster/8e7cd48bf39dc34922b5815cefde50c6 to your computer and use it in GitHub Desktop.
Save timfoster/8e7cd48bf39dc34922b5815cefde50c6 to your computer and use it in GitHub Desktop.
What I want from a build tool
The following are some of the attributes that I'd want to see from any
build system.
Some or all of these may be present in the build system used in Joyent
today (which I'm still trying to learn)
These thoughts are not tied to any one development model or build
flavour: they apply as much to the builds that a developer does while
iterating on a change as they do to the official builds that would
happen after an integration or on a nightly or per-release cadence.
We like transparent build processes and dislike "black boxes" where a
button is pushed and "magic" happens that developers don't understand.
Easy to learn and use
A build tool should be easy to learn, but also should not prevent users
doing things "the long way" ('make all' will always work) However,
having a build wrapper helps with:
- validation of build machine/environment/tools
- assistance in log parsing/error detection
- ease of discovery of build phases
- assistance for deployment of built bits
- archiving past builds and comparing two builds
Anything we do in a build tool should satisfy both "power users" as well
as people who've never done a build before. If the build tool gets in
the way of anyone doing productive work, then we've failed.
Adherence to a CBE ("common build environment")
A build of any component source code will have requirements about the
system it is being built on. Having a well-defined build environment,
ideally one common to as many components as possible is important.
Making sure we only build on blessed CBEs is vital, as differences in
build environments can result in build breakage, or worse, runtime bugs
in that component which may not be detected at build-time.
Changes to the CBE need to be carefully managed so as not to introduce
breakage in any component that depends on the old behaviour or any
component of that older CBE that's not present in the new CBE.
Of course, shrink-to-fit applies: a developer can build on their Mac if
the component allows, but nightly/production builds ought to always use
the official build environment. Problems introduced due a developer not
building on the CBE prior to putback may change a component's
development policies.
Fail-fast - when we blow up, do so as close as possible to the crime
scene
Digging through phantom build failures in log files to uncover the
actual reason for breakage is not acceptable. Builds should blow up
early, and make a loud noise when they do so.
Reproducible builds
Building the same source code on the same build machine should result in
the same built bits.
Deterministic builds
Related to the above, building the same software twice on a similarly
loaded build machine should produce build artifacts in ~= the same
amount of time.
Network-local
This is here partly to satisfy the above two requirements, but from
experience, build tools that rely on the network being up (whether
that's to locate build dependencies, build machines, or deposit build
artifacts) are prone to failure.
The network goes down, is slow for remote users, or can host
dependencies or build machines that change over time, possibly even
during the course of a build.
We should nail down the build environment so that a completely
disconnected user can build our software. If that involves them
maintaining a local cache of $world, so be it. This tends to also help
building in completely isolated lab environments, or when developers are
on the road, etc. By adding this requirement, we start to have more
control over the CBE and thus more likelihood of always producing the
same software.
One could imagine a build machine hosting its own imgapi or manta
instance (or even just a simple http server) where images required for
the build are pulled from, and where build artifacts are posted to,
running it's own vm with a pre-populated pkgsrc server, etc.
As a remote developer, being network-disconnected is particularly
important to me: being able to stand up a full build environment
locally, without throwing bits back and forth over the Atlantic link is
vital.
But, allow for developer conveniences
The above absolutely does NOT preclude integration with your CI of
choice! All of the work to come up with a sane local build helps when
you then start running those builds on Jenkins as well - it's just a
"local build" running on a remote machine. One could easily imagine a
build tool subcommand that submits jobs to Jenkins on your behalf.
If there are other things we can do in a build tool that makes
developers lives easier, we should absolutely do that.
For example, in the past for Solaris OS/Net builds, we had a "build
pkgserve" command that started a HTTP IPS server so developers could
install bits on systems that didn't have NFS-access to the build
machine.
Likewise, in the past, we had a phase of the build construct ISO images
containing ZFS Storage Appliance ISO and upgrade images - tasks that few
developers ever learned to do in the past were now just another
(optional) build phase.
One could imagine similar conveniences to invoke APIs to import
constructed triton images to a test machine.
Crucially though, the build wrapper is not a CI system in itself - we do
not want to replace what Jenkins does perfectly well, however a
well-written build system can make the creation of Jenkins jobs
significantly easier (as there's much less logic to implement as part of
the Jenkins job)
Versioning - at least include SCM data in build artifacts
Having git changeset information in the build artifacts and logs makes
it straightforward to determine what changes are included in this
build. Similarly, including that information in the build logs, along
with a dump of the build environment is very useful when tracking down
build failures.
Easy to read logs
This goes without saying.
Useful notifications
If the build sends notifications, it should do so concisely, showing
relevant data from the build to help quickly diagnose errors, or locate
build artifacts.
Avoid monolithic builds, allow composition ('make all' is fine if 'all:
foo bar baz')
When a build phase fails, having to restart the entire build again is
counter-productive. If there are logical build phases, we should allow
the user to invoke only that phase (I'm looking at you, nightly.sh!)
At the same time, do not attempt to manage build-phase dependency
resolution, that's what make is for, and if it's not obvious that one
phase depends on another, that's usually an indicator that the build
phases are too granular and ought to be combined.
Do as much work during the build as possible
If the build is capable of catching software problems, it should do so.
Whether that's static-analysis of code (lint, coverity, fortify, etc.)
or even simple code-style checking. I like to treat the build as the
first line of defence for code quality, and problems caught earlier are
massively cheaper to fix.
Learn from Lullaby?
Some personal history - I've tackled a problem similar to this before,
rewriting the build system used by a few hundred Solaris developers.
Changing the tools that engineers are forced to use on a daily basis can
be disruptive, and there was some initial resistance to the idea of
change, but I believe we were successful in our goals to simplify the
build and make a meaningful difference to the speed at which we were
able to develop Solaris. [ talk to robj, mgerdts or jlevon, all of whom
got to deal with the changes ]
I hope I can help to improve the lives of developers at Joyent too.
https://timsfoster.wordpress.com/2017/08/10/project-lullaby/
https://timsfoster.wordpress.com/2018/02/23/project-lullaby-build1-log-files/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment