timfoster/build-goals.txt Secret

## build-goals.txt
What I want from a build tool

The following are some of the attributes that I'd want to see from any
build system.

Some or all of these may be present in the build system used in Joyent
today (which I'm still trying to learn)

These thoughts are not tied to any one development model or build
flavour: they apply as much to the builds that a developer does while
iterating on a change as they do to the official builds that would
happen after an integration or on a nightly or per-release cadence.

We like transparent build processes and dislike "black boxes" where a
button is pushed and "magic" happens that developers don't understand.

Easy to learn and use

A build tool should be easy to learn, but also should not prevent users
doing things "the long way" ('make all' will always work) However,
having a build wrapper helps with:

 - validation of build machine/environment/tools
 - assistance in log parsing/error detection
 - ease of discovery of build phases
 - assistance for deployment of built bits
 - archiving past builds and comparing two builds

Anything we do in a build tool should satisfy both "power users" as well
as people who've never done a build before. If the build tool gets in
the way of anyone doing productive work, then we've failed.


Adherence to a CBE ("common build environment")

A build of any component source code will have requirements about the
system it is being built on. Having a well-defined build environment,
ideally one common to as many components as possible is important.
Making sure we only build on blessed CBEs is vital, as differences in
build environments can result in build breakage, or worse, runtime bugs
in that component which may not be detected at build-time.

Changes to the CBE need to be carefully managed so as not to introduce
breakage in any component that depends on the old behaviour or any
component of that older CBE that's not present in the new CBE.

Of course, shrink-to-fit applies: a developer can build on their Mac if
the component allows, but nightly/production builds ought to always use
the official build environment. Problems introduced due a developer not
building on the CBE prior to putback may change a component's
development policies.


Fail-fast - when we blow up, do so as close as possible to the crime
scene

Digging through phantom build failures in log files to uncover the
actual reason for breakage is not acceptable. Builds should blow up
early, and make a loud noise when they do so.


Reproducible builds

Building the same source code on the same build machine should result in
the same built bits.


Deterministic builds

Related to the above, building the same software twice on a similarly
loaded build machine should produce build artifacts in ~= the same
amount of time.


Network-local

This is here partly to satisfy the above two requirements, but from
experience, build tools that rely on the network being up (whether
that's to locate build dependencies, build machines, or deposit build
artifacts) are prone to failure.

The network goes down, is slow for remote users, or can host
dependencies or build machines that change over time, possibly even
during the course of a build.

We should nail down the build environment so that a completely
disconnected user can build our software. If that involves them
maintaining a local cache of $world, so be it. This tends to also help
building in completely isolated lab environments, or when developers are
on the road, etc. By adding this requirement, we start to have more
control over the CBE and thus more likelihood of always producing the
same software.

One could imagine a build machine hosting its own imgapi or manta
instance (or even just a simple http server) where images required for
the build are pulled from, and where build artifacts are posted to,
running it's own vm with a pre-populated pkgsrc server, etc.

As a remote developer, being network-disconnected is particularly
important to me: being able to stand up a full build environment
locally, without throwing bits back and forth over the Atlantic link is
vital.


But, allow for developer conveniences

The above absolutely does NOT preclude integration with your CI of
choice! All of the work to come up with a sane local build helps when
you then start running those builds on Jenkins as well - it's just a
"local build" running on a remote machine. One could easily imagine a
build tool subcommand that submits  jobs to Jenkins on your behalf.

If there are other things we can do in a build tool that makes
developers lives easier, we should absolutely do that.

For example, in the past for Solaris OS/Net builds, we had a "build
pkgserve" command that started a HTTP IPS server so developers could
install bits on systems that didn't have NFS-access to the build
machine.

Likewise, in the past, we had a phase of the build construct ISO images
containing ZFS Storage Appliance ISO and upgrade images - tasks that few
developers ever learned to do in the past were now just another
(optional) build phase.

One could imagine similar conveniences to invoke APIs to import
constructed triton images to a test machine.

Crucially though, the build wrapper is not a CI system in itself - we do
not want to replace what Jenkins does perfectly well, however a
well-written build system can make the creation of Jenkins jobs
significantly easier (as there's much less logic to implement as part of
the Jenkins job)


Versioning - at least include SCM data in build artifacts

Having git changeset information in the build artifacts and logs makes
it straightforward  to determine what changes are included in this
build. Similarly, including that information in the build logs, along
with a dump of the build environment is very useful when tracking down
build failures.

Easy to read logs

This goes without saying.


Useful notifications

If the build sends notifications, it should do so concisely, showing
relevant data from the build to help quickly diagnose errors, or locate
build artifacts.


Avoid monolithic builds, allow composition ('make all' is fine if 'all:
foo bar baz')

When a build phase fails, having to restart the entire build again is
counter-productive. If there are logical build phases, we should allow
the user to invoke only that phase (I'm looking at you, nightly.sh!)

At the same time, do not attempt to manage build-phase dependency
resolution, that's what make is for, and if it's not obvious that one
phase depends on another, that's usually an indicator that the build
phases are too granular and ought to be combined.


Do as much work during the build as possible

If the build is capable of catching software problems, it should do so.
Whether that's static-analysis of code (lint, coverity, fortify, etc.)
or even simple code-style checking. I like to treat the build as the
first line of defence for code quality, and problems caught earlier are
massively cheaper to fix.


Learn from Lullaby?

Some personal history - I've tackled a problem similar to this before,
rewriting the build system used by a few hundred Solaris developers.

Changing the tools that engineers are forced to use on a daily basis can
be disruptive, and there was some initial resistance to the idea of
change, but I believe we were successful in our goals to simplify the
build and make a meaningful difference to the speed at which we were
able to develop Solaris. [ talk to robj, mgerdts or jlevon, all of whom
got to deal with the changes ]

I hope I can help to improve the lives of developers at Joyent too.

 https://timsfoster.wordpress.com/2017/08/10/project-lullaby/
 https://timsfoster.wordpress.com/2018/02/23/project-lullaby-build1-log-files/
	What I want from a build tool

	The following are some of the attributes that I'd want to see from any
	build system.

	Some or all of these may be present in the build system used in Joyent
	today (which I'm still trying to learn)

	These thoughts are not tied to any one development model or build
	flavour: they apply as much to the builds that a developer does while
	iterating on a change as they do to the official builds that would
	happen after an integration or on a nightly or per-release cadence.

	We like transparent build processes and dislike "black boxes" where a
	button is pushed and "magic" happens that developers don't understand.

	Easy to learn and use

	A build tool should be easy to learn, but also should not prevent users
	doing things "the long way" ('make all' will always work) However,
	having a build wrapper helps with:

	- validation of build machine/environment/tools
	- assistance in log parsing/error detection
	- ease of discovery of build phases
	- assistance for deployment of built bits
	- archiving past builds and comparing two builds

	Anything we do in a build tool should satisfy both "power users" as well
	as people who've never done a build before. If the build tool gets in
	the way of anyone doing productive work, then we've failed.


	Adherence to a CBE ("common build environment")

	A build of any component source code will have requirements about the
	system it is being built on. Having a well-defined build environment,
	ideally one common to as many components as possible is important.
	Making sure we only build on blessed CBEs is vital, as differences in
	build environments can result in build breakage, or worse, runtime bugs
	in that component which may not be detected at build-time.

	Changes to the CBE need to be carefully managed so as not to introduce
	breakage in any component that depends on the old behaviour or any
	component of that older CBE that's not present in the new CBE.

	Of course, shrink-to-fit applies: a developer can build on their Mac if
	the component allows, but nightly/production builds ought to always use
	the official build environment. Problems introduced due a developer not
	building on the CBE prior to putback may change a component's
	development policies.


	Fail-fast - when we blow up, do so as close as possible to the crime
	scene

	Digging through phantom build failures in log files to uncover the
	actual reason for breakage is not acceptable. Builds should blow up
	early, and make a loud noise when they do so.


	Reproducible builds

	Building the same source code on the same build machine should result in
	the same built bits.


	Deterministic builds

	Related to the above, building the same software twice on a similarly
	loaded build machine should produce build artifacts in ~= the same
	amount of time.


	Network-local

	This is here partly to satisfy the above two requirements, but from
	experience, build tools that rely on the network being up (whether
	that's to locate build dependencies, build machines, or deposit build
	artifacts) are prone to failure.

	The network goes down, is slow for remote users, or can host
	dependencies or build machines that change over time, possibly even
	during the course of a build.

	We should nail down the build environment so that a completely
	disconnected user can build our software. If that involves them
	maintaining a local cache of $world, so be it. This tends to also help
	building in completely isolated lab environments, or when developers are
	on the road, etc. By adding this requirement, we start to have more
	control over the CBE and thus more likelihood of always producing the
	same software.

	One could imagine a build machine hosting its own imgapi or manta
	instance (or even just a simple http server) where images required for
	the build are pulled from, and where build artifacts are posted to,
	running it's own vm with a pre-populated pkgsrc server, etc.

	As a remote developer, being network-disconnected is particularly
	important to me: being able to stand up a full build environment
	locally, without throwing bits back and forth over the Atlantic link is
	vital.


	But, allow for developer conveniences

	The above absolutely does NOT preclude integration with your CI of
	choice! All of the work to come up with a sane local build helps when
	you then start running those builds on Jenkins as well - it's just a
	"local build" running on a remote machine. One could easily imagine a
	build tool subcommand that submits jobs to Jenkins on your behalf.

	If there are other things we can do in a build tool that makes
	developers lives easier, we should absolutely do that.

	For example, in the past for Solaris OS/Net builds, we had a "build
	pkgserve" command that started a HTTP IPS server so developers could
	install bits on systems that didn't have NFS-access to the build
	machine.

	Likewise, in the past, we had a phase of the build construct ISO images
	containing ZFS Storage Appliance ISO and upgrade images - tasks that few
	developers ever learned to do in the past were now just another
	(optional) build phase.

	One could imagine similar conveniences to invoke APIs to import
	constructed triton images to a test machine.

	Crucially though, the build wrapper is not a CI system in itself - we do
	not want to replace what Jenkins does perfectly well, however a
	well-written build system can make the creation of Jenkins jobs
	significantly easier (as there's much less logic to implement as part of
	the Jenkins job)


	Versioning - at least include SCM data in build artifacts

	Having git changeset information in the build artifacts and logs makes
	it straightforward to determine what changes are included in this
	build. Similarly, including that information in the build logs, along
	with a dump of the build environment is very useful when tracking down
	build failures.

	Easy to read logs

	This goes without saying.


	Useful notifications

	If the build sends notifications, it should do so concisely, showing
	relevant data from the build to help quickly diagnose errors, or locate
	build artifacts.


	Avoid monolithic builds, allow composition ('make all' is fine if 'all:
	foo bar baz')

	When a build phase fails, having to restart the entire build again is
	counter-productive. If there are logical build phases, we should allow
	the user to invoke only that phase (I'm looking at you, nightly.sh!)

	At the same time, do not attempt to manage build-phase dependency
	resolution, that's what make is for, and if it's not obvious that one
	phase depends on another, that's usually an indicator that the build
	phases are too granular and ought to be combined.


	Do as much work during the build as possible

	If the build is capable of catching software problems, it should do so.
	Whether that's static-analysis of code (lint, coverity, fortify, etc.)
	or even simple code-style checking. I like to treat the build as the
	first line of defence for code quality, and problems caught earlier are
	massively cheaper to fix.


	Learn from Lullaby?

	Some personal history - I've tackled a problem similar to this before,
	rewriting the build system used by a few hundred Solaris developers.

	Changing the tools that engineers are forced to use on a daily basis can
	be disruptive, and there was some initial resistance to the idea of
	change, but I believe we were successful in our goals to simplify the
	build and make a meaningful difference to the speed at which we were
	able to develop Solaris. [ talk to robj, mgerdts or jlevon, all of whom
	got to deal with the changes ]

	I hope I can help to improve the lives of developers at Joyent too.

	https://timsfoster.wordpress.com/2017/08/10/project-lullaby/
	https://timsfoster.wordpress.com/2018/02/23/project-lullaby-build1-log-files/