dwilliamson/BuildSystem.txt

## BuildSystem.txt
Assuming we're talking about asset builds and to an
extent, the final build itself the best working
solution I've implemented in the past was based on
SCons. Sorry for the verbosity but it's lunch time and
I'm bored :)
My reasons for choosing it were simple:
- It's Python. Very easy to understand and change. The
language itself allows you to perform non-trivial
tasks such as parse an XML tree.
- I wanted to share the responsibility of maintainence
and extension with my programmers and didn't want to
subject them to some obscure syntax/system.
- Even though compared to systems like Jam it's pretty
slow, the time is completely absorbed by building the
assets themselves and becomes a non-issue.
The asset system was unified, meaning: every type of
asset in the game went through this one dependency
based build process. The backend was a single
executable that had plugins responsible for converting
from one type of data to another, for example:
- Mesh XML to platform specific mesh format.
- Collision XML to compressed/optimised collision
format.
- Texture binary to platform specific texture format.
- Animation binary to compressed binary.
There were also various other plugins that generated
extra data from input files, such as:
- Generating a light grid from a prelit scene.
- Building spatial subdivision data for an input
scene.
Exporters for each package (e.g. MAX) were written to
export to the data the plugins understood. All fine
and simple but in the hands of content developers,
unwieldy and highly error prone (export mesh, run
converter; export collision, run converter - which
one, again???). We could have chained these from our
content packages but then our build process becomes
heavily restrictive and scattered.
So in steps SCons where you can specify "builders" for
each asset type. The build process itself is written
in Python and you can easily chain OS calls. You have
the following tools at your disposal:
- If the MD5 of a source file changes, build target
(when profiled, this actually made no performance
impact when specified over timestamp checks - saved an
awful lot of builds).
- If a target file is deleted, rebuild it from its
source.
- Allow you to specify multiple targets.
- Targets can be generated as a result of scanning the
input file for "outputs." (e.g. scan mesh for
textures)
- Chain dependencies together: make the output of one
build process the input of another.
- Full asset clean (removes all targets) and rebuild.
I built a whole series of build scripts using Scons,
the result of which was a single file you called on
any directory to turn source assets into target
assets. Because of SCons depedency evaluation, you
were always ensured a minimal, accurate build. The
directories would be recursively evaluated so you
could run it from parent folders if you liked. The
chaining of builds was especially useful.
Dependencies could be anything, too. One I added late
in the process was build tool dependency. If the
programmers checked in a new conversion plugin, it
would force a rebuild of all assets that the plugin
was responsible for generating.
Finally the build process was easy to understand and
fast (you just hit build using a P4V custom tool).
Being able to chain the build process without having
to educate a team of content-developers on how to work
with the new process was also valuable (even moreso
considering we had varying platforms). The next
problem was getting this generated data to the server
(the content-developers were responsible for managing
the source). The solution I chose was to not send it
at all - trusting lots of variable-state clients for
your final build is a very bad idea.
I wrote a daemon for P4 that would sit waiting for
checkins. It would scan input directories, reduce them
to a minimal build set and simply trigger the build on
the server using all the latest tools. This data could
always be trusted. To prevent content developers
checking in target files I wrote a submit trigger that
rejected them (some artists would now and again just
checkout/submit entire directories when they got
lazy).
The final steps were logging and build process
modulation via input properties. Each plugin output a
detailed log of what was going on, with clearly marked
errors and warnings that could be scanned by the
Python build process. These were spit to XML files
which were than displayed in a live javascript-based
reporting page. Developers just logged in to the
website where they could browse the asset tree to see
the state of the server build, and if they'd broken
anything (e.g. committing a mesh with lots of errors).
(I really should have sent an email to them, too)
Build modulation was done with simple XML files which
had default settings for each asset type that you
could "inherit" from other settings and ultimately
override per file. These were scanned by the Python
code to send build parameters to each plugin (having a
translation stage between the XML property and the
final command-line parameter was very useful). Editing
was only done by the programmers since it was rare
that you needed to modulate more than the default set
of parameters.
At the time this was deployed on two projects and
scaled very well, despite the two projects using
different tools to manage their assets. However, there
are many ways to improve on what was already pretty
fun to work with (throwing a live DB into the mix
brings all sorts of time-saving possibilities).
I've used SCons in the past for building IDL files
(Visual Studio -- any version -- is a woefully
inadequate build solution). It would scan the input
.vcproj file for potential build sources but I'm not
sure I'd do it again. It was nice to use but with
source files, the startup time of your build solution
really does matter. Something like Jam might be better
in this scenario.
Then there's the final packaged build. We just had a
few batch files for each platform that stuck all our
target files in a pack file for final master. We had
the tools to optimise this process much, much further,
but no time or people to do it (and that said, it
worked very well anyway!).
- Don


Yeah I understand where you're coming from - a couple of years ago I probably would have been all idealistic about it but these days I'm all to aware of the immovable managers or managers that like to jump onto the latest buzz word (I actually used to be one of the latter).
But it's quite simple, there's no "secret" to it - you have most of it in place yourself. The entire thing was based on scons and was done from a main build server:
Somebody checks-in updated source assets.
Build server wakes up and inspects changelist.
The set of input files are reduced to a number of shared paths (I didn't want to identify the most base shared path for obvious reasons).
Scons is run on these paths.
The output build files are then checked-in for everybody to sync locally.
IIRC it also checked-in the local scons files so that building this stuff locally would simply be a no-op. Working locally would go like this:
Check-out source asset(s) and modify.
Identify a directory below what you were running (could be as general as "levels").
Press "build" and scons is run locally (nothing is checked in - I didn't want to trust the client machines for final data - *cough*Unreal*cough*).
So the server would know what was changed and the user would require at least a little knowledge of what type of asset they changed (characters, levels, etc). Both would require an initial "startup" build if you start from nothing but source assets but this was very rarely done, especially for the client machines as all the build output was checked-in for them. I think the entire thing was 80k of python source (scons excluded). This could have been helped for the user by using the NTFS journal API but I didn't have the time for that (and it was real quick anyway so there was no need). That said, the server-side code could also have used the journal API and we could have reused the same code client-side but I didn't think of that at the time. It's worth noting that the build code was used unchanged on both server/client - the only difference being the source would checkin at the end.
The dependencies were sometimes pretty complex and I kept hitting limitations with scons (sometimes it would "break" when doing large complex builds - can't remember the specifics exactly). We'd have an entire build pipeline that was hard to setup in scons (5 levels deep, different dependency paths depending on platform) so more recently I did something far simpler: ditched scons and did all dependency management myself in Python. Just store a big dictionary with input filename mapping to a list of dependencies and pickle it.
I have the chance to work on one of these again soon, which I'm really looking forward to.
	Assuming we're talking about asset builds and to an
	extent, the final build itself the best working
	solution I've implemented in the past was based on
	SCons. Sorry for the verbosity but it's lunch time and
	I'm bored :)
	My reasons for choosing it were simple:
	- It's Python. Very easy to understand and change. The
	language itself allows you to perform non-trivial
	tasks such as parse an XML tree.
	- I wanted to share the responsibility of maintainence
	and extension with my programmers and didn't want to
	subject them to some obscure syntax/system.
	- Even though compared to systems like Jam it's pretty
	slow, the time is completely absorbed by building the
	assets themselves and becomes a non-issue.
	The asset system was unified, meaning: every type of
	asset in the game went through this one dependency
	based build process. The backend was a single
	executable that had plugins responsible for converting
	from one type of data to another, for example:
	- Mesh XML to platform specific mesh format.
	- Collision XML to compressed/optimised collision
	format.
	- Texture binary to platform specific texture format.
	- Animation binary to compressed binary.
	There were also various other plugins that generated
	extra data from input files, such as:
	- Generating a light grid from a prelit scene.
	- Building spatial subdivision data for an input
	scene.
	Exporters for each package (e.g. MAX) were written to
	export to the data the plugins understood. All fine
	and simple but in the hands of content developers,
	unwieldy and highly error prone (export mesh, run
	converter; export collision, run converter - which
	one, again???). We could have chained these from our
	content packages but then our build process becomes
	heavily restrictive and scattered.
	So in steps SCons where you can specify "builders" for
	each asset type. The build process itself is written
	in Python and you can easily chain OS calls. You have
	the following tools at your disposal:
	- If the MD5 of a source file changes, build target
	(when profiled, this actually made no performance
	impact when specified over timestamp checks - saved an
	awful lot of builds).
	- If a target file is deleted, rebuild it from its
	source.
	- Allow you to specify multiple targets.
	- Targets can be generated as a result of scanning the
	input file for "outputs." (e.g. scan mesh for
	textures)
	- Chain dependencies together: make the output of one
	build process the input of another.
	- Full asset clean (removes all targets) and rebuild.
	I built a whole series of build scripts using Scons,
	the result of which was a single file you called on
	any directory to turn source assets into target
	assets. Because of SCons depedency evaluation, you
	were always ensured a minimal, accurate build. The
	directories would be recursively evaluated so you
	could run it from parent folders if you liked. The
	chaining of builds was especially useful.
	Dependencies could be anything, too. One I added late
	in the process was build tool dependency. If the
	programmers checked in a new conversion plugin, it
	would force a rebuild of all assets that the plugin
	was responsible for generating.
	Finally the build process was easy to understand and
	fast (you just hit build using a P4V custom tool).
	Being able to chain the build process without having
	to educate a team of content-developers on how to work
	with the new process was also valuable (even moreso
	considering we had varying platforms). The next
	problem was getting this generated data to the server
	(the content-developers were responsible for managing
	the source). The solution I chose was to not send it
	at all - trusting lots of variable-state clients for
	your final build is a very bad idea.
	I wrote a daemon for P4 that would sit waiting for
	checkins. It would scan input directories, reduce them
	to a minimal build set and simply trigger the build on
	the server using all the latest tools. This data could
	always be trusted. To prevent content developers
	checking in target files I wrote a submit trigger that
	rejected them (some artists would now and again just
	checkout/submit entire directories when they got
	lazy).
	The final steps were logging and build process
	modulation via input properties. Each plugin output a
	detailed log of what was going on, with clearly marked
	errors and warnings that could be scanned by the
	Python build process. These were spit to XML files
	which were than displayed in a live javascript-based
	reporting page. Developers just logged in to the
	website where they could browse the asset tree to see
	the state of the server build, and if they'd broken
	anything (e.g. committing a mesh with lots of errors).
	(I really should have sent an email to them, too)
	Build modulation was done with simple XML files which
	had default settings for each asset type that you
	could "inherit" from other settings and ultimately
	override per file. These were scanned by the Python
	code to send build parameters to each plugin (having a
	translation stage between the XML property and the
	final command-line parameter was very useful). Editing
	was only done by the programmers since it was rare
	that you needed to modulate more than the default set
	of parameters.
	At the time this was deployed on two projects and
	scaled very well, despite the two projects using
	different tools to manage their assets. However, there
	are many ways to improve on what was already pretty
	fun to work with (throwing a live DB into the mix
	brings all sorts of time-saving possibilities).
	I've used SCons in the past for building IDL files
	(Visual Studio -- any version -- is a woefully
	inadequate build solution). It would scan the input
	.vcproj file for potential build sources but I'm not
	sure I'd do it again. It was nice to use but with
	source files, the startup time of your build solution
	really does matter. Something like Jam might be better
	in this scenario.
	Then there's the final packaged build. We just had a
	few batch files for each platform that stuck all our
	target files in a pack file for final master. We had
	the tools to optimise this process much, much further,
	but no time or people to do it (and that said, it
	worked very well anyway!).
	- Don


	Yeah I understand where you're coming from - a couple of years ago I probably would have been all idealistic about it but these days I'm all to aware of the immovable managers or managers that like to jump onto the latest buzz word (I actually used to be one of the latter).
	But it's quite simple, there's no "secret" to it - you have most of it in place yourself. The entire thing was based on scons and was done from a main build server:
	Somebody checks-in updated source assets.
	Build server wakes up and inspects changelist.
	The set of input files are reduced to a number of shared paths (I didn't want to identify the most base shared path for obvious reasons).
	Scons is run on these paths.
	The output build files are then checked-in for everybody to sync locally.
	IIRC it also checked-in the local scons files so that building this stuff locally would simply be a no-op. Working locally would go like this:
	Check-out source asset(s) and modify.
	Identify a directory below what you were running (could be as general as "levels").
	Press "build" and scons is run locally (nothing is checked in - I didn't want to trust the client machines for final data - coughUnrealcough).
	So the server would know what was changed and the user would require at least a little knowledge of what type of asset they changed (characters, levels, etc). Both would require an initial "startup" build if you start from nothing but source assets but this was very rarely done, especially for the client machines as all the build output was checked-in for them. I think the entire thing was 80k of python source (scons excluded). This could have been helped for the user by using the NTFS journal API but I didn't have the time for that (and it was real quick anyway so there was no need). That said, the server-side code could also have used the journal API and we could have reused the same code client-side but I didn't think of that at the time. It's worth noting that the build code was used unchanged on both server/client - the only difference being the source would checkin at the end.
	The dependencies were sometimes pretty complex and I kept hitting limitations with scons (sometimes it would "break" when doing large complex builds - can't remember the specifics exactly). We'd have an entire build pipeline that was hard to setup in scons (5 levels deep, different dependency paths depending on platform) so more recently I did something far simpler: ditched scons and did all dependency management myself in Python. Just store a big dictionary with input filename mapping to a list of dependencies and pickle it.
	I have the chance to work on one of these again soon, which I'm really looking forward to.