lalunamel/gist:716de8bb16cbf1d942324fc2120931ee

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    Understanding how XCode builds

Raw notes are at the top
Human-readable blog post is in the middle
Raw Notes:

llbuild is the build system used by xcode and the swift package manager.
Understanding llbuild will allow you to understand how xcode builds your app.
Enabling debug features in xcode to make llbuild spit out logging information

Run this in your terminal:
defaults write com.apple.dt.XCBuild EnableBuildDebugging -bool YES
This will tell XCBuild to enable build debugging, which will in turn tell llbuild to enable build debugging.
You'll want to turn this off when you're done as it will produce build artifacts that take up a fair amount of disk space if you're building many times a day, every day.
Artifacts produced by EnableBuildDebugging


manifest.xcbuild

This is the build file that describes your build as a yml file with a bunch of json in it.


build.db

This is an SQLite database that holds build information. Essentially, this is the cache behind the build that enables incremental compilation. It stores cache keys related to the artifacts that were build previously which are stored in the derived data folder.


build.trace

The trace file is the log of what llbuild actually did as it executed manifest.xcbuild


manifest.xcbuild (build file)

This file describes your build.
It has a few different parts: client, target, nodes, and commands.
I've not seen client or target contain particularly useful information. nodes and commands are where it's at.
You can find the source documentation for each of these things here, but I'll cover them in my own words now.
Node

A node represents some input or output of the build process. Typically, this is a file.
Nodes can have attributes.
Misc facts about nodes

By default, if a node ends in a /, then it is considered a directory.
By default, if a node matches the pattern <.*> (where .* is the regex for "any number of characters"), like <link>, then it is considered "virtual" and is assumed to not represent a file.
If a node has the attribute is-command-timestamp, then it will be used to hold the timestamp when a command was completed. This timestamp can be used to see if a command that was run in a previous build needs to be re-run.
Command

A command is some task to be run by the build system. This can be something simple like "copy the contents of this framework somewhere else" (ProcessXCFramework), or "compile these swift files" (Debug:CompileSwiftSources).
A command has a tool, description, probably inputs and outputs and then a few other attributes depending on what tool is specified.
There are a few built in tools listed here.
Here are all the tools used by a build I just ran:
phony
mkdir
shell
stale-file-removal

auxiliary-file
copy-plist
copy-strings-file
create-build-directory
embed-swift-stdlib
file-copy
info-plist-processor
process-product-entitlements
process-xcframework
register-execution-policy-exception

Inputs and Outputs

The inputs and outputs part of commands are quite important - they're the thing that determine whether the command should be run, or if the output from a previous run of the command is sufficient and we can all save a bunch of time by skipping it. This is what's known as "incremental compilation" or "compilation avoidance".
In the parlance of llbuild, a command creates a rule, which creates a task.
commands will always be run unless the output they produce is the same from one build to the next.
rules will run if their "signature" has changed from one build to the next OR any of their inputs have different "file info" from one build to the next
Running Commands

A Shell command is the type of command that will compile your source code. Since compiling source code is probably the biggest thing you want to avoid, lets take a look at what causes Shell commands to be re-run on an incremental build.
Firstly, a Shell command is a subclass of ExternalCommand - you can find more information on when an ExternalCommand is run by looking at the source code here.
An ExternalCommand is run if:

it's deemed to be always out of date, which is an attribute that can specified on a command within the build file.
if the prior value wasn't "successful", where "successful" means that the command was 1. executed and 2. for every output specified for the command, that output's stat information is recorded
for each output of a command

if the output is "virtual", don't consider that output
if the output has different "information" than before, run the command. "information" is the result of the unix stat command and is retrieved here. It includes a file's st_dev, st_ino, st_mode, st_size, st_mtime - you can read more about what each of those mean here.


If a command is run (which is controlled by the outputs for the command), it will create a new rule. The rule may or may not be run based on its inputs.
Running Rules that are created by Commands


BuildEngine#scanRule

Check if it needs to run ("checking-rule-needs-to-run")

Does it need to run because the signature changed? ("signature-changed")

ruleInfo.rule->signature != ruleInfo.result.signature


ruleInfo.rule is from this run


ruleInfo.result is from last run

how are rules identified between runs? do they have a stable ID?

they are identified through their KeyID
a KeyID is a sequential database ID (integer) that maps to an actual key (the string representation of a rule, like "N/Users/.../GraphQLSchema.swift") through the key_names table


how are rules created?

BuildEngine#addRule
a rule is looked up from the db (SQLiteBuildDB#lookupRuleResult)

if it's found, return it
if it's not, don't do anything and allow the newly created rule to be empty


Where do these rule signatures get computed?

BuildSystem#lookupRule is where new rules get their values (including their signature)

BuildKey::Kind::Command - command->getSignature()
BuildKey::Kind::Node - node->getSignature()


Only Commands and Nodes have signatures


What's in a signature?

BuildKey::Kind::Command - command->getSignature()

combine the name of the command, the names of all the inputs to the command, the names of all the outputs, whether or not the command allows missing inputs, whether or not the command allows modified outputs, and whether or not the command is always out of date


BuildKey::Kind::Node - node->getSignature()

combine the type of the node with the names of all the producers for the node

what is the type of a node?
BuildNode.h - Plain, Directory, DirectoryStructure, Virtual
what's a producer for a node?

a producer is a Command that can produce a node


So based on the above, the signature is not going to change when you, e.g. change the contents of a file. It will change when you change the name of a file, or a file. It's more of a structural thing that will only ever change when files or dependencies are added or removed.


Does it need to run because it's "invalid"? ("invalid-value")

How does this differ from the signature check above?

The signature is computed based on file names, compiler arguments, and the like
Whether or not a rule is valid has to do with the contents or modification times of the files involved have changed since last time


BuildKey::Kind::Command - CommandTask::isResultValid(engine, *command, BuildValue::fromData(value))

Delegate to the command - command.isResultValid(buildSystem.getBuildSystem(), value)

See above for this chain of events - checking if a command is valid is essentially checking if the outputs are all there


BuildKey::Kind::Node - FileInputNodeTask::isResultValid(engine, *node, BuildValue::fromData(value))

BuildSystem.cpp#FileInputNodeTask#isResultValid

compares "value" to current file "info"

value

is stored in the db
where's it created?

anywhere getFileInfo is called


info

BuildInfo.cpp#BuildNode::getFileInfo

FileSystem.cpp#FileInfo getFileInfo

FileInfo.cpp#FileInfo::getInfoForPath

{
device: st_dev (device the file's inode resides on),
inode: st_ino (inode number),
mode: st_mode (inode protection mode),
size: st_size (size, in bytes, of the file),
modTime.seconds: st_mtim.tv_sec (time of last file modification in seconds),
modTime.nanoseconds: st_mtim.tv_nsec (time of last file modification in nanoseconds),
}


Does it need to run because the input has been rebuilt? ("input-rebuilt")

The system that enqueues the rules for running (BuildEngine.cpp#executeTasks) is basically a big while loop. It goes like this:

While there are new rules in the queue, pick the top one off the stack

If the rule is picked up but has not finished scanning, skip it and pick up the next rule ("rule-scanning-deferred-on-input")
Do all the checks above (signature changed, invalid value)
Check to make sure that the rule has all the inputs it needs to run

Inputs from previous builds are still valid (this is how caching/derived data works)
If it doesn't, enqueue new rules to scan all its inputs and then mark the rule as "deferred" and put it back in the queue

The trace includes "rule-scanning-deferred-on-task" when this happens
When the inputs are all scanned and built and the task is picked up again, it needs to be run and "input-rebuilt"


If it does, run it


Do any of the inputs for a rule need to be run? ("rule-does-not-need-to-run")

If nothing depends on the rule, or all the inputs for a rule have already been run, it does not need to be run.


If the rule passes all the tests above and does actually need to run, enqueue rule for scanning (aka task execution in BuildEngine.cpp#executeTasks) ("rule-scheduled-for-scanning")


Scan / Execute the rule (BuildEngine.cpp#executeTasks)

For every rule enqueued for scanning

BuildEngine.cpp#processRuleScanRequest

For every input for the rule

"Check if it needs to run" (see above). If it does, scan it.
If the rule has been scanned already and does in fact need to run, scan the input ("rule-scanning-next-input")

BuildEngine.cpp#demandRule

Assert that the rule has already been scanned
If the rule is "complete", exit
If the rule is "in progress" already, exit
If the rule doesn't actually need to run ("rule-does-not-need-to-run" from before), mark it "complete" and exit
If all of those checks don't cause an early exit, create a new task for this rule ("created-task-for-rule")

BuildSystem.cpp#createTask

the type of task created for each rule is determined in BuildSystem.cpp#BuildSystemEngineDelegate::lookupRule
the most relevant tasks for investigating build slowness are probably VirtualInputNodeTask,  DirectoryInputNodeTask, DirectoryStructureInputNodeTask, FileInputNodeTask, ProducedNodeTask


Start the task by enqueuing it into a queue called readyTaskInfos


BuildEngine.cpp#executeTasks is the thing that writes the rule results to the db with setRuleResult
BuildEngine.cpptaskIsComplete is where task results (signatures and values and computed_at) are recorded
build.trace (trace file)

All the trace keywords and what they mean


new-task - when a new task is created (BuildEngine.cpp#ruleInfo.rule->createTask)
new-rule - when a new rule is created (BuildSystem.cpp#new BuildSystemRule)
build-started - when the build is started (BuildEngine.cpp#trace->buildStarted())
handling-build-input-request - when a build input request is handled from BuildEngine.cpp#inputRequests. A "build input request" is a request made by a rule to do some work
created-task-for-rule - when a rule creates a task to do some work on its behalf
handling-task-input-request - when a build input request is handled from BuildEngine.cpp#inputRequests. A "build input request" is a request made by a task to do some work
paused-input-request-for-rule-scan - when a rule is scanned, but already marked as "pending scan", so it's skipped and not scanned twice
readying-task-input-request - when a rule's inputs are computed/completed and the work that the rule represents is enqueued
added-rule-pending-task - when a rule's inputs are not computed/completed and the work that the rule represents is attempted to be enqueued (but fails because its inputs are not ready)
completed-task-input-request - when a rule is dequeued after it's been enqueued by "readying-task-input-request"
updated-task-wait-count - when a task is no longer waiting on an input (tasks wait on all their inputs before they're run)
unblocked-task - when a task is no longer waiting on any inputs (happens right after "updated-task-wait-count")
readied-task - when a task is dequeued from readyTaskInfos queue and ready to run. The readyTaskInfos queue contains tasks that are waiting on no inputs
finished-task - when a task is dequeued from finishedTaskInfos. Tasks are placed on this queue when they are completed. A task is "changed" if its value was computed in the current build (and not pulled from a prior build).
build-ended - when the build ends
checking-rule-needs-to-run - when a rule is scanned to determine whether or not it needs to be run
rule-scheduled-for-scanning - when it is determined that a rule needs to be run and it is enqueued for processing (where its inputs are checked to make sure it's ready to run, then it's executed)
rule-scanning-next-input - while a rule is processed, when one of its inputs is retrieved and enqueued for scanning, and has been scanned already
rule-scanning-deferred-on-input - while a rule is processed, when one of its inputs is retrieved, has not been scanned, and is therefore enqueued for scanning
rule-scanning-deferred-on-task - when a rule is processed, when one if its inputs is retrieved, has been scanned already, but the task representing that input has not been completed
rule-needs-to-run, never-built - when a rule is scanned and has not been run and therefore is marked as "needs to run"
rule-needs-to-run, signature-changed - when a rule is scanned and the file associated with the rule for this run has a different signature than that of the previous cached build
rule-needs-to-run, invalid-value - when a rule is scanned and the file associated with the rule for this run has a different stat output (file modification time and other file metadata) than that of the previous cached build
rule-needs-to-run, input-missing - this is a possible trace output, but it isn't currently used anywhere
rule-needs-to-run, input-rebuilt - when the rule has been computed at a certain time, but has an input that's been computed more recently
rule-does-not-need-to-run - if the rule has no dependencies
cycle-force-rule-needs-to-run - force a rule to be run in order to break a build cycle llbuild has detected
cycle-supply-prior-value - when a rule is forced to be run in order to break a build cycle and the value from the previous build is set as the rule result

A "signature" is computed by hashing some inputs with hash_combine like in ExternalCommand#getSignature
Questions left to answer:

What's the difference between the BuildEngine and the BuildSystem?

The BuildEngine seems to be the thing doing things - enqueuing commands, rules, tasks, telling them to run, printing trace output
The BuildSystem seems to be the thing that contains the logic for all the parts - "what does this type of task do?", "how does this rule know it needs to run"?


Things left to do:

Approach this document as if I didn't know anything and I had a goal: I notice my ios build is taking too long, what can I do about it?
Maybe create an architecture diagram of how all the data moves around and where it's stored


Cleaned up blog post version of the above

Debugging XCode build performance by understanding llbuild

XCode is a powerful tool that allows developers to create amazing apps, games, and full featured applications. I've used it as a mobile developer in the past and now as a mobile infrastructure engineer. Most of the time it does its job without complaint, but sometimes it doesn't, and doesn't give much indication as to why.
I've explored the nitty gritty of how Android apps get built efficiently and here I'll document the same, but for the build system used by XCode called llbuild.
llbuild was integrated into XCode fairly recently (XCode 10, 2018) and is still sometimes referred to as "the new build system". The next version is in the works, but that's probably a long way out and will change a fair bit before it's used.
Both build systems for Android and iOS (Gradle and llbuild) operate with similar fundamental concepts. Gradle operates on "tasks" with "inputs" and "outputs". llbuild uses "commands", "rules", and "tasks" - all of which also have "inputs" and "outputs". In order for builds to operate efficiently, the inputs and outputs from a build are recorded and stored. On subsequent builds those outputs are reused if the inputs haven't changed. That's the basic idea behind incremental compilation, also called compilation avoidance.
So if a clean build takes 10 minutes, subsequent builds with no changes should hypothetically run in seconds. The power of caching!
But what happens when the dream of incremental compilation doesn't come true? Up until recently I chalked it up to "just the way things are". Today, though, lets dive in and figure out what's really going on and how we can reach the promised land of build system performance and 0 sescond incremental compilation.
Investigating a slow incremental build

Finding the slow part

Before we get into the details, lets find a part of the build to investigate. For this I recommend XCode's "build with timing summary" or the third party tool XCLogParser. A ton has already been written on how to use these tools to find slow aspects of a build, so I won't duplicate that here - a quick google will turn up some great guides.
The information we'll be working with

Now we'll get some files in front of us and explore what they do and how they're structured.
Firstly, run this in your terminal: defaults write com.apple.dt.XCBuild EnableBuildDebugging -bool YES.
That will tell XCode to enable build debugging output, which will tell llbuild to do the same. Once you're done debugging your builds, change that YES to a NO and rerun the command. The artifacts that get produced are a bit less than 100Mb per run, which can really add up if you're building all day every day.
Now, build your project with XCode. The build logs (View > Navigators > Reports) will tell you about a few new files:

manifest.xcbuild

This is the build file that describes your build as a yml file with a bunch of json in it. It's fed to the build system as a set of instructions.


build.db

This is an SQLite database that holds prior build metadata. It's one half of the cache that allows incremental compilation. The other half is the actual files stored in DerivedData.


build.trace

The trace file is the log of what llbuild actually did as it executed the instructions in manifest.xcbuild


Investigating

Now, lets look at those files and investigate what's going on. Open up the manfifest.xcbuild and build.trace in your text editor of choice (careful - they're big!). You can find the exact location of those files by looking at the build log file in the Report Navigator.
This next part will be fairly freeform - you're going to need to put on your detective hat and explore:

Start by taking a look at the build.trace and try and find any log lines related to the slow aspect of the build you're investigating.

Eventually you should find a line that mentions the framework you're interested in and whatever operation is taking a long time - maybe the string :Debug:CompileSwiftSources if you're curious about why the framework is being recompiled. Try and find the line that describes the entire slow operation you're interested in and not just one part, i.e. compiling an entire framework rather than an individual file.

After you've located the slow operation, the next question to answer is "why did this happen?"
To answer this, just employ the standard process you use every day to debug regular programs: start from the observed behavior, walk backwards to find its cause, and repeat until you find the root of the problem.

As you're working through the trace, use the handy glossary at the bottom to understand what each line means.
Eventually you'll reach a trace line that makes you say "huh, I didn't modify that file!" or "why'd that change?". What you do next will totally depend on the aspect of the build you're investigating, the cause of the slowness, and so on. I'll leave that up to you!
Wrapping Up

Assuming you've followed the steps above, you've walked backwards from the observed slow behavior and found its root cause, whether that's an errant file modification, mis-configured build settings, or something else. Hopefully you've been able to remedy that root cause and are on your way to faster, more consistent builds.
Remember that you don't have to tackle every single slowdown all at once! Any non-trivial XCode project will have multiple steps that can be improved or optimized.
Good luck, and keep working towards 0 second incremental builds in XCode!
Glossary

Anatomy of a trace line

{ "new-rule", "R7897", "N/Users/blah/foo/bar/file.json" }

The first element is the "trace keyword" - it describes what this trace line is doing
The second element is the rule ID, formatted something like R###
The third element is the rule key, and it usually looks like the path to a file. The N at the front is a marker for the build system. It might be a different capital letter sometimes.

{ "rule-scanning-next-input", "R7853", "R7854" }

The first element is the trace keyword
The second element is a rule ID for the rule that's being scanned
The third element is a rule ID for the input of the rule that's being scanned. 

Think of it this way: the third element is an input to the second, and therefore to scan the second element, all its inputs must be scanned as well - that's what's happening here.

Trace keywords


new-task - when a new task is created
new-rule - when a new rule is created
build-started - when the build is started
handling-build-input-request - when a build input request is handled from the BuildEngine. A "build input request" is a request made by a rule to do some work
created-task-for-rule - when a rule creates a task to do some work on its behalf
handling-task-input-request - when a build input request is handled from the BuildEngine. A "build input request" is a request made by a task to do some work
paused-input-request-for-rule-scan - when a rule is scanned, but already marked as "pending scan", so it's skipped and not scanned twice
readying-task-input-request - when a rule's inputs are computed/completed and the work that the rule represents is enqueued
added-rule-pending-task - when a rule's inputs are not computed/completed and the work that the rule represents is attempted to be enqueued (but fails because its inputs are not ready)
completed-task-input-request - when a rule is dequeued after it's been enqueued by "readying-task-input-request"
updated-task-wait-count - when a task is no longer waiting on an input (tasks wait on all their inputs before they're run)
unblocked-task - when a task is no longer waiting on any inputs (happens right after "updated-task-wait-count")
readied-task - when a task is dequeued from readyTaskInfos queue and ready to run. The readyTaskInfos queue contains tasks that are waiting on no inputs
finished-task - when a task is dequeued from finishedTaskInfos. Tasks are placed on this queue when they are completed. A task is "changed" if its value was computed in the current build (and not pulled from a prior build).
build-ended - when the build ends
checking-rule-needs-to-run - when a rule is scanned to determine whether or not it needs to be run
rule-scheduled-for-scanning - when it is determined that a rule needs to be run and it is enqueued for processing (where its inputs are checked to make sure it's ready to run, then it's executed)
rule-scanning-next-input - while a rule is processed, when one of its inputs is retrieved and enqueued for scanning, and has been scanned already
rule-scanning-deferred-on-input - while a rule is processed, when one of its inputs is retrieved, has not been scanned, and is therefore enqueued for scanning
rule-scanning-deferred-on-task - when a rule is processed, when one if its inputs is retrieved, has been scanned already, but the task representing that input has not been completed
rule-needs-to-run, never-built - when a rule is scanned and has not been run and therefore is marked as "needs to run"
rule-needs-to-run, signature-changed - when a rule is scanned and the file associated with the rule for this run has a different signature than that of the previous cached build
rule-needs-to-run, invalid-value - when a rule is scanned and the file associated with the rule for this run has a different stat output (file modification time and other file metadata) than that of the previous cached build
rule-needs-to-run, input-missing - this is a possible trace output, but it isn't currently used anywhere
rule-needs-to-run, input-rebuilt - when the rule has been computed at a certain time, but has an input that's been computed more recently
rule-does-not-need-to-run - if the rule has no dependencies
cycle-force-rule-needs-to-run - force a rule to be run in order to break a build cycle llbuild has detected
cycle-supply-prior-value - when a rule is forced to be run in order to break a build cycle and the value from the previous build is set as the rule result

manifest.xcbuild (build file)

This file describes your build.
It has a few different parts: client, target, nodes, and commands.
I've not seen client or target contain particularly useful information. nodes and commands are where it's at.
You can find the source documentation for each of these things here, but I'll cover them in my own words now.
Nodes

A node represents some input or output of the build process. Typically, this is a file.
Nodes can have attributes.
Sources


Gist from Daniel Dunbar (works at apple on build systems) describing how to turn on build debugging

https://gist.github.com/ddunbar/2dda0e836c855ea96759d1d05f086d69


swift-llbuild repo

https://github.com/apple/swift-llbuild


Docs on llbuild

https://llbuild.readthedocs.io/en/latest/buildsystem.html


Blog post on this whole thing

https://asifmohd.github.io/ios/2021/03/11/xcbuild-debug-info.html