Tl;dr: we're aiming for the task.extra.chainOfTrust.inputs
model below. This is a writeup of the goals, choices, and decision.
For Chain of Trust verification, we need to be able to trace the graph back to the tree or trusted task (e.g., the signing task). To do that, it makes sense to explicitly add information to the graph at decision task time, rather than making scriptworker query the entire upstream graph and guess which tasks need validation.
The tasks that need validation are the tasks that can modify inputs. For a build signing task, the build, the docker-image task, and the decision task(s) can modify the inputs, and mere graph traversal can work. For the balrog submitter or google play submitter, which live at the end of the graph, we [initially] need to be able to ignore tests and other tasks that tell us about the inputs, but don't modify them. At some point we may decide to inspect and validate the entire graph before scriptworker tasks can proceed, but not during phase 1.
In the task.dependencies
model, we add all of the tasks we need to validate into the task.dependencies
list, possibly excluding the decision task (we can guess the decision taskId
from the taskGroupId
).
We have download_artifacts()
which currently limits valid artifact downloads to artifacts from dependent tasks. This check is to avoid arbitrary links to evil.com or randomTaskId
. This check works well with the task.dependencies
model.
- already written. I have a tested patch to add the
docker-image
task to the signing task's dependencies, and we can already add the decision task to the list by referencing thetaskGroupId
. Anddownload_artifacts()
already can limit downloads to thosetaskId
s. - explicitly state dependencies. Rather than relying on implicit dependencies, we're explicitly saying that we rely on the artifacts of these tasks in order to sign. Also, this is a second, explicit check to make sure all the tasks in that list finish successfully before we even attempt to trigger the signing task.
- ease of querying: we don't have to make a bunch of taskcluster queries to know which tasks to start downloading from; it's in our task definition.
- we have an idea of how many tasks a scriptworker validation step is going to have to inspect, at graph creation time, as opposed to a dynamically populated list that can grow huge without anyone noticing until it times out.
- moves the logic of determining which tasks should be validated to humans writing the graph, rather than scriptworker inferring from task definitions and
workerType
s
- there's an upper limit to the number of dependencies a task can have (100). I only see scriptworker tasks as having a handful of dependencies, but it's possible that can increase later.
- there's a cost to each task's dependency, server-side.
- in the discussion, it appears we may have to trace the
docker-image
task back to the tree, meaning we may need to add thedocker-image
'sdecision
task to the list of validtaskId
s. - there is no label, just a list of
taskId
s; we have to query before we know what each task is. - if we want to visualize the graph with dot, it's cleaner with a simple dependency tree. I don't think we should optimize for dot, since we could do another normalize pass to create the simple dependency tree, but mentioning because this came up in the meeting.
In the task.extra.chainOfTrust.inputs
model, we add inputs of each task that need validating in task.extra.chainOfTrust.inputs
. These can be labeled. For instance,
12:59 <dustin> so the build task.extra.chainOfTrust.inputs would be [{"docker-image": "09328502938502"}]
13:00 <dustin> and then verficiation would entail checking that the task indeed depends on taskId 09328502938502
13:00 <dustin> and then validating taskId 09328502938502 as a docker-image-build task
...
13:03 <aki> i can hardcode what keys i expect in task.extra.chainOfTrust.inputs as well, per type of job, i think
...
13:03 <dustin> and one of the verification steps is that if it's mentioned in task.extra.chainOfTrust.inputs, it should be in task.dependencies too
The signing task would see the build task as a dependency, look up its task definition, find and follow the inputs, and add the appropriate decision task(s) to the list of tasks to validate as well.
- explicitly state dependencies in a different way.
task.extra.chainOfTrust.inputs
can have labels along with thetaskId
s, which means we don't have to guess what role each task played (is abcd1234 the decision task, or the docker-image task?). however, we do have to traverse the graph before we can find all upstream dependencies. - no hardcoded upper limit to the number of CoT dependencies a task can have. The limit is the amount of validation time the scriptworker can take before performing its task, without hitting scaling or timeout issues.
- able to deal with much more complex graphs, since the logic is in scriptworker rather than a human's head.
- this allows us to put a bunch of other stuff in
task.dependencies
, like tests, for graph purposes. if they're not intask.extra.chainOfTrust.inputs
, they won't add noise to thechainOfTrust
validation. - dot optimized!
- not written yet: we'll need to populate
task.extra.chainOfTrust.inputs
in all appropriate tasks, and havedownload_artifacts
allow all of the appropriatetaskId
s in valid artifact urls, then add the graph traversal logic and validation before we start validating each task. - extra complexity in querying and building the chain. Rather than laying this all out in the scriptworker tasks' definitions at decision time, we have to rebuild it in every scriptworker validation step. These may appear dozens of times in a single graph, each which may be retried multiple times; that may eventually be true for every single CI graph, making this client-side cost multiply.
- we could get around this by adding the important upstream tasks into the scriptworker tasks'
task.extra.chainOfTrust.inputs
, since the scriptworker task will need to download artifacts from the decision, docker-image, and build tasks to validate. This is a combination solution that I think will still get pushback since it's not a link-by-link solution.
- we could get around this by adding the important upstream tasks into the scriptworker tasks'