escapewindow/chainOfTrustDependencyTraversal.md

## chainOfTrustDependencyTraversal.md

      
    Raw
  

              chainOfTrustDependencyTraversal.md
            
          
    Tl;dr: we're aiming for the task.extra.chainOfTrust.inputs model below.  This is a writeup of the goals, choices, and decision.
chain of trust dependency traversal

For Chain of Trust verification, we need to be able to trace the graph back to the tree or trusted task (e.g., the signing task).  To do that, it makes sense to explicitly add information to the graph at decision task time, rather than making scriptworker query the entire upstream graph and guess which tasks need validation.
The tasks that need validation are the tasks that can modify inputs.  For a build signing task, the build, the docker-image task, and the decision task(s) can modify the inputs, and mere graph traversal can work.  For the balrog submitter or google play submitter, which live at the end of the graph, we [initially] need to be able to ignore tests and other tasks that tell us about the inputs, but don't modify them.  At some point we may decide to inspect and validate the entire graph before scriptworker tasks can proceed, but not during phase 1.
task.dependencies model

In the task.dependencies model, we add all of the tasks we need to validate into the task.dependencies list, possibly excluding the decision task (we can guess the decision taskId from the taskGroupId).
We have download_artifacts() which currently limits valid artifact downloads to artifacts from dependent tasks.  This check is to avoid arbitrary links to evil.com or randomTaskId.  This check works well with the task.dependencies model.
plusses


already written.  I have a tested patch to add the docker-image task to the signing task's dependencies, and we can already add the decision task to the list by referencing the taskGroupId.  And download_artifacts() already can limit downloads to those taskIds.
explicitly state dependencies.  Rather than relying on implicit dependencies, we're explicitly saying that we rely on the artifacts of these tasks in order to sign.  Also, this is a second, explicit check to make sure all the tasks in that list finish successfully before we even attempt to trigger the signing task.
ease of querying: we don't have to make a bunch of taskcluster queries to know which tasks to start downloading from; it's in our task definition.
we have an idea of how many tasks a scriptworker validation step is going to have to inspect, at graph creation time, as opposed to a dynamically populated list that can grow huge without anyone noticing until it times out.
moves the logic of determining which tasks should be validated to humans writing the graph, rather than scriptworker inferring from task definitions and workerTypes

minuses


there's an upper limit to the number of dependencies a task can have (100).  I only see scriptworker tasks as having a handful of dependencies, but it's possible that can increase later.
there's a cost to each task's dependency, server-side.
in the discussion, it appears we may have to trace the docker-image task back to the tree, meaning we may need to add the docker-image's decision task to the list of valid taskIds.
there is no label, just a list of taskIds; we have to query before we know what each task is.
if we want to visualize the graph with dot, it's cleaner with a simple dependency tree.  I don't think we should optimize for dot, since we could do another normalize pass to create the simple dependency tree, but mentioning because this came up in the meeting.

task.extra.chainOfTrust.inputs model

In the task.extra.chainOfTrust.inputs model, we add inputs of each task that need validating in task.extra.chainOfTrust.inputs.  These can be labeled.  For instance,
12:59 <dustin> so the build task.extra.chainOfTrust.inputs would be [{"docker-image": "09328502938502"}]
13:00 <dustin> and then verficiation would entail checking that the task indeed depends on taskId 09328502938502
13:00 <dustin> and then validating taskId 09328502938502 as a docker-image-build task
...
13:03 <aki> i can hardcode what keys i expect in task.extra.chainOfTrust.inputs as well, per type of job, i think
...
13:03 <dustin> and one of the verification steps is that if it's mentioned in task.extra.chainOfTrust.inputs, it should be in task.dependencies too

The signing task would see the build task as a dependency, look up its task definition, find and follow the inputs, and add the appropriate decision task(s) to the list of tasks to validate as well.
plusses


explicitly state dependencies in a different way.  task.extra.chainOfTrust.inputs can have labels along with the taskIds, which means we don't have to guess what role each task played (is abcd1234 the decision task, or the docker-image task?).  however, we do have to traverse the graph before we can find all upstream dependencies.
no hardcoded upper limit to the number of CoT dependencies a task can have.  The limit is the amount of validation time the scriptworker can take before performing its task, without hitting scaling or timeout issues.
able to deal with much more complex graphs, since the logic is in scriptworker rather than a human's head.
this allows us to put a bunch of other stuff in task.dependencies, like tests, for graph purposes.  if they're not in task.extra.chainOfTrust.inputs, they won't add noise to the chainOfTrust validation.
dot optimized!

minuses


not written yet: we'll need to populate task.extra.chainOfTrust.inputs in all appropriate tasks, and have download_artifacts allow all of the appropriate taskIds in valid artifact urls, then add the graph traversal logic and validation before we start validating each task.
extra complexity in querying and building the chain.  Rather than laying this all out in the scriptworker tasks' definitions at decision time, we have to rebuild it in every scriptworker validation step.  These may appear dozens of times in a single graph, each which may be retried multiple times; that may eventually be true for every single CI graph, making this client-side cost multiply.

we could get around this by adding the important upstream tasks into the scriptworker tasks' task.extra.chainOfTrust.inputs, since the scriptworker task will need to download artifacts from the decision, docker-image, and build tasks to validate.  This is a combination solution that I think will still get pushback since it's not a link-by-link solution.