This document is an overview of how concepts from TES map to concepts in AWS Batch.
AWS Batch ("Batch") has a few basic concepts that need to be understood before we can make a comparison to concepts in TES. Some relate directly to TES and others do not.
Job
: A Job
is a unit of work executed by AWS Batch. Jobs can be executed as containerized applications via Amazon ECS in an ECS cluster. Containerized jobs can reference a container image, command, and parameters. The general structure of a Job
must be pre-defined via a JobDefinition
. You can submit a large number of independent, simple jobs. More information here.
JobDefinition
: A JobDefinition
specifies how jobs are to be run. While each Job
must reference a definition, many of the parameters that are specified in the job definition can be overridden at run time. Some of the attributes specified in a job definition include: Docker image, number of CPU's, memory, the command to run, environment variables, data volumes, and AWS permissions needed (e.g. access to particular private S3 bucket). More information here.
JobQueue
: Jobs are submitted to a JobQueue
, where they reside until they are able to be scheduled to run in a compute environment. You can have multiple job queues, for example one using On-Demand instances, and one for Spot instances. JobQueues
have a priority that is used by the scheduler to determine which jobs in which queue should be evaluated for execution first. More inforamtion here
JobState
: The current state of a submitted Job
. More information here.
ComputeEnvironment
: The underlying compute and storage resources to run jobs from a particular JobQueue
. More than one JobQueue
can be mapped to a given ComputeEnvironment
. A ComputeEnvironment
can either be managed (Batch will provision and deprovision compute resources automatically) or unmanaged (you control the underlying resources to send Jobs
to). More information here.
From the above, you can see that the closest analogues from TES are Executor
and Task
as they relate (roughly) to JobDefinition
and Job
. There is not a straight mapping, since a single TES Task is a vector of processes to be executed (Executor[]
). In AWS Batch, this niavely translates to a set of serially dependent job submissions 1.
At a high level, any TES compliant provider endpoint built on top of AWS Batch has a couple of requirements:
- It must already have configured a
ComputeEnvironment
andJobQueue
- It must have it's own registry to track previously submitted tasks, and upon discovery of a new type of
Executor
, it will likely have to create a newJobDefinition
to fulfill the full Task submission request.
There are other requirements specific to a TES API endpoint and we will cover those in turn as we cover the TES ontology tree. We will cover leaf concepts first, moving up the TES ontology to collection types afterwards.
FileType
and TaskView
have no analogue in AWS Batch. It would be up to a service provider to define how to interperet these concepts.
The State
enumeration is used in a Task.state
response, and represents the current state of a submitted task. Since a single TES Task
may contain a Executor
vector, there is a disconnect to Batch. Specifically JobState
is stored in a response to a DescribeJobs
API request within a JSON structure as jobs[].status
. The overall Task
state may be a function of that collection.
In addition to the above, AWS Batch has a different set of enumerations, with a clearly defined state transition.
+-----------+ +----------+ +----------+ +-----------+ +--------+ | SUBMITTED | --> | RUNNABLE | --> | STARTING | --> | RUNNING | --> | FAILED | +-----------+ +----------+ +----------+ +-----------+ +--------+ | ^ | | | | v | v +---------+ | +-----------+ | PENDING | ----------+ | SUCCEEDED | +---------+ +-----------+
The following table is a rough mapping from TES State
to Batch JobState
TES State | Batch JobState | Note |
---|---|---|
UNKNOWN | Possible to use for canceled jobs. | |
QUEUED | SUBMITTED, PENDING or RUNNABLE | Could be any of these. See the state transition diagram |
INITIALIZING | STARTING | |
RUNNING | RUNNING | |
PAUSED | Jobs can only be canceled or terminated in Batch | |
COMPLETE | SUCCEEDED | |
ERROR | FAILED | |
SYSTEM_ERROR | FAILED | |
CANCELED | FAILED | CancelJob only sets this when job has not progressed to STARTING or RUNNING state, otherwise job would need to be terminated using TerminateJob API call. The reason for either canceling or termincation will be in the Batch job details under statusReason |
Not applicable to AWS Batch. Amazon ECS, which Batch is built on top of, does support port mappings for a container, but Batch does not expose this service feature.
The various *Log
types in TES are spread across a few Batch types. The main difference between AWS Batch and TES is that TES is explicit about reporting where the runtime outputs would be using OutputFileLog
, while Batch leaves the output handling up to the user to manage. In Batch, the STDOUT
and STDERR
of a Job
are submitted to CloudWatch Logs. It is important to note that CloudWatch Logs have a configurable retention time, and that any TES service built on top of AWS Batch would need to account for how it wants to handle job data over the long term.
This is metadata information associated with one of the set of output files from the set of all Executor
of a Task
. Batch leaves handling of output files to the process. For example, if you had a Job that runs BWA, the output would be a BAM file. It would be up to the launched container's process to move that BAM file to some storage like S3. We will discuss this more when we discuss Task
.
As the output from an individual Job, the ExecutorLog
type has some direct analogues to a batch Job
description from status queries against the API.
EL attribute | Job attribute | Note |
---|---|---|
string start_time | jobs[].startedAt | |
string end_time | jobs[].stoppedAt | |
string stdout | Stored in Cloud Watch Logs | |
string stderr | Stored in Cloud Watch Logs | |
int32 exit_code | jobs[].exitCode | |
string host_ip | Possible that this is avialable from containerInstanceArn information |
|
repeated Ports ports | Not applicable for Batch |
As a aggragator of ExecutorLog
and OutputFileLog
, this class has no real correllary to Batch Job
and must be computed.
TaskLog attribute | AWS Batch computed value |
---|---|
repeated ExecutorLog logs[] | The set of values mapped to ExecutorLog |
map<string, string> metadata | Useful items not reported elsewhere like jobs[].statusReason |
string start_time | first Executor jobs[].startedAt |
string end_time | last Executor jobs[].stoppedAt |
repeated OutputFileLog outputs | Problematic for a lot of reasons |
A Resource
maps closest to the JobDefinition
container properties (which can be mostly overridden at Job
runtime). A notable exception here is that volume sizes given to a container is handled at the level of a ComputeEnvironment
, not at the individual job level.
Resource attr | JobDefinition attr | Note |
---|---|---|
uint32 cpu_cores | jobDefinition.containerPropterties.vcpus | These are hyperthreaded cores |
bool preemptible | Handled by virtue of which JobQueue the Job was submitted to | |
double ram_gb | jobDefinition.containerPropterties.memory | Integer in MiB |
double size_gb | a container's properties of volumes and mountPoints would need to account for this. |
|
repeated string zones | Handled at the level of ComputeEnvironment and JobQueue |
Batch job parameters are simple key-value pairs, are represent default values or parameter substitution placeholders and defined within a JobDefintion
. Parameters in a job submission request override any corresponding parameter defaults from the job definition. This is a big departure from TES TaskParameter
which is meant to define file input and outputs for a set of operations. Any mapping would be subject to a lot of conventions and be specific to an implementation of TES on top of AWS Batch.
As mentioned, the simplest mapping of Job
to a TES Task::Executor
vector would be a encoding the Executor
vector to a set of serially dependent Job
s each with a matching JobDefinition
.
Executor attr | JobDefinition containerProperties | Notes |
---|---|---|
string image_name | image | |
repeated string cmd | command | |
string workdir | mountPoints | Conventions needed |
string stdin | mountPoints | Conventions needed |
string stdout | mountPoints | Conventions needed |
string stderr | mountPoints | Conventions needed |
repeated Ports ports | Not applicable for Batch | |
map<string,string> environ | environment | Also a key-value array |
Not to beat a dead horse, but have I mentioned that Tasks do not map cleanly to Batch? Below I've made my best attempt to do so, but this is worth a discussion.
Executor attr | JobDefinition or Job attr | Notes |
---|---|---|
string id | jobs[].jobId | |
State state | jobs[].status | |
string name | jobs[].jobName | |
string project | Could utilize JobQueue by convention. |
|
string description | No use in Batch | |
repeated TaskParameter inputs | jobDefinition.parameters | Conventions needed |
repeated TaskParameter outputs | jobDefinition.parameters | Conventions needed |
Resources resources | See mapping above. | |
repeated Executor executors | See mapping above. | |
repeated string volumes | jobDefinition.containerProperties volumes and mountPoints | Conventions needed |
map<string, string> tags | Not used. | |
repeated TaskLog logs | See mapping above |
Both of Batch's API requests for job information (DescribeJobs
and ListJobs
) return an array of results. You can give a array of jobId
as a filtering parameter, but the result is still an array of job information even when only one jobId
is given.
Batch also does not support query of jobs by their jobName
, while TES allows for defining a prefix to search on. This would have to be handled outside of Batch.
AWS Batch differentiates between canceling and terminating a Job
.
A Batch CancelJob
request will cancel a Job
in the PENDING
or RUNNABLE
states, but will be a no-op for jobs that have entered the STARTING
or RUNNING
states. For the latter, Batch requires an explicit TerminateJob
request to be issued on a job. In the case of a successful cancel or termination of a job, Batch will set the job detail status
to FAILED
, and the statusReason
to the provided reason given to the CancelJob
or TerminateJob
API call.
Footnotes
-
Alternatively one could implement a system where Batch only receives a request to run a TES Task launcher container with priviledged access and enough resource allocations for the serial Executors, but this is getting ahead of ourselves. ↩
This is great Angel!!!