Skip to content

Instantly share code, notes, and snippets.

@hisplan
Last active February 3, 2018 00:35
Show Gist options
  • Save hisplan/edf7367ea034f46e7b6c569bd8d1a9ac to your computer and use it in GitHub Desktop.
Save hisplan/edf7367ea034f46e7b6c569bd8d1a9ac to your computer and use it in GitHub Desktop.
cwltoil --help
usage: cwltoil [-h] [--logOff] [--logCritical] [--logError] [--logWarning]
[--logInfo] [--logDebug] [--logLevel LOGLEVEL]
[--logFile LOGFILE] [--rotatingLogging] [--workDir WORKDIR]
[--stats] [--clean {always,onError,never,onSuccess}]
[--cleanWorkDir {always,never,onSuccess,onError}]
[--clusterStats [CLUSTERSTATS]] [--restart]
[--batchSystem BATCHSYSTEM] [--disableHotDeployment]
[--scale SCALE] [--mesosMaster MESOSMASTERADDRESS]
[--parasolCommand PARASOLCOMMAND]
[--parasolMaxBatches PARASOLMAXBATCHES] [--provisioner {aws}]
[--nodeType TYPE] [--nodeOptions OPTIONS] [--minNodes NUM]
[--maxNodes NUM] [--preemptableNodeType TYPE]
[--preemptableNodeOptions OPTIONS] [--minPreemptableNodes NUM]
[--maxPreemptableNodes NUM] [--alphaPacking ALPHAPACKING]
[--betaInertia BETAINERTIA] [--scaleInterval SCALEINTERVAL]
[--preemptableCompensation PREEMPTABLECOMPENSATION]
[--maxServiceJobs MAXSERVICEJOBS]
[--maxPreemptableServiceJobs MAXPREEMPTABLESERVICEJOBS]
[--deadlockWait DEADLOCKWAIT] [--defaultMemory INT]
[--defaultCores FLOAT] [--defaultDisk INT]
[--defaultPreemptable] [--readGlobalFileMutableByDefault]
[--maxCores INT] [--maxMemory INT] [--maxDisk INT]
[--retryCount RETRYCOUNT] [--maxJobDuration MAXJOBDURATION]
[--rescueJobsFrequency RESCUEJOBSFREQUENCY] [--disableCaching]
[--maxLogFileSize MAXLOGFILESIZE] [--writeLogs [WRITELOGS]]
[--writeLogsGzip [WRITELOGSGZIP]] [--realTimeLogging]
[--sseKey SSEKEY] [--cseKey CSEKEY]
[--setEnv NAME=VALUE or NAME]
[--servicePollingInterval SERVICEPOLLINGINTERVAL]
[--badWorker BADWORKER]
[--badWorkerFailInterval BADWORKERFAILINTERVAL]
[--jobStore JOBSTORE] [--conformance-test] [--not-strict]
[--no-container] [--quiet] [--basedir BASEDIR]
[--outdir OUTDIR] [--version]
[--preserve-environment VAR1,VAR2 [VAR1,VAR2 ...]]
jobStore cwltool [cwljob]
positional arguments:
cwltool
cwljob
optional arguments:
-h, --help show this help message and exit
--jobStore JOBSTORE
--conformance-test
--not-strict
--no-container
--quiet
--basedir BASEDIR
--outdir OUTDIR
--version show program's version number and exit
--preserve-environment VAR1,VAR2 [VAR1,VAR2 ...]
Preserve specified environment variables when running
CommandLineTools
Logging Options:
Options that control logging
--logOff Same as --logCritical
--logCritical Turn on logging at level CRITICAL and above. (default
is INFO)
--logError Turn on logging at level ERROR and above. (default is
INFO)
--logWarning Turn on logging at level WARNING and above. (default
is INFO)
--logInfo Turn on logging at level INFO and above. (default is
INFO)
--logDebug Turn on logging at level DEBUG and above. (default is
INFO)
--logLevel LOGLEVEL Log at given level (may be either OFF (or CRITICAL),
ERROR, WARN (or WARNING), INFO or DEBUG). (default is
INFO)
--logFile LOGFILE File to log in
--rotatingLogging Turn on rotating logging, which prevents log files
getting too big.
toil core options:
Options to specify the location of the Toil workflow and turn on stats
collation about the performance of jobs.
jobStore The location of the job store for the workflow. A job
store holds persistent information about the jobs and
files in a workflow. If the workflow is run with a
distributed batch system, the job store must be
accessible by all worker nodes. Depending on the
desired job store implementation, the location should
be formatted according to one of the following
schemes: file:<path> where <path> points to a
directory on the file systen aws:<region>:<prefix>
where <region> is the name of an AWS region like us-
west-2 and <prefix> will be prepended to the names of
any top-level AWS resources in use by job store, e.g.
S3 buckets. azure:<account>:<prefix>
google:<project_id>:<prefix> TODO: explain For
backwards compatibility, you may also specify ./foo
(equivalent to file:./foo or just file:foo) or /bar
(equivalent to file:/bar).
--workDir WORKDIR Absolute path to directory where temporary files
generated during the Toil run should be placed. Temp
files and folders will be placed in a directory
toil-<workflowID> within workDir (The workflowID is
generated by Toil and will be reported in the workflow
logs. Default is determined by the user-defined
environmental variable TOIL_TEMPDIR, or the
environment variables (TMPDIR, TEMP, TMP) via mkdtemp.
This directory needs to exist on all machines running
jobs.
--stats Records statistics about the toil workflow to be used
by 'toil stats'.
--clean {always,onError,never,onSuccess}
Determines the deletion of the jobStore upon
completion of the program. Choices: 'always',
'onError','never', 'onSuccess'. The --stats option
requires information from the jobStore upon completion
so the jobStore will never be deleted withthat flag.
If you wish to be able to restart the run, choose
'never' or 'onSuccess'. Default is 'never' if stats is
enabled, and 'onSuccess' otherwise
--cleanWorkDir {always,never,onSuccess,onError}
Determines deletion of temporary worker directory upon
completion of a job. Choices: 'always', 'never',
'onSuccess'. Default = always. WARNING: This option
should be changed for debugging only. Running a full
pipeline with this option could fill your disk with
intermediate data.
--clusterStats [CLUSTERSTATS]
If enabled, writes out JSON resource usage statistics
to a file. The default location for this file is the
current working directory, but an absolute path can
also be passed to specify where this file should be
written. This options only applies when using scalable
batch systems.
toil options for restarting an existing workflow:
Allows the restart of an existing workflow
--restart If --restart is specified then will attempt to restart
existing workflow at the location pointed to by the
--jobStore option. Will raise an exception if the
workflow does not exist
toil options for specifying the batch system:
Allows the specification of the batch system, and arguments to the batch
system/big batch system (see below).
--batchSystem BATCHSYSTEM
The type of batch system to run the job(s) with,
currently can be one of singleMachine, parasol,
gridEngine, lsf or mesos'. default=singleMachine
--disableHotDeployment
Should hot-deployment of the user script be
deactivated? If True, the user script/package should
be present at the same location on all workers.
default=False
--scale SCALE A scaling factor to change the value of all submitted
tasks's submitted cores. Used in singleMachine batch
system. default=1
--mesosMaster MESOSMASTERADDRESS
The host and port of the Mesos master separated by
colon. default=localhost:5050
--parasolCommand PARASOLCOMMAND
The name or path of the parasol program. Will be
looked up on PATH unless it starts with a
slashdefault=parasol
--parasolMaxBatches PARASOLMAXBATCHES
Maximum number of job batches the Parasol batch is
allowed to create. One batch is created for jobs with
a a unique set of resource requirements. default=10000
toil options for autoscaling the cluster of worker nodes:
Allows the specification of the minimum and maximum number of nodes in an
autoscaled cluster, as well as parameters to control the level of
provisioning.
--provisioner {aws} The provisioner for cluster auto-scaling. The
currently supported choices are'cgcloud' or 'aws'. The
default is None.
--nodeType TYPE Node type for non-preemptable nodes. The syntax
depends on the provisioner used. For the cgcloud and
AWS provisioners this is the name of an EC2 instance
type, for example 'c3.8xlarge'. The default is None.
--nodeOptions OPTIONS
Provisioning options for the non-preemptable node
type. The syntax depends on the provisioner used.
Neither the CGCloud nor the AWS provisioner support
any node options. The default is None.
--minNodes NUM Minimum number of non-preemptable nodes in the
cluster, if using auto-scaling. The default is 0.
--maxNodes NUM Maximum number of non-preemptable nodes in the
cluster, if using auto-scaling. The default is 10.
--preemptableNodeType TYPE
Node type for preemptable nodes. The syntax depends on
the provisioner used. For the cgcloud and AWS
provisioners this is the name of an EC2 instance type,
followed by a colon and the price in dollar to bid for
a spot instance, for example 'c3.8xlarge:0.42'. The
default is None.
--preemptableNodeOptions OPTIONS
Provisioning options for the preemptable node type.
The syntax depends on the provisioner used. Neither
the CGCloud nor the AWS provisioner support any node
options. The default is None.
--minPreemptableNodes NUM
Minimum number of preemptable nodes in the cluster, if
using auto-scaling. The default is 0.
--maxPreemptableNodes NUM
Maximum number of preemptable nodes in the cluster, if
using auto-scaling. The default is 0.
--alphaPacking ALPHAPACKING
The total number of nodes estimated to be required to
compute the issued jobs is multiplied by the alpha
packing parameter to produce the actual number of
nodes requested. Values of this coefficient greater
than one will tend to over provision and values less
than one will under provision. default=0.8
--betaInertia BETAINERTIA
A smoothing parameter to prevent unnecessary
oscillations in the number of provisioned nodes. If
the number of nodes is within the beta inertia of the
currently provisioned number of nodes then no change
is made to the number of requested nodes. default=1.2
--scaleInterval SCALEINTERVAL
The interval (seconds) between assessing if the scale
of the cluster needs to change. default=30
--preemptableCompensation PREEMPTABLECOMPENSATION
The preference of the autoscaler to replace
preemptable nodes with non-preemptable nodes, when
preemptable nodes cannot be started for some reason.
Defaults to 0.0. This value must be between 0.0 and
1.0, inclusive. A value of 0.0 disables such
compensation, a value of 0.5 compensates two missing
preemptable nodes with a non-preemptable one. A value
of 1.0 replaces every missing pre-emptable node with a
non-preemptable one.
toil options for limiting the number of service jobs and detecting service deadlocks:
Allows the specification of the maximum number of service jobs in a
cluster. By keeping this limited we can avoid all the nodes being occupied
with services, so causing a deadlock
--maxServiceJobs MAXSERVICEJOBS
The maximum number of service jobs that can be run
concurrently, excluding service jobs running on
preemptable nodes. default=9223372036854775807
--maxPreemptableServiceJobs MAXPREEMPTABLESERVICEJOBS
The maximum number of service jobs that can run
concurrently on preemptable nodes.
default=9223372036854775807
--deadlockWait DEADLOCKWAIT
The minimum number of seconds to observe the cluster
stuck running only the same service jobs before
throwing a deadlock exception. default=60
toil options for cores/memory requirements:
The options to specify default cores/memory requirements (if not specified
by the jobs themselves), and to limit the total amount of memory/cores
requested from the batch system.
--defaultMemory INT The default amount of memory to request for a job.
Only applicable to jobs that do not specify an
explicit value for this requirement. Standard suffixes
like K, Ki, M, Mi, G or Gi are supported. Default is
2.0 Gi
--defaultCores FLOAT The default number of CPU cores to dedicate a job.
Only applicable to jobs that do not specify an
explicit value for this requirement. Fractions of a
core (for example 0.1) are supported on some batch
systems, namely Mesos and singleMachine. Default is
1.0
--defaultDisk INT The default amount of disk space to dedicate a job.
Only applicable to jobs that do not specify an
explicit value for this requirement. Standard suffixes
like K, Ki, M, Mi, G or Gi are supported. Default is
2.0 Gi
--defaultPreemptable
--readGlobalFileMutableByDefault
Toil disallows modification of read global files by
default. This flag makes it makes read file mutable by
default, however it also defeats the purpose of shared
caching via hard links to save space. Default is False
--maxCores INT The maximum number of CPU cores to request from the
batch system at any one time. Standard suffixes like
K, Ki, M, Mi, G or Gi are supported. Default is 8.0 Ei
--maxMemory INT The maximum amount of memory to request from the batch
system at any one time. Standard suffixes like K, Ki,
M, Mi, G or Gi are supported. Default is 8.0 Ei
--maxDisk INT The maximum amount of disk space to request from the
batch system at any one time. Standard suffixes like
K, Ki, M, Mi, G or Gi are supported. Default is 8.0 Ei
toil options for rescuing/killing/restarting jobs:
The options for jobs that either run too long/fail or get lost (some batch
systems have issues!)
--retryCount RETRYCOUNT
Number of times to retry a failing job before giving
up and labeling job failed. default=0
--maxJobDuration MAXJOBDURATION
Maximum runtime of a job (in seconds) before we kill
it (this is a lower bound, and the actual time before
killing the job may be longer).
default=9223372036854775807
--rescueJobsFrequency RESCUEJOBSFREQUENCY
Period of time to wait (in seconds) between checking
for missing/overlong jobs, that is jobs which get lost
by the batch system. Expert parameter. default=3600
toil miscellaneous options:
Miscellaneous options
--disableCaching Disables caching in the file store. This flag must be
set to use a batch system that does not support
caching such as Grid Engine, Parasol, LSF, or Slurm
--maxLogFileSize MAXLOGFILESIZE
The maximum size of a job log file to keep (in bytes),
log files larger than this will be truncated to the
last X bytes. Setting this option to zero will prevent
any truncation. Setting this option to a negative
value will truncate from the beginning.Default=62.5 K
--writeLogs [WRITELOGS]
Write worker logs received by the leader into their
own files at the specified path. The current working
directory will be used if a path is not specified
explicitly. Note: By default only the logs of failed
jobs are returned to leader. Set log level to 'debug'
to get logs back from successful jobs, and adjust
'maxLogFileSize' to control the truncation limit for
worker logs.
--writeLogsGzip [WRITELOGSGZIP]
Identical to --writeLogs except the logs files are
gzipped on the leader.
--realTimeLogging Enable real-time logging from workers to masters
--sseKey SSEKEY Path to file containing 32 character key to be used
for server-side encryption on awsJobStore. SSE will
not be used if this flag is not passed.
--cseKey CSEKEY Path to file containing 256-bit key to be used for
client-side encryption on azureJobStore. By default,
no encryption is used.
--setEnv NAME=VALUE or NAME, -e NAME=VALUE or NAME
Set an environment variable early on in the worker. If
VALUE is omitted, it will be looked up in the current
environment. Independently of this option, the worker
will try to emulate the leader's environment before
running a job. Using this option, a variable can be
injected into the worker process itself before it is
started.
--servicePollingInterval SERVICEPOLLINGINTERVAL
Interval of time service jobs wait between polling for
the existence of the keep-alive flag (defailt=60)
toil debug options:
Debug options
--badWorker BADWORKER
For testing purposes randomly kill 'badWorker'
proportion of jobs using SIGKILL, default=0.0
--badWorkerFailInterval BADWORKERFAILINTERVAL
When killing the job pick uniformly within the
interval from 0.0 to 'badWorkerFailInterval' seconds
after the worker starts, default=0.01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment