One of the major pains I have in using HPC resources is figuring out/remembering the syntax required to create certain jobs, check the state of the queue(s) I am interested in, figure out, based on how heavily utilized the cluster is, how much time I spend waiting to get one of my jobs to run, figuring out how many nodes are available for the given queue(s), etc...
One thing I would love to have, to make this all the easier, would be to have a script that generated a script for me based on the system I was running on, whether it be Slurm or Torque (PBS/PBS-Pro) and can easily give me what I want. As well, I would want to be able to generate some template batch scripts and to append to them, so being able to define a base template for my project would be very useful (i.e. a base template generated from this script, and then using that template to generate more specific (less generalized) fully complete scripts.
- Detect whether or not the script is being run on PBS/PBS-Pro or Slurm and invoke the appropriate script
- Read in the state of the queue(s), and populate a table of queue information, such as maximum timeout, approximation of resources, list of nodes and their specifications, etc.
- Display number of available nodes in each queue, as well as specification for the queue in question.
- Give option to query more information about a queue such as the nodes it has available, and information obtainable about such nodes
- Allow creating/adding parameters to the scripts in question and to export as template (incomplete) or as full script (complete)
Queues | Available | Projected Wait-Time |
--------------------------------------------------------
default | 32/64 | 0 Minutes
>>> set nodes=64
>>> show
Queues | Available | Projected Wait-Time |
--------------------------------------------------------
default | 32/64 | 9:30:27
>>> set output="job.out" error="job.err"
>>> export job.sh
>>> import job.sh
>>> get output error
job.out
job.err
>>>