- Submitting a job for immediate scheduler processing (not held)
- Ingest module creates the first event to “ingested” (maybe “submitted”)
- Submitting a job as held
- Cancelling a job
- From submitted
- From allocated
- From running
- List active jobs (and estimate their current state)
- Am I in the queue, running, or finished/inactive?
- Wait for job to be inactive (and then get the result)
- Holding/releasing a job
- Job shell: needs to know when the job grows/shrinks
- Job shell: needs to know when it is going to be killed (pre-emptive queues)
- Job shell: needs to know when checkpoint/restart is occuring
- User changes the jobspec (increases the number of nodes/walltime or changes the executable)
- Sched needs to be informed of the change
- The interface needs to validate the new jobpsec, and replace the old one
- Scheduler: needs to obtain the set of active jobs (and continually monitor that) after a (re-)load
- Exec system: needs to know when resources are allocated to a job
- Sched/Exec: need to know when a job is cancelled
- Scheduler: needs to know when a job is released (unheld)
- Exec: needs to know when the job grows/shrinks (due to resources going up/down OR the sched adding/removing resources)
- admins expedite the job (change the priority)
- can insert the expedition/unexepedition into the event log
- Space delimiters between elements of an event, newline delimiters between events
- Timestamp
- Floating point seconds since epoch
- String representation
- At least millisecond resolution (if not more)
- Topic
- String
- Namespaced
- Freeform space is allowed (newlines not allowed)
- Implementation should scrub out newlines from freeform, to preserve integrity
- “Relevant topics” for various modules
- Scheduler: submitted
- Exec: allocated
- Is the “ingest.submit” pub/sub event really needed if the scheduler is using the multi-key kvs watch?
- How to prevent people from writing to keys of a job that are now inactive?
- Might need to add ENODIR