Skip to content

Instantly share code, notes, and snippets.

@SteVwonder
Last active August 30, 2018 17:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save SteVwonder/629399778a6163569b076e11d730ae46 to your computer and use it in GitHub Desktop.
Save SteVwonder/629399778a6163569b076e11d730ae46 to your computer and use it in GitHub Desktop.
Notes from the Flux Team Brainstorming Sessions

Use-Cases

User-initiated

Handled by ingest module

  • Submitting a job for immediate scheduler processing (not held)
    • Ingest module creates the first event to “ingested” (maybe “submitted”)
  • Submitting a job as held

Proxied through job manager module

  • Cancelling a job
    • From submitted
    • From allocated
    • From running
  • List active jobs (and estimate their current state)
    • Am I in the queue, running, or finished/inactive?
  • Wait for job to be inactive (and then get the result)
  • Holding/releasing a job
  • Job shell: needs to know when the job grows/shrinks
  • Job shell: needs to know when it is going to be killed (pre-emptive queues)
  • Job shell: needs to know when checkpoint/restart is occuring
  • User changes the jobspec (increases the number of nodes/walltime or changes the executable)
    • Sched needs to be informed of the change
    • The interface needs to validate the new jobpsec, and replace the old one

System

  • Scheduler: needs to obtain the set of active jobs (and continually monitor that) after a (re-)load
  • Exec system: needs to know when resources are allocated to a job
  • Sched/Exec: need to know when a job is cancelled
  • Scheduler: needs to know when a job is released (unheld)
  • Exec: needs to know when the job grows/shrinks (due to resources going up/down OR the sched adding/removing resources)
  • admins expedite the job (change the priority)
    • can insert the expedition/unexepedition into the event log

Event log format

  • Space delimiters between elements of an event, newline delimiters between events
  • Timestamp
    • Floating point seconds since epoch
    • String representation
    • At least millisecond resolution (if not more)
  • Topic
    • String
    • Namespaced
  • Freeform space is allowed (newlines not allowed)
    • Implementation should scrub out newlines from freeform, to preserve integrity

RFC 16

  • “Relevant topics” for various modules
    • Scheduler: submitted
    • Exec: allocated

Side-notes

  • Is the “ingest.submit” pub/sub event really needed if the scheduler is using the multi-key kvs watch?
  • How to prevent people from writing to keys of a job that are now inactive?
    • Might need to add ENODIR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment