Skip to content

Instantly share code, notes, and snippets.

@Tabrizian
Last active January 5, 2020 22:55
Show Gist options
  • Save Tabrizian/cf48058e2359c27aeb54d89e0798c8c9 to your computer and use it in GitHub Desktop.
Save Tabrizian/cf48058e2359c27aeb54d89e0798c8c9 to your computer and use it in GitHub Desktop.

Scheduling

Can we apply the Reinforcement Learning problem for the ML scheduling while considering the case that the jobs may be preempted?

Paper State Representation Action Representation Reward Function RL Customizations Policy Representation Goal
Resource Management with Deep Reinforcement Learning (HotNets '16) An image showing the resource allocation of the cluster of resources and resource demands of the jobs waiting to be scheduled. Whether to admit a job or not Negative of the total slowdown REINFORCE + Negating the value function in order to reduce the variance. Simple feed forward network Reducing the average job slowdown
Device Placement Optimization with Reinforcement Learning (ICML '17) A computation graph + a list of devices The given placement strategy Square root of the total running time in a given placement or a large constant for exception methods REINFORCE + Negating the base term. A sequence to sequence neural network Reducing the average job completion
A Hierarchical Model For Device Placement (ICLR '18) A computation graph + a list of devices The given placement strategy Minus the square root of the total running time in a given placement or a large constant for exception methods REINFORCE + Negating the base term. Two sequence-to-sequence neural network one for the placer and one for the optimizer Reducing the average job completion
Chic (IWQoS '19) The training model, remaining batches, number of workers, number of servers Whether to increase the number of PS/worker or do nothing Average completion time + penalizing wrong actions Cross Entropy method Simple feed forward network Determining the PS/worker requirements
Harmony (Infocomm '19) Current job model mapping, worker/PS type requirement, worker/PS allocation, physical server resources, a matrix showing the current placement Placement decision of the new jobs negative sum of the normalized training speed REINFORCE + Negating the value + epsilon expoloration + Experience Replay Simple feed forward network Reducing the interference
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment