Can we apply the Reinforcement Learning problem for the ML scheduling while considering the case that the jobs may be preempted?
Paper | State Representation | Action Representation | Reward Function | RL Customizations | Policy Representation | Goal |
---|---|---|---|---|---|---|
Resource Management with Deep Reinforcement Learning (HotNets '16) | An image showing the resource allocation of the cluster of resources and resource demands of the jobs waiting to be scheduled. | Whether to admit a job or not | Negative of the total slowdown | REINFORCE + Negating the value function in order to reduce the variance. | Simple feed forward network | Reducing the average job slowdown |
Device Placement Optimization with Reinforcement Learning (ICML '17) | A computation graph + a list of devices | The given placement strategy | Square root of the total running time in a given placement or a large constant for exception methods | REINFORCE + Negating the base term. | A sequence to sequence neural network | Reducing the average job completion |
A Hierarchical Model For Device Placement (ICLR '18) | A computation graph + a list of devices | The given placement strategy | Minus the square root of the total running time in a given placement or a large constant for exception methods | REINFORCE + Negating the base term. | Two sequence-to-sequence neural network one for the placer and one for the optimizer | Reducing the average job completion |
Chic (IWQoS '19) | The training model, remaining batches, number of workers, number of servers | Whether to increase the number of PS/worker or do nothing | Average completion time + penalizing wrong actions | Cross Entropy method | Simple feed forward network | Determining the PS/worker requirements |
Harmony (Infocomm '19) | Current job model mapping, worker/PS type requirement, worker/PS allocation, physical server resources, a matrix showing the current placement | Placement decision of the new jobs | negative sum of the normalized training speed | REINFORCE + Negating the value + epsilon expoloration + Experience Replay | Simple feed forward network | Reducing the interference |