Tabrizian/Scheduling.md

## Scheduling.md

      
    Raw
  

              Scheduling.md
            
          
    Scheduling

Can we apply the Reinforcement Learning problem for the ML scheduling while
considering the case that the jobs may be preempted?


Paper
State Representation
Action Representation
Reward Function
RL Customizations
Policy Representation
Goal


Resource Management with Deep Reinforcement Learning (HotNets '16)
An image showing the resource allocation of the cluster of resources and resource demands of the jobs waiting to be scheduled.
Whether to admit a job or not
Negative of the total slowdown
REINFORCE + Negating the value function in order to reduce the variance.
Simple feed forward network
Reducing the average job slowdown


Device Placement Optimization with Reinforcement Learning (ICML '17)
A computation graph + a list of devices
The given placement strategy
Square root of the total running time in a given placement or a large constant for exception methods
REINFORCE + Negating the base term.
A sequence to sequence neural network
Reducing the average job completion


A Hierarchical Model For Device Placement (ICLR '18)
A computation graph + a list of devices
The given placement strategy
Minus the square root of the total running time in a given placement or a large constant for exception methods
REINFORCE + Negating the base term.
Two sequence-to-sequence neural network one for the placer and one for the optimizer
Reducing the average job completion


Chic (IWQoS '19)
The training model, remaining batches, number of workers, number of servers
Whether to increase the number of PS/worker or do nothing
Average completion time + penalizing wrong actions
Cross Entropy method
Simple feed forward network
Determining the PS/worker requirements


Harmony (Infocomm '19)
Current job model mapping, worker/PS type requirement, worker/PS allocation, physical server resources, a matrix showing the current placement
Placement decision of the new jobs
negative sum of the normalized training speed
REINFORCE + Negating the value + epsilon expoloration + Experience Replay
Simple feed forward network
Reducing the interference
Paper	State Representation	Action Representation	Reward Function	RL Customizations	Policy Representation	Goal
Resource Management with Deep Reinforcement Learning (HotNets '16)	An image showing the resource allocation of the cluster of resources and resource demands of the jobs waiting to be scheduled.	Whether to admit a job or not	Negative of the total slowdown	REINFORCE + Negating the value function in order to reduce the variance.	Simple feed forward network	Reducing the average job slowdown
Device Placement Optimization with Reinforcement Learning (ICML '17)	A computation graph + a list of devices	The given placement strategy	Square root of the total running time in a given placement or a large constant for exception methods	REINFORCE + Negating the base term.	A sequence to sequence neural network	Reducing the average job completion
A Hierarchical Model For Device Placement (ICLR '18)	A computation graph + a list of devices	The given placement strategy	Minus the square root of the total running time in a given placement or a large constant for exception methods	REINFORCE + Negating the base term.	Two sequence-to-sequence neural network one for the placer and one for the optimizer	Reducing the average job completion
Chic (IWQoS '19)	The training model, remaining batches, number of workers, number of servers	Whether to increase the number of PS/worker or do nothing	Average completion time + penalizing wrong actions	Cross Entropy method	Simple feed forward network	Determining the PS/worker requirements
Harmony (Infocomm '19)	Current job model mapping, worker/PS type requirement, worker/PS allocation, physical server resources, a matrix showing the current placement	Placement decision of the new jobs	negative sum of the normalized training speed	REINFORCE + Negating the value + epsilon expoloration + Experience Replay	Simple feed forward network	Reducing the interference