Clement Gehring gehring

## smdp.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                gehring
                / smdp.md
            
            
              Created
              December 6, 2019 06:41
            
          
    Example of how one could abstract away high-level options to leverage MDP solvers. For context, options can be thought of as a tuple
which include:

an initiation set, i.e., where can you start executing the option,
a policy, i.e., which primitive actions are chosen while an option is active,
and state (and maybe also action) dependent termination probabilties, i.e., when do you stop following the current
active option.

The advantage of such an abstraction is that you can simply treat the options as regular actions and solve the
high-level problem as you would any MDP provided you support variable discounts. The generative interface offers

  
## eager_builder.py
def build_eager_policy(name, postprocess_fn, loss_fn, stats_fn):

    class EagerPolicy():
        def __init__(self, action_space, obs_space):
            self.model = get_model(action_space, obs_space)
            self.optimizer = make_optimizer()

        def postprocess_trajectory(self, batch):
            return postprocess_fn(batch)
	def build_eager_policy(name, postprocess_fn, loss_fn, stats_fn):

	class EagerPolicy():
	def __init__(self, action_space, obs_space):
	self.model = get_model(action_space, obs_space)
	self.optimizer = make_optimizer()

	def postprocess_trajectory(self, batch):
	return postprocess_fn(batch)