Skip to content

Instantly share code, notes, and snippets.

Example of how one could abstract away high-level options to leverage MDP solvers. For context, options can be thought of as a tuple which include:

  • an initiation set, i.e., where can you start executing the option,
  • a policy, i.e., which primitive actions are chosen while an option is active,
  • and state (and maybe also action) dependent termination probabilties, i.e., when do you stop following the current active option.

The advantage of such an abstraction is that you can simply treat the options as regular actions and solve the high-level problem as you would any MDP provided you support variable discounts. The generative interface offers

def build_eager_policy(name, postprocess_fn, loss_fn, stats_fn):
class EagerPolicy():
def __init__(self, action_space, obs_space):
self.model = get_model(action_space, obs_space)
self.optimizer = make_optimizer()
def postprocess_trajectory(self, batch):
return postprocess_fn(batch)