Skip to content

Instantly share code, notes, and snippets.

@revanurambareesh
Created January 1, 2019 16:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save revanurambareesh/7f83c5caadef26344573fc2ec7c94fac to your computer and use it in GitHub Desktop.
Save revanurambareesh/7f83c5caadef26344573fc2ec7c94fac to your computer and use it in GitHub Desktop.
MDP State object
class State:
def __init__(self, player: Player, state_desc: State_Desc):
# print(state_desc)
assert (len(state_desc) == 9)
self.possible_actions: List = [action[1].name for action in enumerate(Action) if state_desc[action[0]] == '-']
self.p: Dict = {action: Qvalue_Frequency((RewardType(0), 0)) for action in self.possible_actions}
self.v_star = -np.inf
self.state_desc: State_Desc = state_desc
self.policy = None # Represents best action to be taken
self.player: Player = player
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment