Skip to content

Instantly share code, notes, and snippets.

@eerkaijun
Created September 14, 2020 03:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save eerkaijun/6bd1cd5728f1cdfb392c7f46897631a4 to your computer and use it in GitHub Desktop.
Save eerkaijun/6bd1cd5728f1cdfb392c7f46897631a4 to your computer and use it in GitHub Desktop.
# agent taking a step at each time step
def agent_step(self, reward, state):
# reward (r.t) is the reward obtained from the previous step, state (s.t+1) is the state for the current step
act_values = self.model.predict(state)[0] # an array of action values of current time step
action = self.agent_take_action(act_values) # action chosen in current time step
# Perform an update to the neural network model based on previous step
target = reward + self.discount * act_values[action]
target_f = self.model.predict(self.prev_state) # action values of previous step
target_f[0][self.prev_action] = target # update
self.model.fit(self.prev_state, target_f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment