Skip to content

Instantly share code, notes, and snippets.

@icoxfog417
Last active March 19, 2019 23:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save icoxfog417/ac2cc957230c2b19bf49c7a51088f23b to your computer and use it in GitHub Desktop.
Save icoxfog417/ac2cc957230c2b19bf49c7a51088f23b to your computer and use it in GitHub Desktop.
Actor Critic
def update(self, states, actions, rewards):
values = self.critic(states)
advantage = reward - values
action_probs = self.actor(states)
selected_action_probs = action_probs[self.to_one_hot(actions)]
neg_logs = - log(selected_action_probs)
# If backprop executed, gradient of policy_loss will affect critic!
policy_loss = reduce_mean(neg_logs * advantages)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment