Skip to content

Instantly share code, notes, and snippets.

@icoxfog417
Created March 19, 2019 23:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save icoxfog417/598e56b985babf8a373f5c335d940796 to your computer and use it in GitHub Desktop.
Save icoxfog417/598e56b985babf8a373f5c335d940796 to your computer and use it in GitHub Desktop.
def update(self, states, actions, rewards):
values = self.critic(states)
advantage = reward - tf.stop_gradient(values) # Prevent gradient flows to critic
action_probs = self.actor(states)
selected_action_probs = action_probs[self.to_one_hot(actions)]
neg_logs = - log(selected_action_probs)
policy_loss = reduce_mean(neg_logs * advantages)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment