Skip to content

Instantly share code, notes, and snippets.

@icoxfog417
Created March 19, 2019 23:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save icoxfog417/f0d01400df94d2a7da9537ae8f4e3de4 to your computer and use it in GitHub Desktop.
Save icoxfog417/f0d01400df94d2a7da9537ae8f4e3de4 to your computer and use it in GitHub Desktop.
def update(self, states, actions, rewards, values):
# Calculate values (or advantage) at outside of update process.
advantage = reward - values
action_probs = self.actor(states)
selected_action_probs = action_probs[self.to_one_hot(actions)]
neg_logs = - log(selected_action_probs)
policy_loss = reduce_mean(neg_logs * advantages)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment