Skip to content

Instantly share code, notes, and snippets.

@Danaze
Created August 25, 2020 16:23
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Danaze/714e161acc05b1c4d07b78a89711e559 to your computer and use it in GitHub Desktop.
Save Danaze/714e161acc05b1c4d07b78a89711e559 to your computer and use it in GitHub Desktop.
def QLupdate(self, state, action, reward, new_state):
# updating the Q-value of the visited state-action pair
self.Q_table[state][action] += self.learning_rate * (reward + self.discount * np.max(self.Q_table[new_state]) - self.Q_table[state][action])
def SARSAupdate(self, state, action, reward, new_state, next_action):
# updating the Q-value of the visited state-action pair
self.Q_table[state][action] += self.learning_rate * (reward + self.discount * self.Q_table[new_state][next_action] - self.Q_table[state][action])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment