Skip to content

Instantly share code, notes, and snippets.

@ikbendewilliam
Created February 17, 2021 10:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ikbendewilliam/d45f228a92c324b713d7c605b384b3e3 to your computer and use it in GitHub Desktop.
Save ikbendewilliam/d45f228a92c324b713d7c605b384b3e3 to your computer and use it in GitHub Desktop.
def train_single_step(self, state0, state1, a, reward, maximum_discount):
Q0 = self.predict(state0)
Q1 = np.argmax(self.predict(state1)[0])
Q0[0][a] = reward + maximum_discount * Q1
self.model.fit(np.array(state0).reshape(1, -1), Q0, epochs=1, verbose=0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment