Skip to content

Instantly share code, notes, and snippets.

@NMZivkovic
Created July 7, 2019 12:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save NMZivkovic/a12a6a8fa1c9c8c2a361e6d3b373e306 to your computer and use it in GitHub Desktop.
Save NMZivkovic/a12a6a8fa1c9c8c2a361e6d3b373e306 to your computer and use it in GitHub Desktop.
def retrain(self, batch_size):
minibatch = random.sample(self.expirience_replay, batch_size)
for state, action, reward, next_state, terminated in minibatch:
target = self.q_network.predict(state)
if terminated:
target[0][action] = reward
else:
t = self.target_network.predict(next_state)
target[0][action] = reward + self.gamma * np.amax(t)
self.q_network.fit(state, target, epochs=1, verbose=0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment