Skip to content

Instantly share code, notes, and snippets.

@AurelianTactics
Created December 18, 2017 14:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save AurelianTactics/f279576cc943405099b0f4582e57cda1 to your computer and use it in GitHub Desktop.
Save AurelianTactics/f279576cc943405099b0f4582e57cda1 to your computer and use it in GitHub Desktop.
states_batch, action_batch, reward_batch, next_states_batch, done_batch = map(np.array, zip(*minibatch))
q_values_next = target_model.predict(next_states_batch,batch_size=BATCH)
targets = np.zeros((BATCH,ACTIONS)) #BATCHxACTIONS
targets[ti_tuple,action_batch] = reward_batch + done_batch * GAMMA * np.amax(q_values_next,axis=1)
loss += model.train_on_batch(states_batch, targets)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment