Skip to content

Instantly share code, notes, and snippets.

@pragatibaheti
Created March 11, 2020 21:21
Show Gist options
  • Save pragatibaheti/121e38f1ab2765de215d83c600d0dcea to your computer and use it in GitHub Desktop.
Save pragatibaheti/121e38f1ab2765de215d83c600d0dcea to your computer and use it in GitHub Desktop.
#This equation, known as the Bellman equation, tells us that the maximum future reward.
Q[current_state, action] = R[current_state, action] + gamma * max_value
print('max_value', R[current_state, action] + gamma * max_value)
if (np.max(Q) > 0):
return(np.sum(Q/np.max(Q)*100))
else:
return (0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment