Skip to content

Instantly share code, notes, and snippets.

@fakemonk1
Last active August 23, 2019 19:49
Show Gist options
  • Save fakemonk1/71fa9f956df9354569883c108e4b16fa to your computer and use it in GitHub Desktop.
Save fakemonk1/71fa9f956df9354569883c108e4b16fa to your computer and use it in GitHub Desktop.
DQN Algorithm for Lunar Lander
initialize replay memory R
initialize action-value function Q (with random weights)
observe initial state s
repeat
select an action a
with probability ϵ select a random action
otherwise select a= argmaxa′Q(s,a′)
carry out action a
observe reward rr and new state s’
store experience <s,a,r,s> in replay memory R
sample random transitions <ss,aa,rr,ss′>from replay memory R
calculate target for each minibatch transition
if ss’ is terminal state then tt =rr otherwise tt =rr + γmaxa′Q(ss′,aa′)
train the Q network using (tt−Q(ss,aa))2 as loss
s=s′
until terminated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment