Last active
August 23, 2019 19:49
-
-
Save fakemonk1/71fa9f956df9354569883c108e4b16fa to your computer and use it in GitHub Desktop.
DQN Algorithm for Lunar Lander
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
initialize replay memory R | |
initialize action-value function Q (with random weights) | |
observe initial state s | |
repeat | |
select an action a | |
with probability ϵ select a random action | |
otherwise select a= argmaxa′Q(s,a′) | |
carry out action a | |
observe reward rr and new state s’ | |
store experience <s,a,r,s> in replay memory R | |
sample random transitions <ss,aa,rr,ss′>from replay memory R | |
calculate target for each minibatch transition | |
if ss’ is terminal state then tt =rr otherwise tt =rr + γmaxa′Q(ss′,aa′) | |
train the Q network using (tt−Q(ss,aa))2 as loss | |
s=s′ | |
until terminated |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment