fakemonk1/DQN_Algo_Lunar_Lander

## DQN_Algo_Lunar_Lander
initialize replay memory R
initialize action-value function Q (with random weights)
observe initial state s
repeat
	select an action a
		with probability ϵ select a random action
		otherwise select a= argmaxa′Q(s,a′)
	carry out action a
	observe reward rr and new state s’
	store experience <s,a,r,s> in replay memory R
	sample random transitions <ss,aa,rr,ss′>from replay memory R
	calculate target for each minibatch transition
		if ss’ is terminal state then tt =rr otherwise tt =rr + γmaxa′Q(ss′,aa′)
	train the Q network using (tt−Q(ss,aa))2 as loss
	s=s′
until terminated
	initialize replay memory R
	initialize action-value function Q (with random weights)
	observe initial state s
	repeat
	select an action a
	with probability ϵ select a random action
	otherwise select a= argmaxa′Q(s,a′)
	carry out action a
	observe reward rr and new state s’
	store experience <s,a,r,s> in replay memory R
	sample random transitions <ss,aa,rr,ss′>from replay memory R
	calculate target for each minibatch transition
	if ss’ is terminal state then tt =rr otherwise tt =rr + γmaxa′Q(ss′,aa′)
	train the Q network using (tt−Q(ss,aa))2 as loss
	s=s′
	until terminated