Skip to content

Instantly share code, notes, and snippets.

@awjuliani
Last active July 18, 2023 19:18
Show Gist options
  • Star 44 You must be signed in to star a gist
  • Fork 16 You must be signed in to fork a gist
  • Save awjuliani/35d2ab3409fc818011b6519f0f1629df to your computer and use it in GitHub Desktop.
Save awjuliani/35d2ab3409fc818011b6519f0f1629df to your computer and use it in GitHub Desktop.
An implementation of a Deep Recurrent Q-Network in Tensorflow.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@IbrahimSobh
Copy link

Thank you, excellent work!

I would like to discuss many points … :)

It was very cool to use DRQN, Double-DQN and Dueling-DQN in one setup! (impressive)

“In order to only propagate accurate gradients through the network, we will mask the first half of the losses for each trace”

I was thinking it could be very nice if the network can learn how much gradient to use, using some sort of gating mechanism, what do you think? is it doable? how? would it help?

Could you please send the exact reference where the idea of using only half of gradients

Finally, I am a PhD candidate and was thinking of making a contribution, However it looks very hard given all the scientific work described above!

For example, here are some ideas:
1- Do you think using Skip-Connections like ResNet could enable deeper networks and then improve the results? https://arxiv.org/pdf/1512.03385v1.pdf
2- Instead of random initialization, using simple idea such as auto-encoders for game images as pre-training the network, could speed up the training? what about transfer learning?
3- A3C: Can we use the same methods (DRQN) with A3C,and have a super A3C-DRQN-DD-DQN algorithm?
https://arxiv.org/pdf/1602.01783.pdf
4- Planning: Do you think it will be good if we train a model for the environment, using neural network as function approximator, and then use algorithms like Dyna2 to plan for the next move?Generally speaking, can we think of planned-DQN?

What direction, in your opinion, could make a good contribution?

Thank you

@chhung3
Copy link

chhung3 commented Jul 28, 2017

Thank you very much for the code.

I got a question. It seems that the "deep" part is on the CNN but not on the recurrent part. And I found that the multiple-layer RNN doesn't quite popular such that I hardly found examples on the web. Is it true?

@samsenyang
Copy link

Thanks for these amazing codes!
Why did you define LSTMCell outside the class Qnetwork, rather than inside the Qnetwork?

@samsenyang
Copy link

samsenyang commented Apr 27, 2019

and why did you split the outputs of recurrent layer? can I use the outputs of recurrent layer directly for the inputs of advantage layer and value layer?

@samsenyang
Copy link

Because I didn't see the connections between targetQN and targetOps in the codes. So I really would like to know how exactly to update the parameters of targetQN by updateTarget(targetOps,sess).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment