Skip to content

Instantly share code, notes, and snippets.

@bkocis
Last active December 18, 2019 10:10
Show Gist options
  • Save bkocis/bd22feca15368fdc22d7fd39b1328723 to your computer and use it in GitHub Desktop.
Save bkocis/bd22feca15368fdc22d7fd39b1328723 to your computer and use it in GitHub Desktop.
DRLND-links-references

https://www.alexirpan.com/2018/02/14/rl-hard.html

Prioritized Experience Learning https://github.com/jaromiru/AI-blog/blob/master/Seaquest-DDQN-PER.py

https://github.com/Ullar-Kask/TD3-PER/blob/master/Pytorch/src/PER.py https://knowledge.udacity.com/questions/46815 https://knowledge.udacity.com/questions/56910 https://knowledge.udacity.com/questions/54781

the paper https://arxiv.org/pdf/1511.05952.pdf https://www.semanticscholar.org/paper/A-novel-DDPG-method-with-prioritized-experience-Hou-Liu/027d002d205e49989d734603ff0c2f7cbfa6b6dd

https://wpumacay.github.io/research_blog/posts/deeprlnd-project1-part3/ https://medium.com/@kinwo/learning-to-play-tennis-from-scratch-with-self-play-using-ddpg-ac7389eb980e https://towardsdatascience.com/training-two-agents-to-play-tennis-8285ebfaec5f

https://openai.com/blog/learning-dexterity/

https://towardsdatascience.com/pytorch-vs-tensorflow-spotting-the-difference-25c75777377b

Great job providing the ideas to experiment more in future with the project!

As pointed in the report, you should try implementing Prioritized Experience Replay also. It helps to improve the performance and significantly reduces the training time. This should also help to stabilize the performance to some extent. A fast implementation of Prioritized Experience Replay is possible using a special data structure called Sum Tree. I found a good implementation here.

Also, I request you to check the following posts to get familiar with more reinforcement learning algorithms.

[Asynchronous Actor-Critic Agents (A3C)](Asynchronous Actor-Critic Agents (A3C))
[Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO))[https://medium.com/@sanketgujar95/trust-region-policy-optimization-trpo-and-proximal-policy-optimization-ppo-e6e7075f39ed]

Here is an implementation of PPO on tennis environment. The training was slow but the final average score achieved was almost 1.25 (with some fluctuation). You should surely try PPO in future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment