Proximal Policy Optimization with Generalized Advantage Estimation
By Patrick Coady: Learning Artificial Intelligence
The same learning algorithm was used to train agents for each of the ten OpenAI Gym MuJoCo continuous control environments. The only difference between evaluations was the number of episodes used per training batch, otherwise all options were the same. The code is available in the GitHub repository. The exact code used to generate the submissions is in the
The README.md file in the GitHub repository provides additional details on the algorithm and usage instructions.