Deep reinforcement learning using an asynchronous advantage actor-critic (A3C) model written in TensorFlow.
This AI does not rely on hand-engineered rules or features. Instead, it masters the environment by looking at raw pixels and learning from experience, just as humans do.
For Pong, an average score of 18 was reached in 72 hours of training on an 8-core CPU. Training and evaluation code is available at github.com/andreimuntean/a3c.
- OpenAI Gym 0.8
- TensorFlow 1.0