Deep reinforcement learning using an asynchronous advantage actor-critic (A3C) model written in TensorFlow.
This AI does not rely on hand-engineered rules or features. Instead, it masters the environment by looking at raw pixels and learning from experience, just as humans do.
The Flappy Bird Gym environment was mastered in 48 hours of training using a CPU with 8 cores. Training and evaluation code is available at github.com/andreimuntean/a3c.
- OpenAI Gym 0.8
- TensorFlow 1.0