Solve CartPole environment on OpenAI Gym by simple q-learning algorithm (Parameter tuned version)
- gamma: 0.99
- bin size: [3, 3, 8, 5]
- low bound: [None, -0.5, None, -math.radians(50)]
- high bound: [None, 0.5, None, math.radians(50)]
- learning rate update rule: max(0.1, min(0.5, 1.0 - math.log10((t + 1) / 25)))
- epsilon update rule: max(0.01, min(1.0, 1.0 - math.log10((t + 1) / 25)))
These parameter settings are refered to sakulkar's algorithm.
You can confim no parameter tuning version at here. You can feel the effect of these.
Model Overview