This gist documents my OpenAI evaluations at
- https://gym.openai.com/evaluations/eval_bW731pNgS2i1dZB0JvojA [failed to solve]
- https://gym.openai.com/evaluations/eval_FrOsJ2oiRb2EB3oh5l5Lg [solved in 23 episodes]
I attempted to reproduce the quickest documented CartPole-v0 solution to-date, reporting 29 episodes to solve,by iaroslav-ai, documented at https://gym.openai.com/evaluations/eval_yCJkgBGRl2Nfn3TKbvkkg
I don't know why I had a different result the first time, or indeed why it was faster the second time.
I also saw it fail with a traceback:
/srv/s/openai/gym/gym/envs/classic_control/rendering.py in render(self, return_rgb_array)
---> 95 arr = arr.reshape(self.height, self.width, 4)
ValueError: total size of new array must be unchanged
Note: no GPU/CUDA used
Variance in solve time comes from randomness in both the environment and the agent. If you set the random seeds in both then the result should be the same every time.
This is partly why the 'Algorithms' section has been created so the performance over multiple evaluations can be measured (an algorithm's true ability should be independent of randomness)