This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Solves the cartpole-v1 enviroment on OpenAI gym using policy search | |
Same algorithm as for cartpole-v0 | |
A neural network is used to store the policy | |
At the end of each episode the target value for each taken action is | |
updated with the total normalized reward (up to a learning rate) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Solves the cartpole-v0 enviroment on OpenAI gym using policy search | |
A neural network is used to store the policy | |
At the end of each episode the target value for each taken action is | |
updated with the total normalized reward (up to a learning rate) | |
Then a standard supervised learning backprop on the entire batch is | |
executed |