xiaoFu ffrige

## CartPole-v1.py
"""
Solves the cartpole-v1 enviroment on OpenAI gym using policy search

Same algorithm as for cartpole-v0

A neural network is used to store the policy

At the end of each episode the target value for each taken action is
updated with the total normalized reward (up to a learning rate)

## CartPole-v0.py
"""
Solves the cartpole-v0 enviroment on OpenAI gym using policy search

A neural network is used to store the policy

At the end of each episode the target value for each taken action is
updated with the total normalized reward (up to a learning rate)

Then a standard supervised learning backprop on the entire batch is
executed
	"""
	Solves the cartpole-v1 enviroment on OpenAI gym using policy search

	Same algorithm as for cartpole-v0

	A neural network is used to store the policy

	At the end of each episode the target value for each taken action is
	updated with the total normalized reward (up to a learning rate)
	"""
	Solves the cartpole-v0 enviroment on OpenAI gym using policy search

	A neural network is used to store the policy

	At the end of each episode the target value for each taken action is
	updated with the total normalized reward (up to a learning rate)

	Then a standard supervised learning backprop on the entire batch is
	executed