Skip to content

Instantly share code, notes, and snippets.

@hetelek
Created April 12, 2017 21:51
Show Gist options
  • Save hetelek/435b134e9483796e514da693da9fc9d1 to your computer and use it in GitHub Desktop.
Save hetelek/435b134e9483796e514da693da9fc9d1 to your computer and use it in GitHub Desktop.
Attempt to solve Cart Pole by adding random noise to the best weights.
import tensorflow as tf
import gym
stddev = 1.0
render = True
monitor = True
best_weights = tf.Variable(tf.truncated_normal(shape=[4, 1]))
current_weights = tf.Variable(best_weights.initialized_value())
recalculate_current = tf.assign(current_weights, tf.add(best_weights, tf.random_normal(shape=[4, 1], stddev=stddev)))
set_best = tf.assign(best_weights, current_weights)
x = tf.placeholder(tf.float32, shape=[None, 4])
y = tf.cast(tf.less_equal(0.0, tf.matmul(x, current_weights)), tf.int32)
env = gym.make('CartPole-v0')
if monitor:
env = gym.wrappers.Monitor(env, '/tmp/cartpole-experiment-1', force=True)
observation = env.reset()
if render:
env.render()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
best = 0
current = 0
while True:
action = sess.run(y, feed_dict={x: [observation]})[0][0]
observation, reward, done, info = env.step(action)
current += reward
if render:
env.render()
if done:
if current >= best:
best = current
sess.run(set_best)
print 'new best: ' + str(best)
current = 0
sess.run(recalculate_current)
observation = env.reset()
@hypernicon
Copy link

Just in case for some reason you aren't aware, the formal name for this algorithm is a 1+1 Evolution Strategy (1+1 ES). It's an instance of a fairly simple and standard evolutionary method invented in the 1960's by Rechenberg (https://en.wikipedia.org/wiki/Evolution_strategy)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment