{{ message }}

Instantly share code, notes, and snippets.

# Andy Zhang zhangandyx

• San Francisco, CA
Created Jul 4, 2017
Cleaned up CartPole
View cross_entropy.py
 # Source: http://rl-gym-doc.s3-website-us-west-2.amazonaws.com/mlss/lab1.html import gym import numpy as np from gym.wrappers.monitoring import Monitor from policy import Policy # Task settings: env = gym.make('CartPole-v0') # Change as needed env = Monitor(env, 'tmp/cart-pole-cross-entropy-1', force=True)
Created Jul 4, 2017
Monte Carlo EM CartPole-v0 with exponentially weighted variance
View evaluation.py
 from utils import make_policy def do_episode(policy, env, max_steps, render=False): total_rew = 0 ob = env.reset() for t in range(max_steps): a = policy.act(ob) (ob, reward, done, _info) = env.step(a) total_rew += reward
Created Jul 4, 2017
Monte Carlo EM - weighted sampling of mean/variance of theta by reward
View evaluation.py
 from utils import make_policy def do_episode(policy, env, max_steps, render=False): total_rew = 0 ob = env.reset() for t in range(max_steps): a = policy.act(ob) (ob, reward, done, _info) = env.step(a) total_rew += reward
Created Jul 4, 2017
Cross Entropy (Evolutionary Strategy) on CartPole-v0 - somewhat overparameterized
View cross_entropy.py
 # Source: http://rl-gym-doc.s3-website-us-west-2.amazonaws.com/mlss/lab1.html import gym import numpy as np from gym.wrappers.monitoring import Monitor from evaluation import noisy_evaluation, do_episode from utils import get_dim_theta, make_policy # Task settings: env = gym.make('CartPole-v0') # Change as needed
Created Jul 4, 2017
CartPole-v0 Cross Entropy Method with Minimal Params
View cart_pole_cem_3.py
 # Source: http://rl-gym-doc.s3-website-us-west-2.amazonaws.com/mlss/lab1.html import gym import numpy as np from gym.spaces import Discrete, Box from gym.wrappers.monitoring import Monitor # ================================================================ # Policies # ================================================================
Created Jul 4, 2017
CartPole-v0 Cross Entropy Method - Affine Function and Base HyperParams
View cart_pole_cem_1
 # Source: http://rl-gym-doc.s3-website-us-west-2.amazonaws.com/mlss/lab1.html import gym import numpy as np from gym.spaces import Discrete, Box from gym.wrappers.monitoring import Monitor # ================================================================ # Policies # ================================================================
Created Jul 4, 2017
CartPole-v0 Cross Entropy Method with no bias
View cart_pole_cem_2.py
 # Source: http://rl-gym-doc.s3-website-us-west-2.amazonaws.com/mlss/lab1.html import gym import numpy as np from gym.spaces import Discrete, Box from gym.wrappers.monitoring import Monitor # ================================================================ # Policies # ================================================================
Created Jul 1, 2017
CartPole - Hill Climb v4 - Correct hill climb and reduced noise & variance (MC-10)
View hill_climb_4
 import gym import numpy as np from gym.wrappers.monitoring import Monitor MC_POLICY_EVAL_EP = 10 BASE_NOISE_FACTOR = 0.1 NUM_POLICY_EVAL = 500 env = gym.make('CartPole-v0') env = Monitor(env, 'tmp/cart-pole-hill-climb-4', force=True)
Created Jul 1, 2017
CartPole-v0 Hill Climb + MC(10) + Gaussian Noise (Sigma 0.5)
View hill_climb_3.py
 import gym import numpy as np from gym.wrappers.monitoring import Monitor MC_POLICY_EVAL_EP = 10 BASE_NOISE_FACTOR = 0.5 NUM_POLICY_EVAL = 500 env = gym.make('CartPole-v0')
Created Jul 1, 2017
CartPole-v0 Hill Climb with MC(10) Eval + Simulated Annealing
View hill_climb_2.py
 import gym import numpy as np from gym.wrappers.monitoring import Monitor env = gym.make('CartPole-v0') env = Monitor(env, 'tmp/cart-pole-hill-climb-2', force=True) print("Action space: {0}".format(env.action_space)) print("Observation space: {0}\n\tLow: {1}\n\tHigh: {2}".format( env.observation_space,