Skip to content

Instantly share code, notes, and snippets.

@joschu
Last active August 1, 2020 19:09
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save joschu/a21ed1259d3f8c7bdff178fb47bc6fc1 to your computer and use it in GitHub Desktop.
Save joschu/a21ed1259d3f8c7bdff178fb47bc6fc1 to your computer and use it in GitHub Desktop.

I used the cross-entropy method (an evolutionary algorithm / derivative free optimization method) to optimize small two-layer neural networks.

Code used to obtain these results can be found at the url https://github.com/joschu/modular_rl, commit 3324639f82a81288e9d21ddcb6c2a37957cdd361. The command line expression used for all the environments can be found in the text file below. Note that the same exact parameters were used for all tasks. The important parameters are:

  • hid_sizes=10,5: hidden layer sizes of MLP
  • extra_std=0.01: noise added to variance, see [1]
  • batch_size=200: number of episodes per batch
  • seed=0 random seed.

The program is single-threaded and deterministic. I used float32 precision, with THEANO_FLAGS=floatX=float32.

The following instructions commands will let you conveniently run all of the experiments at once.

  1. Find a computer with many cpus.
  2. If it's a headless computer, sudo apt-get install xvfb. Then type xvfb-run /bin/bash -s "-screen 0 1400x900x24" to enter a shell where all your commands will benefit from a fake monitor provided by xvfb.
  3. Navigate into the modular-rl directory.
  4. export THEANO_FLAGS=floatX=float32; export outdir=/YOUR/PATH/HERE; export NUM_CPUS=YOUR_NUMBER_OF_CPUS
  5. Run all experiments with the following command cat experiments/2-cem-scripts.txt | xargs -n 1 -P $NUM_CPUS bash -c.

You can also set --video=0 in these scripts to disable video recording. If video is disabled, you won't need the xvfb commands.

[1] Szita, István, and András Lörincz. "Learning Tetris using the noisy cross-entropy method." Neural computation 18.12 (2006): 2936-2941.

"python run_cem.py --n_iter=250 --batch_size=200 --agent=modular_rl.agentzoo.DeterministicAgent --hid_sizes=10,5 --env=Walker2d-v0 --extra_std=0.01 --seed=0 --outfile=$outdir/cem10-5-walker"
"python run_cem.py --n_iter=250 --batch_size=200 --agent=modular_rl.agentzoo.DeterministicAgent --hid_sizes=10,5 --env=Swimmer-v0 --extra_std=0.01 --seed=0 --outfile=$outdir/cem10-5-swimmer"
"python run_cem.py --n_iter=250 --batch_size=200 --agent=modular_rl.agentzoo.DeterministicAgent --hid_sizes=10,5 --env=Hopper-v0 --extra_std=0.01 --seed=0 --outfile=$outdir/cem10-5-hopper"
"python run_cem.py --n_iter=250 --batch_size=200 --agent=modular_rl.agentzoo.DeterministicAgent --hid_sizes=10,5 --env=MountainCar-v0 --extra_std=0.01 --seed=0 --outfile=$outdir/cem10-5-mountaincar"
"python run_cem.py --n_iter=250 --batch_size=200 --agent=modular_rl.agentzoo.DeterministicAgent --hid_sizes=10,5 --env=Ant-v0 --extra_std=0.01 --seed=0 --outfile=$outdir/cem10-5-ant"
"python run_cem.py --n_iter=250 --batch_size=200 --agent=modular_rl.agentzoo.DeterministicAgent --hid_sizes=10,5 --env=Acrobot-v0 --extra_std=0.01 --seed=0 --outfile=$outdir/cem10-5-acrobot"
"python run_cem.py --n_iter=250 --batch_size=200 --agent=modular_rl.agentzoo.DeterministicAgent --hid_sizes=10,5 --env=InvertedPendulum-v0 --extra_std=0.01 --seed=0 --outfile=$outdir/cem10-5-ip"
"python run_cem.py --n_iter=250 --batch_size=200 --agent=modular_rl.agentzoo.DeterministicAgent --hid_sizes=10,5 --env=InvertedDoublePendulum-v0 --extra_std=0.01 --seed=0 --outfile=$outdir/cem10-5-idp"
"python run_cem.py --n_iter=250 --batch_size=200 --agent=modular_rl.agentzoo.DeterministicAgent --hid_sizes=10,5 --env=Reacher-v0 --extra_std=0.01 --seed=0 --outfile=$outdir/cem10-5-reacher"
"python run_cem.py --n_iter=250 --batch_size=200 --agent=modular_rl.agentzoo.DeterministicAgent --hid_sizes=10,5 --env=HalfCheetah-v0 --extra_std=0.01 --seed=0 --outfile=$outdir/cem10-5-hc"
"python run_cem.py --n_iter=250 --batch_size=200 --agent=modular_rl.agentzoo.DeterministicAgent --hid_sizes=10,5 --env=Humanoid-v0 --extra_std=0.01 --seed=0 --outfile=$outdir/cem10-5-humanoid"
"python run_cem.py --n_iter=250 --batch_size=200 --agent=modular_rl.agentzoo.DeterministicAgent --hid_sizes=10,5 --env=CartPole-v0 --extra_std=0.01 --seed=0 --outfile=$outdir/cem10-5-cartpole"
@gdb
Copy link

gdb commented Apr 28, 2016

Reproduced your results here, and marked as reviewed. To make the continuous control things work, I needed to run as xvfb-run -s "-screen 0 1400x900x24" bash:

- Swimmer-v0: https://gym.openai.com/evaluations/eval_uFANs5TDKojctCZqXFQ
- Walker2d-v0: https://gym.openai.com/evaluations/eval_tkcXsIeJRvefJQtjUDGtA
- Reacher-v0: https://gym.openai.com/evaluations/eval_Md6ZBNK6TD2lDcNKHKsyKA
- Swimmer-v0: https://gym.openai.com/evaluations/eval_CzUylyo9T6qQiTRDgV0sww
- InvertedPendulum-v0: https://gym.openai.com/evaluations/eval_qPP7LMJ4ROmDAiYgZ2UHLw
- Hopper-v0: https://gym.openai.com/evaluations/eval_m7gi0JuRhqNktyqHRKS0Q
- Humanoid-v0: https://gym.openai.com/evaluations/eval_yvpDQ1mJS9al78xDkVIw
- HalfCheetah-v0: https://gym.openai.com/evaluations/eval_2t1oMldWQbqIfQ01Oua5ew
- InvertedDoublePendulum-v0: https://gym.openai.com/evaluations/eval_LmdVJgQRlCkhVaCcwGOdA
- Acrobot-v0: https://gym.openai.com/evaluations/eval_P8qNlwMcQnWxLmhngLMzhQ
- CartPole-v0: https://gym.openai.com/evaluations/eval_qVBzATKSGCQ0jTcwFgqhA
- MountainCar-v0: https://gym.openai.com/evaluations/eval_ZiGs3eBmTsie6e7XXdr4A

@joschu
Copy link
Author

joschu commented Apr 28, 2016

Oops, I've fixed the xvfb-runinstruction.

@Sohojoe
Copy link

Sohojoe commented May 28, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment