Skip to content

Instantly share code, notes, and snippets.

@joschu
Last active April 20, 2024 17:30
Show Gist options
  • Star 21 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save joschu/e42a050b1eb5cfbb1fdc667c3450467a to your computer and use it in GitHub Desktop.
Save joschu/e42a050b1eb5cfbb1fdc667c3450467a to your computer and use it in GitHub Desktop.
TRPO-GAE (Version 0) Writeup

Code used to obtain these results can be found at the url https://github.com/joschu/modular_rl, commit 50cdfdf375e69d86e3db6eb2ad0218ea6aebf371. The command line expression used for all the environments can be found in the text file below. Note that the same exact parameters and policies were used for all tasks, except for timesteps_per_batch, which was varied based on the difficulty of the task. The important parameters are:

  • gamma=0.995: discount
  • lam=0.97: see GAE paper for explanation
  • agent=TrpoAgent: name of the class, which specifies policy and value function architecture. In this case, we used two hidden layers of size 64, with tanh activations
  • cg_damping: multiple of the identity added for conjugate gradient
  • timesteps_per_batch: we collect trajectories until this number is exceeded
  • seed=0 random seed.

The program is single-threaded and deterministic. I used Theano with float64 precision, by setting THEANO_FLAGS=floatX=float64. float32 will probably work just as well, but it's not any faster on this experiment, so I didn't bother.

The following instructions commands will let you conveniently run all of the experiments at once.

  1. Find a computer with many cpus.
  2. If it's a headless computer, sudo apt-get install xvfb. Then type xvfb-run -s "-screen 0 1400x900x24" /bin/bash to enter a shell where all your commands will benefit from a fake monitor provided by xvfb.
  3. Navigate into the modular-rl directory.
  4. export THEANO_FLAGS=floatX=float64; export outdir=/YOUR/PATH/HERE; export NUM_CPUS=YOUR_NUMBER_OF_CPUS
  5. Run all experiments with the following command cat experiments/2-trpo-scripts.txt | xargs -n 1 -P $NUM_CPUS bash -c.

You can also set --video=0 in these scripts to disable video recording. If video is disabled, you won't need the xvfb commands.

"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=5000 --env=Pendulum-v0 --outfile=$outdir/Pendulum-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=5000 --env=CartPole-v0 --outfile=$outdir/CartPole-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=5000 --env=MountainCar-v0 --outfile=$outdir/MountainCar-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=5000 --env=Acrobot-v0 --outfile=$outdir/Acrobot-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=5000 --env=InvertedPendulum-v0 --outfile=$outdir/InvertedPendulum-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=15000 --env=Reacher-v0 --outfile=$outdir/Reacher-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=15000 --env=InvertedDoublePendulum-v0 --outfile=$outdir/InvertedDoublePendulum-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=25000 --env=HalfCheetah-v0 --outfile=$outdir/HalfCheetah-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=25000 --env=Hopper-v0 --outfile=$outdir/Hopper-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=25000 --env=Swimmer-v0 --outfile=$outdir/Swimmer-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=25000 --env=Walker2d-v0 --outfile=$outdir/Walker2d-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=50000 --env=Ant-v0 --outfile=$outdir/Ant-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=50000 --env=Humanoid-v0 --outfile=$outdir/Humanoid-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=AirRaid-ram-v0 --outfile=$outdir/AirRaid-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Alien-ram-v0 --outfile=$outdir/Alien-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Amidar-ram-v0 --outfile=$outdir/Amidar-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Assault-ram-v0 --outfile=$outdir/Assault-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Asterix-ram-v0 --outfile=$outdir/Asterix-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Asteroids-ram-v0 --outfile=$outdir/Asteroids-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Atlantis-ram-v0 --outfile=$outdir/Atlantis-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=BankHeist-ram-v0 --outfile=$outdir/BankHeist-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=BattleZone-ram-v0 --outfile=$outdir/BattleZone-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=BeamRider-ram-v0 --outfile=$outdir/BeamRider-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Berzerk-ram-v0 --outfile=$outdir/Berzerk-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Bowling-ram-v0 --outfile=$outdir/Bowling-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Boxing-ram-v0 --outfile=$outdir/Boxing-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Breakout-ram-v0 --outfile=$outdir/Breakout-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Carnival-ram-v0 --outfile=$outdir/Carnival-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Centipede-ram-v0 --outfile=$outdir/Centipede-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=ChopperCommand-ram-v0 --outfile=$outdir/ChopperCommand-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=CrazyClimber-ram-v0 --outfile=$outdir/CrazyClimber-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=DemonAttack-ram-v0 --outfile=$outdir/DemonAttack-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=DoubleDunk-ram-v0 --outfile=$outdir/DoubleDunk-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=ElevatorAction-ram-v0 --outfile=$outdir/ElevatorAction-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Enduro-ram-v0 --outfile=$outdir/Enduro-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=FishingDerby-ram-v0 --outfile=$outdir/FishingDerby-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Freeway-ram-v0 --outfile=$outdir/Freeway-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Frostbite-ram-v0 --outfile=$outdir/Frostbite-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Gopher-ram-v0 --outfile=$outdir/Gopher-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Gravitar-ram-v0 --outfile=$outdir/Gravitar-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=IceHockey-ram-v0 --outfile=$outdir/IceHockey-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Jamesbond-ram-v0 --outfile=$outdir/Jamesbond-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=JourneyEscape-ram-v0 --outfile=$outdir/JourneyEscape-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Kangaroo-ram-v0 --outfile=$outdir/Kangaroo-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Krull-ram-v0 --outfile=$outdir/Krull-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=KungFuMaster-ram-v0 --outfile=$outdir/KungFuMaster-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=MontezumaRevenge-ram-v0 --outfile=$outdir/MontezumaRevenge-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=MsPacman-ram-v0 --outfile=$outdir/MsPacman-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=NameThisGame-ram-v0 --outfile=$outdir/NameThisGame-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Phoenix-ram-v0 --outfile=$outdir/Phoenix-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Pitfall-ram-v0 --outfile=$outdir/Pitfall-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Pong-ram-v0 --outfile=$outdir/Pong-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Pooyan-ram-v0 --outfile=$outdir/Pooyan-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=PrivateEye-ram-v0 --outfile=$outdir/PrivateEye-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Qbert-ram-v0 --outfile=$outdir/Qbert-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Riverraid-ram-v0 --outfile=$outdir/Riverraid-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=RoadRunner-ram-v0 --outfile=$outdir/RoadRunner-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Robotank-ram-v0 --outfile=$outdir/Robotank-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Seaquest-ram-v0 --outfile=$outdir/Seaquest-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Skiing-ram-v0 --outfile=$outdir/Skiing-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Solaris-ram-v0 --outfile=$outdir/Solaris-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=SpaceInvaders-ram-v0 --outfile=$outdir/SpaceInvaders-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=StarGunner-ram-v0 --outfile=$outdir/StarGunner-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Tennis-ram-v0 --outfile=$outdir/Tennis-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=TimePilot-ram-v0 --outfile=$outdir/TimePilot-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Tutankham-ram-v0 --outfile=$outdir/Tutankham-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=UpNDown-ram-v0 --outfile=$outdir/UpNDown-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Venture-ram-v0 --outfile=$outdir/Venture-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=VideoPinball-ram-v0 --outfile=$outdir/VideoPinball-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=WizardOfWor-ram-v0 --outfile=$outdir/WizardOfWor-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=YarsRevenge-ram-v0 --outfile=$outdir/YarsRevenge-ram-v0.h5"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=250 --seed=0 --timesteps_per_batch=40000 --env=Zaxxon-ram-v0 --outfile=$outdir/Zaxxon-ram-v0.h5"
@poweic
Copy link

poweic commented Apr 17, 2017

Thanks for sharing and this is very cool.
I got a question: about the timesteps_per_batch, you said it varies based on the difficulty of the task. For example, you use 5000 for InvertedPendulum-v1 and 50000 for Humanoid-v1. But don't you get less frequent updates? What if you use 5000 or 10000 for Humanoid-v1? Does it make significant difference?
Thanks

@mrtnmcc
Copy link

mrtnmcc commented Jul 23, 2017

Hi reppolice, that error about 'Variable' object has no attribute 'set_value' can be fixed by changing your configured backend from Tensorflow to Theano.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment