joschu/1-trpo-gae-v0-writeup-UPDATE.md

## 1-trpo-gae-v0-writeup-UPDATE.md

      
    Raw
  

              1-trpo-gae-v0-writeup-UPDATE.md
            
          
    Same exact code and parameters as https://gist.github.com/joschu/e42a050b1eb5cfbb1fdc667c3450467a but I ran it on the updated (v1) Mujoco environments. The new scripts are provided below.
Ran on commit 987cb5d229027045fd0390533832e173237f81b6 but there shouldn't be any functional differences from the previous writeup.
Also, I (inadvertently) ran everything for 500 iterations instead of 250.

  
## 2-trpo-scripts.txt
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=500 --seed=0 --timesteps_per_batch=5000 --env=InvertedPendulum-v1 --outfile=$outdir/InvertedPendulum"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=500 --seed=0 --timesteps_per_batch=15000 --env=Reacher-v1 --outfile=$outdir/Reacher"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=500 --seed=0 --timesteps_per_batch=15000 --env=InvertedDoublePendulum-v1 --outfile=$outdir/InvertedDoublePendulum"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=500 --seed=0 --timesteps_per_batch=25000 --env=HalfCheetah-v1 --outfile=$outdir/HalfCheetah"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=500 --seed=0 --timesteps_per_batch=25000 --env=Hopper-v1 --outfile=$outdir/Hopper"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=500 --seed=0 --timesteps_per_batch=25000 --env=Swimmer-v1 --outfile=$outdir/Swimmer"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=500 --seed=0 --timesteps_per_batch=25000 --env=Walker2d-v1 --outfile=$outdir/Walker2d"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=500 --seed=0 --timesteps_per_batch=50000 --env=Ant-v1 --outfile=$outdir/Ant"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=500 --seed=0 --timesteps_per_batch=50000 --env=Humanoid-v1 --outfile=$outdir/Humanoid"
	"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=500 --seed=0 --timesteps_per_batch=5000 --env=InvertedPendulum-v1 --outfile=$outdir/InvertedPendulum"
	"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=500 --seed=0 --timesteps_per_batch=15000 --env=Reacher-v1 --outfile=$outdir/Reacher"
	"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=500 --seed=0 --timesteps_per_batch=15000 --env=InvertedDoublePendulum-v1 --outfile=$outdir/InvertedDoublePendulum"
	"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=500 --seed=0 --timesteps_per_batch=25000 --env=HalfCheetah-v1 --outfile=$outdir/HalfCheetah"
	"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=500 --seed=0 --timesteps_per_batch=25000 --env=Hopper-v1 --outfile=$outdir/Hopper"
	"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=500 --seed=0 --timesteps_per_batch=25000 --env=Swimmer-v1 --outfile=$outdir/Swimmer"
	"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=500 --seed=0 --timesteps_per_batch=25000 --env=Walker2d-v1 --outfile=$outdir/Walker2d"
	"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=500 --seed=0 --timesteps_per_batch=50000 --env=Ant-v1 --outfile=$outdir/Ant"
	"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=500 --seed=0 --timesteps_per_batch=50000 --env=Humanoid-v1 --outfile=$outdir/Humanoid"