Skip to content

Instantly share code, notes, and snippets.

@joschu
Last active August 15, 2016 02:27
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save joschu/6de0710846dff7230543016fc7639f82 to your computer and use it in GitHub Desktop.
Save joschu/6de0710846dff7230543016fc7639f82 to your computer and use it in GitHub Desktop.
Ran trpo-gae-v0 on new environments

Same exact code and parameters as https://gist.github.com/joschu/e42a050b1eb5cfbb1fdc667c3450467a but I ran it on the updated (v1) Mujoco environments. The new scripts are provided below. Ran on commit 987cb5d229027045fd0390533832e173237f81b6 but there shouldn't be any functional differences from the previous writeup.

Also, I (inadvertently) ran everything for 500 iterations instead of 250.

"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=500 --seed=0 --timesteps_per_batch=5000 --env=InvertedPendulum-v1 --outfile=$outdir/InvertedPendulum"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=500 --seed=0 --timesteps_per_batch=15000 --env=Reacher-v1 --outfile=$outdir/Reacher"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=500 --seed=0 --timesteps_per_batch=15000 --env=InvertedDoublePendulum-v1 --outfile=$outdir/InvertedDoublePendulum"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=500 --seed=0 --timesteps_per_batch=25000 --env=HalfCheetah-v1 --outfile=$outdir/HalfCheetah"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=500 --seed=0 --timesteps_per_batch=25000 --env=Hopper-v1 --outfile=$outdir/Hopper"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=500 --seed=0 --timesteps_per_batch=25000 --env=Swimmer-v1 --outfile=$outdir/Swimmer"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=500 --seed=0 --timesteps_per_batch=25000 --env=Walker2d-v1 --outfile=$outdir/Walker2d"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=500 --seed=0 --timesteps_per_batch=50000 --env=Ant-v1 --outfile=$outdir/Ant"
"python run_pg.py --gamma=0.995 --lam=0.97 --agent=modular_rl.agentzoo.TrpoAgent --max_kl=0.01 --cg_damping=0.1 --activation=tanh --n_iter=500 --seed=0 --timesteps_per_batch=50000 --env=Humanoid-v1 --outfile=$outdir/Humanoid"
@floodsung
Copy link

floodsung commented Aug 15, 2016

Did you change the reward on different mujoco envs? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment