John Schulman joschu

## 1-trpo-gae-v0-writeup-UPDATE.md

      
              2 files
            
          
              0 forks
            
          
                1 comment
              
            
              1 star
            
          
                joschu
                / 1-trpo-gae-v0-writeup-UPDATE.md
            
            
              Last active
              August 15, 2016 02:27
            
              
                Ran trpo-gae-v0 on new environments
              
          
    Same exact code and parameters as https://gist.github.com/joschu/e42a050b1eb5cfbb1fdc667c3450467a but I ran it on the updated (v1) Mujoco environments. The new scripts are provided below.
Ran on commit 987cb5d229027045fd0390533832e173237f81b6 but there shouldn't be any functional differences from the previous writeup.
Also, I (inadvertently) ran everything for 500 iterations instead of 250.

  
## 1-cem-v1-writeup.md

      
              2 files
            
          
              0 forks
            
          
                0 comments
              
            
              1 star
            
          
                joschu
                / 1-cem-v1-writeup.md
            
            
              Last active
              February 22, 2017 01:16
            
          
    This is a tiny update to https://gist.github.com/joschu/a21ed1259d3f8c7bdff178fb47bc6fc1#file-1-cem-v0-writeup-md

I ran experiments on the v1 mujoco environments
I reduced the added noise extra_std parameter from 0.01 to 0.001

I used the cross-entropy method (an evolutionary algorithm / derivative free optimization method) to optimize small two-layer neural networks.
Code used to obtain these results can be found at the url
https://github.com/joschu/modular_rl, commit ba42955b41d7f419470a95d875af1ab7e7ee66fc.
The command line expression used for all the environments can be found in the text file below.

  
## 1-cem-v0-writeup.md

      
              2 files
            
          
              0 forks
            
          
                3 comments
              
            
              2 stars
            
          
                joschu
                / 1-cem-v0-writeup.md
            
            
              Last active
              August 1, 2020 19:09
            
          
    I used the cross-entropy method (an evolutionary algorithm / derivative free optimization method) to optimize small two-layer neural networks.
Code used to obtain these results can be found at the url
https://github.com/joschu/modular_rl, commit 3324639f82a81288e9d21ddcb6c2a37957cdd361.
The command line expression used for all the environments can be found in the text file below.
Note that the same exact parameters were used for all tasks.
The important parameters are:

hid_sizes=10,5: hidden layer sizes of MLP
extra_std=0.01: noise added to variance, see [1]


## 1-trpo-gae-v0-writeup.md

      
              2 files
            
          
              2 forks
            
          
                12 comments
              
            
              20 stars
            
          
                joschu
                / 1-trpo-gae-v0-writeup.md
            
            
              Last active
              April 20, 2024 17:30
            
              
                TRPO-GAE (Version 0) Writeup
              
          
    Code used to obtain these results can be found at the url
https://github.com/joschu/modular_rl, commit 50cdfdf375e69d86e3db6eb2ad0218ea6aebf371.
The command line expression used for all the environments can be found in the text file below.
Note that the same exact parameters and policies were used for all tasks, except for timesteps_per_batch, which was varied based on the difficulty of the task.
The important parameters are:

gamma=0.995: discount
lam=0.97: see GAE paper for explanation
agent=TrpoAgent: name of the class, which specifies policy and value function architecture. In this case, we used two hidden layers of size 64, with tanh activations
cg_damping: multiple of the identity added for conjugate gradient


## test_mod_derivative.py
import numpy as np, theano.tensor as TT, theano

x = TT.scalar('x')
y = TT.scalar('y')

z = TT.mod(x**2, y)

# z = x**2+y**2
f = theano.function([x,y], z, allow_input_downcast=True)
dfdx = theano.function([x,y], TT.grad(z,x),allow_input_downcast=True)
	import numpy as np, theano.tensor as TT, theano

	x = TT.scalar('x')
	y = TT.scalar('y')

	z = TT.mod(x**2, y)

	# z = x2+y2
	f = theano.function([x,y], z, allow_input_downcast=True)
	dfdx = theano.function([x,y], TT.grad(z,x),allow_input_downcast=True)