Skip to content

Instantly share code, notes, and snippets.

@Nov05
Last active April 15, 2024 18:38
Show Gist options
  • Save Nov05/1d49183a91456a63e13782e5f49436be to your computer and use it in GitHub Desktop.
Save Nov05/1d49183a91456a63e13782e5f49436be to your computer and use it in GitHub Desktop.

Udacity Deep Reinforcement Learning - p2 & deeprl env setup

πŸ‘‰ check the drlnd_py310 env setup notes
πŸ‘‰ check the p1 env setup notes
πŸ‘‰ course curriculum
πŸ‘‰ Colab notebooks


Window 11, VSCode, Minicoda, Powershell

πŸ‘‰ copy from the env where cuda and pytorch have been installed
🟒 conda create --name drlnd_p2 --clone drlnd (Python 3.6)

(base) PS D:\github\udacity-deep-reinforcement-learning\python> conda create --name drlnd_p2 --clone drlnd
Source:      D:\Users\*\miniconda3\envs\drlnd
Destination: D:\Users\*\miniconda3\envs\drlnd_p2
Packages: 159
Files: 13970
  • or check how to install cuda + pytorch in windows 11
    conda install cuda --channel "nvidia/label/cuda-12.1.0"
  • or go to https://pytorch.org/, and select the right version to install
    ❌ pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
    🟒 conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
pip install torchmeta
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidi

🟒 Follow these steps to install mujoco-py on Windows

🟒 Powershell $env:PATH += ";C:\Users\*\.mujoco\mjpro150\bin"
 Powershell $env:path -split ";" to display path variables

🟒 download mujoco-py-1.50.1.68.tar.gz from https://pypi.org/project/mujoco-py/1.50.1.68/#files

pip install "cython<3"  
pip install mujoco-py-1.50.1.68.tar.gz  
python D:\github\udacity-deep-reinforcement-learning\python\mujoco-py\examples\body_interaction.py  
  • you might need this pip install lockfile and some other packages. install them according to the error messages.
  • a worse case is that your python version is too high (maybe >=3.9?), you might need to install mujoco_py manually.
  • now you should be able to see this.

πŸ‘‰ install gym atari and lincense
https://stackoverflow.com/a/69602242

pip install -U gym
pip install -U gym[atari,accept-rom-license]
pip install bleach==1.5.0  
pip install --upgrade numpy   
pip install --upgrade tensorboard

πŸ‘‰ install OpenAI Baselines

pip install --upgrade pip setuptools wheel   
pip install opencv-python==4.5.5.64  
git clone https://github.com/openai/baselines.git
cd baselines
pip install -e .
  • for python 3.11, you can pip install opencv-python.
    and i Successfully installed opencv-python-4.9.0.80.

πŸ‘‰ intall the rest packages for the deeprl folder.
pip install -r .\deeprl_files\requirements.txt

  • requirements.txt
# torch
# torchvision
# torchmeta 
# gym==0.15.7
# tensorflow==1.15.0
# opencv-python==4.0.0.21
atari-py
scikit-image==0.14.2
tqdm
pandas
pathlib
seaborn
# roboschool==1.0.34
dm-control2gym  
tensorflow-io
  • for python 3.11, losen the version requirement scikit-image.
    I got scikit-image-0.22.0 installed.

πŸ‘‰ test the env setup

  • run notebooks
python -m ipykernel install --user --name=drlnd_p2
jupyter notebook D:\github\udacity-deep-reinforcement-learning\p2_continuous-control\Continuous_Control.ipynb  
jupyter notebook D:\github\udacity-deep-reinforcement-learning\p2_continuous-control\Crawler.ipynb  

🟒 python -m deeprl.component.envs

if __name__ == '__main__':
    import time
    ## num_envs=5 will only create 3 env and cause error
    ## "results = _flatten_list(results)"
    ## in "baselines\baselines\common\vec_env\subproc_vec_env.py"
    task = Task('Hopper-v2', num_envs=3, single_process=False)
    state = task.reset()

    ## This might be helpful for custom env debugging
    # env_dict = gym.envs.registration.registry.env_specs.copy()
    # for item in env_dict.items():
    #     print(item)

    start_time = time.time()
    while True:
        action = np.random.rand(task.action_space.shape[0])
        next_state, reward, done, _ = task.step(action)
        print(done)
        if time.time()-start_time > 10: ## run about 10s
            break  
    task.close()

🟒 run examples:
D:\github\udacity-deep-reinforcement-learning\python\deeprl_files\examples.py

if __name__ == '__main__':
    mkdir('log')
    mkdir('tf_log')
    set_one_thread()
    random_seed()
    # -1 is CPU, an non-negative integer is the index of GPU
    # select_device(-1)
    select_device(0) ## GPU
    
    game = 'Reacher-v2'
    # a2c_continuous(game=game)
    # ppo_continuous(game=game)
    ddpg_continuous(game=game)    




folder ./python/deeprl structure

https://github.com/ShangtongZhang/DeepRL
https://github.com/ChalamPVS/Unity-Reacher

🟒 copied python files from repo @ShangtongZhang/DeepRL to repo @Nov05/udacity-deep-reinforcement-learning under the './python' folder.

DeepRL/template_jobs.py

ddpg_continuous(game='Reacher-v2', run=0, env=env,
	remark=ddpg_continuous.__name__)

DeepRL/examples.py

def ddpg_continuous(**kwargs):
	config.task_fn = lambda: Task(config.game, env=env)
	run_steps(DDPGAgent(config))

deep_rl/utils/config.py

class Config:
	def __init__(self):
		self.task_fn = None

DeepRL/deep_rl/utils/misc.py

def run_steps(agent):
    config = agent.config
    agent.step()

deep_rl/agent/DDPG_agent.py

class DDPGAgent(BaseAgent):
	self.task = config.task_fn()
	def step(self):

deep_rl/component/envs.py

def make_env(env_id, seed, rank, episode_life=True):
class Task:
    def __init__(self,
                 name,
                 num_envs=1,
		 env=env,
if __name__ == '__main__':
    task = Task('Hopper-v2', 5, single_process=False)
@Nov05
Copy link
Author

Nov05 commented Apr 7, 2024

🟒⚠️ issue solved: training has been slow. added torch.nn.BatchNorm1d, however, got the following error. my task has multiple unity envs, each env has multiple agents, torch.Size([1, 1, 33]) means there is 1 env 1 agent.

2024-04-07 03:32:29,914 - root - INFO: Episode 0, Step 0, 0.00 s/episode
🟒 Unity environment has been resetted.
πŸ‘‰ torch.Size([1, 1, 33]) BatchNorm1d(33, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  • refer to this set of model training hypermeters
  • solution: the original code is for mujoco, 1 env with 1 agent, hence the shape of tensors, such as actions and states, are 2 dimensional. for unity, 1 env with multiple agents, hence the shape of tensors need to reduce 1 dimension for the neural networks.

@Nov05
Copy link
Author

Nov05 commented Apr 9, 2024

🟒⚠️ issue solved: neural network nn.BatchNorm1d layer threw error ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, state_size]) when it was actually evaluating. during training, tensor sizes are usually like [mini_batch_size, state_size], no error will be given. it turned out that i forgot to turn on eval mode of the network. it makes sense that you can't normalize a single channel of values. and this layer probably is skipped during evaluation.

    ## neural network
    config.network_fn = lambda: DeterministicActorCriticNet(
        config.state_dim,  
        config.action_dim,  
        actor_body=FCBody(config.state_dim, (128,128), gate=nn.LeakyReLU, 
                          init_method='uniform_fan_in', 
                          batch_norm=nn.BatchNorm1d,),
        critic_body=FCBody(config.state_dim+config.action_dim, (128,128), gate=nn.LeakyReLU, 
                           init_method='uniform_fan_in', batch_norm=nn.BatchNorm1d),
        actor_opt_fn=lambda params: torch.optim.Adam(params, lr=1e-4),
        ## for the critic optimizer, it seems that 1e-3 won't converge
        critic_opt_fn=lambda params: torch.optim.Adam(params, lr=1e-4, weight_decay=1e-5),  
        # batch_norm=nn.BatchNorm1d,
        )
DeterministicActorCriticNet(
  (phi_body): DummyBody()
  (actor_body): FCBody(
    (layers): ModuleList(
      (0): Linear(in_features=33, out_features=128, bias=True)
      (1): LeakyReLU(negative_slope=0.01)
      (2): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): Linear(in_features=128, out_features=128, bias=True)
      (4): LeakyReLU(negative_slope=0.01)
      (5): Linear(in_features=128, out_features=4, bias=True)
      (6): Tanh()
    )
  )
  (critic_body): FCBody(
    (layers): ModuleList(
      (0): Linear(in_features=37, out_features=128, bias=True)
      (1): LeakyReLU(negative_slope=0.01)
      (2): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): Linear(in_features=128, out_features=128, bias=True)
      (4): LeakyReLU(negative_slope=0.01)
      (5): Linear(in_features=128, out_features=1, bias=True)
    )
  )
)

@Nov05
Copy link
Author

Nov05 commented Apr 15, 2024

🟒⚠️ issue solved: alphazero folder jupyter notebook: %matplotlib notebook threw Javascript Error: IPython is not defined.

$ jupyter notebook ..\alphazero\alphazero-TicTacToe-advanced.ipynb
jupyter lab --version
pip install --upgrade jupyterlab
ipython --version
pip install --upgrade ipython

my env drlnd_py310 upgraded jupyterlab from 4.1.4 to jupyterlab-4.1.6, ipython from 8.22.2 to ipython-8.23.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment