Last active April 15, 2024 18:38
Udacity Deep Reinforcement Learning - p2 & deeprl env setup

👉 check the drlnd_py310 env setup notes
👉 check the p1 env setup notes
👉 course curriculum
👉 Colab notebooks

Window 11, VSCode, Minicoda, Powershell

👉 copy from the env where cuda and pytorch have been installed
🟢 conda create --name drlnd_p2 --clone drlnd (Python 3.6)

(base) PS D:\github\udacity-deep-reinforcement-learning\python> conda create --name drlnd_p2 --clone drlnd
Source:      D:\Users\*\miniconda3\envs\drlnd
Destination: D:\Users\*\miniconda3\envs\drlnd_p2
Packages: 159
Files: 13970
  • or check how to install cuda + pytorch in windows 11
    conda install cuda --channel "nvidia/label/cuda-12.1.0"
  • or go to, and select the right version to install
    pip3 install torch torchvision torchaudio --index-url
    🟢 conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
pip install torchmeta
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidi

🟢 Follow these steps to install mujoco-py on Windows

🟢 Powershell $env:PATH += ";C:\Users\*\.mujoco\mjpro150\bin"
 Powershell $env:path -split ";" to display path variables

🟢 download mujoco-py- from

pip install "cython<3"  
pip install mujoco-py-  
python D:\github\udacity-deep-reinforcement-learning\python\mujoco-py\examples\  
  • you might need this pip install lockfile and some other packages. install them according to the error messages.
  • a worse case is that your python version is too high (maybe >=3.9?), you might need to install mujoco_py manually.
  • now you should be able to see this.

👉 install gym atari and lincense

pip install -U gym
pip install -U gym[atari,accept-rom-license]
pip install bleach==1.5.0  
pip install --upgrade numpy   
pip install --upgrade tensorboard

👉 install OpenAI Baselines

pip install --upgrade pip setuptools wheel   
pip install opencv-python==  
git clone
cd baselines
pip install -e .
  • for python 3.11, you can pip install opencv-python.
    and i Successfully installed opencv-python-

👉 intall the rest packages for the deeprl folder.
pip install -r .\deeprl_files\requirements.txt

  • requirements.txt
# torch
# torchvision
# torchmeta 
# gym==0.15.7
# tensorflow==1.15.0
# opencv-python==
# roboschool==1.0.34
  • for python 3.11, losen the version requirement scikit-image.
    I got scikit-image-0.22.0 installed.

👉 test the env setup

  • run notebooks
python -m ipykernel install --user --name=drlnd_p2
jupyter notebook D:\github\udacity-deep-reinforcement-learning\p2_continuous-control\Continuous_Control.ipynb  
jupyter notebook D:\github\udacity-deep-reinforcement-learning\p2_continuous-control\Crawler.ipynb  

🟢 python -m deeprl.component.envs

if __name__ == '__main__':
    import time
    ## num_envs=5 will only create 3 env and cause error
    ## "results = _flatten_list(results)"
    ## in "baselines\baselines\common\vec_env\"
    task = Task('Hopper-v2', num_envs=3, single_process=False)
    state = task.reset()

    ## This might be helpful for custom env debugging
    # env_dict = gym.envs.registration.registry.env_specs.copy()
    # for item in env_dict.items():
    #     print(item)

    start_time = time.time()
    while True:
        action = np.random.rand(task.action_space.shape[0])
        next_state, reward, done, _ = task.step(action)
        if time.time()-start_time > 10: ## run about 10s

🟢 run examples:

if __name__ == '__main__':
    # -1 is CPU, an non-negative integer is the index of GPU
    # select_device(-1)
    select_device(0) ## GPU
    game = 'Reacher-v2'
    # a2c_continuous(game=game)
    # ppo_continuous(game=game)

folder ./python/deeprl structure

🟢 copied python files from repo @ShangtongZhang/DeepRL to repo @Nov05/udacity-deep-reinforcement-learning under the './python' folder.


ddpg_continuous(game='Reacher-v2', run=0, env=env,


def ddpg_continuous(**kwargs):
	config.task_fn = lambda: Task(, env=env)


class Config:
	def __init__(self):
		self.task_fn = None


def run_steps(agent):
    config = agent.config


class DDPGAgent(BaseAgent):
	self.task = config.task_fn()
	def step(self):


def make_env(env_id, seed, rank, episode_life=True):
class Task:
    def __init__(self,
if __name__ == '__main__':
    task = Task('Hopper-v2', 5, single_process=False)
Nov05 commented Mar 5, 2024

🟢⚠️ issue solved: Unity env multiprocessing issue in Windows. check the result video.

(drlnd_py311) PS D:\github\udacity-deep-reinforcement-learning> cd python
(drlnd_py311) PS D:\github\udacity-deep-reinforcement-learning\python> python -m experiments.deeprl_ddpg_continous
WARNING:tensorflow:From D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\site-packages\keras\src\ The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead.

E0305 03:11:07.896000000  2688 src/core/ext/transport/chttp2/server/] UNKNOWN:No address added out of total 1 resolved for '[::]:5005' {created_time:"2024-03-05T09:11:07.869297+00:00", children:[UNKNOWN:Failed to prepare server socket {fd:4188, target_address:"ipv6:%5B::%5D:5005", created_time:"2024-03-05T09:11:07.8692565+00:00", children:[UNAVAILABLE:WSA Error {created_time:"2024-03-05T09:11:07.8692022+00:00", wsa_error:10048, grpc_status:14, os_error:"Only one usage of each socket address (protocol/network address/port) is normally permitted.\r\n", syscall:"bind"}]}]}
Traceback (most recent call last):
  File "D:\github\udacity-deep-reinforcement-learning\python\unityagents\", line 51, in initialize
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\site-packages\grpc\", line 1329, in add_insecure_port
    return _common.validate_port_binding_result(
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\site-packages\grpc\", line 181, in validate_port_binding_result
    raise RuntimeError(_ERROR_MESSAGE_PORT_BINDING_FAILED % address)
RuntimeError: Failed to bind to address [::]:5005; set GRPC_VERBOSITY=debug environment variable to see detailed error message.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "D:\github\udacity-deep-reinforcement-learning\python\experiments\", line 53, in <module>
    ddpg_continuous(game='unity-Reacher-v2', run=0,
  File "D:\github\udacity-deep-reinforcement-learning\python\experiments\", line 35, in ddpg_continuous
  File "D:\github\udacity-deep-reinforcement-learning\python\deeprl\agent\", line 17, in __init__
    self.task = config.task_fn()
  File "D:\github\udacity-deep-reinforcement-learning\python\experiments\", line 16, in <lambda>
    config.task_fn = lambda: Task(, env_fn_kwargs=config.env_fn_kwargs, single_process=True)
  File "D:\github\udacity-deep-reinforcement-learning\python\deeprl\component\", line 277, in __init__
    env_fn, self.env_type = get_env_fn(game, env_fn_kwargs=self.env_fn_kwargs,
  File "D:\github\udacity-deep-reinforcement-learning\python\deeprl\component\", line 74, in get_env_fn
    env = env_fn_mappings[env_type](**kwargs)
  File "D:\github\udacity-deep-reinforcement-learning\python\unityagents\", line 63, in __init__
    aca_params = self.send_academy_parameters(rl_init_parameters_in)
  File "D:\github\udacity-deep-reinforcement-learning\python\unityagents\", line 506, in send_academy_parameters
    return self.communicator.initialize(inputs).rl_initialization_output
  File "D:\github\udacity-deep-reinforcement-learning\python\unityagents\", line 54, in initialize
    raise UnityTimeOutException(
unityagents.exception.UnityTimeOutException: Couldn't start socket communication because worker number 0 is still in use. You may need to manually close a previously opened environment or use a different worker number.

✅ 创建 pipe 的步骤,拉到 class 外完成,把 conns 赋值给 . 就行——如此就不会每个从此 class 实体化而来 object 的 conns 都指向变成 class variables、同样的 conns(。原版代码那样创建 pipe,是因为不清楚原原版代码 .__init__() 具体如何实现,不能改动。

  • change code in D:\github\udacity-deep-reinforcement-learning\python\unityagents\
    test $python -m test2.test_unity_multiprocessing
self.unity_to_external = UnityToExternalServicerImplementation()
self.unity_to_external.parent_conn, self.unity_to_external.child_conn = Pipe()

Nov05 commented Mar 5, 2024

✅ unity env having two subprocesses.
👉 run python -m test2.test_unity_multiprocessing.

  • change the D:\github\udacity-deep-reinforcement-learning\python\unityagents\
class UnityEnvironment(object):
    def executable_launcher(self, file_name, docker_training, no_graphics):
                    print("⚠️ launch_string:", launch_string)
                    self.proc1 = subprocess.Popen(
                        [launch_string, '--port', str(self.port)])
                    self.proc2 = subprocess.Popen(
                        [launch_string, '--port', str(self.port+20)])
    def _close(self):
        self._loaded = False
        if self.proc1 is not None:
        import time
        if self.proc2 is not None:
  • lauch the env, 2 procs got created, only the newest one had graphics. need to test whether both are working.
    env1 = UnityEnvironment(file_name=f1, no_graphics=False)
  • output:
(drlnd_py311) PS D:\github\udacity-deep-reinforcement-learning\python> python -m experiments.unity_multiprocessing
WARNING:tensorflow:From D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\site-packages\keras\src\ The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead.

⚠️ launch_string: D:\github\udacity-deep-reinforcement-learning\python\..\data\Reacher_Windows_x86_64_20\Reacher.exe
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
                goal_size -> 5.0
                goal_speed -> 1.0
Unity brain name: ReacherBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 33
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , ,

Nov05 commented Mar 5, 2024

⚠️ issue open: tensorflow==2.15.0 causes the following error.

(drlnd_py311) PS D:\github\udacity-deep-reinforcement-learning\python> python -m tests.test_bc
Traceback (most recent call last):
  File "<frozen runpy>", line 189, in _run_module_as_main
  File "<frozen runpy>", line 112, in _get_module_details
  File "D:\github\udacity-deep-reinforcement-learning\python\tests\", line 9, in <module>
    from unitytrainers import *
  File "D:\github\udacity-deep-reinforcement-learning\python\unitytrainers\", line 2, in <module>
    from .models import *
  File "D:\github\udacity-deep-reinforcement-learning\python\unitytrainers\", line 5, in <module>
    import tensorflow.contrib.layers as c_layers
ModuleNotFoundError: No module named 'tensorflow.contrib'

this solution doesn't work for me.

import tensorflow.compat.v1 as tf

this solution works for me. i am using python==3.11.7 and tensorflow==2.15.0. however, not sure this is the right solution.

# import tensorflow.contrib.layers as c_layers
import keras.layers as c_layers

Nov05 commented Mar 7, 2024

⁉️ question: is it possible to control individual agent asynchronously in an Unity env (.\unityagents\

Nov05 commented Mar 11, 2024

🟢⚠️ issue solved: Unity + deeprl (multiprocessing, unsuccessful)
wrap the class with a func, so that cloudpickle won't throw "TypeError: can't pickle _thread._local objects".


## wrap the class with a func, or Multiprocessing will throw
## "TypeError: cannot pickle '_thread.lock' object"
def make_unity(**kwargs):
    return lambda: UnityEnvironment(**kwargs)

env_types = {'dm', 'atari', 'gym', 'unity'}
env_fn_mappings = {'dm': dm_control2gym.make,
                   'atari': make_atari,
                   'gym': gym.make,
                   'unity': make_unity}
# adapted from
## refactored, func for unity added, by nov05
def get_env_fn(game, ## could be called "id", "env_id" in other functions
               env_fn_kwargs = None,

    ## get env type
    env_type, kwargs = None, dict()
    if game.startswith("unity"):
        env_type = 'unity'
    elif game.startswith("dm"):
        env_type = 'dm'
        _, domain, task = game.split('-')
        kwargs.update({'domain_name': domain, 'task_name': task})
    elif hasattr(gym.envs, 'atari') and \
        isinstance(env.unwrapped, gym.envs.atari.atari_env.AtariEnv):
        env_type = 'atari'
        env_type = 'gym'

    ## create env    
    env = env_fn_mappings[env_type](**kwargs)

    if env_type!='unity':
        env.seed(seed + rank)
        env = OriginalReturnWrapper(env)
        if env_type=='atari':
            env = wrap_deepmind(env,
            obs_shape = env.observation_space.shape
            if len(obs_shape)==3:
                env = TransposeImage(env)
                env = FrameStack(env, 4)
    return lambda:env, env_type ## return the env as a func

  • TypeError: cannot pickle '_thread.lock' object
🟢 RpcCommunicator at port 5005 is initializing...
🟢 RpcCommunicator at port 5006 is initializing...
TypeError                                 Traceback (most recent call last)
<timed exec> in <module>

[/content/python/deeprl/component/](https://localhost:8080/#) in __init__(self, game, num_envs, env_fn_kwargs, envs, single_process, log_dir, episode_life, seed)
    299         else:
    300             wrapper_kwargs = {'env_fns': self.env_fns}
--> 301         self.envs_wrapper = Wrapper(**wrapper_kwargs)
    303         self.observation_space = self.envs_wrapper.observation_space

9 frames
[/content/python/baselines/baselines/common/vec_env/](https://localhost:8080/#) in __init__(self, env_fns, spaces, context)
     55             p.daemon = True  # if the main process crashes, we should not cause things to hang
     56             with clear_mpi_env_vars():
---> 57                 p.start()
     58         for remote in self.work_remotes:
     59             remote.close()

[/usr/lib/python3.10/multiprocessing/](https://localhost:8080/#) in start(self)
    119                'daemonic processes are not allowed to have children'
    120         _cleanup()
--> 121         self._popen = self._Popen(self)
    122         self._sentinel = self._popen.sentinel
    123         # Avoid a refcycle if the target function holds an indirect

[/usr/lib/python3.10/multiprocessing/](https://localhost:8080/#) in _Popen(process_obj)
    286         def _Popen(process_obj):
    287             from .popen_spawn_posix import Popen
--> 288             return Popen(process_obj)
    290         @staticmethod

[/usr/lib/python3.10/multiprocessing/](https://localhost:8080/#) in __init__(self, process_obj)
     30     def __init__(self, process_obj):
     31         self._fds = []
---> 32         super().__init__(process_obj)
     34     def duplicate_for_child(self, fd):

[/usr/lib/python3.10/multiprocessing/](https://localhost:8080/#) in __init__(self, process_obj)
     17         self.returncode = None
     18         self.finalizer = None
---> 19         self._launch(process_obj)
     21     def duplicate_for_child(self, fd):

[/usr/lib/python3.10/multiprocessing/](https://localhost:8080/#) in _launch(self, process_obj)
     45         try:
     46             reduction.dump(prep_data, fp)
---> 47             reduction.dump(process_obj, fp)
     48         finally:
     49             set_spawning_popen(None)

[/usr/lib/python3.10/multiprocessing/](https://localhost:8080/#) in dump(obj, file, protocol)
     58 def dump(obj, file, protocol=None):
     59     '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60     ForkingPickler(file, protocol).dump(obj)
     62 #

[/content/python/baselines/baselines/common/vec_env/](https://localhost:8080/#) in __getstate__(self)
    193     def __getstate__(self):
    194         import cloudpickle
--> 195         return cloudpickle.dumps(self.x)
    197     def __setstate__(self, ob):

[/usr/local/lib/python3.10/dist-packages/cloudpickle/](https://localhost:8080/#) in dumps(obj, protocol)
     60     with io.BytesIO() as file:
     61         cp = CloudPickler(file, protocol=protocol)
---> 62         cp.dump(obj)
     63         return file.getvalue()

[/usr/local/lib/python3.10/dist-packages/cloudpickle/](https://localhost:8080/#) in dump(self, obj)
    536     def dump(self, obj):
    537         try:
--> 538             return Pickler.dump(self, obj)
    539         except RuntimeError as e:
    540             if "recursion" in e.args[0]:

TypeError: cannot pickle '_thread.lock' object

Nov05 commented Mar 11, 2024

🟢⚠️ issue solved: Gym game + deeprl example (multiprocessing), runs successfully in Colab (Linux), causes dump in Windows.
✅ solution: downgrade Python 3.11 to Python 3.10.

  • multiprocessing and python 3.11 conflict? TypeError: code() argument 13 must be str, not int

  • The instantiating of of class Task causes error.

from deeprl import *
task = Task('Hopper-v2', num_envs=2, single_process=False) 
(drlnd_py311) PS D:\github\udacity-deep-reinforcement-learning\python> python -m tests2.test_deeprl_envs 
🟢 Process SpawnProcess-1 has started.
🟢 Process SpawnProcess-2 has started.
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\multiprocessing\", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\multiprocessing\", line 132, in _main
    self = reduction.pickle.load(from_parent)
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\common\vec_env\", line 200, in __setstate__
    self.x = pickle.loads(ob)
TypeError: code() argument 13 must be str, not int
Traceback (most recent call last):
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\multiprocessing\", line 328, in _recv_bytes
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\multiprocessing\", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
    nread, err = ov.GetOverlappedResult(True)
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\multiprocessing\", line 132, in _main
BrokenPipeError: [WinError 109] The pipe has been ended

During handling of the above exception, another exception occurred:
    self = reduction.pickle.load(from_parent)

Traceback (most recent call last):
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\common\vec_env\", line 200, in __setstate__
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "D:\github\udacity-deep-reinforcement-learning\python\tests2\", line 120, in <module>
    self.x = pickle.loads(ob)
    test1() ## gym fn, deeprl
TypeError: code() argument 13 must be str, not int
  File "D:\github\udacity-deep-reinforcement-learning\python\tests2\", line 21, in test1
    task = Task('Hopper-v2', num_envs=num_envs, single_process=single_process)
  File "D:\github\udacity-deep-reinforcement-learning\python\deeprl\component\", line 301, in __init__
    self.envs_wrapper = Wrapper(**wrapper_kwargs)
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\common\vec_env\", line 63, in __init__
    observation_space, action_space, self.spec = self.remotes[0].recv()
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\multiprocessing\", line 250, in recv
    buf = self._recv_bytes()
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\multiprocessing\", line 337, in _recv_bytes
    raise EOFError
Exception ignored in: <function SubprocVecEnv.__del__ at 0x000001E71FE06660>
Traceback (most recent call last):
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\common\vec_env\", line 108, in __del__
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\common\vec_env\", line 98, in close
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\common\vec_env\", line 92, in close_extras
    remote.send(('close', None))
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\multiprocessing\", line 206, in send
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\multiprocessing\", line 289, in _send_bytes
    ov, err = _winapi.WriteFile(self._handle, buf, overlapped=True)
BrokenPipeError: [WinError 232] The pipe is being closed

Nov05 commented Mar 11, 2024

🟢⚠️ issue solved: conda env drlnd_py310, tensorflow==2.16.1 would cause the following errors. ✅ downgrade to tensorflow==2.15.0 solved the issue. colab is currently using tensorflow==2.15.0 as well.

  • run a Baselines example
    python -m --alg=ppo2 --env=PongNoFrameskip-v4 --save_path=~/models/PongNoFrameskip-v4_1M_ppo2 --log_path=~/log
(drlnd_py310) PS D:\github\udacity-deep-reinforcement-learning> python -m --alg=ppo2 --env=PongNoFrameskip-v4 --save_path=~/models/PongNoFrameskip-v4_1M_ppo2 --log_path=~/log
Logging to C:\Users\guido/log
env_type: atari
⚠️ <function make_vec_env.<locals>.make_thunk.<locals>.<lambda> at 0x000002C5A3ADB9A0>
🟢 Process SpawnProcess-1 has started.
🟢 Process SpawnProcess-2 has started.
🟢 Process SpawnProcess-3 has started.
🟢 Process SpawnProcess-4 has started.
🟢 Process SpawnProcess-5 has started.
🟢 Process SpawnProcess-6 has started.
🟢 Process SpawnProcess-7 has started.
🟢 Process SpawnProcess-8 has started.
🟢 Process SpawnProcess-9 has started.
🟢 Process SpawnProcess-10 has started.
🟢 Process SpawnProcess-11 has started.
🟢 Process SpawnProcess-12 has started.
Training ppo2 on atari:PongNoFrameskip-v4 with arguments 
{'nsteps': 128, 'nminibatches': 4, 'lam': 0.95, 'gamma': 0.99, 'noptepochs': 4, 'log_interval': 1, 'ent_coef': 0.01, 'lr': <function atari.<locals>.<lambda> at 0x000002C5A3ADA440>, 'cliprange': 0.1, 'network': 'cnn'}
input shape is (84, 84, 4)
Traceback (most recent call last):
  File "D:\Users\guido\miniconda3\envs\drlnd_py310\lib\", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\Users\guido\miniconda3\envs\drlnd_py310\lib\", line 86, in _run_code
    exec(code, run_globals)
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\", line 250, in <module>
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\", line 211, in main
    model, env = train(args, extra_args)
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\", line 77, in train
    model = learn(
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\ppo2\", line 97, in learn
    network = policy_network_fn(ob_space.shape)
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\common\", line 68, in network_fn
    return nature_cnn(input_shape, **conv_kwargs)
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\common\", line 21, in nature_cnn
    h = tf.cast(h, tf.float32) / 255.
  File "D:\Users\guido\miniconda3\envs\drlnd_py310\lib\site-packages\tensorflow\python\util\", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "D:\Users\guido\miniconda3\envs\drlnd_py310\lib\site-packages\keras\src\backend\common\", line 92, in __tf_tensor__
    raise ValueError(
ValueError: A KerasTensor cannot be used as input to a TensorFlow function. A KerasTensor is a symbolic placeholder for a shape and dtype, used when constructing Keras Functional models or Keras Functions. You can only use it as input to a Keras layer or a Keras operation (from the namespaces `keras.layers` and `keras.operations`). You are likely doing something like:

x = Input(...)
tf_fn(x) # Invalid.

What you should do instead is wrap `tf_fn` in a layer:

class MyLayer(Layer):
def call(self, x):
return tf_fn(x)

x = MyLayer()(x)

Exception ignored in: <function SubprocVecEnv.__del__ at 0x000002C5A39B9870>
Traceback (most recent call last):
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\common\vec_env\", line 109, in __del__
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\common\vec_env\", line 98, in close
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\common\vec_env\", line 93, in close_extras
    remote.send(('close', None))
  File "D:\Users\guido\miniconda3\envs\drlnd_py310\lib\multiprocessing\", line 206, in send
  File "D:\Users\guido\miniconda3\envs\drlnd_py310\lib\multiprocessing\", line 280, in _send_bytes
    ov, err = _winapi.WriteFile(self._handle, buf, overlapped=True)
BrokenPipeError: [WinError 232] The pipe is being closed

Nov05 commented Mar 14, 2024

🟢⁉️ question closed: 'vector_action_descriptions': ['', '', '', ''] in class BrainParameters cannot be pickled during Multiprocess piping. however, the following lines ran just fine. i don't understand why.
✅ alright, this father-f*cker, the value of BrainParameters.vector_action_descriptions, isn't a list of strings. Rather, it is <class 'google.protobuf.pyext._message.RepeatedScalarContainer'> and seems to be not serializable.


brain_info = {'vector_action_descriptions':['','','',''], 'something':9}
  • you can find class BrainParameters definition here.

Nov05 commented Mar 15, 2024

🟢⚠️ issue solved: random seed problem. in .\python\tests2\, seeds only affect the balls. if seeds are different, each ball movement will be different. if seeds are the same, ball movements in different environment instance will be the same. however, what we would need here is the randomness of the Unity environment, e.g. for Reacher-v2. it is strange that in another python file .\python\tests2\, each environment is different no matter whether the seeds are different.

✅ first of all, the env controls the ball movements, and they are fine, always fine - balls move randomly, which means the random seeds always work. the actions controls the sticks, and if you wrote ❌ [randn()] * num_envs which would generate a list of the same number, and of course the sticks would move the same in different envs. instead, you need to use [rand() for _ in range(num_envs)] to get a list of different numbers. this was a stupid mistake.

    for _ in range(max_steps):
        actions = [np.random.randn(task.envs_wrapper.num_agents, task.action_space.shape[0]) for _ in range(task.num_envs)]
    env_fn_kwargs = {'file_name': env_file_name, 'no_graphics': no_graphics}
    task = Task('unity-Reacher-v2', num_envs=num_envs, seeds=[1,1],
                env_fn_kwargs=env_fn_kwargs, single_process=single_process)
    for _ in range(max_steps):
        actions = [np.random.randn(task.envs_wrapper.num_agents, task.action_space.shape[0])] * task.num_envs
  • terminal outputs
(drlnd_py310) PS D:\github\udacity-deep-reinforcement-learning\python> python -m tests2.test_deeprl_envs
👉 Random seed: 335424301
🟢 RpcCommunicator at port 5005 is initializing...
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
                goal_size -> 5.0
                goal_speed -> 1.0
Unity brain name: ReacherBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 33
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , ,
👉 Random seed: 916458839
🟢 RpcCommunicator at port 5006 is initializing...
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
                goal_size -> 5.0
                goal_speed -> 1.0
Unity brain name: ReacherBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 33
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , ,
🟢 Task has started...
  • has it anything to do with Multiprocessing? No. Single processing gives the same result.
import multiprocessing as mp
class UnitySubprocVecEnv(VecEnv):
        ctx = mp.get_context(context)
        self.remotes, self.work_remotes = zip(*[ctx.Pipe() for _ in range(self.num_envs)]) = [ctx.Process(target=unity_worker, args=(work_remote, remote, CloudpickleWrapper(env_fn))) 
                for (work_remote, remote, env_fn) in zip(self.work_remotes, self.remotes, env_fns)]
  • seed doesn't work in the Unity environment Python code.

$ python -m tests2.test_deeprl_envs, single_process = True


    def _generate_reset_input(self, training, config) -> UnityRLInput: # type: ignore
        for key in config:
            rl_in.environment_parameters.float_parameters[key] = config[key]
        # rl_in.environment_parameters.float_parameters['seed'] = np.random.randint(-2147483648, 2147483647) ## added by nov05
        # print('👉 rl_in.environment_parameters.float_parameters[\'seed\']:', rl_in.environment_parameters.float_parameters['seed'])
    def send_academy_parameters(self, init_parameters: UnityRLInitializationInput) -> UnityRLInitializationOutput: # type: ignore
        inputs = UnityInput()
        ## seed will be stored in "inputs.rl_initialization_input.seed"
        print('👉 inputs.rl_initialization_input.seed:', inputs.rl_initialization_input.seed)
        return self.communicator.initialize(inputs).rl_initialization_output

Nov05 commented Mar 23, 2024

  • one solution for reference: an env with 1 agent, score reached 30+ after 280 episodes. check the code.

  • one visual result for reference: an env with 20 agents, trained

  • Shangtong Zhang's deeprl

  •'s DDPG score playing mujoco reacher: -4.01

  • my code (integrated with deeprl): 1. (mujoco) reacher-v2_train, 2. (mujoco) reacher-v2_eval, 3. unity-reacher_train, 4. unity-reacher-v2_eval
    🟢⚠️ issue solved: The models don't seem to learn for the Unity Reacher game. They perform well in Mujoco Reacher (reaching a score of -5), but their learning halts after 40 episodes when playing Unity Reacher (reaching only a score of 6 instead of the expected score of 30+). Possible causes include bugs in the logic to get episodic_return_train for multiple environments or issues with the hyperparameter configurations.

  • solution: the models were not learning possibly due to the following causes:

    • q_critic and q_target had a shape of (mini_batch_size, 1), the output of MSE as loss value is an empty tensor
    • optimizer learning rate was 1e-3, probably too large
    • optimizer params included phi_body, a dummy module
    • zero_grad on the network, rather than the optimizer (theoretically it shouldn't be a problem)
    • when using the local network to generate actions, it didn't turn on the eval mode
    • ...

❌ the old code:

actor_opt_fn=lambda params: torch.optim.Adam(params, lr=1e-3)
self.actor_opt = actor_opt_fn(list(self.actor_body.parameters()) + list(self.phi_body.parameters()))
critic_loss = (q_critic - q_target).pow(2).mul(0.5).sum(-1).mean()  ## returns torch([]), empty tensor

🟢 my code:

actor_opt_fn=lambda params: torch.optim.Adam(params, lr=1e-4)
self.actor_opt = actor_opt_fn(list(self.actor_body.parameters()))  ## added by nov05
critic_loss = torch.mean((q_critic-q_target).pow(2).mul(0.5).sum(-1), 0)  ## RMSE

Nov05 commented Apr 7, 2024

🟢⚠️ issue solved: training has been slow. added torch.nn.BatchNorm1d, however, got the following error. my task has multiple unity envs, each env has multiple agents, torch.Size([1, 1, 33]) means there is 1 env 1 agent.

2024-04-07 03:32:29,914 - root - INFO: Episode 0, Step 0, 0.00 s/episode
🟢 Unity environment has been resetted.
👉 torch.Size([1, 1, 33]) BatchNorm1d(33, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  • refer to this set of model training hypermeters
  • solution: the original code is for mujoco, 1 env with 1 agent, hence the shape of tensors, such as actions and states, are 2 dimensional. for unity, 1 env with multiple agents, hence the shape of tensors need to reduce 1 dimension for the neural networks.

Nov05 commented Apr 9, 2024

🟢⚠️ issue solved: neural network nn.BatchNorm1d layer threw error ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, state_size]) when it was actually evaluating. during training, tensor sizes are usually like [mini_batch_size, state_size], no error will be given. it turned out that i forgot to turn on eval mode of the network. it makes sense that you can't normalize a single channel of values. and this layer probably is skipped during evaluation.

    ## neural network
    config.network_fn = lambda: DeterministicActorCriticNet(
        actor_body=FCBody(config.state_dim, (128,128), gate=nn.LeakyReLU, 
        critic_body=FCBody(config.state_dim+config.action_dim, (128,128), gate=nn.LeakyReLU, 
                           init_method='uniform_fan_in', batch_norm=nn.BatchNorm1d),
        actor_opt_fn=lambda params: torch.optim.Adam(params, lr=1e-4),
        ## for the critic optimizer, it seems that 1e-3 won't converge
        critic_opt_fn=lambda params: torch.optim.Adam(params, lr=1e-4, weight_decay=1e-5),  
        # batch_norm=nn.BatchNorm1d,
  (phi_body): DummyBody()
  (actor_body): FCBody(
    (layers): ModuleList(
      (0): Linear(in_features=33, out_features=128, bias=True)
      (1): LeakyReLU(negative_slope=0.01)
      (2): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): Linear(in_features=128, out_features=128, bias=True)
      (4): LeakyReLU(negative_slope=0.01)
      (5): Linear(in_features=128, out_features=4, bias=True)
      (6): Tanh()
  (critic_body): FCBody(
    (layers): ModuleList(
      (0): Linear(in_features=37, out_features=128, bias=True)
      (1): LeakyReLU(negative_slope=0.01)
      (2): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): Linear(in_features=128, out_features=128, bias=True)
      (4): LeakyReLU(negative_slope=0.01)
      (5): Linear(in_features=128, out_features=1, bias=True)

Nov05 commented Apr 15, 2024

🟢⚠️ issue solved: alphazero folder jupyter notebook: %matplotlib notebook threw Javascript Error: IPython is not defined.

$ jupyter notebook ..\alphazero\alphazero-TicTacToe-advanced.ipynb
jupyter lab --version
pip install --upgrade jupyterlab
ipython --version
pip install --upgrade ipython

my env drlnd_py310 upgraded jupyterlab from 4.1.4 to jupyterlab-4.1.6, ipython from 8.22.2 to ipython-8.23.0.

