Skip to content

Instantly share code, notes, and snippets.

@carefish
Created June 4, 2021 09:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save carefish/3e5193f94103c14707e925c3bf65977f to your computer and use it in GitHub Desktop.
Save carefish/3e5193f94103c14707e925c3bf65977f to your computer and use it in GitHub Desktop.
// Other version info:
Unity 2020.3.10f1
ml agents 2.0.0-exp.1
//===
Version information:
ml-agents: 0.26.0,
ml-agents-envs: 0.26.0,
Communicator API: 1.5.0,
PyTorch: 1.7.1+cu110
[INFO] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.
[INFO] Connected to Unity environment with package version 2.0.0-exp.1 and communication version 1.5.0
[INFO] Connected new brain: Behaviour_Arena0?team=0
2021-06-04 11:03:16.890577: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
[INFO] Hyperparameters for behavior name Behaviour_Arena0:
trainer_type: ppo
hyperparameters:
batch_size: 128
buffer_size: 2048
learning_rate: 0.0003
learning_rate_schedule: linear
network_settings:
normalize: False
hidden_units: 256
num_layers: 2
vis_encode_type: simple
memory: None
goal_conditioning_type: hyper
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
network_settings:
normalize: False
hidden_units: 128
num_layers: 2
vis_encode_type: simple
memory: None
goal_conditioning_type: hyper
gail:
gamma: 0.99
strength: 0.25
network_settings:
normalize: False
hidden_units: 128
num_layers: 2
vis_encode_type: simple
memory: None
goal_conditioning_type: hyper
init_path: None
keep_checkpoints: 5
checkpoint_interval: 500000
max_steps: 2500000
time_horizon: 64
summary_freq: 60000
threaded: False
self_play: None
behavioral_cloning:
demo_path: ProjectFolder/Assets/Demos
steps: 2500000
strength: 0.25
samples_per_update: 0
num_epoch: None
batch_size: None
[WARNING] Trainer has no policies, not saving anything.
Traceback (most recent call last):
File "C:\Python\Python38\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Python\Python38\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\proj\python-envs\sample-env\Scripts\mlagents-learn.exe\__main__.py", line 7, in <module>
File "c:\proj\python-envs\sample-env\lib\site-packages\mlagents\trainers\learn.py", line 250, in main
run_cli(parse_command_line())
File "c:\proj\python-envs\sample-env\lib\site-packages\mlagents\trainers\learn.py", line 246, in run_cli
run_training(run_seed, options)
File "c:\proj\python-envs\sample-env\lib\site-packages\mlagents\trainers\learn.py", line 125, in run_training
tc.start_learning(env_manager)
File "c:\proj\python-envs\sample-env\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "c:\proj\python-envs\sample-env\lib\site-packages\mlagents\trainers\trainer_controller.py", line 173, in start_learning
self._reset_env(env_manager)
File "c:\proj\python-envs\sample-env\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "c:\proj\python-envs\sample-env\lib\site-packages\mlagents\trainers\trainer_controller.py", line 107, in _reset_env
self._register_new_behaviors(env_manager, env_manager.first_step_infos)
File "c:\proj\python-envs\sample-env\lib\site-packages\mlagents\trainers\trainer_controller.py", line 268, in _register_new_behaviors
self._create_trainers_and_managers(env_manager, new_behavior_ids)
File "c:\proj\python-envs\sample-env\lib\site-packages\mlagents\trainers\trainer_controller.py", line 166, in _create_trainers_and_managers
self._create_trainer_and_manager(env_manager, behavior_id)
File "c:\proj\python-envs\sample-env\lib\site-packages\mlagents\trainers\trainer_controller.py", line 137, in _create_trainer_and_manager
policy = trainer.create_policy(
File "c:\proj\python-envs\sample-env\lib\site-packages\mlagents\trainers\trainer\rl_trainer.py", line 119, in create_policy
return self.create_torch_policy(parsed_behavior_id, behavior_spec)
File "c:\proj\python-envs\sample-env\lib\site-packages\mlagents\trainers\ppo\trainer.py", line 226, in create_torch_policy
policy = TorchPolicy(
File "c:\proj\python-envs\sample-env\lib\site-packages\mlagents\trainers\policy\torch_policy.py", line 65, in __init__
self.actor = SimpleActor(
File "c:\proj\python-envs\sample-env\lib\site-packages\mlagents\trainers\torch\networks.py", line 592, in __init__
self.network_body = NetworkBody(observation_specs, network_settings)
File "c:\proj\python-envs\sample-env\lib\site-packages\mlagents\trainers\torch\networks.py", line 212, in __init__
self._body_endoder = LinearEncoder(
File "c:\proj\python-envs\sample-env\lib\site-packages\mlagents\trainers\torch\layers.py", line 148, in __init__
linear_layer(
File "c:\proj\python-envs\sample-env\lib\site-packages\mlagents\trainers\torch\layers.py", line 49, in linear_layer
layer = torch.nn.Linear(input_size, output_size)
File "c:\proj\python-envs\sample-env\lib\site-packages\torch\nn\modules\linear.py", line 83, in __init__
self.reset_parameters()
File "c:\proj\python-envs\sample-env\lib\site-packages\torch\nn\modules\linear.py", line 86, in reset_parameters
init.kaiming_uniform_(self.weight, a=math.sqrt(5))
File "c:\proj\python-envs\sample-env\lib\site-packages\torch\nn\init.py", line 381, in kaiming_uniform_
std = gain / math.sqrt(fan)
ZeroDivisionError: float division by zero
behaviors:
Behaviour_Arena0:
trainer_type: ppo
hyperparameters:
batch_size: 128
buffer_size: 2048
learning_rate: 0.0003
beta: 0.01
epsilon: 0.2
lambd: 0.95
num_epoch: 3
learning_rate_schedule: linear
network_settings:
normalize: false
hidden_units: 256
num_layers: 2
vis_encode_type: simple
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
gail:
gamma: 0.99
strength: 0.25
network_settings:
normalize: false
hidden_units: 128
num_layers: 2
vis_encode_type: simple
learning_rate: 0.0003
use_actions: false
use_vail: false
demo_path: ProjectFolder/Assets/Demos
keep_checkpoints: 5
max_steps: 2500000
time_horizon: 64
summary_freq: 60000
behavioral_cloning:
demo_path: ProjectFolder/Assets/Demos
steps: 2500000
strength: 0.25
samples_per_update: 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment