Skip to content

Instantly share code, notes, and snippets.

@xwjiang2010
Created April 19, 2022 22:14
Show Gist options
  • Save xwjiang2010/e11a9301eb92c0555bf8f0c787aaa12b to your computer and use it in GitHub Desktop.
Save xwjiang2010/e11a9301eb92c0555bf8f0c787aaa12b to your computer and use it in GitHub Desktop.
== Status ==
Current time: 2022-04-19 14:41:36 (running for 00:02:26.83)
Memory usage on this node: 7.4/62.0 GiB
Using HyperBand: num_stopped=0 total_brackets=1
Round #0:
Bracket(Max Size (n)=27, Milestone (r)=1, completed=0.0%): {RUNNING: 4}
Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects
Result logdir: /home/ray/ray_results/bohb_test
Number of trials: 4/4 (4 RUNNING)
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+
| Trial name | status | loc | batch_size | lr | w_decay | warmup |
|----------------------+----------+------------------+--------------+-------------+-----------+-----------|
| hyper_optim_25af1e94 | RUNNING | 172.31.12.94:645 | 32 | 0.0012147 | 0.0661842 | 0.0722013 |
| hyper_optim_263ec5a8 | RUNNING | 172.31.17.28:645 | 8 | 0.0138314 | 0.0611623 | 0.0618856 |
| hyper_optim_26cb78b8 | RUNNING | 172.31.12.94:644 | 8 | 4.18488e-05 | 0.166196 | 0.0200642 |
| hyper_optim_26dbfec2 | RUNNING | 172.31.17.28:644 | 32 | 0.000968953 | 0.274445 | 0.0061904 |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+
== Status ==
Current time: 2022-04-19 14:41:37 (running for 00:02:27.84)
Memory usage on this node: 9.1/62.0 GiB
Using HyperBand: num_stopped=0 total_brackets=1
Round #0:
Bracket(Max Size (n)=27, Milestone (r)=1, completed=0.0%): {RUNNING: 4}
Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects
Result logdir: /home/ray/ray_results/bohb_test
Number of trials: 4/4 (4 RUNNING)
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+
| Trial name | status | loc | batch_size | lr | w_decay | warmup |
|----------------------+----------+------------------+--------------+-------------+-----------+-----------|
| hyper_optim_25af1e94 | RUNNING | 172.31.12.94:645 | 32 | 0.0012147 | 0.0661842 | 0.0722013 |
| hyper_optim_263ec5a8 | RUNNING | 172.31.17.28:645 | 8 | 0.0138314 | 0.0611623 | 0.0618856 |
| hyper_optim_26cb78b8 | RUNNING | 172.31.12.94:644 | 8 | 4.18488e-05 | 0.166196 | 0.0200642 |
| hyper_optim_26dbfec2 | RUNNING | 172.31.17.28:644 | 32 | 0.000968953 | 0.274445 | 0.0061904 |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+
== Status ==
Current time: 2022-04-19 14:41:42 (running for 00:02:32.84)
Memory usage on this node: 8.8/62.0 GiB
Using HyperBand: num_stopped=0 total_brackets=1
Round #0:
Bracket(Max Size (n)=27, Milestone (r)=1, completed=0.0%): {RUNNING: 4}
Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects
Result logdir: /home/ray/ray_results/bohb_test
Number of trials: 4/4 (4 RUNNING)
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+
| Trial name | status | loc | batch_size | lr | w_decay | warmup |
|----------------------+----------+------------------+--------------+-------------+-----------+-----------|
| hyper_optim_25af1e94 | RUNNING | 172.31.12.94:645 | 32 | 0.0012147 | 0.0661842 | 0.0722013 |
| hyper_optim_263ec5a8 | RUNNING | 172.31.17.28:645 | 8 | 0.0138314 | 0.0611623 | 0.0618856 |
| hyper_optim_26cb78b8 | RUNNING | 172.31.12.94:644 | 8 | 4.18488e-05 | 0.166196 | 0.0200642 |
| hyper_optim_26dbfec2 | RUNNING | 172.31.17.28:644 | 32 | 0.000968953 | 0.274445 | 0.0061904 |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+
== Status ==
Current time: 2022-04-19 14:41:47 (running for 00:02:37.85)
Memory usage on this node: 8.8/62.0 GiB
Using HyperBand: num_stopped=0 total_brackets=1
Round #0:
Bracket(Max Size (n)=27, Milestone (r)=1, completed=0.0%): {RUNNING: 4}
Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects
Result logdir: /home/ray/ray_results/bohb_test
Number of trials: 4/4 (4 RUNNING)
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+
| Trial name | status | loc | batch_size | lr | w_decay | warmup |
|----------------------+----------+------------------+--------------+-------------+-----------+-----------|
| hyper_optim_25af1e94 | RUNNING | 172.31.12.94:645 | 32 | 0.0012147 | 0.0661842 | 0.0722013 |
| hyper_optim_263ec5a8 | RUNNING | 172.31.17.28:645 | 8 | 0.0138314 | 0.0611623 | 0.0618856 |
| hyper_optim_26cb78b8 | RUNNING | 172.31.12.94:644 | 8 | 4.18488e-05 | 0.166196 | 0.0200642 |
| hyper_optim_26dbfec2 | RUNNING | 172.31.17.28:644 | 32 | 0.000968953 | 0.274445 | 0.0061904 |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+
Result for hyper_optim_26cb78b8:
best_loss: 0.023672818150612285
date: 2022-04-19_14-41-50
done: false
early_stopping_count: 0
epoch: 1
experiment_id: 8b8120dd501b4824bc49b82da0fda0bf
hostname: ip-172-31-12-94
iterations_since_restore: 1
node_ip: 172.31.12.94
pid: 644
should_checkpoint: true
time_since_restore: 14.10544204711914
time_this_iter_s: 14.10544204711914
time_total_s: 14.10544204711914
timestamp: 1650404510
timesteps_since_restore: 0
training_iteration: 1
trial_id: 26cb78b8
val_loss: 0.023672818150612285
2022-04-19 14:41:51,029 INFO commands.py:293 -- Checking External environment settings
2022-04-19 14:41:52,957 WARN util.py:133 -- The `head_node` field is deprecated and will be ignored. Use `head_node_type` and `available_node_types` instead.
2022-04-19 14:41:52,957 WARN util.py:138 -- The `worker_nodes` field is deprecated and will be ignored. Use `available_node_types` instead.
Authenticating
Loaded Anyscale authentication token from variable.
2022-04-19 14:41:54,758 INFO command_runner.py:357 -- Fetched IP: 172.31.12.94
2022-04-19 14:41:54,758 INFO log_timer.py:27 -- NodeUpdater: ins_JYiZiYxkMuELacpf249rU7Vw: Got IP [LogTimer=36ms]
2022-04-19 14:51:12,679 WARNING util.py:164 -- The `callbacks.on_trial_result` operation took 561.807 s, which may be a performance bottleneck.
2022-04-19 14:51:12,679 WARNING util.py:164 -- The `process_trial_result` operation took 561.808 s, which may be a performance bottleneck.
2022-04-19 14:51:12,679 WARNING util.py:164 -- Processing trial results took 561.808 s, which may be a performance bottleneck. Please consider reporting results less frequently to Ray Tune.
2022-04-19 14:51:12,679 WARNING util.py:164 -- The `process_trial` operation took 561.809 s, which may be a performance bottleneck.
== Status ==
Current time: 2022-04-19 14:51:12 (running for 00:12:02.75)
Memory usage on this node: 8.9/62.0 GiB
Using HyperBand: num_stopped=0 total_brackets=1
Round #0:
Bracket(Max Size (n)=27, Milestone (r)=1, completed=0.9%): {RUNNING: 4}
Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects
Current best trial: 26cb78b8 with val_loss=0.023672818150612285 and parameters={'model_name': 'Transformer', 'num_labels': 3, 'batch_size': 8, 'lr': 4.1848774014953113e-05, 'warmup': 0.020064220578974626, 'w_decay': 0.16619568921799385, 'n_epochs': 30, 'max_length': 512}
Result logdir: /home/ray/ray_results/bohb_test
Number of trials: 4/4 (4 RUNNING)
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------+
| Trial name | status | loc | batch_size | lr | w_decay | warmup | iter | total time (s) | val_loss | epoch | early_stopping_count |
|----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------|
| hyper_optim_25af1e94 | RUNNING | 172.31.12.94:645 | 32 | 0.0012147 | 0.0661842 | 0.0722013 | | | | | |
| hyper_optim_263ec5a8 | RUNNING | 172.31.17.28:645 | 8 | 0.0138314 | 0.0611623 | 0.0618856 | | | | | |
| hyper_optim_26cb78b8 | RUNNING | 172.31.12.94:644 | 8 | 4.18488e-05 | 0.166196 | 0.0200642 | 1 | 14.1054 | 0.0236728 | 1 | 0 |
| hyper_optim_26dbfec2 | RUNNING | 172.31.17.28:644 | 32 | 0.000968953 | 0.274445 | 0.0061904 | | | | | |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------+
2022-04-19 14:51:12,686 WARNING ray_trial_executor.py:659 -- Over the last 60 seconds, the Tune event loop has been backlogged processing new results. Consider increasing your period of result reporting to improve performance.
Result for hyper_optim_26dbfec2:
best_loss: 0.023672818150612285
date: 2022-04-19_14-41-50
done: false
early_stopping_count: 0
epoch: 1
experiment_id: 468aa3a8b08949c4ab2fadca98bc3fcd
hostname: ip-172-31-17-28
iterations_since_restore: 1
node_ip: 172.31.17.28
pid: 644
should_checkpoint: true
time_since_restore: 14.207346200942993
time_this_iter_s: 14.207346200942993
time_total_s: 14.207346200942993
timestamp: 1650404510
timesteps_since_restore: 0
training_iteration: 1
trial_id: 26dbfec2
val_loss: 0.023672818150612285
2022-04-19 14:51:12,727 WARN commands.py:269 -- Loaded cached provider configuration
2022-04-19 14:51:12,727 WARN commands.py:273 -- If you experience issues with the cloud provider, try re-running the command with --no-config-cache.
2022-04-19 14:51:13,829 INFO command_runner.py:357 -- Fetched IP: 172.31.17.28
2022-04-19 14:51:13,829 INFO log_timer.py:27 -- NodeUpdater: ins_8Ap3Ap8Vz7BequRrxaa1sS58: Got IP [LogTimer=53ms]
2022-04-19 15:00:30,001 WARNING util.py:164 -- The `callbacks.on_trial_result` operation took 557.314 s, which may be a performance bottleneck.
2022-04-19 15:00:30,001 WARNING util.py:164 -- The `process_trial_result` operation took 557.315 s, which may be a performance bottleneck.
2022-04-19 15:00:30,002 WARNING util.py:164 -- Processing trial results took 557.315 s, which may be a performance bottleneck. Please consider reporting results less frequently to Ray Tune.
2022-04-19 15:00:30,002 WARNING util.py:164 -- The `process_trial` operation took 557.315 s, which may be a performance bottleneck.
== Status ==
Current time: 2022-04-19 15:00:30 (running for 00:21:20.07)
Memory usage on this node: 8.9/62.0 GiB
Using HyperBand: num_stopped=0 total_brackets=1
Round #0:
Bracket(Max Size (n)=27, Milestone (r)=1, completed=1.9%): {RUNNING: 4}
Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects
Current best trial: 26cb78b8 with val_loss=0.023672818150612285 and parameters={'model_name': 'Transformer', 'num_labels': 3, 'batch_size': 8, 'lr': 4.1848774014953113e-05, 'warmup': 0.020064220578974626, 'w_decay': 0.16619568921799385, 'n_epochs': 30, 'max_length': 512}
Result logdir: /home/ray/ray_results/bohb_test
Number of trials: 4/4 (4 RUNNING)
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------+
| Trial name | status | loc | batch_size | lr | w_decay | warmup | iter | total time (s) | val_loss | epoch | early_stopping_count |
|----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------|
| hyper_optim_25af1e94 | RUNNING | 172.31.12.94:645 | 32 | 0.0012147 | 0.0661842 | 0.0722013 | | | | | |
| hyper_optim_263ec5a8 | RUNNING | 172.31.17.28:645 | 8 | 0.0138314 | 0.0611623 | 0.0618856 | | | | | |
| hyper_optim_26cb78b8 | RUNNING | 172.31.12.94:644 | 8 | 4.18488e-05 | 0.166196 | 0.0200642 | 1 | 14.1054 | 0.0236728 | 1 | 0 |
| hyper_optim_26dbfec2 | RUNNING | 172.31.17.28:644 | 32 | 0.000968953 | 0.274445 | 0.0061904 | 1 | 14.2073 | 0.0236728 | 1 | 0 |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------+
2022-04-19 15:00:30,010 WARNING ray_trial_executor.py:659 -- Over the last 60 seconds, the Tune event loop has been backlogged processing new results. Consider increasing your period of result reporting to improve performance.
Result for hyper_optim_25af1e94:
best_loss: 0.023672818150612285
date: 2022-04-19_14-41-50
done: false
early_stopping_count: 0
epoch: 1
experiment_id: a2dcf332dc6b4985850fd7f8bbe46ba9
hostname: ip-172-31-12-94
iterations_since_restore: 1
node_ip: 172.31.12.94
pid: 645
should_checkpoint: true
time_since_restore: 14.113289833068848
time_this_iter_s: 14.113289833068848
time_total_s: 14.113289833068848
timestamp: 1650404510
timesteps_since_restore: 0
training_iteration: 1
trial_id: 25af1e94
val_loss: 0.023672818150612285
2022-04-19 15:00:31,211 INFO command_runner.py:357 -- Fetched IP: 172.31.12.94
2022-04-19 15:00:31,211 INFO log_timer.py:27 -- NodeUpdater: ins_JYiZiYxkMuELacpf249rU7Vw: Got IP [LogTimer=34ms]
2022-04-19 15:09:47,116 WARNING util.py:164 -- The `callbacks.on_trial_result` operation took 557.105 s, which may be a performance bottleneck.
2022-04-19 15:09:47,117 WARNING util.py:164 -- The `process_trial_result` operation took 557.106 s, which may be a performance bottleneck.
2022-04-19 15:09:47,117 WARNING util.py:164 -- Processing trial results took 557.106 s, which may be a performance bottleneck. Please consider reporting results less frequently to Ray Tune.
2022-04-19 15:09:47,117 WARNING util.py:164 -- The `process_trial` operation took 557.107 s, which may be a performance bottleneck.
== Status ==
Current time: 2022-04-19 15:09:47 (running for 00:30:37.18)
Memory usage on this node: 8.9/62.0 GiB
Using HyperBand: num_stopped=0 total_brackets=1
Round #0:
Bracket(Max Size (n)=27, Milestone (r)=1, completed=2.8%): {RUNNING: 4}
Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects
Current best trial: 25af1e94 with val_loss=0.023672818150612285 and parameters={'model_name': 'Transformer', 'num_labels': 3, 'batch_size': 32, 'lr': 0.0012146952471943553, 'warmup': 0.0722013249900006, 'w_decay': 0.06618419913715975, 'n_epochs': 30, 'max_length': 512}
Result logdir: /home/ray/ray_results/bohb_test
Number of trials: 4/4 (4 RUNNING)
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------+
| Trial name | status | loc | batch_size | lr | w_decay | warmup | iter | total time (s) | val_loss | epoch | early_stopping_count |
|----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------|
| hyper_optim_25af1e94 | RUNNING | 172.31.12.94:645 | 32 | 0.0012147 | 0.0661842 | 0.0722013 | 1 | 14.1133 | 0.0236728 | 1 | 0 |
| hyper_optim_263ec5a8 | RUNNING | 172.31.17.28:645 | 8 | 0.0138314 | 0.0611623 | 0.0618856 | | | | | |
| hyper_optim_26cb78b8 | RUNNING | 172.31.12.94:644 | 8 | 4.18488e-05 | 0.166196 | 0.0200642 | 1 | 14.1054 | 0.0236728 | 1 | 0 |
| hyper_optim_26dbfec2 | RUNNING | 172.31.17.28:644 | 32 | 0.000968953 | 0.274445 | 0.0061904 | 1 | 14.2073 | 0.0236728 | 1 | 0 |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------+
2022-04-19 15:09:47,126 WARNING ray_trial_executor.py:659 -- Over the last 60 seconds, the Tune event loop has been backlogged processing new results. Consider increasing your period of result reporting to improve performance.
Result for hyper_optim_263ec5a8:
best_loss: 0.023672818150612285
date: 2022-04-19_14-41-50
done: false
early_stopping_count: 0
epoch: 1
experiment_id: 07840c8729d343e9a935c6aa7b65c6c8
hostname: ip-172-31-17-28
iterations_since_restore: 1
node_ip: 172.31.17.28
pid: 645
should_checkpoint: true
time_since_restore: 14.20270586013794
time_this_iter_s: 14.20270586013794
time_total_s: 14.20270586013794
timestamp: 1650404510
timesteps_since_restore: 0
training_iteration: 1
trial_id: 263ec5a8
val_loss: 0.023672818150612285
2022-04-19 15:09:48,617 INFO command_runner.py:357 -- Fetched IP: 172.31.17.28
2022-04-19 15:09:48,617 INFO log_timer.py:27 -- NodeUpdater: ins_8Ap3Ap8Vz7BequRrxaa1sS58: Got IP [LogTimer=34ms]
(hyper_optim pid=645, ip=172.31.12.94) 2022-04-19 15:09:50,116 INFO trainable.py:89 -- Checkpoint size is 2235556379 bytes
(hyper_optim pid=644, ip=172.31.17.28) 2022-04-19 15:09:50,234 INFO trainable.py:89 -- Checkpoint size is 2235556379 bytes
(hyper_optim pid=644, ip=172.31.12.94) 2022-04-19 15:09:50,326 INFO trainable.py:89 -- Checkpoint size is 2235556379 bytes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment