Created
April 19, 2022 22:14
-
-
Save xwjiang2010/e11a9301eb92c0555bf8f0c787aaa12b to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
== Status == | |
Current time: 2022-04-19 14:41:36 (running for 00:02:26.83) | |
Memory usage on this node: 7.4/62.0 GiB | |
Using HyperBand: num_stopped=0 total_brackets=1 | |
Round #0: | |
Bracket(Max Size (n)=27, Milestone (r)=1, completed=0.0%): {RUNNING: 4} | |
Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects | |
Result logdir: /home/ray/ray_results/bohb_test | |
Number of trials: 4/4 (4 RUNNING) | |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+ | |
| Trial name | status | loc | batch_size | lr | w_decay | warmup | | |
|----------------------+----------+------------------+--------------+-------------+-----------+-----------| | |
| hyper_optim_25af1e94 | RUNNING | 172.31.12.94:645 | 32 | 0.0012147 | 0.0661842 | 0.0722013 | | |
| hyper_optim_263ec5a8 | RUNNING | 172.31.17.28:645 | 8 | 0.0138314 | 0.0611623 | 0.0618856 | | |
| hyper_optim_26cb78b8 | RUNNING | 172.31.12.94:644 | 8 | 4.18488e-05 | 0.166196 | 0.0200642 | | |
| hyper_optim_26dbfec2 | RUNNING | 172.31.17.28:644 | 32 | 0.000968953 | 0.274445 | 0.0061904 | | |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+ | |
== Status == | |
Current time: 2022-04-19 14:41:37 (running for 00:02:27.84) | |
Memory usage on this node: 9.1/62.0 GiB | |
Using HyperBand: num_stopped=0 total_brackets=1 | |
Round #0: | |
Bracket(Max Size (n)=27, Milestone (r)=1, completed=0.0%): {RUNNING: 4} | |
Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects | |
Result logdir: /home/ray/ray_results/bohb_test | |
Number of trials: 4/4 (4 RUNNING) | |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+ | |
| Trial name | status | loc | batch_size | lr | w_decay | warmup | | |
|----------------------+----------+------------------+--------------+-------------+-----------+-----------| | |
| hyper_optim_25af1e94 | RUNNING | 172.31.12.94:645 | 32 | 0.0012147 | 0.0661842 | 0.0722013 | | |
| hyper_optim_263ec5a8 | RUNNING | 172.31.17.28:645 | 8 | 0.0138314 | 0.0611623 | 0.0618856 | | |
| hyper_optim_26cb78b8 | RUNNING | 172.31.12.94:644 | 8 | 4.18488e-05 | 0.166196 | 0.0200642 | | |
| hyper_optim_26dbfec2 | RUNNING | 172.31.17.28:644 | 32 | 0.000968953 | 0.274445 | 0.0061904 | | |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+ | |
== Status == | |
Current time: 2022-04-19 14:41:42 (running for 00:02:32.84) | |
Memory usage on this node: 8.8/62.0 GiB | |
Using HyperBand: num_stopped=0 total_brackets=1 | |
Round #0: | |
Bracket(Max Size (n)=27, Milestone (r)=1, completed=0.0%): {RUNNING: 4} | |
Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects | |
Result logdir: /home/ray/ray_results/bohb_test | |
Number of trials: 4/4 (4 RUNNING) | |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+ | |
| Trial name | status | loc | batch_size | lr | w_decay | warmup | | |
|----------------------+----------+------------------+--------------+-------------+-----------+-----------| | |
| hyper_optim_25af1e94 | RUNNING | 172.31.12.94:645 | 32 | 0.0012147 | 0.0661842 | 0.0722013 | | |
| hyper_optim_263ec5a8 | RUNNING | 172.31.17.28:645 | 8 | 0.0138314 | 0.0611623 | 0.0618856 | | |
| hyper_optim_26cb78b8 | RUNNING | 172.31.12.94:644 | 8 | 4.18488e-05 | 0.166196 | 0.0200642 | | |
| hyper_optim_26dbfec2 | RUNNING | 172.31.17.28:644 | 32 | 0.000968953 | 0.274445 | 0.0061904 | | |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+ | |
== Status == | |
Current time: 2022-04-19 14:41:47 (running for 00:02:37.85) | |
Memory usage on this node: 8.8/62.0 GiB | |
Using HyperBand: num_stopped=0 total_brackets=1 | |
Round #0: | |
Bracket(Max Size (n)=27, Milestone (r)=1, completed=0.0%): {RUNNING: 4} | |
Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects | |
Result logdir: /home/ray/ray_results/bohb_test | |
Number of trials: 4/4 (4 RUNNING) | |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+ | |
| Trial name | status | loc | batch_size | lr | w_decay | warmup | | |
|----------------------+----------+------------------+--------------+-------------+-----------+-----------| | |
| hyper_optim_25af1e94 | RUNNING | 172.31.12.94:645 | 32 | 0.0012147 | 0.0661842 | 0.0722013 | | |
| hyper_optim_263ec5a8 | RUNNING | 172.31.17.28:645 | 8 | 0.0138314 | 0.0611623 | 0.0618856 | | |
| hyper_optim_26cb78b8 | RUNNING | 172.31.12.94:644 | 8 | 4.18488e-05 | 0.166196 | 0.0200642 | | |
| hyper_optim_26dbfec2 | RUNNING | 172.31.17.28:644 | 32 | 0.000968953 | 0.274445 | 0.0061904 | | |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+ | |
Result for hyper_optim_26cb78b8: | |
best_loss: 0.023672818150612285 | |
date: 2022-04-19_14-41-50 | |
done: false | |
early_stopping_count: 0 | |
epoch: 1 | |
experiment_id: 8b8120dd501b4824bc49b82da0fda0bf | |
hostname: ip-172-31-12-94 | |
iterations_since_restore: 1 | |
node_ip: 172.31.12.94 | |
pid: 644 | |
should_checkpoint: true | |
time_since_restore: 14.10544204711914 | |
time_this_iter_s: 14.10544204711914 | |
time_total_s: 14.10544204711914 | |
timestamp: 1650404510 | |
timesteps_since_restore: 0 | |
training_iteration: 1 | |
trial_id: 26cb78b8 | |
val_loss: 0.023672818150612285 | |
2022-04-19 14:41:51,029 INFO commands.py:293 -- Checking External environment settings | |
2022-04-19 14:41:52,957 WARN util.py:133 -- The `head_node` field is deprecated and will be ignored. Use `head_node_type` and `available_node_types` instead. | |
2022-04-19 14:41:52,957 WARN util.py:138 -- The `worker_nodes` field is deprecated and will be ignored. Use `available_node_types` instead. | |
Authenticating | |
Loaded Anyscale authentication token from variable. | |
2022-04-19 14:41:54,758 INFO command_runner.py:357 -- Fetched IP: 172.31.12.94 | |
2022-04-19 14:41:54,758 INFO log_timer.py:27 -- NodeUpdater: ins_JYiZiYxkMuELacpf249rU7Vw: Got IP [LogTimer=36ms] | |
2022-04-19 14:51:12,679 WARNING util.py:164 -- The `callbacks.on_trial_result` operation took 561.807 s, which may be a performance bottleneck. | |
2022-04-19 14:51:12,679 WARNING util.py:164 -- The `process_trial_result` operation took 561.808 s, which may be a performance bottleneck. | |
2022-04-19 14:51:12,679 WARNING util.py:164 -- Processing trial results took 561.808 s, which may be a performance bottleneck. Please consider reporting results less frequently to Ray Tune. | |
2022-04-19 14:51:12,679 WARNING util.py:164 -- The `process_trial` operation took 561.809 s, which may be a performance bottleneck. | |
== Status == | |
Current time: 2022-04-19 14:51:12 (running for 00:12:02.75) | |
Memory usage on this node: 8.9/62.0 GiB | |
Using HyperBand: num_stopped=0 total_brackets=1 | |
Round #0: | |
Bracket(Max Size (n)=27, Milestone (r)=1, completed=0.9%): {RUNNING: 4} | |
Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects | |
Current best trial: 26cb78b8 with val_loss=0.023672818150612285 and parameters={'model_name': 'Transformer', 'num_labels': 3, 'batch_size': 8, 'lr': 4.1848774014953113e-05, 'warmup': 0.020064220578974626, 'w_decay': 0.16619568921799385, 'n_epochs': 30, 'max_length': 512} | |
Result logdir: /home/ray/ray_results/bohb_test | |
Number of trials: 4/4 (4 RUNNING) | |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------+ | |
| Trial name | status | loc | batch_size | lr | w_decay | warmup | iter | total time (s) | val_loss | epoch | early_stopping_count | | |
|----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------| | |
| hyper_optim_25af1e94 | RUNNING | 172.31.12.94:645 | 32 | 0.0012147 | 0.0661842 | 0.0722013 | | | | | | | |
| hyper_optim_263ec5a8 | RUNNING | 172.31.17.28:645 | 8 | 0.0138314 | 0.0611623 | 0.0618856 | | | | | | | |
| hyper_optim_26cb78b8 | RUNNING | 172.31.12.94:644 | 8 | 4.18488e-05 | 0.166196 | 0.0200642 | 1 | 14.1054 | 0.0236728 | 1 | 0 | | |
| hyper_optim_26dbfec2 | RUNNING | 172.31.17.28:644 | 32 | 0.000968953 | 0.274445 | 0.0061904 | | | | | | | |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------+ | |
2022-04-19 14:51:12,686 WARNING ray_trial_executor.py:659 -- Over the last 60 seconds, the Tune event loop has been backlogged processing new results. Consider increasing your period of result reporting to improve performance. | |
Result for hyper_optim_26dbfec2: | |
best_loss: 0.023672818150612285 | |
date: 2022-04-19_14-41-50 | |
done: false | |
early_stopping_count: 0 | |
epoch: 1 | |
experiment_id: 468aa3a8b08949c4ab2fadca98bc3fcd | |
hostname: ip-172-31-17-28 | |
iterations_since_restore: 1 | |
node_ip: 172.31.17.28 | |
pid: 644 | |
should_checkpoint: true | |
time_since_restore: 14.207346200942993 | |
time_this_iter_s: 14.207346200942993 | |
time_total_s: 14.207346200942993 | |
timestamp: 1650404510 | |
timesteps_since_restore: 0 | |
training_iteration: 1 | |
trial_id: 26dbfec2 | |
val_loss: 0.023672818150612285 | |
2022-04-19 14:51:12,727 WARN commands.py:269 -- Loaded cached provider configuration | |
2022-04-19 14:51:12,727 WARN commands.py:273 -- If you experience issues with the cloud provider, try re-running the command with --no-config-cache. | |
2022-04-19 14:51:13,829 INFO command_runner.py:357 -- Fetched IP: 172.31.17.28 | |
2022-04-19 14:51:13,829 INFO log_timer.py:27 -- NodeUpdater: ins_8Ap3Ap8Vz7BequRrxaa1sS58: Got IP [LogTimer=53ms] | |
2022-04-19 15:00:30,001 WARNING util.py:164 -- The `callbacks.on_trial_result` operation took 557.314 s, which may be a performance bottleneck. | |
2022-04-19 15:00:30,001 WARNING util.py:164 -- The `process_trial_result` operation took 557.315 s, which may be a performance bottleneck. | |
2022-04-19 15:00:30,002 WARNING util.py:164 -- Processing trial results took 557.315 s, which may be a performance bottleneck. Please consider reporting results less frequently to Ray Tune. | |
2022-04-19 15:00:30,002 WARNING util.py:164 -- The `process_trial` operation took 557.315 s, which may be a performance bottleneck. | |
== Status == | |
Current time: 2022-04-19 15:00:30 (running for 00:21:20.07) | |
Memory usage on this node: 8.9/62.0 GiB | |
Using HyperBand: num_stopped=0 total_brackets=1 | |
Round #0: | |
Bracket(Max Size (n)=27, Milestone (r)=1, completed=1.9%): {RUNNING: 4} | |
Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects | |
Current best trial: 26cb78b8 with val_loss=0.023672818150612285 and parameters={'model_name': 'Transformer', 'num_labels': 3, 'batch_size': 8, 'lr': 4.1848774014953113e-05, 'warmup': 0.020064220578974626, 'w_decay': 0.16619568921799385, 'n_epochs': 30, 'max_length': 512} | |
Result logdir: /home/ray/ray_results/bohb_test | |
Number of trials: 4/4 (4 RUNNING) | |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------+ | |
| Trial name | status | loc | batch_size | lr | w_decay | warmup | iter | total time (s) | val_loss | epoch | early_stopping_count | | |
|----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------| | |
| hyper_optim_25af1e94 | RUNNING | 172.31.12.94:645 | 32 | 0.0012147 | 0.0661842 | 0.0722013 | | | | | | | |
| hyper_optim_263ec5a8 | RUNNING | 172.31.17.28:645 | 8 | 0.0138314 | 0.0611623 | 0.0618856 | | | | | | | |
| hyper_optim_26cb78b8 | RUNNING | 172.31.12.94:644 | 8 | 4.18488e-05 | 0.166196 | 0.0200642 | 1 | 14.1054 | 0.0236728 | 1 | 0 | | |
| hyper_optim_26dbfec2 | RUNNING | 172.31.17.28:644 | 32 | 0.000968953 | 0.274445 | 0.0061904 | 1 | 14.2073 | 0.0236728 | 1 | 0 | | |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------+ | |
2022-04-19 15:00:30,010 WARNING ray_trial_executor.py:659 -- Over the last 60 seconds, the Tune event loop has been backlogged processing new results. Consider increasing your period of result reporting to improve performance. | |
Result for hyper_optim_25af1e94: | |
best_loss: 0.023672818150612285 | |
date: 2022-04-19_14-41-50 | |
done: false | |
early_stopping_count: 0 | |
epoch: 1 | |
experiment_id: a2dcf332dc6b4985850fd7f8bbe46ba9 | |
hostname: ip-172-31-12-94 | |
iterations_since_restore: 1 | |
node_ip: 172.31.12.94 | |
pid: 645 | |
should_checkpoint: true | |
time_since_restore: 14.113289833068848 | |
time_this_iter_s: 14.113289833068848 | |
time_total_s: 14.113289833068848 | |
timestamp: 1650404510 | |
timesteps_since_restore: 0 | |
training_iteration: 1 | |
trial_id: 25af1e94 | |
val_loss: 0.023672818150612285 | |
2022-04-19 15:00:31,211 INFO command_runner.py:357 -- Fetched IP: 172.31.12.94 | |
2022-04-19 15:00:31,211 INFO log_timer.py:27 -- NodeUpdater: ins_JYiZiYxkMuELacpf249rU7Vw: Got IP [LogTimer=34ms] | |
2022-04-19 15:09:47,116 WARNING util.py:164 -- The `callbacks.on_trial_result` operation took 557.105 s, which may be a performance bottleneck. | |
2022-04-19 15:09:47,117 WARNING util.py:164 -- The `process_trial_result` operation took 557.106 s, which may be a performance bottleneck. | |
2022-04-19 15:09:47,117 WARNING util.py:164 -- Processing trial results took 557.106 s, which may be a performance bottleneck. Please consider reporting results less frequently to Ray Tune. | |
2022-04-19 15:09:47,117 WARNING util.py:164 -- The `process_trial` operation took 557.107 s, which may be a performance bottleneck. | |
== Status == | |
Current time: 2022-04-19 15:09:47 (running for 00:30:37.18) | |
Memory usage on this node: 8.9/62.0 GiB | |
Using HyperBand: num_stopped=0 total_brackets=1 | |
Round #0: | |
Bracket(Max Size (n)=27, Milestone (r)=1, completed=2.8%): {RUNNING: 4} | |
Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects | |
Current best trial: 25af1e94 with val_loss=0.023672818150612285 and parameters={'model_name': 'Transformer', 'num_labels': 3, 'batch_size': 32, 'lr': 0.0012146952471943553, 'warmup': 0.0722013249900006, 'w_decay': 0.06618419913715975, 'n_epochs': 30, 'max_length': 512} | |
Result logdir: /home/ray/ray_results/bohb_test | |
Number of trials: 4/4 (4 RUNNING) | |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------+ | |
| Trial name | status | loc | batch_size | lr | w_decay | warmup | iter | total time (s) | val_loss | epoch | early_stopping_count | | |
|----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------| | |
| hyper_optim_25af1e94 | RUNNING | 172.31.12.94:645 | 32 | 0.0012147 | 0.0661842 | 0.0722013 | 1 | 14.1133 | 0.0236728 | 1 | 0 | | |
| hyper_optim_263ec5a8 | RUNNING | 172.31.17.28:645 | 8 | 0.0138314 | 0.0611623 | 0.0618856 | | | | | | | |
| hyper_optim_26cb78b8 | RUNNING | 172.31.12.94:644 | 8 | 4.18488e-05 | 0.166196 | 0.0200642 | 1 | 14.1054 | 0.0236728 | 1 | 0 | | |
| hyper_optim_26dbfec2 | RUNNING | 172.31.17.28:644 | 32 | 0.000968953 | 0.274445 | 0.0061904 | 1 | 14.2073 | 0.0236728 | 1 | 0 | | |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------+ | |
2022-04-19 15:09:47,126 WARNING ray_trial_executor.py:659 -- Over the last 60 seconds, the Tune event loop has been backlogged processing new results. Consider increasing your period of result reporting to improve performance. | |
Result for hyper_optim_263ec5a8: | |
best_loss: 0.023672818150612285 | |
date: 2022-04-19_14-41-50 | |
done: false | |
early_stopping_count: 0 | |
epoch: 1 | |
experiment_id: 07840c8729d343e9a935c6aa7b65c6c8 | |
hostname: ip-172-31-17-28 | |
iterations_since_restore: 1 | |
node_ip: 172.31.17.28 | |
pid: 645 | |
should_checkpoint: true | |
time_since_restore: 14.20270586013794 | |
time_this_iter_s: 14.20270586013794 | |
time_total_s: 14.20270586013794 | |
timestamp: 1650404510 | |
timesteps_since_restore: 0 | |
training_iteration: 1 | |
trial_id: 263ec5a8 | |
val_loss: 0.023672818150612285 | |
2022-04-19 15:09:48,617 INFO command_runner.py:357 -- Fetched IP: 172.31.17.28 | |
2022-04-19 15:09:48,617 INFO log_timer.py:27 -- NodeUpdater: ins_8Ap3Ap8Vz7BequRrxaa1sS58: Got IP [LogTimer=34ms] | |
(hyper_optim pid=645, ip=172.31.12.94) 2022-04-19 15:09:50,116 INFO trainable.py:89 -- Checkpoint size is 2235556379 bytes | |
(hyper_optim pid=644, ip=172.31.17.28) 2022-04-19 15:09:50,234 INFO trainable.py:89 -- Checkpoint size is 2235556379 bytes | |
(hyper_optim pid=644, ip=172.31.12.94) 2022-04-19 15:09:50,326 INFO trainable.py:89 -- Checkpoint size is 2235556379 bytes |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment