xwjiang2010/large_checkpoint_output

## large_checkpoint_output
== Status ==
Current time: 2022-04-19 14:41:36 (running for 00:02:26.83)
Memory usage on this node: 7.4/62.0 GiB
Using HyperBand: num_stopped=0 total_brackets=1
Round #0:
  Bracket(Max Size (n)=27, Milestone (r)=1, completed=0.0%): {RUNNING: 4}
Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects
Result logdir: /home/ray/ray_results/bohb_test
Number of trials: 4/4 (4 RUNNING)
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+
| Trial name           | status   | loc              |   batch_size |          lr |   w_decay |    warmup |
|----------------------+----------+------------------+--------------+-------------+-----------+-----------|
| hyper_optim_25af1e94 | RUNNING  | 172.31.12.94:645 |           32 | 0.0012147   | 0.0661842 | 0.0722013 |
| hyper_optim_263ec5a8 | RUNNING  | 172.31.17.28:645 |            8 | 0.0138314   | 0.0611623 | 0.0618856 |
| hyper_optim_26cb78b8 | RUNNING  | 172.31.12.94:644 |            8 | 4.18488e-05 | 0.166196  | 0.0200642 |
| hyper_optim_26dbfec2 | RUNNING  | 172.31.17.28:644 |           32 | 0.000968953 | 0.274445  | 0.0061904 |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+


== Status ==
Current time: 2022-04-19 14:41:37 (running for 00:02:27.84)
Memory usage on this node: 9.1/62.0 GiB
Using HyperBand: num_stopped=0 total_brackets=1
Round #0:
  Bracket(Max Size (n)=27, Milestone (r)=1, completed=0.0%): {RUNNING: 4}
Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects
Result logdir: /home/ray/ray_results/bohb_test
Number of trials: 4/4 (4 RUNNING)
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+
| Trial name           | status   | loc              |   batch_size |          lr |   w_decay |    warmup |
|----------------------+----------+------------------+--------------+-------------+-----------+-----------|
| hyper_optim_25af1e94 | RUNNING  | 172.31.12.94:645 |           32 | 0.0012147   | 0.0661842 | 0.0722013 |
| hyper_optim_263ec5a8 | RUNNING  | 172.31.17.28:645 |            8 | 0.0138314   | 0.0611623 | 0.0618856 |
| hyper_optim_26cb78b8 | RUNNING  | 172.31.12.94:644 |            8 | 4.18488e-05 | 0.166196  | 0.0200642 |
| hyper_optim_26dbfec2 | RUNNING  | 172.31.17.28:644 |           32 | 0.000968953 | 0.274445  | 0.0061904 |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+


== Status ==
Current time: 2022-04-19 14:41:42 (running for 00:02:32.84)
Memory usage on this node: 8.8/62.0 GiB
Using HyperBand: num_stopped=0 total_brackets=1
Round #0:
  Bracket(Max Size (n)=27, Milestone (r)=1, completed=0.0%): {RUNNING: 4}
Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects
Result logdir: /home/ray/ray_results/bohb_test
Number of trials: 4/4 (4 RUNNING)
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+
| Trial name           | status   | loc              |   batch_size |          lr |   w_decay |    warmup |
|----------------------+----------+------------------+--------------+-------------+-----------+-----------|
| hyper_optim_25af1e94 | RUNNING  | 172.31.12.94:645 |           32 | 0.0012147   | 0.0661842 | 0.0722013 |
| hyper_optim_263ec5a8 | RUNNING  | 172.31.17.28:645 |            8 | 0.0138314   | 0.0611623 | 0.0618856 |
| hyper_optim_26cb78b8 | RUNNING  | 172.31.12.94:644 |            8 | 4.18488e-05 | 0.166196  | 0.0200642 |
| hyper_optim_26dbfec2 | RUNNING  | 172.31.17.28:644 |           32 | 0.000968953 | 0.274445  | 0.0061904 |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+


== Status ==
Current time: 2022-04-19 14:41:47 (running for 00:02:37.85)
Memory usage on this node: 8.8/62.0 GiB
Using HyperBand: num_stopped=0 total_brackets=1
Round #0:
  Bracket(Max Size (n)=27, Milestone (r)=1, completed=0.0%): {RUNNING: 4}
Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects
Result logdir: /home/ray/ray_results/bohb_test
Number of trials: 4/4 (4 RUNNING)
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+
| Trial name           | status   | loc              |   batch_size |          lr |   w_decay |    warmup |
|----------------------+----------+------------------+--------------+-------------+-----------+-----------|
| hyper_optim_25af1e94 | RUNNING  | 172.31.12.94:645 |           32 | 0.0012147   | 0.0661842 | 0.0722013 |
| hyper_optim_263ec5a8 | RUNNING  | 172.31.17.28:645 |            8 | 0.0138314   | 0.0611623 | 0.0618856 |
| hyper_optim_26cb78b8 | RUNNING  | 172.31.12.94:644 |            8 | 4.18488e-05 | 0.166196  | 0.0200642 |
| hyper_optim_26dbfec2 | RUNNING  | 172.31.17.28:644 |           32 | 0.000968953 | 0.274445  | 0.0061904 |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+


Result for hyper_optim_26cb78b8:
  best_loss: 0.023672818150612285
  date: 2022-04-19_14-41-50
  done: false
  early_stopping_count: 0
  epoch: 1
  experiment_id: 8b8120dd501b4824bc49b82da0fda0bf
  hostname: ip-172-31-12-94
  iterations_since_restore: 1
  node_ip: 172.31.12.94
  pid: 644
  should_checkpoint: true
  time_since_restore: 14.10544204711914
  time_this_iter_s: 14.10544204711914
  time_total_s: 14.10544204711914
  timestamp: 1650404510
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: 26cb78b8
  val_loss: 0.023672818150612285

2022-04-19 14:41:51,029 INFO commands.py:293 -- Checking External environment settings
2022-04-19 14:41:52,957 WARN util.py:133 -- The `head_node` field is deprecated and will be ignored. Use `head_node_type` and `available_node_types` instead.
2022-04-19 14:41:52,957 WARN util.py:138 -- The `worker_nodes` field is deprecated and will be ignored. Use `available_node_types` instead.
Authenticating
Loaded Anyscale authentication token from variable.

2022-04-19 14:41:54,758 INFO command_runner.py:357 -- Fetched IP: 172.31.12.94
2022-04-19 14:41:54,758 INFO log_timer.py:27 -- NodeUpdater: ins_JYiZiYxkMuELacpf249rU7Vw: Got IP  [LogTimer=36ms]
2022-04-19 14:51:12,679 WARNING util.py:164 -- The `callbacks.on_trial_result` operation took 561.807 s, which may be a performance bottleneck.
2022-04-19 14:51:12,679 WARNING util.py:164 -- The `process_trial_result` operation took 561.808 s, which may be a performance bottleneck.
2022-04-19 14:51:12,679 WARNING util.py:164 -- Processing trial results took 561.808 s, which may be a performance bottleneck. Please consider reporting results less frequently to Ray Tune.
2022-04-19 14:51:12,679 WARNING util.py:164 -- The `process_trial` operation took 561.809 s, which may be a performance bottleneck.
== Status ==
Current time: 2022-04-19 14:51:12 (running for 00:12:02.75)
Memory usage on this node: 8.9/62.0 GiB
Using HyperBand: num_stopped=0 total_brackets=1
Round #0:
  Bracket(Max Size (n)=27, Milestone (r)=1, completed=0.9%): {RUNNING: 4}
Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects
Current best trial: 26cb78b8 with val_loss=0.023672818150612285 and parameters={'model_name': 'Transformer', 'num_labels': 3, 'batch_size': 8, 'lr': 4.1848774014953113e-05, 'warmup': 0.020064220578974626, 'w_decay': 0.16619568921799385, 'n_epochs': 30, 'max_length': 512}
Result logdir: /home/ray/ray_results/bohb_test
Number of trials: 4/4 (4 RUNNING)
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------+
| Trial name           | status   | loc              |   batch_size |          lr |   w_decay |    warmup |   iter |   total time (s) |   val_loss |   epoch |   early_stopping_count |
|----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------|
| hyper_optim_25af1e94 | RUNNING  | 172.31.12.94:645 |           32 | 0.0012147   | 0.0661842 | 0.0722013 |        |                  |            |         |                        |
| hyper_optim_263ec5a8 | RUNNING  | 172.31.17.28:645 |            8 | 0.0138314   | 0.0611623 | 0.0618856 |        |                  |            |         |                        |
| hyper_optim_26cb78b8 | RUNNING  | 172.31.12.94:644 |            8 | 4.18488e-05 | 0.166196  | 0.0200642 |      1 |          14.1054 |  0.0236728 |       1 |                      0 |
| hyper_optim_26dbfec2 | RUNNING  | 172.31.17.28:644 |           32 | 0.000968953 | 0.274445  | 0.0061904 |        |                  |            |         |                        |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------+


2022-04-19 14:51:12,686 WARNING ray_trial_executor.py:659 -- Over the last 60 seconds, the Tune event loop has been backlogged processing new results. Consider increasing your period of result reporting to improve performance.
Result for hyper_optim_26dbfec2:
  best_loss: 0.023672818150612285
  date: 2022-04-19_14-41-50
  done: false
  early_stopping_count: 0
  epoch: 1
  experiment_id: 468aa3a8b08949c4ab2fadca98bc3fcd
  hostname: ip-172-31-17-28
  iterations_since_restore: 1
  node_ip: 172.31.17.28
  pid: 644
  should_checkpoint: true
  time_since_restore: 14.207346200942993
  time_this_iter_s: 14.207346200942993
  time_total_s: 14.207346200942993
  timestamp: 1650404510
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: 26dbfec2
  val_loss: 0.023672818150612285

2022-04-19 14:51:12,727 WARN commands.py:269 -- Loaded cached provider configuration
2022-04-19 14:51:12,727 WARN commands.py:273 -- If you experience issues with the cloud provider, try re-running the command with --no-config-cache.
2022-04-19 14:51:13,829 INFO command_runner.py:357 -- Fetched IP: 172.31.17.28
2022-04-19 14:51:13,829 INFO log_timer.py:27 -- NodeUpdater: ins_8Ap3Ap8Vz7BequRrxaa1sS58: Got IP  [LogTimer=53ms]
2022-04-19 15:00:30,001 WARNING util.py:164 -- The `callbacks.on_trial_result` operation took 557.314 s, which may be a performance bottleneck.
2022-04-19 15:00:30,001 WARNING util.py:164 -- The `process_trial_result` operation took 557.315 s, which may be a performance bottleneck.
2022-04-19 15:00:30,002 WARNING util.py:164 -- Processing trial results took 557.315 s, which may be a performance bottleneck. Please consider reporting results less frequently to Ray Tune.
2022-04-19 15:00:30,002 WARNING util.py:164 -- The `process_trial` operation took 557.315 s, which may be a performance bottleneck.
== Status ==
Current time: 2022-04-19 15:00:30 (running for 00:21:20.07)
Memory usage on this node: 8.9/62.0 GiB
Using HyperBand: num_stopped=0 total_brackets=1
Round #0:
  Bracket(Max Size (n)=27, Milestone (r)=1, completed=1.9%): {RUNNING: 4}
Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects
Current best trial: 26cb78b8 with val_loss=0.023672818150612285 and parameters={'model_name': 'Transformer', 'num_labels': 3, 'batch_size': 8, 'lr': 4.1848774014953113e-05, 'warmup': 0.020064220578974626, 'w_decay': 0.16619568921799385, 'n_epochs': 30, 'max_length': 512}
Result logdir: /home/ray/ray_results/bohb_test
Number of trials: 4/4 (4 RUNNING)
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------+
| Trial name           | status   | loc              |   batch_size |          lr |   w_decay |    warmup |   iter |   total time (s) |   val_loss |   epoch |   early_stopping_count |
|----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------|
| hyper_optim_25af1e94 | RUNNING  | 172.31.12.94:645 |           32 | 0.0012147   | 0.0661842 | 0.0722013 |        |                  |            |         |                        |
| hyper_optim_263ec5a8 | RUNNING  | 172.31.17.28:645 |            8 | 0.0138314   | 0.0611623 | 0.0618856 |        |                  |            |         |                        |
| hyper_optim_26cb78b8 | RUNNING  | 172.31.12.94:644 |            8 | 4.18488e-05 | 0.166196  | 0.0200642 |      1 |          14.1054 |  0.0236728 |       1 |                      0 |
| hyper_optim_26dbfec2 | RUNNING  | 172.31.17.28:644 |           32 | 0.000968953 | 0.274445  | 0.0061904 |      1 |          14.2073 |  0.0236728 |       1 |                      0 |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------+


2022-04-19 15:00:30,010 WARNING ray_trial_executor.py:659 -- Over the last 60 seconds, the Tune event loop has been backlogged processing new results. Consider increasing your period of result reporting to improve performance.
Result for hyper_optim_25af1e94:
  best_loss: 0.023672818150612285
  date: 2022-04-19_14-41-50
  done: false
  early_stopping_count: 0
  epoch: 1
  experiment_id: a2dcf332dc6b4985850fd7f8bbe46ba9
  hostname: ip-172-31-12-94
  iterations_since_restore: 1
  node_ip: 172.31.12.94
  pid: 645
  should_checkpoint: true
  time_since_restore: 14.113289833068848
  time_this_iter_s: 14.113289833068848
  time_total_s: 14.113289833068848
  timestamp: 1650404510
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: 25af1e94
  val_loss: 0.023672818150612285

2022-04-19 15:00:31,211 INFO command_runner.py:357 -- Fetched IP: 172.31.12.94
2022-04-19 15:00:31,211 INFO log_timer.py:27 -- NodeUpdater: ins_JYiZiYxkMuELacpf249rU7Vw: Got IP  [LogTimer=34ms]
2022-04-19 15:09:47,116 WARNING util.py:164 -- The `callbacks.on_trial_result` operation took 557.105 s, which may be a performance bottleneck.
2022-04-19 15:09:47,117 WARNING util.py:164 -- The `process_trial_result` operation took 557.106 s, which may be a performance bottleneck.
2022-04-19 15:09:47,117 WARNING util.py:164 -- Processing trial results took 557.106 s, which may be a performance bottleneck. Please consider reporting results less frequently to Ray Tune.
2022-04-19 15:09:47,117 WARNING util.py:164 -- The `process_trial` operation took 557.107 s, which may be a performance bottleneck.
== Status ==
Current time: 2022-04-19 15:09:47 (running for 00:30:37.18)
Memory usage on this node: 8.9/62.0 GiB
Using HyperBand: num_stopped=0 total_brackets=1
Round #0:
  Bracket(Max Size (n)=27, Milestone (r)=1, completed=2.8%): {RUNNING: 4}
Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects
Current best trial: 25af1e94 with val_loss=0.023672818150612285 and parameters={'model_name': 'Transformer', 'num_labels': 3, 'batch_size': 32, 'lr': 0.0012146952471943553, 'warmup': 0.0722013249900006, 'w_decay': 0.06618419913715975, 'n_epochs': 30, 'max_length': 512}
Result logdir: /home/ray/ray_results/bohb_test
Number of trials: 4/4 (4 RUNNING)
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------+
| Trial name           | status   | loc              |   batch_size |          lr |   w_decay |    warmup |   iter |   total time (s) |   val_loss |   epoch |   early_stopping_count |
|----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------|
| hyper_optim_25af1e94 | RUNNING  | 172.31.12.94:645 |           32 | 0.0012147   | 0.0661842 | 0.0722013 |      1 |          14.1133 |  0.0236728 |       1 |                      0 |
| hyper_optim_263ec5a8 | RUNNING  | 172.31.17.28:645 |            8 | 0.0138314   | 0.0611623 | 0.0618856 |        |                  |            |         |                        |
| hyper_optim_26cb78b8 | RUNNING  | 172.31.12.94:644 |            8 | 4.18488e-05 | 0.166196  | 0.0200642 |      1 |          14.1054 |  0.0236728 |       1 |                      0 |
| hyper_optim_26dbfec2 | RUNNING  | 172.31.17.28:644 |           32 | 0.000968953 | 0.274445  | 0.0061904 |      1 |          14.2073 |  0.0236728 |       1 |                      0 |
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------+


2022-04-19 15:09:47,126 WARNING ray_trial_executor.py:659 -- Over the last 60 seconds, the Tune event loop has been backlogged processing new results. Consider increasing your period of result reporting to improve performance.
Result for hyper_optim_263ec5a8:
  best_loss: 0.023672818150612285
  date: 2022-04-19_14-41-50
  done: false
  early_stopping_count: 0
  epoch: 1
  experiment_id: 07840c8729d343e9a935c6aa7b65c6c8
  hostname: ip-172-31-17-28
  iterations_since_restore: 1
  node_ip: 172.31.17.28
  pid: 645
  should_checkpoint: true
  time_since_restore: 14.20270586013794
  time_this_iter_s: 14.20270586013794
  time_total_s: 14.20270586013794
  timestamp: 1650404510
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: 263ec5a8
  val_loss: 0.023672818150612285

2022-04-19 15:09:48,617 INFO command_runner.py:357 -- Fetched IP: 172.31.17.28
2022-04-19 15:09:48,617 INFO log_timer.py:27 -- NodeUpdater: ins_8Ap3Ap8Vz7BequRrxaa1sS58: Got IP  [LogTimer=34ms]
(hyper_optim pid=645, ip=172.31.12.94) 2022-04-19 15:09:50,116  INFO trainable.py:89 -- Checkpoint size is 2235556379 bytes
                                                                                                                           (hyper_optim pid=644, ip=172.31.17.28) 2022-04-19 15:09:50,234 INFO trainable.py:89 -- Checkpoint size is 2235556379 bytes
                                                                                                   (hyper_optim pid=644, ip=172.31.12.94) 2022-04-19 15:09:50,326 INFO trainable.py:89 -- Checkpoint size is 2235556379 bytes
	== Status ==
	Current time: 2022-04-19 14:41:36 (running for 00:02:26.83)
	Memory usage on this node: 7.4/62.0 GiB
	Using HyperBand: num_stopped=0 total_brackets=1
	Round #0:
	Bracket(Max Size (n)=27, Milestone (r)=1, completed=0.0%): {RUNNING: 4}
	Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects
	Result logdir: /home/ray/ray_results/bohb_test
	Number of trials: 4/4 (4 RUNNING)
	+----------------------+----------+------------------+--------------+-------------+-----------+-----------+
	\| Trial name \| status \| loc \| batch_size \| lr \| w_decay \| warmup \|
	\|----------------------+----------+------------------+--------------+-------------+-----------+-----------\|
	\| hyper_optim_25af1e94 \| RUNNING \| 172.31.12.94:645 \| 32 \| 0.0012147 \| 0.0661842 \| 0.0722013 \|
	\| hyper_optim_263ec5a8 \| RUNNING \| 172.31.17.28:645 \| 8 \| 0.0138314 \| 0.0611623 \| 0.0618856 \|
	\| hyper_optim_26cb78b8 \| RUNNING \| 172.31.12.94:644 \| 8 \| 4.18488e-05 \| 0.166196 \| 0.0200642 \|
	\| hyper_optim_26dbfec2 \| RUNNING \| 172.31.17.28:644 \| 32 \| 0.000968953 \| 0.274445 \| 0.0061904 \|
	+----------------------+----------+------------------+--------------+-------------+-----------+-----------+


	== Status ==
	Current time: 2022-04-19 14:41:37 (running for 00:02:27.84)
	Memory usage on this node: 9.1/62.0 GiB
	Using HyperBand: num_stopped=0 total_brackets=1
	Round #0:
	Bracket(Max Size (n)=27, Milestone (r)=1, completed=0.0%): {RUNNING: 4}
	Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects
	Result logdir: /home/ray/ray_results/bohb_test
	Number of trials: 4/4 (4 RUNNING)
	+----------------------+----------+------------------+--------------+-------------+-----------+-----------+
	\| Trial name \| status \| loc \| batch_size \| lr \| w_decay \| warmup \|
	\|----------------------+----------+------------------+--------------+-------------+-----------+-----------\|
	\| hyper_optim_25af1e94 \| RUNNING \| 172.31.12.94:645 \| 32 \| 0.0012147 \| 0.0661842 \| 0.0722013 \|
	\| hyper_optim_263ec5a8 \| RUNNING \| 172.31.17.28:645 \| 8 \| 0.0138314 \| 0.0611623 \| 0.0618856 \|
	\| hyper_optim_26cb78b8 \| RUNNING \| 172.31.12.94:644 \| 8 \| 4.18488e-05 \| 0.166196 \| 0.0200642 \|
	\| hyper_optim_26dbfec2 \| RUNNING \| 172.31.17.28:644 \| 32 \| 0.000968953 \| 0.274445 \| 0.0061904 \|
	+----------------------+----------+------------------+--------------+-------------+-----------+-----------+


	== Status ==
	Current time: 2022-04-19 14:41:42 (running for 00:02:32.84)
	Memory usage on this node: 8.8/62.0 GiB
	Using HyperBand: num_stopped=0 total_brackets=1
	Round #0:
	Bracket(Max Size (n)=27, Milestone (r)=1, completed=0.0%): {RUNNING: 4}
	Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects
	Result logdir: /home/ray/ray_results/bohb_test
	Number of trials: 4/4 (4 RUNNING)
	+----------------------+----------+------------------+--------------+-------------+-----------+-----------+
	\| Trial name \| status \| loc \| batch_size \| lr \| w_decay \| warmup \|
	\|----------------------+----------+------------------+--------------+-------------+-----------+-----------\|
	\| hyper_optim_25af1e94 \| RUNNING \| 172.31.12.94:645 \| 32 \| 0.0012147 \| 0.0661842 \| 0.0722013 \|
	\| hyper_optim_263ec5a8 \| RUNNING \| 172.31.17.28:645 \| 8 \| 0.0138314 \| 0.0611623 \| 0.0618856 \|
	\| hyper_optim_26cb78b8 \| RUNNING \| 172.31.12.94:644 \| 8 \| 4.18488e-05 \| 0.166196 \| 0.0200642 \|
	\| hyper_optim_26dbfec2 \| RUNNING \| 172.31.17.28:644 \| 32 \| 0.000968953 \| 0.274445 \| 0.0061904 \|
	+----------------------+----------+------------------+--------------+-------------+-----------+-----------+


	== Status ==
	Current time: 2022-04-19 14:41:47 (running for 00:02:37.85)
	Memory usage on this node: 8.8/62.0 GiB
	Using HyperBand: num_stopped=0 total_brackets=1
	Round #0:
	Bracket(Max Size (n)=27, Milestone (r)=1, completed=0.0%): {RUNNING: 4}
	Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects
	Result logdir: /home/ray/ray_results/bohb_test
	Number of trials: 4/4 (4 RUNNING)
	+----------------------+----------+------------------+--------------+-------------+-----------+-----------+
	\| Trial name \| status \| loc \| batch_size \| lr \| w_decay \| warmup \|
	\|----------------------+----------+------------------+--------------+-------------+-----------+-----------\|
	\| hyper_optim_25af1e94 \| RUNNING \| 172.31.12.94:645 \| 32 \| 0.0012147 \| 0.0661842 \| 0.0722013 \|
	\| hyper_optim_263ec5a8 \| RUNNING \| 172.31.17.28:645 \| 8 \| 0.0138314 \| 0.0611623 \| 0.0618856 \|
	\| hyper_optim_26cb78b8 \| RUNNING \| 172.31.12.94:644 \| 8 \| 4.18488e-05 \| 0.166196 \| 0.0200642 \|
	\| hyper_optim_26dbfec2 \| RUNNING \| 172.31.17.28:644 \| 32 \| 0.000968953 \| 0.274445 \| 0.0061904 \|
	+----------------------+----------+------------------+--------------+-------------+-----------+-----------+


	Result for hyper_optim_26cb78b8:
	best_loss: 0.023672818150612285
	date: 2022-04-19_14-41-50
	done: false
	early_stopping_count: 0
	epoch: 1
	experiment_id: 8b8120dd501b4824bc49b82da0fda0bf
	hostname: ip-172-31-12-94
	iterations_since_restore: 1
	node_ip: 172.31.12.94
	pid: 644
	should_checkpoint: true
	time_since_restore: 14.10544204711914
	time_this_iter_s: 14.10544204711914
	time_total_s: 14.10544204711914
	timestamp: 1650404510
	timesteps_since_restore: 0
	training_iteration: 1
	trial_id: 26cb78b8
	val_loss: 0.023672818150612285

	2022-04-19 14:41:51,029 INFO commands.py:293 -- Checking External environment settings
	2022-04-19 14:41:52,957 WARN util.py:133 -- The `head_node` field is deprecated and will be ignored. Use `head_node_type` and `available_node_types` instead.
	2022-04-19 14:41:52,957 WARN util.py:138 -- The `worker_nodes` field is deprecated and will be ignored. Use `available_node_types` instead.
	Authenticating
	Loaded Anyscale authentication token from variable.

	2022-04-19 14:41:54,758 INFO command_runner.py:357 -- Fetched IP: 172.31.12.94
	2022-04-19 14:41:54,758 INFO log_timer.py:27 -- NodeUpdater: ins_JYiZiYxkMuELacpf249rU7Vw: Got IP [LogTimer=36ms]
	2022-04-19 14:51:12,679 WARNING util.py:164 -- The `callbacks.on_trial_result` operation took 561.807 s, which may be a performance bottleneck.
	2022-04-19 14:51:12,679 WARNING util.py:164 -- The `process_trial_result` operation took 561.808 s, which may be a performance bottleneck.
	2022-04-19 14:51:12,679 WARNING util.py:164 -- Processing trial results took 561.808 s, which may be a performance bottleneck. Please consider reporting results less frequently to Ray Tune.
	2022-04-19 14:51:12,679 WARNING util.py:164 -- The `process_trial` operation took 561.809 s, which may be a performance bottleneck.
	== Status ==
	Current time: 2022-04-19 14:51:12 (running for 00:12:02.75)
	Memory usage on this node: 8.9/62.0 GiB
	Using HyperBand: num_stopped=0 total_brackets=1
	Round #0:
	Bracket(Max Size (n)=27, Milestone (r)=1, completed=0.9%): {RUNNING: 4}
	Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects
	Current best trial: 26cb78b8 with val_loss=0.023672818150612285 and parameters={'model_name': 'Transformer', 'num_labels': 3, 'batch_size': 8, 'lr': 4.1848774014953113e-05, 'warmup': 0.020064220578974626, 'w_decay': 0.16619568921799385, 'n_epochs': 30, 'max_length': 512}
	Result logdir: /home/ray/ray_results/bohb_test
	Number of trials: 4/4 (4 RUNNING)
	+----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------+
	\| Trial name \| status \| loc \| batch_size \| lr \| w_decay \| warmup \| iter \| total time (s) \| val_loss \| epoch \| early_stopping_count \|
	\|----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------\|
	\| hyper_optim_25af1e94 \| RUNNING \| 172.31.12.94:645 \| 32 \| 0.0012147 \| 0.0661842 \| 0.0722013 \| \| \| \| \| \|
	\| hyper_optim_263ec5a8 \| RUNNING \| 172.31.17.28:645 \| 8 \| 0.0138314 \| 0.0611623 \| 0.0618856 \| \| \| \| \| \|
	\| hyper_optim_26cb78b8 \| RUNNING \| 172.31.12.94:644 \| 8 \| 4.18488e-05 \| 0.166196 \| 0.0200642 \| 1 \| 14.1054 \| 0.0236728 \| 1 \| 0 \|
	\| hyper_optim_26dbfec2 \| RUNNING \| 172.31.17.28:644 \| 32 \| 0.000968953 \| 0.274445 \| 0.0061904 \| \| \| \| \| \|
	+----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------+


	2022-04-19 14:51:12,686 WARNING ray_trial_executor.py:659 -- Over the last 60 seconds, the Tune event loop has been backlogged processing new results. Consider increasing your period of result reporting to improve performance.
	Result for hyper_optim_26dbfec2:
	best_loss: 0.023672818150612285
	date: 2022-04-19_14-41-50
	done: false
	early_stopping_count: 0
	epoch: 1
	experiment_id: 468aa3a8b08949c4ab2fadca98bc3fcd
	hostname: ip-172-31-17-28
	iterations_since_restore: 1
	node_ip: 172.31.17.28
	pid: 644
	should_checkpoint: true
	time_since_restore: 14.207346200942993
	time_this_iter_s: 14.207346200942993
	time_total_s: 14.207346200942993
	timestamp: 1650404510
	timesteps_since_restore: 0
	training_iteration: 1
	trial_id: 26dbfec2
	val_loss: 0.023672818150612285

	2022-04-19 14:51:12,727 WARN commands.py:269 -- Loaded cached provider configuration
	2022-04-19 14:51:12,727 WARN commands.py:273 -- If you experience issues with the cloud provider, try re-running the command with --no-config-cache.
	2022-04-19 14:51:13,829 INFO command_runner.py:357 -- Fetched IP: 172.31.17.28
	2022-04-19 14:51:13,829 INFO log_timer.py:27 -- NodeUpdater: ins_8Ap3Ap8Vz7BequRrxaa1sS58: Got IP [LogTimer=53ms]
	2022-04-19 15:00:30,001 WARNING util.py:164 -- The `callbacks.on_trial_result` operation took 557.314 s, which may be a performance bottleneck.
	2022-04-19 15:00:30,001 WARNING util.py:164 -- The `process_trial_result` operation took 557.315 s, which may be a performance bottleneck.
	2022-04-19 15:00:30,002 WARNING util.py:164 -- Processing trial results took 557.315 s, which may be a performance bottleneck. Please consider reporting results less frequently to Ray Tune.
	2022-04-19 15:00:30,002 WARNING util.py:164 -- The `process_trial` operation took 557.315 s, which may be a performance bottleneck.
	== Status ==
	Current time: 2022-04-19 15:00:30 (running for 00:21:20.07)
	Memory usage on this node: 8.9/62.0 GiB
	Using HyperBand: num_stopped=0 total_brackets=1
	Round #0:
	Bracket(Max Size (n)=27, Milestone (r)=1, completed=1.9%): {RUNNING: 4}
	Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects
	Current best trial: 26cb78b8 with val_loss=0.023672818150612285 and parameters={'model_name': 'Transformer', 'num_labels': 3, 'batch_size': 8, 'lr': 4.1848774014953113e-05, 'warmup': 0.020064220578974626, 'w_decay': 0.16619568921799385, 'n_epochs': 30, 'max_length': 512}
	Result logdir: /home/ray/ray_results/bohb_test
	Number of trials: 4/4 (4 RUNNING)
	+----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------+
	\| Trial name \| status \| loc \| batch_size \| lr \| w_decay \| warmup \| iter \| total time (s) \| val_loss \| epoch \| early_stopping_count \|
	\|----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------\|
	\| hyper_optim_25af1e94 \| RUNNING \| 172.31.12.94:645 \| 32 \| 0.0012147 \| 0.0661842 \| 0.0722013 \| \| \| \| \| \|
	\| hyper_optim_263ec5a8 \| RUNNING \| 172.31.17.28:645 \| 8 \| 0.0138314 \| 0.0611623 \| 0.0618856 \| \| \| \| \| \|
	\| hyper_optim_26cb78b8 \| RUNNING \| 172.31.12.94:644 \| 8 \| 4.18488e-05 \| 0.166196 \| 0.0200642 \| 1 \| 14.1054 \| 0.0236728 \| 1 \| 0 \|
	\| hyper_optim_26dbfec2 \| RUNNING \| 172.31.17.28:644 \| 32 \| 0.000968953 \| 0.274445 \| 0.0061904 \| 1 \| 14.2073 \| 0.0236728 \| 1 \| 0 \|
	+----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------+


	2022-04-19 15:00:30,010 WARNING ray_trial_executor.py:659 -- Over the last 60 seconds, the Tune event loop has been backlogged processing new results. Consider increasing your period of result reporting to improve performance.
	Result for hyper_optim_25af1e94:
	best_loss: 0.023672818150612285
	date: 2022-04-19_14-41-50
	done: false
	early_stopping_count: 0
	epoch: 1
	experiment_id: a2dcf332dc6b4985850fd7f8bbe46ba9
	hostname: ip-172-31-12-94
	iterations_since_restore: 1
	node_ip: 172.31.12.94
	pid: 645
	should_checkpoint: true
	time_since_restore: 14.113289833068848
	time_this_iter_s: 14.113289833068848
	time_total_s: 14.113289833068848
	timestamp: 1650404510
	timesteps_since_restore: 0
	training_iteration: 1
	trial_id: 25af1e94
	val_loss: 0.023672818150612285

	2022-04-19 15:00:31,211 INFO command_runner.py:357 -- Fetched IP: 172.31.12.94
	2022-04-19 15:00:31,211 INFO log_timer.py:27 -- NodeUpdater: ins_JYiZiYxkMuELacpf249rU7Vw: Got IP [LogTimer=34ms]
	2022-04-19 15:09:47,116 WARNING util.py:164 -- The `callbacks.on_trial_result` operation took 557.105 s, which may be a performance bottleneck.
	2022-04-19 15:09:47,117 WARNING util.py:164 -- The `process_trial_result` operation took 557.106 s, which may be a performance bottleneck.
	2022-04-19 15:09:47,117 WARNING util.py:164 -- Processing trial results took 557.106 s, which may be a performance bottleneck. Please consider reporting results less frequently to Ray Tune.
	2022-04-19 15:09:47,117 WARNING util.py:164 -- The `process_trial` operation took 557.107 s, which may be a performance bottleneck.
	== Status ==
	Current time: 2022-04-19 15:09:47 (running for 00:30:37.18)
	Memory usage on this node: 8.9/62.0 GiB
	Using HyperBand: num_stopped=0 total_brackets=1
	Round #0:
	Bracket(Max Size (n)=27, Milestone (r)=1, completed=2.8%): {RUNNING: 4}
	Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects
	Current best trial: 25af1e94 with val_loss=0.023672818150612285 and parameters={'model_name': 'Transformer', 'num_labels': 3, 'batch_size': 32, 'lr': 0.0012146952471943553, 'warmup': 0.0722013249900006, 'w_decay': 0.06618419913715975, 'n_epochs': 30, 'max_length': 512}
	Result logdir: /home/ray/ray_results/bohb_test
	Number of trials: 4/4 (4 RUNNING)
	+----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------+
	\| Trial name \| status \| loc \| batch_size \| lr \| w_decay \| warmup \| iter \| total time (s) \| val_loss \| epoch \| early_stopping_count \|
	\|----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------\|
	\| hyper_optim_25af1e94 \| RUNNING \| 172.31.12.94:645 \| 32 \| 0.0012147 \| 0.0661842 \| 0.0722013 \| 1 \| 14.1133 \| 0.0236728 \| 1 \| 0 \|
	\| hyper_optim_263ec5a8 \| RUNNING \| 172.31.17.28:645 \| 8 \| 0.0138314 \| 0.0611623 \| 0.0618856 \| \| \| \| \| \|
	\| hyper_optim_26cb78b8 \| RUNNING \| 172.31.12.94:644 \| 8 \| 4.18488e-05 \| 0.166196 \| 0.0200642 \| 1 \| 14.1054 \| 0.0236728 \| 1 \| 0 \|
	\| hyper_optim_26dbfec2 \| RUNNING \| 172.31.17.28:644 \| 32 \| 0.000968953 \| 0.274445 \| 0.0061904 \| 1 \| 14.2073 \| 0.0236728 \| 1 \| 0 \|
	+----------------------+----------+------------------+--------------+-------------+-----------+-----------+--------+------------------+------------+---------+------------------------+


	2022-04-19 15:09:47,126 WARNING ray_trial_executor.py:659 -- Over the last 60 seconds, the Tune event loop has been backlogged processing new results. Consider increasing your period of result reporting to improve performance.
	Result for hyper_optim_263ec5a8:
	best_loss: 0.023672818150612285
	date: 2022-04-19_14-41-50
	done: false
	early_stopping_count: 0
	epoch: 1
	experiment_id: 07840c8729d343e9a935c6aa7b65c6c8
	hostname: ip-172-31-17-28
	iterations_since_restore: 1
	node_ip: 172.31.17.28
	pid: 645
	should_checkpoint: true
	time_since_restore: 14.20270586013794
	time_this_iter_s: 14.20270586013794
	time_total_s: 14.20270586013794
	timestamp: 1650404510
	timesteps_since_restore: 0
	training_iteration: 1
	trial_id: 263ec5a8
	val_loss: 0.023672818150612285

	2022-04-19 15:09:48,617 INFO command_runner.py:357 -- Fetched IP: 172.31.17.28
	2022-04-19 15:09:48,617 INFO log_timer.py:27 -- NodeUpdater: ins_8Ap3Ap8Vz7BequRrxaa1sS58: Got IP [LogTimer=34ms]
	(hyper_optim pid=645, ip=172.31.12.94) 2022-04-19 15:09:50,116 INFO trainable.py:89 -- Checkpoint size is 2235556379 bytes
	(hyper_optim pid=644, ip=172.31.17.28) 2022-04-19 15:09:50,234 INFO trainable.py:89 -- Checkpoint size is 2235556379 bytes
	(hyper_optim pid=644, ip=172.31.12.94) 2022-04-19 15:09:50,326 INFO trainable.py:89 -- Checkpoint size is 2235556379 bytes