Created
April 20, 2022 21:03
-
-
Save xwjiang2010/acf08c09ed0844c74dbb65c3a012c301 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(base) ray@ip-172-31-79-189:~/e2e-tests$ vi a.py | |
(base) ray@ip-172-31-79-189:~/e2e-tests$ python a.py | |
2022-04-20 13:51:42,108 INFO main.py:985 -- [RayXGBoost] Created 4 new actors (4 total actors). Waiting until actors are ready for training. | |
2022-04-20 13:51:45,235 INFO main.py:1030 -- [RayXGBoost] Starting XGBoost training. | |
(_RemoteRayXGBoostActor pid=310, ip=172.31.75.107) [13:51:45] task [xgboost.ray]:140243459005904 got new rank 1 | |
(_RemoteRayXGBoostActor pid=343, ip=172.31.68.8) [13:51:45] task [xgboost.ray]:140569271499600 got new rank 0 | |
(_RemoteRayXGBoostActor pid=312, ip=172.31.86.190) [13:51:45] task [xgboost.ray]:139676711361104 got new rank 3 | |
(_RemoteRayXGBoostActor pid=311, ip=172.31.82.234) [13:51:45] task [xgboost.ray]:140698607811280 got new rank 2 | |
(_RemoteRayXGBoostActor pid=312, ip=172.31.86.190) [13:51:46] DEBUG: ../src/tree/updater_gpu_hist.cu:819: [GPU Hist]: Configure | |
(_RemoteRayXGBoostActor pid=311, ip=172.31.82.234) [13:51:46] DEBUG: ../src/tree/updater_gpu_hist.cu:819: [GPU Hist]: Configure | |
(_RemoteRayXGBoostActor pid=310, ip=172.31.75.107) [13:51:46] DEBUG: ../src/tree/updater_gpu_hist.cu:819: [GPU Hist]: Configure | |
(_RemoteRayXGBoostActor pid=343, ip=172.31.68.8) [13:51:46] DEBUG: ../src/tree/updater_gpu_hist.cu:819: [GPU Hist]: Configure | |
(_RemoteRayXGBoostActor pid=312, ip=172.31.86.190) ip-172-31-86-190:312:362 [0] NCCL INFO Bootstrap : Using [0]ens3:172.31.86.190<0> [1]vethdad8eac:fe80::f4e5:4aff:fec4:1c01%vethdad8eac<0> | |
(_RemoteRayXGBoostActor pid=312, ip=172.31.86.190) ip-172-31-86-190:312:362 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation | |
(_RemoteRayXGBoostActor pid=312, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=312, ip=172.31.86.190) ip-172-31-86-190:312:362 [0] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1] | |
(_RemoteRayXGBoostActor pid=312, ip=172.31.86.190) ip-172-31-86-190:312:362 [0] NCCL INFO NET/Socket : Using [0]ens3:172.31.86.190<0> [1]vethdad8eac:fe80::f4e5:4aff:fec4:1c01%vethdad8eac<0> | |
(_RemoteRayXGBoostActor pid=312, ip=172.31.86.190) ip-172-31-86-190:312:362 [0] NCCL INFO Using network Socket | |
(_RemoteRayXGBoostActor pid=312, ip=172.31.86.190) ip-172-31-86-190:312:362 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64 | |
(_RemoteRayXGBoostActor pid=312, ip=172.31.86.190) ip-172-31-86-190:312:362 [0] NCCL INFO Trees [0] -1/-1/-1->3->2|2->3->-1/-1/-1 [1] -1/-1/-1->3->2|2->3->-1/-1/-1 [2] 2/0/-1->3->1|1->3->2/0/-1 [3] 2/0/-1->3->1|1->3->2/0/-1 | |
(_RemoteRayXGBoostActor pid=311, ip=172.31.82.234) ip-172-31-82-234:311:362 [0] NCCL INFO Bootstrap : Using [0]ens3:172.31.82.234<0> [1]vethcda9fd2:fe80::4b8:cbff:fe19:7384%vethcda9fd2<0> | |
(_RemoteRayXGBoostActor pid=311, ip=172.31.82.234) ip-172-31-82-234:311:362 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation | |
(_RemoteRayXGBoostActor pid=311, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=311, ip=172.31.82.234) ip-172-31-82-234:311:362 [0] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1] | |
(_RemoteRayXGBoostActor pid=311, ip=172.31.82.234) ip-172-31-82-234:311:362 [0] NCCL INFO NET/Socket : Using [0]ens3:172.31.82.234<0> [1]vethcda9fd2:fe80::4b8:cbff:fe19:7384%vethcda9fd2<0> | |
(_RemoteRayXGBoostActor pid=311, ip=172.31.82.234) ip-172-31-82-234:311:362 [0] NCCL INFO Using network Socket | |
(_RemoteRayXGBoostActor pid=311, ip=172.31.82.234) ip-172-31-82-234:311:362 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64 | |
(_RemoteRayXGBoostActor pid=311, ip=172.31.82.234) ip-172-31-82-234:311:362 [0] NCCL INFO Trees [0] 1/3/-1->2->0|0->2->1/3/-1 [1] 1/3/-1->2->0|0->2->1/3/-1 [2] -1/-1/-1->2->3|3->2->-1/-1/-1 [3] -1/-1/-1->2->3|3->2->-1/-1/-1 | |
(_RemoteRayXGBoostActor pid=311, ip=172.31.82.234) ip-172-31-82-234:311:362 [0] NCCL INFO Channel 00 : 1[1e0] -> 2[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=311, ip=172.31.82.234) ip-172-31-82-234:311:362 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=311, ip=172.31.82.234) ip-172-31-82-234:311:362 [0] NCCL INFO Channel 00 : 2[1e0] -> 3[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=310, ip=172.31.75.107) ip-172-31-75-107:310:361 [0] NCCL INFO Bootstrap : Using [0]ens3:172.31.75.107<0> [1]veth5b15dc1:fe80::5c95:6bff:fee7:232a%veth5b15dc1<0> | |
(_RemoteRayXGBoostActor pid=310, ip=172.31.75.107) ip-172-31-75-107:310:361 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation | |
(_RemoteRayXGBoostActor pid=310, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=310, ip=172.31.75.107) ip-172-31-75-107:310:361 [0] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1] | |
(_RemoteRayXGBoostActor pid=310, ip=172.31.75.107) ip-172-31-75-107:310:361 [0] NCCL INFO NET/Socket : Using [0]ens3:172.31.75.107<0> [1]veth5b15dc1:fe80::5c95:6bff:fee7:232a%veth5b15dc1<0> | |
(_RemoteRayXGBoostActor pid=310, ip=172.31.75.107) ip-172-31-75-107:310:361 [0] NCCL INFO Using network Socket | |
(_RemoteRayXGBoostActor pid=310, ip=172.31.75.107) ip-172-31-75-107:310:361 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64 | |
(_RemoteRayXGBoostActor pid=310, ip=172.31.75.107) ip-172-31-75-107:310:361 [0] NCCL INFO Trees [0] -1/-1/-1->1->2|2->1->-1/-1/-1 [1] -1/-1/-1->1->2|2->1->-1/-1/-1 [2] 3/-1/-1->1->-1|-1->1->3/-1/-1 [3] 3/-1/-1->1->-1|-1->1->3/-1/-1 | |
(_RemoteRayXGBoostActor pid=310, ip=172.31.75.107) ip-172-31-75-107:310:361 [0] NCCL INFO Channel 00 : 0[1e0] -> 1[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=310, ip=172.31.75.107) ip-172-31-75-107:310:361 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=310, ip=172.31.75.107) ip-172-31-75-107:310:361 [0] NCCL INFO Channel 00 : 1[1e0] -> 2[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=310, ip=172.31.75.107) ip-172-31-75-107:310:361 [0] NCCL INFO Channel 00 : 2[1e0] -> 1[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=310, ip=172.31.75.107) ip-172-31-75-107:310:361 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=343, ip=172.31.68.8) ip-172-31-68-8:343:394 [0] NCCL INFO Bootstrap : Using [0]ens3:172.31.68.8<0> [1]veth41a4151:fe80::34b7:f4ff:fe0d:bbfa%veth41a4151<0> | |
(_RemoteRayXGBoostActor pid=343, ip=172.31.68.8) ip-172-31-68-8:343:394 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation | |
(_RemoteRayXGBoostActor pid=343, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=343, ip=172.31.68.8) ip-172-31-68-8:343:394 [0] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1] | |
(_RemoteRayXGBoostActor pid=343, ip=172.31.68.8) ip-172-31-68-8:343:394 [0] NCCL INFO NET/Socket : Using [0]ens3:172.31.68.8<0> [1]veth41a4151:fe80::34b7:f4ff:fe0d:bbfa%veth41a4151<0> | |
(_RemoteRayXGBoostActor pid=343, ip=172.31.68.8) ip-172-31-68-8:343:394 [0] NCCL INFO Using network Socket | |
(_RemoteRayXGBoostActor pid=343, ip=172.31.68.8) NCCL version 2.7.3+cuda11.0 | |
(_RemoteRayXGBoostActor pid=343, ip=172.31.68.8) ip-172-31-68-8:343:394 [0] NCCL INFO Channel 00/04 : 0 1 2 3 | |
(_RemoteRayXGBoostActor pid=343, ip=172.31.68.8) ip-172-31-68-8:343:394 [0] NCCL INFO Channel 01/04 : 0 1 2 3 | |
(_RemoteRayXGBoostActor pid=343, ip=172.31.68.8) ip-172-31-68-8:343:394 [0] NCCL INFO Channel 02/04 : 0 1 2 3 | |
(_RemoteRayXGBoostActor pid=343, ip=172.31.68.8) ip-172-31-68-8:343:394 [0] NCCL INFO Channel 03/04 : 0 1 2 3 | |
(_RemoteRayXGBoostActor pid=343, ip=172.31.68.8) ip-172-31-68-8:343:394 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64 | |
(_RemoteRayXGBoostActor pid=343, ip=172.31.68.8) ip-172-31-68-8:343:394 [0] NCCL INFO Trees [0] 2/-1/-1->0->-1|-1->0->2/-1/-1 [1] 2/-1/-1->0->-1|-1->0->2/-1/-1 [2] -1/-1/-1->0->3|3->0->-1/-1/-1 [3] -1/-1/-1->0->3|3->0->-1/-1/-1 | |
(_RemoteRayXGBoostActor pid=343, ip=172.31.68.8) ip-172-31-68-8:343:394 [0] NCCL INFO Channel 00 : 3[1e0] -> 0[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=343, ip=172.31.68.8) ip-172-31-68-8:343:394 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=343, ip=172.31.68.8) ip-172-31-68-8:343:394 [0] NCCL INFO Channel 00 : 0[1e0] -> 1[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=343, ip=172.31.68.8) ip-172-31-68-8:343:394 [0] NCCL INFO Channel 00 : 2[1e0] -> 0[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=343, ip=172.31.68.8) ip-172-31-68-8:343:394 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=312, ip=172.31.86.190) ip-172-31-86-190:312:362 [0] NCCL INFO Channel 00 : 2[1e0] -> 3[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=312, ip=172.31.86.190) ip-172-31-86-190:312:362 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=312, ip=172.31.86.190) ip-172-31-86-190:312:362 [0] NCCL INFO Channel 00 : 3[1e0] -> 0[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=312, ip=172.31.86.190) ip-172-31-86-190:312:362 [0] NCCL INFO Channel 00 : 3[1e0] -> 2[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=312, ip=172.31.86.190) ip-172-31-86-190:312:362 [0] NCCL INFO Channel 01 : 2[1e0] -> 3[1e0] [receive] via NET/Socket/1 | |
(_RemoteRayXGBoostActor pid=312, ip=172.31.86.190) ip-172-31-86-190:312:362 [0] NCCL INFO Channel 01 : 3[1e0] -> 0[1e0] [send] via NET/Socket/1 | |
(_RemoteRayXGBoostActor pid=311, ip=172.31.82.234) ip-172-31-82-234:311:362 [0] NCCL INFO Channel 00 : 3[1e0] -> 2[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=311, ip=172.31.82.234) ip-172-31-82-234:311:362 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=311, ip=172.31.82.234) ip-172-31-82-234:311:362 [0] NCCL INFO Channel 00 : 2[1e0] -> 0[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=311, ip=172.31.82.234) ip-172-31-82-234:311:362 [0] NCCL INFO Channel 00 : 0[1e0] -> 2[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=311, ip=172.31.82.234) ip-172-31-82-234:311:362 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=311, ip=172.31.82.234) ip-172-31-82-234:311:362 [0] NCCL INFO Channel 00 : 2[1e0] -> 1[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=311, ip=172.31.82.234) ip-172-31-82-234:311:362 [0] NCCL INFO Channel 01 : 1[1e0] -> 2[1e0] [receive] via NET/Socket/1 | |
(_RemoteRayXGBoostActor pid=311, ip=172.31.82.234) ip-172-31-82-234:311:362 [0] NCCL INFO Channel 01 : 2[1e0] -> 3[1e0] [send] via NET/Socket/1 | |
(_RemoteRayXGBoostActor pid=310, ip=172.31.75.107) ip-172-31-75-107:310:361 [0] NCCL INFO Channel 01 : 0[1e0] -> 1[1e0] [receive] via NET/Socket/1 | |
(_RemoteRayXGBoostActor pid=310, ip=172.31.75.107) ip-172-31-75-107:310:361 [0] NCCL INFO Channel 01 : 1[1e0] -> 2[1e0] [send] via NET/Socket/1 | |
(_RemoteRayXGBoostActor pid=343, ip=172.31.68.8) ip-172-31-68-8:343:394 [0] NCCL INFO Channel 00 : 0[1e0] -> 2[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=343, ip=172.31.68.8) ip-172-31-68-8:343:394 [0] NCCL INFO Channel 01 : 3[1e0] -> 0[1e0] [receive] via NET/Socket/1 | |
(_RemoteRayXGBoostActor pid=343, ip=172.31.68.8) ip-172-31-68-8:343:394 [0] NCCL INFO Channel 01 : 0[1e0] -> 1[1e0] [send] via NET/Socket/1 | |
2022-04-20 13:52:15,304 INFO main.py:1105 -- Training in progress (30 seconds since last restart). | |
2022-04-20 13:52:45,359 INFO main.py:1105 -- Training in progress (60 seconds since last restart). | |
2022-04-20 13:53:15,414 INFO main.py:1105 -- Training in progress (90 seconds since last restart). | |
2022-04-20 13:53:45,469 INFO main.py:1105 -- Training in progress (120 seconds since last restart). | |
(_RemoteRayXGBoostActor pid=312, ip=172.31.86.190) ip-172-31-86-190:312:362 [0] NCCL INFO Call to connect returned Connection timed out, retrying | |
(_RemoteRayXGBoostActor pid=343, ip=172.31.68.8) ip-172-31-68-8:343:394 [0] NCCL INFO Call to connect returned Connection timed out, retrying | |
(_RemoteRayXGBoostActor pid=310, ip=172.31.75.107) ip-172-31-75-107:310:361 [0] NCCL INFO Call to connect returned Connection timed out, retrying | |
(_RemoteRayXGBoostActor pid=311, ip=172.31.82.234) ip-172-31-82-234:311:362 [0] NCCL INFO Call to connect returned Connection timed out, retrying | |
2022-04-20 13:54:15,524 INFO main.py:1105 -- Training in progress (150 seconds since last restart). | |
2022-04-20 13:54:45,578 INFO main.py:1105 -- Training in progress (180 seconds since last restart). | |
2022-04-20 13:55:15,631 INFO main.py:1105 -- Training in progress (210 seconds since last restart). | |
2022-04-20 13:55:45,684 INFO main.py:1105 -- Training in progress (240 seconds since last restart). | |
(_RemoteRayXGBoostActor pid=312, ip=172.31.86.190) ip-172-31-86-190:312:362 [0] NCCL INFO Call to connect returned Connection timed out, retrying | |
(_RemoteRayXGBoostActor pid=343, ip=172.31.68.8) ip-172-31-68-8:343:394 [0] NCCL INFO Call to connect returned Connection timed out, retrying | |
(_RemoteRayXGBoostActor pid=310, ip=172.31.75.107) ip-172-31-75-107:310:361 [0] NCCL INFO Call to connect returned Connection timed out, retrying | |
(_RemoteRayXGBoostActor pid=311, ip=172.31.82.234) ip-172-31-82-234:311:362 [0] NCCL INFO Call to connect returned Connection timed out, retrying | |
2022-04-20 13:56:15,739 INFO main.py:1105 -- Training in progress (270 seconds since last restart). | |
2022-04-20 13:56:45,792 INFO main.py:1105 -- Training in progress (301 seconds since last restart). | |
^CProcess Process-1: | |
Traceback (most recent call last): | |
File "/home/ray/anaconda3/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap | |
self.run() | |
File "/home/ray/anaconda3/lib/python3.7/multiprocessing/process.py", line 99, in run | |
self._target(*self._args, **self._kwargs) | |
File "/home/ray/anaconda3/lib/python3.7/site-packages/xgboost_ray/main.py", line 201, in run | |
self.accept_workers(nworker) | |
File "/home/ray/anaconda3/lib/python3.7/site-packages/xgboost/tracker.py", line 313, in accept_workers | |
fd, s_addr = self.sock.accept() | |
File "/home/ray/anaconda3/lib/python3.7/socket.py", line 212, in accept | |
fd, addr = self._accept() | |
KeyboardInterrupt | |
^CTraceback (most recent call last): | |
File "a.py", line 32, in <module> | |
xgboost_params={"verbosity": 3}, | |
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/util/xgboost/release_test_util.py", line 158, in train_ray | |
**kwargs, | |
File "/home/ray/anaconda3/lib/python3.7/site-packages/xgboost_ray/main.py", line 1431, in train | |
**kwargs) | |
File "/home/ray/anaconda3/lib/python3.7/site-packages/xgboost_ray/main.py", line 1110, in _train | |
not_ready, num_returns=len(not_ready), timeout=1) | |
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper | |
return func(*args, **kwargs) | |
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/worker.py", line 1999, in wait | |
fetch_local, | |
File "python/ray/_raylet.pyx", line 1403, in ray._raylet.CoreWorker.wait | |
File "python/ray/_raylet.pyx", line 169, in ray._raylet.check_status | |
KeyboardInterrupt | |
^CError in atexit._run_exitfuncs: | |
Traceback (most recent call last): | |
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper | |
return func(*args, **kwargs) | |
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/worker.py", line 1135, in shutdown | |
disconnect(_exiting_interpreter) | |
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/worker.py", line 1676, in disconnect | |
worker.import_thread.join_import_thread() | |
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/_private/import_thread.py", line 58, in join_import_thread | |
self.t.join() | |
File "/home/ray/anaconda3/lib/python3.7/threading.py", line 1044, in join | |
self._wait_for_tstate_lock() | |
File "/home/ray/anaconda3/lib/python3.7/threading.py", line 1060, in _wait_for_tstate_lock | |
elif lock.acquire(block, timeout): | |
KeyboardInterrupt | |
terminate called without an active exception | |
Aborted (core dumped) | |
(base) ray@ip-172-31-79-189:~/e2e-tests$ | |
(base) ray@ip-172-31-79-189:~/e2e-tests$ | |
(base) ray@ip-172-31-79-189:~/e2e-tests$ | |
(base) ray@ip-172-31-79-189:~/e2e-tests$ ^C | |
(base) ray@ip-172-31-79-189:~/e2e-tests$ vi a.py | |
(base) ray@ip-172-31-79-189:~/e2e-tests$ python a.py | |
2022-04-20 13:57:33,377 INFO main.py:985 -- [RayXGBoost] Created 4 new actors (4 total actors). Waiting until actors are ready for training. | |
2022-04-20 13:57:36,400 INFO main.py:1030 -- [RayXGBoost] Starting XGBoost training. | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:36] task [xgboost.ray]:140692871935056 got new rank 0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:36] task [xgboost.ray]:140095556672912 got new rank 2 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:36] task [xgboost.ray]:139997260704848 got new rank 3 | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:36] task [xgboost.ray]:140544974912016 got new rank 1 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:36] DEBUG: ../src/tree/updater_gpu_hist.cu:819: [GPU Hist]: Configure | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:36] DEBUG: ../src/tree/updater_gpu_hist.cu:819: [GPU Hist]: Configure | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:36] DEBUG: ../src/tree/updater_gpu_hist.cu:819: [GPU Hist]: Configure | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:36] DEBUG: ../src/tree/updater_gpu_hist.cu:819: [GPU Hist]: Configure | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO Bootstrap : Using [0]ens3:172.31.68.8<0> | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1] | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO NET/Socket : Using [0]ens3:172.31.68.8<0> | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO Using network Socket | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) NCCL version 2.7.3+cuda11.0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO Bootstrap : Using [0]ens3:172.31.82.234<0> | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1] | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO NET/Socket : Using [0]ens3:172.31.82.234<0> | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO Using network Socket | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO Trees [0] 1/3/-1->2->0|0->2->1/3/-1 [1] -1/-1/-1->2->3|3->2->-1/-1/-1 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO Channel 00 : 1[1e0] -> 2[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO Channel 00 : 2[1e0] -> 3[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO Channel 00 : 3[1e0] -> 2[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO Bootstrap : Using [0]ens3:172.31.86.190<0> | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1] | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO NET/Socket : Using [0]ens3:172.31.86.190<0> | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO Using network Socket | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO Trees [0] -1/-1/-1->3->2|2->3->-1/-1/-1 [1] 2/0/-1->3->1|1->3->2/0/-1 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO Channel 00 : 2[1e0] -> 3[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO Channel 00 : 3[1e0] -> 0[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO Channel 00 : 3[1e0] -> 2[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO Bootstrap : Using [0]ens3:172.31.75.107<0> | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1] | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO NET/Socket : Using [0]ens3:172.31.75.107<0> | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO Using network Socket | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64 | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO Trees [0] -1/-1/-1->1->2|2->1->-1/-1/-1 [1] 3/-1/-1->1->-1|-1->1->3/-1/-1 | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO Channel 00 : 0[1e0] -> 1[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO Channel 00 : 1[1e0] -> 2[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO Channel 00 : 2[1e0] -> 1[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO Channel 00/02 : 0 1 2 3 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO Channel 01/02 : 0 1 2 3 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO Trees [0] 2/-1/-1->0->-1|-1->0->2/-1/-1 [1] -1/-1/-1->0->3|3->0->-1/-1/-1 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO Channel 00 : 3[1e0] -> 0[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO Channel 00 : 0[1e0] -> 1[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO Channel 00 : 2[1e0] -> 0[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO Channel 00 : 0[1e0] -> 2[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO Channel 01 : 3[1e0] -> 0[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO Channel 00 : 2[1e0] -> 0[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO Channel 00 : 0[1e0] -> 2[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO Channel 00 : 2[1e0] -> 1[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO Channel 01 : 1[1e0] -> 2[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO Channel 01 : 2[1e0] -> 3[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO Channel 01 : 3[1e0] -> 2[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO Channel 01 : 2[1e0] -> 3[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO Channel 01 : 3[1e0] -> 0[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO Channel 01 : 0[1e0] -> 3[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO Channel 01 : 3[1e0] -> 1[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO Channel 01 : 1[1e0] -> 3[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO Channel 01 : 3[1e0] -> 2[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO comm 0x7f4e4273b220 rank 3 nranks 4 cudaDev 0 busId 1e0 - Init COMPLETE | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO Channel 01 : 0[1e0] -> 1[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO Channel 01 : 1[1e0] -> 2[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO Channel 01 : 3[1e0] -> 1[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO Channel 01 : 1[1e0] -> 3[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO comm 0x7fcdbcaacd00 rank 1 nranks 4 cudaDev 0 busId 1e0 - Init COMPLETE | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64 | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO Trees [0] -1/-1/-1->1->2|2->1->-1/-1/-1 [1] 3/-1/-1->1->-1|-1->1->3/-1/-1 | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO Channel 00 : 0[1e0] -> 1[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO Channel 01 : 0[1e0] -> 1[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO Channel 01 : 0[1e0] -> 3[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO comm 0x7ff032568ed0 rank 0 nranks 4 cudaDev 0 busId 1e0 - Init COMPLETE | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO Channel 00/02 : 0 1 2 3 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO Channel 01/02 : 0 1 2 3 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO Trees [0] 2/-1/-1->0->-1|-1->0->2/-1/-1 [1] -1/-1/-1->0->3|3->0->-1/-1/-1 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO Channel 00 : 3[1e0] -> 0[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO Channel 00 : 0[1e0] -> 1[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO Channel 00 : 2[1e0] -> 0[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO comm 0x7f6567024c20 rank 2 nranks 4 cudaDev 0 busId 1e0 - Init COMPLETE | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO Trees [0] 1/3/-1->2->0|0->2->1/3/-1 [1] -1/-1/-1->2->3|3->2->-1/-1/-1 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO Channel 00 : 1[1e0] -> 2[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO Channel 00 : 2[1e0] -> 3[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO Channel 00 : 3[1e0] -> 2[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO Channel 00 : 2[1e0] -> 0[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO Channel 00 : 0[1e0] -> 2[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO Trees [0] -1/-1/-1->3->2|2->3->-1/-1/-1 [1] 2/0/-1->3->1|1->3->2/0/-1 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO Channel 00 : 2[1e0] -> 3[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO Channel 00 : 3[1e0] -> 0[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO Channel 00 : 3[1e0] -> 2[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO Channel 01 : 2[1e0] -> 3[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO Channel 01 : 3[1e0] -> 0[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO Channel 00 : 1[1e0] -> 2[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO Channel 00 : 2[1e0] -> 1[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO Channel 01 : 0[1e0] -> 1[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO Channel 00 : 0[1e0] -> 2[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO Channel 01 : 3[1e0] -> 0[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO Channel 01 : 0[1e0] -> 1[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO Channel 01 : 0[1e0] -> 3[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO comm 0x7ff0315dc530 rank 0 nranks 4 cudaDev 0 busId 1e0 - Init COMPLETE | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO Channel 00 : 2[1e0] -> 1[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO Channel 01 : 1[1e0] -> 2[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO Channel 01 : 2[1e0] -> 3[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO Channel 01 : 3[1e0] -> 2[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) ip-172-31-82-234:435:485 [0] NCCL INFO comm 0x7f650e332210 rank 2 nranks 4 cudaDev 0 busId 1e0 - Init COMPLETE | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO Channel 01 : 0[1e0] -> 3[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO Channel 01 : 3[1e0] -> 1[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO Channel 01 : 1[1e0] -> 3[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO Channel 01 : 3[1e0] -> 2[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) ip-172-31-86-190:435:485 [0] NCCL INFO comm 0x7f4e42968a90 rank 3 nranks 4 cudaDev 0 busId 1e0 - Init COMPLETE | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:37] ======== NCCL Statistics======== | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:37] AllReduce calls: 0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:37] AllReduce total MiB communicated: 0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:37] ======== Monitor (3): SketchContainer ======== | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:37] Merge: 0.003206s, 4 calls @ 3206us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:37] ======== Monitor (3): SketchContainer ======== | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:37] AllReduce: 1e-06s, 1 calls @ 1us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:37] MakeCuts: 0s, 1 calls @ 0us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:37] Merge: 0.003206s, 4 calls @ 3206us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:37] Prune: 0.000275s, 1 calls @ 275us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:37] Unique: 0.000191s, 1 calls @ 191us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:37] ======== NCCL Statistics======== | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:37] AllReduce calls: 0 | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:37] AllReduce total MiB communicated: 0 | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:37] ======== Monitor (1): SketchContainer ======== | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:37] Merge: 0.002256s, 4 calls @ 2256us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:37] ======== Monitor (1): SketchContainer ======== | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:37] AllReduce: 1e-06s, 1 calls @ 1us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:37] MakeCuts: 1e-06s, 1 calls @ 1us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:37] Merge: 0.002256s, 4 calls @ 2256us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:37] Prune: 0.000387s, 1 calls @ 387us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:37] Unique: 0.000194s, 1 calls @ 194us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO Channel 01 : 1[1e0] -> 2[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO Channel 01 : 3[1e0] -> 1[1e0] [receive] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO Channel 01 : 1[1e0] -> 3[1e0] [send] via NET/Socket/0 | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) ip-172-31-75-107:434:484 [0] NCCL INFO comm 0x7fcdbd4567f0 rank 1 nranks 4 cudaDev 0 busId 1e0 - Init COMPLETE | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:37] ======== NCCL Statistics======== | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:37] AllReduce calls: 0 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:37] AllReduce total MiB communicated: 0 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:37] ======== Monitor (0): SketchContainer ======== | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:37] Merge: 0.002154s, 4 calls @ 2154us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:37] ======== Monitor (0): SketchContainer ======== | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:37] AllReduce: 0s, 1 calls @ 0us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:37] MakeCuts: 0s, 1 calls @ 0us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:37] Merge: 0.002154s, 4 calls @ 2154us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:37] Prune: 0.000266s, 1 calls @ 266us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:37] Unique: 0.000186s, 1 calls @ 186us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO Launch mode Parallel | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) ip-172-31-68-8:500:551 [0] NCCL INFO Launch mode Parallel | |
[0] train-logloss:0.69248 train-error:0.48505 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:37] ======== NCCL Statistics======== | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:37] AllReduce calls: 0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:37] AllReduce total MiB communicated: 0 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:37] ======== Monitor (2): SketchContainer ======== | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:37] Merge: 0.002279s, 4 calls @ 2279us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:37] ======== Monitor (2): SketchContainer ======== | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:37] AllReduce: 1e-06s, 1 calls @ 1us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:37] MakeCuts: 0s, 1 calls @ 0us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:37] Merge: 0.002279s, 4 calls @ 2279us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:37] Prune: 0.000286s, 1 calls @ 286us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:37] Unique: 0.00019s, 1 calls @ 190us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
[1] train-logloss:0.69196 train-error:0.48088 | |
[2] train-logloss:0.69141 train-error:0.47766 | |
[3] train-logloss:0.69083 train-error:0.47294 | |
[4] train-logloss:0.69023 train-error:0.46812 | |
[5] train-logloss:0.68977 train-error:0.46494 | |
[6] train-logloss:0.68928 train-error:0.46292 | |
[7] train-logloss:0.68868 train-error:0.45878 | |
[8] train-logloss:0.68815 train-error:0.45628 | |
[9] train-logloss:0.68766 train-error:0.45414 | |
[10] train-logloss:0.68726 train-error:0.45273 | |
[11] train-logloss:0.68686 train-error:0.45062 | |
[12] train-logloss:0.68641 train-error:0.44842 | |
[13] train-logloss:0.68581 train-error:0.44549 | |
[14] train-logloss:0.68533 train-error:0.44346 | |
[15] train-logloss:0.68469 train-error:0.44029 | |
[16] train-logloss:0.68420 train-error:0.43867 | |
[17] train-logloss:0.68360 train-error:0.43638 | |
[18] train-logloss:0.68309 train-error:0.43474 | |
[19] train-logloss:0.68246 train-error:0.43240 | |
[20] train-logloss:0.68185 train-error:0.43016 | |
[21] train-logloss:0.68121 train-error:0.42690 | |
[22] train-logloss:0.68071 train-error:0.42520 | |
[23] train-logloss:0.68017 train-error:0.42310 | |
[24] train-logloss:0.67966 train-error:0.42142 | |
[25] train-logloss:0.67910 train-error:0.41922 | |
[26] train-logloss:0.67853 train-error:0.41694 | |
[27] train-logloss:0.67811 train-error:0.41578 | |
[28] train-logloss:0.67762 train-error:0.41457 | |
[29] train-logloss:0.67700 train-error:0.41233 | |
[30] train-logloss:0.67642 train-error:0.41058 | |
[31] train-logloss:0.67592 train-error:0.40899 | |
[32] train-logloss:0.67531 train-error:0.40591 | |
[33] train-logloss:0.67479 train-error:0.40434 | |
[34] train-logloss:0.67435 train-error:0.40312 | |
[35] train-logloss:0.67391 train-error:0.40231 | |
[36] train-logloss:0.67350 train-error:0.40128 | |
[37] train-logloss:0.67292 train-error:0.39933 | |
[38] train-logloss:0.67237 train-error:0.39804 | |
[39] train-logloss:0.67187 train-error:0.39643 | |
[40] train-logloss:0.67153 train-error:0.39571 | |
[41] train-logloss:0.67102 train-error:0.39385 | |
[42] train-logloss:0.67052 train-error:0.39232 | |
[43] train-logloss:0.67001 train-error:0.39097 | |
[44] train-logloss:0.66950 train-error:0.38964 | |
[45] train-logloss:0.66917 train-error:0.38918 | |
[46] train-logloss:0.66875 train-error:0.38830 | |
[47] train-logloss:0.66829 train-error:0.38679 | |
[48] train-logloss:0.66776 train-error:0.38525 | |
[49] train-logloss:0.66723 train-error:0.38376 | |
[50] train-logloss:0.66673 train-error:0.38200 | |
[51] train-logloss:0.66611 train-error:0.38024 | |
[52] train-logloss:0.66558 train-error:0.37866 | |
[53] train-logloss:0.66502 train-error:0.37794 | |
[54] train-logloss:0.66443 train-error:0.37612 | |
[55] train-logloss:0.66403 train-error:0.37510 | |
[56] train-logloss:0.66346 train-error:0.37338 | |
[57] train-logloss:0.66298 train-error:0.37265 | |
[58] train-logloss:0.66247 train-error:0.37158 | |
[59] train-logloss:0.66192 train-error:0.37024 | |
[60] train-logloss:0.66150 train-error:0.36951 | |
[61] train-logloss:0.66097 train-error:0.36825 | |
[62] train-logloss:0.66041 train-error:0.36676 | |
[63] train-logloss:0.66004 train-error:0.36609 | |
[64] train-logloss:0.65955 train-error:0.36491 | |
[65] train-logloss:0.65906 train-error:0.36377 | |
[66] train-logloss:0.65850 train-error:0.36249 | |
[67] train-logloss:0.65796 train-error:0.36137 | |
[68] train-logloss:0.65743 train-error:0.36039 | |
[69] train-logloss:0.65695 train-error:0.35918 | |
[70] train-logloss:0.65658 train-error:0.35866 | |
[71] train-logloss:0.65613 train-error:0.35782 | |
[72] train-logloss:0.65567 train-error:0.35692 | |
[73] train-logloss:0.65508 train-error:0.35554 | |
[74] train-logloss:0.65460 train-error:0.35479 | |
[75] train-logloss:0.65424 train-error:0.35379 | |
[76] train-logloss:0.65386 train-error:0.35296 | |
[77] train-logloss:0.65350 train-error:0.35207 | |
[78] train-logloss:0.65297 train-error:0.35044 | |
[79] train-logloss:0.65251 train-error:0.34952 | |
[80] train-logloss:0.65200 train-error:0.34912 | |
[81] train-logloss:0.65160 train-error:0.34858 | |
[82] train-logloss:0.65110 train-error:0.34719 | |
[83] train-logloss:0.65066 train-error:0.34609 | |
[84] train-logloss:0.65010 train-error:0.34526 | |
[85] train-logloss:0.64969 train-error:0.34493 | |
[86] train-logloss:0.64927 train-error:0.34408 | |
[87] train-logloss:0.64881 train-error:0.34358 | |
[88] train-logloss:0.64822 train-error:0.34230 | |
[89] train-logloss:0.64778 train-error:0.34158 | |
[90] train-logloss:0.64730 train-error:0.34050 | |
[91] train-logloss:0.64677 train-error:0.33865 | |
[92] train-logloss:0.64630 train-error:0.33776 | |
[93] train-logloss:0.64578 train-error:0.33718 | |
[94] train-logloss:0.64542 train-error:0.33624 | |
[95] train-logloss:0.64498 train-error:0.33533 | |
[96] train-logloss:0.64449 train-error:0.33414 | |
[97] train-logloss:0.64396 train-error:0.33329 | |
[98] train-logloss:0.64341 train-error:0.33245 | |
[99] train-logloss:0.64280 train-error:0.33094 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] ======== Monitor (0): Learner ======== | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] Configure: 1e-06s, 1 calls @ 1us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] EvalOneIter: 0.150938s, 100 calls @ 150938us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] GetGradient: 0.004865s, 100 calls @ 4865us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] PredictRaw: 0.448541s, 100 calls @ 448541us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] UpdateOneIter: 4.5934s, 100 calls @ 4593401us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] ======== Monitor (0): GBTree ======== | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] BoostNewTrees: 5.07532s, 100 calls @ 5075322us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] CommitModel: 0.000131s, 100 calls @ 131us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] ======== Device 0 Memory Allocations: ======== | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] Peak memory usage: 43MiB | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] Number of allocations: 20528 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] ======== Monitor (0): updater_gpu_hist ======== | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] InitData: 0.429466s, 100 calls @ 429466us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] InitDataOnce: 0.429373s, 1 calls @ 429373us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] Update: 5.06787s, 100 calls @ 5067875us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] UpdatePredictionCache: 0.006143s, 100 calls @ 6143us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] ======== NCCL Statistics======== | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] AllReduce calls: 3112 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] AllReduce total MiB communicated: 486 | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] ======== Monitor (0): gradient_based_sampler ======== | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] Sample: 0.000153s, 100 calls @ 153us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] ======== Monitor (0): GPUHistMakerDevice0 ======== | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] AllReduce: 0.081929s, 3112 calls @ 81929us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] BuildHist: 0.175306s, 3012 calls @ 175306us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] EvaluateSplits: 0.113801s, 3012 calls @ 113801us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] FinalisePosition: 0.002524s, 100 calls @ 2524us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] InitRoot: 0.230931s, 100 calls @ 230931us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] Reset: 0.019521s, 100 calls @ 19521us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] UpdatePosition: 3.42325s, 3012 calls @ 3423249us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] ======== Monitor (0): ellpack_page ======== | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] BinningCompression: 0.003812s, 1 calls @ 3812us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] InitCompressedData: 0.00021s, 1 calls @ 210us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) [13:57:43] Quantiles: 0.221693s, 1 calls @ 221693us | |
(_RemoteRayXGBoostActor pid=500, ip=172.31.68.8) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] ======== Monitor (2): Learner ======== | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] Configure: 1e-06s, 1 calls @ 1us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] EvalOneIter: 0.15664s, 100 calls @ 156640us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] GetGradient: 0.006135s, 100 calls @ 6135us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] PredictRaw: 0.509217s, 100 calls @ 509217us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] UpdateOneIter: 5.55571s, 100 calls @ 5555714us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] ======== Monitor (2): GBTree ======== | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] BoostNewTrees: 5.9744s, 100 calls @ 5974403us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] CommitModel: 0.000169s, 100 calls @ 169us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] ======== Device 0 Memory Allocations: ======== | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] Peak memory usage: 37MiB | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] Number of allocations: 20528 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] ======== Monitor (2): updater_gpu_hist ======== | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] InitData: 0.37422s, 100 calls @ 374220us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] InitDataOnce: 0.374127s, 1 calls @ 374127us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] Update: 5.96447s, 100 calls @ 5964471us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] UpdatePredictionCache: 0.008169s, 100 calls @ 8169us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] ======== NCCL Statistics======== | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] AllReduce calls: 3112 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] AllReduce total MiB communicated: 486 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] ======== Monitor (2): gradient_based_sampler ======== | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] Sample: 0.000144s, 100 calls @ 144us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] ======== Monitor (2): GPUHistMakerDevice0 ======== | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] AllReduce: 0.088283s, 3112 calls @ 88283us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] BuildHist: 0.186446s, 3012 calls @ 186446us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] EvaluateSplits: 0.116248s, 3012 calls @ 116248us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] FinalisePosition: 0.003242s, 100 calls @ 3242us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] InitRoot: 1.17477s, 100 calls @ 1174765us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] Reset: 0.022092s, 100 calls @ 22092us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] UpdatePosition: 3.42277s, 3012 calls @ 3422773us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] ======== Monitor (0): ellpack_page ======== | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] BinningCompression: 0.003649s, 1 calls @ 3649us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] InitCompressedData: 0.000246s, 1 calls @ 246us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) [13:57:43] Quantiles: 0.204815s, 1 calls @ 204815us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.82.234) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] ======== Monitor (3): Learner ======== | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] Configure: 1e-06s, 1 calls @ 1us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] EvalOneIter: 0.15206s, 100 calls @ 152060us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] GetGradient: 0.004642s, 100 calls @ 4642us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] PredictRaw: 0.461168s, 100 calls @ 461168us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] UpdateOneIter: 5.61095s, 100 calls @ 5610947us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] ======== Monitor (3): GBTree ======== | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] BoostNewTrees: 6.0802s, 100 calls @ 6080202us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] CommitModel: 0.000132s, 100 calls @ 132us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] ======== Device 0 Memory Allocations: ======== | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] Peak memory usage: 37MiB | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] Number of allocations: 20528 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] ======== Monitor (3): updater_gpu_hist ======== | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] InitData: 0.41892s, 100 calls @ 418920us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] InitDataOnce: 0.41883s, 1 calls @ 418830us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] Update: 6.07251s, 100 calls @ 6072513us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] UpdatePredictionCache: 0.006267s, 100 calls @ 6267us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] ======== NCCL Statistics======== | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] AllReduce calls: 3112 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] AllReduce total MiB communicated: 486 | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] ======== Monitor (3): gradient_based_sampler ======== | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] Sample: 0.000151s, 100 calls @ 151us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] ======== Monitor (3): GPUHistMakerDevice0 ======== | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] AllReduce: 0.080225s, 3112 calls @ 80225us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] BuildHist: 0.168289s, 3012 calls @ 168289us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] EvaluateSplits: 0.101093s, 3012 calls @ 101093us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] FinalisePosition: 0.002548s, 100 calls @ 2548us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] InitRoot: 1.24785s, 100 calls @ 1247850us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] Reset: 0.01821s, 100 calls @ 18210us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] UpdatePosition: 3.44804s, 3012 calls @ 3448039us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] ======== Monitor (0): ellpack_page ======== | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] BinningCompression: 0.003466s, 1 calls @ 3466us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] InitCompressedData: 0.000238s, 1 calls @ 238us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) [13:57:43] Quantiles: 0.201594s, 1 calls @ 201594us | |
(_RemoteRayXGBoostActor pid=435, ip=172.31.86.190) | |
[13:57:43] WARNING: ../src/gbm/gbtree.cc:390: Loading from a raw memory buffer on CPU only machine. Changing tree_method to hist. | |
[13:57:43] WARNING: ../src/learner.cc:248: No visible GPU is found, setting `gpu_id` to -1 | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] ======== Monitor (1): Learner ======== | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] Configure: 1e-06s, 1 calls @ 1us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] EvalOneIter: 0.154381s, 100 calls @ 154381us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] GetGradient: 0.004744s, 100 calls @ 4744us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] PredictRaw: 0.479793s, 100 calls @ 479793us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] UpdateOneIter: 5.58157s, 100 calls @ 5581572us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] ======== Monitor (1): GBTree ======== | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] BoostNewTrees: 6.03191s, 100 calls @ 6031913us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] CommitModel: 0.000145s, 100 calls @ 145us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] ======== Device 0 Memory Allocations: ======== | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] Peak memory usage: 37MiB | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] Number of allocations: 20528 | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] ======== Monitor (1): updater_gpu_hist ======== | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] InitData: 0.397994s, 100 calls @ 397994us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] InitDataOnce: 0.397904s, 1 calls @ 397904us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] Update: 6.02337s, 100 calls @ 6023370us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] UpdatePredictionCache: 0.006977s, 100 calls @ 6977us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] ======== NCCL Statistics======== | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] AllReduce calls: 3112 | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] AllReduce total MiB communicated: 486 | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] ======== Monitor (1): gradient_based_sampler ======== | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] Sample: 0.000156s, 100 calls @ 156us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] ======== Monitor (1): GPUHistMakerDevice0 ======== | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] AllReduce: 0.083578s, 3112 calls @ 83578us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] BuildHist: 0.17274s, 3012 calls @ 172740us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] EvaluateSplits: 0.103818s, 3012 calls @ 103818us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] FinalisePosition: 0.002837s, 100 calls @ 2837us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] InitRoot: 1.20745s, 100 calls @ 1207448us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] Reset: 0.018836s, 100 calls @ 18836us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] UpdatePosition: 3.43934s, 3012 calls @ 3439342us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] ======== Monitor (0): ellpack_page ======== | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] BinningCompression: 0.003531s, 1 calls @ 3531us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] InitCompressedData: 0.000213s, 1 calls @ 213us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) [13:57:43] Quantiles: 0.20527s, 1 calls @ 205270us | |
(_RemoteRayXGBoostActor pid=434, ip=172.31.75.107) | |
[13:57:43] WARNING: ../src/gbm/gbtree.cc:390: Loading from a raw memory buffer on CPU only machine. Changing tree_method to hist. | |
[13:57:43] WARNING: ../src/learner.cc:248: No visible GPU is found, setting `gpu_id` to -1 | |
2022-04-20 13:57:43,694 INFO main.py:1512 -- [RayXGBoost] Finished XGBoost training on training data with total N=250,000 in 12.03 seconds (7.29 pure XGBoost training time). | |
TRAIN TIME TAKEN: 12.28 seconds | |
Final training error: 0.3309 | |
PASSED. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment