Created
April 22, 2022 15:01
-
-
Save vanzod/ce6cfc5b823bfe5d71f4e1c8097a1e43 to your computer and use it in GitHub Desktop.
UCX 1.11.2 debug log for failing osu_scatter with forced sysV
This file has been truncated, but you can view the full file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[1650638541.466186] [ndv4:68756:0] debug.c:1198 UCX DEBUG using signal stack 0x2b1397c2a000 size 141824 | |
[1650638541.466330] [ndv4:68756:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638541.466349] [ndv4:68756:0] init.c:113 UCX DEBUG /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 loaded at 0x2b1397a98000 | |
[1650638541.466373] [ndv4:68756:0] init.c:115 UCX DEBUG cmd line: osu_scatter | |
[1650638541.466382] [ndv4:68756:0] module.c:69 UCX DEBUG ucs library path: /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 | |
[1650638541.466389] [ndv4:68756:0] module.c:253 UCX DEBUG loading modules for ucs | |
[1650638541.469621] [ndv4:68756:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638541.469653] [ndv4:68756:0] time.c:22 UCX DEBUG measured arch clock speed: 1000000.00 Hz | |
[1650638541.469692] [ndv4:68756:0] ucp_context.c:1361 UCX DEBUG estimated number of endpoints is 96 | |
[1650638541.469695] [ndv4:68756:0] ucp_context.c:1368 UCX DEBUG estimated number of endpoints per node is 96 | |
[1650638541.469704] [ndv4:68756:0] ucp_context.c:1375 UCX DEBUG estimated bcopy bandwidth is 5251268608.000000 | |
[1650638541.469711] [ndv4:68756:0] ucp_context.c:1427 UCX DEBUG allocation method[0] is md 'sysv' | |
[1650638541.469714] [ndv4:68756:0] ucp_context.c:1427 UCX DEBUG allocation method[1] is md 'posix' | |
[1650638541.469720] [ndv4:68756:0] ucp_context.c:1439 UCX DEBUG allocation method[2] is 'huge' | |
[1650638541.469722] [ndv4:68756:0] ucp_context.c:1439 UCX DEBUG allocation method[3] is 'thp' | |
[1650638541.469725] [ndv4:68756:0] ucp_context.c:1427 UCX DEBUG allocation method[4] is md '*' | |
[1650638541.469727] [ndv4:68756:0] ucp_context.c:1439 UCX DEBUG allocation method[5] is 'mmap' | |
[1650638541.469730] [ndv4:68756:0] ucp_context.c:1439 UCX DEBUG allocation method[6] is 'heap' | |
[1650638541.469740] [ndv4:68756:0] module.c:253 UCX DEBUG loading modules for uct | |
[1650638541.478932] [ndv4:68756:0] module.c:253 UCX DEBUG loading modules for uct_ib | |
[1650638541.479298] [ndv4:68756:0] ucp_context.c:1117 UCX DEBUG closing md posix because it has no selected transport resources | |
[1650638541.479321] [ndv4:68756:0] ucp_context.c:1117 UCX DEBUG closing md self because it has no selected transport resources | |
[1650638541.479553] [ndv4:68756:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib4) failed: Cannot assign requested address | |
[1650638541.479564] [ndv4:68756:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib3) failed: Cannot assign requested address | |
[1650638541.479575] [ndv4:68756:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib2) failed: Cannot assign requested address | |
[1650638541.479585] [ndv4:68756:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib7) failed: Cannot assign requested address | |
[1650638541.479594] [ndv4:68756:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib6) failed: Cannot assign requested address | |
[1650638541.479604] [ndv4:68756:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib5) failed: Cannot assign requested address | |
[1650638541.479615] [ndv4:68756:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib1) failed: Cannot assign requested address | |
[1650638541.479655] [ndv4:68756:0] ucp_context.c:1117 UCX DEBUG closing md tcp because it has no selected transport resources | |
[1650638541.480021] [ndv4:68756:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: Success, continuing, but fork may be unsafe. | |
[1650638541.487568] [ndv4:68756:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib0 vendor_id: 0x15b3 device_id: 4124 | |
[1650638541.487935] [ndv4:68756:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib0: disable ODP because it's not supported for DevX QP | |
[1650638541.536026] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0xccc470 [id=54 ref 1] uct_ib_async_event_handler() to hash | |
[1650638541.536156] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 54 events 0x1 mode thread_spinlock | |
[1650638541.536738] [ndv4:68756:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib0' (InfiniBand channel adapter) with 1 ports | |
[1650638541.541945] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638541.541969] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638541.542003] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638541.542084] [ndv4:68756:0] module.c:253 UCX DEBUG loading modules for ucm | |
[1650638541.542112] [ndv4:68756:0] ib_md.c:1319 UCX DEBUG mlx5_ib0: using registration cache | |
[1650638541.543539] [ndv4:68756:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638541.543552] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638541.543806] [ndv4:68756:0] ib_md.c:1604 UCX DEBUG mlx5_ib0: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638541.543871] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638541.543878] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638541.546928] [ndv4:68756:0] topo.c:99 UCX DEBUG bus id 0x101000000 doesn't exist. sys_dev = 0 | |
[1650638541.546935] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638541.549626] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638541.549632] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638541.552269] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638541.552275] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638541.554560] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638541.554566] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638541.557493] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638541.557499] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638541.557835] [ndv4:68756:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638541.569796] [ndv4:68756:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib1 vendor_id: 0x15b3 device_id: 4124 | |
[1650638541.570313] [ndv4:68756:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib1: disable ODP because it's not supported for DevX QP | |
[1650638541.571539] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0xca5040 [id=59 ref 1] uct_ib_async_event_handler() to hash | |
[1650638541.571551] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 59 events 0x1 mode thread_spinlock | |
[1650638541.571554] [ndv4:68756:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib1' (InfiniBand channel adapter) with 1 ports | |
[1650638541.572075] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638541.572084] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638541.572089] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638541.572131] [ndv4:68756:0] ib_md.c:1319 UCX DEBUG mlx5_ib1: using registration cache | |
[1650638541.573356] [ndv4:68756:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638541.573361] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638541.573717] [ndv4:68756:0] ib_md.c:1604 UCX DEBUG mlx5_ib1: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638541.573892] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638541.573897] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638541.576205] [ndv4:68756:0] topo.c:99 UCX DEBUG bus id 0x102000000 doesn't exist. sys_dev = 1 | |
[1650638541.576264] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638541.579826] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638541.579833] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638541.587690] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638541.587697] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638541.591428] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638541.591434] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638541.597662] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638541.597669] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638541.597898] [ndv4:68756:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638541.606792] [ndv4:68756:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib2 vendor_id: 0x15b3 device_id: 4124 | |
[1650638541.607351] [ndv4:68756:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib2: disable ODP because it's not supported for DevX QP | |
[1650638541.610325] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0xcc4c50 [id=61 ref 1] uct_ib_async_event_handler() to hash | |
[1650638541.610337] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 61 events 0x1 mode thread_spinlock | |
[1650638541.610341] [ndv4:68756:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib2' (InfiniBand channel adapter) with 1 ports | |
[1650638541.610515] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638541.610523] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638541.610529] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638541.610572] [ndv4:68756:0] ib_md.c:1319 UCX DEBUG mlx5_ib2: using registration cache | |
[1650638541.614064] [ndv4:68756:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638541.614079] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638541.614531] [ndv4:68756:0] ib_md.c:1604 UCX DEBUG mlx5_ib2: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638541.615058] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638541.615066] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638541.619588] [ndv4:68756:0] topo.c:99 UCX DEBUG bus id 0x103000000 doesn't exist. sys_dev = 2 | |
[1650638541.619596] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638541.623172] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638541.623185] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638541.626271] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638541.626277] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638541.628544] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638541.628551] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638541.632437] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638541.632444] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638541.632705] [ndv4:68756:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638541.639879] [ndv4:68756:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib3 vendor_id: 0x15b3 device_id: 4124 | |
[1650638541.640401] [ndv4:68756:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib3: disable ODP because it's not supported for DevX QP | |
[1650638541.641545] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0xcc4b40 [id=63 ref 1] uct_ib_async_event_handler() to hash | |
[1650638541.641584] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 63 events 0x1 mode thread_spinlock | |
[1650638541.641589] [ndv4:68756:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib3' (InfiniBand channel adapter) with 1 ports | |
[1650638541.641892] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638541.641903] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638541.641910] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638541.641959] [ndv4:68756:0] ib_md.c:1319 UCX DEBUG mlx5_ib3: using registration cache | |
[1650638541.643679] [ndv4:68756:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638541.643688] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638541.643935] [ndv4:68756:0] ib_md.c:1604 UCX DEBUG mlx5_ib3: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638541.644041] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638541.644047] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638541.647292] [ndv4:68756:0] topo.c:99 UCX DEBUG bus id 0x104000000 doesn't exist. sys_dev = 3 | |
[1650638541.647300] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638541.649251] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638541.649257] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638541.650435] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638541.650442] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638541.653604] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638541.653611] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638541.656310] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638541.656317] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638541.656596] [ndv4:68756:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638541.665110] [ndv4:68756:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib4 vendor_id: 0x15b3 device_id: 4124 | |
[1650638541.665545] [ndv4:68756:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib4: disable ODP because it's not supported for DevX QP | |
[1650638541.668526] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x13ae360 [id=65 ref 1] uct_ib_async_event_handler() to hash | |
[1650638541.668556] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 65 events 0x1 mode thread_spinlock | |
[1650638541.668560] [ndv4:68756:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib4' (InfiniBand channel adapter) with 1 ports | |
[1650638541.668790] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638541.668799] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638541.668805] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638541.668847] [ndv4:68756:0] ib_md.c:1319 UCX DEBUG mlx5_ib4: using registration cache | |
[1650638541.670164] [ndv4:68756:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638541.670172] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638541.670515] [ndv4:68756:0] ib_md.c:1604 UCX DEBUG mlx5_ib4: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638541.670830] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638541.670837] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638541.672512] [ndv4:68756:0] topo.c:99 UCX DEBUG bus id 0x105000000 doesn't exist. sys_dev = 4 | |
[1650638541.672519] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638541.674099] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638541.674106] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638541.677269] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638541.677275] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638541.687049] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638541.687057] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638541.689678] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638541.689686] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638541.689926] [ndv4:68756:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638541.699545] [ndv4:68756:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib5 vendor_id: 0x15b3 device_id: 4124 | |
[1650638541.699921] [ndv4:68756:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib5: disable ODP because it's not supported for DevX QP | |
[1650638541.702288] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0xcc4880 [id=67 ref 1] uct_ib_async_event_handler() to hash | |
[1650638541.702327] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 67 events 0x1 mode thread_spinlock | |
[1650638541.702331] [ndv4:68756:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib5' (InfiniBand channel adapter) with 1 ports | |
[1650638541.702773] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638541.702783] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638541.702789] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638541.702837] [ndv4:68756:0] ib_md.c:1319 UCX DEBUG mlx5_ib5: using registration cache | |
[1650638541.704709] [ndv4:68756:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638541.704717] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638541.704995] [ndv4:68756:0] ib_md.c:1604 UCX DEBUG mlx5_ib5: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638541.705010] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638541.705015] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638541.709012] [ndv4:68756:0] topo.c:99 UCX DEBUG bus id 0x106000000 doesn't exist. sys_dev = 5 | |
[1650638541.709020] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638541.713662] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638541.713670] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638541.718645] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638541.718652] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638541.723005] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638541.723012] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638541.726838] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638541.726859] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638541.727132] [ndv4:68756:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638541.740267] [ndv4:68756:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib6 vendor_id: 0x15b3 device_id: 4124 | |
[1650638541.740712] [ndv4:68756:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib6: disable ODP because it's not supported for DevX QP | |
[1650638541.742973] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0xcc61a0 [id=69 ref 1] uct_ib_async_event_handler() to hash | |
[1650638541.743018] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 69 events 0x1 mode thread_spinlock | |
[1650638541.743023] [ndv4:68756:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib6' (InfiniBand channel adapter) with 1 ports | |
[1650638541.743677] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638541.743691] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638541.743698] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638541.743757] [ndv4:68756:0] ib_md.c:1319 UCX DEBUG mlx5_ib6: using registration cache | |
[1650638541.744879] [ndv4:68756:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638541.744888] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638541.745484] [ndv4:68756:0] ib_md.c:1604 UCX DEBUG mlx5_ib6: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638541.745587] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638541.745593] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638541.752159] [ndv4:68756:0] topo.c:99 UCX DEBUG bus id 0x107000000 doesn't exist. sys_dev = 6 | |
[1650638541.752168] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638541.757649] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638541.757656] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638541.763532] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638541.763540] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638541.769533] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638541.769540] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638541.775539] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638541.775545] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638541.775774] [ndv4:68756:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638541.785158] [ndv4:68756:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib7 vendor_id: 0x15b3 device_id: 4124 | |
[1650638541.785545] [ndv4:68756:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib7: disable ODP because it's not supported for DevX QP | |
[1650638541.787775] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x13afa40 [id=71 ref 1] uct_ib_async_event_handler() to hash | |
[1650638541.787804] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 71 events 0x1 mode thread_spinlock | |
[1650638541.787807] [ndv4:68756:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib7' (InfiniBand channel adapter) with 1 ports | |
[1650638541.788637] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638541.788647] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638541.788652] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638541.788695] [ndv4:68756:0] ib_md.c:1319 UCX DEBUG mlx5_ib7: using registration cache | |
[1650638541.791159] [ndv4:68756:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638541.791167] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638541.791397] [ndv4:68756:0] ib_md.c:1604 UCX DEBUG mlx5_ib7: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638541.792002] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638541.792008] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638541.823645] [ndv4:68756:0] topo.c:99 UCX DEBUG bus id 0x108000000 doesn't exist. sys_dev = 7 | |
[1650638541.823654] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638541.827698] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x108000000 exists. sys_dev = 7 | |
[1650638541.827705] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638541.855391] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x108000000 exists. sys_dev = 7 | |
[1650638541.855399] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638541.857956] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x108000000 exists. sys_dev = 7 | |
[1650638541.857961] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638541.876086] [ndv4:68756:0] topo.c:91 UCX DEBUG bus id 0x108000000 exists. sys_dev = 7 | |
[1650638541.876094] [ndv4:68756:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638541.876186] [ndv4:68756:0] ucp_context.c:1556 UCX DEBUG created ucp context 0xcb1f10 0xcb1f10 [10 mds 42 tls] features 0x1 tl bitmap 0x3ffffffffff 0x0 | |
[1650638541.972950] [ndv4:68756:0] mm_sysv.c:94 UCX DEBUG mm failed to allocate 8447 bytes with hugetlb | |
[1650638541.973057] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool mm_recv_desc: align 64, maxelems 4294967295, elemsize 8368 | |
[1650638541.974408] [ndv4:68756:0] mm_sysv.c:94 UCX DEBUG mm failed to allocate 4292720 bytes with hugetlb | |
[1650638541.974423] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool mm_recv_desc: allocated chunk 0x2b139ef2f018 of 4296680 bytes with 512 elements | |
[1650638541.974998] [ndv4:68756:0] mm_iface.c:600 UCX DEBUG created mm iface 0x142e220 FIFO id 0xd002b va 0x2b1397ff4000 size 12288 (128 x 64 elems) | |
[1650638541.975050] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[0]=0x142e220 using sysv/memory on worker 0x1d084d0 | |
[1650638541.981692] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638541.981704] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638541.981997] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638541.982002] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638541.993010] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638542.002683] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638542.002720] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638542.002723] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638542.002776] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638542.003837] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638542.003845] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638542.004354] [ndv4:68756:0] ib_iface.c:994 UCX DEBUG iface=0x1439430: created RC QP 0x2121 on mlx5_ib0:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638542.006517] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[1]=0x1439430 using rc_verbs/mlx5_ib0:1 on worker 0x1d084d0 | |
[1650638542.006705] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638542.006712] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638542.006928] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638542.006933] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638542.007510] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638542.007784] [ndv4:68756:0] ib_device.c:1394 UCX DEBUG max IB CQE size is 128 | |
[1650638542.009142] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638542.009152] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638542.009155] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638542.009229] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638542.009752] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x202f010 of 8176 bytes with 127 elements | |
[1650638542.009972] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638542.009993] [ndv4:68756:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib0:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638542.010046] [ndv4:68756:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib0 length=2048) failed: Invalid argument | |
[1650638542.010052] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638542.020121] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x139cb00 [id=78 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638542.020161] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 78 events 0x1 mode thread_spinlock | |
[1650638542.020804] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[2]=0x1446fb0 using rc_mlx5/mlx5_ib0:1 on worker 0x1d084d0 | |
[1650638542.021162] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638542.021170] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638542.021847] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638542.021853] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638542.023118] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638542.036074] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638542.036087] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638542.036091] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638542.036109] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638542.036672] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638542.036681] [ndv4:68756:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib0:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638542.036726] [ndv4:68756:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib0 length=2048) failed: Invalid argument | |
[1650638542.036730] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638542.036739] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x139c860 [id=80 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638542.036763] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 80 events 0x1 mode thread_spinlock | |
[1650638542.037415] [ndv4:68756:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638542.047046] [ndv4:68756:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x2031050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x2144 | |
[1650638542.047394] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638542.047400] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638542.047423] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2b13a1b49008 of 151544 bytes with 1052 elements | |
[1650638542.051105] [ndv4:68756:0] ib_md.c:812 UCX DEBUG registered memory 0x2b139f400000..0x2b13a1a00000 on mlx5_ib0 lkey 0x80800 rkey 0x80800 access 0xf flags 0x3e4 | |
[1650638542.051131] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2b139f400018 of 39845864 bytes with 4752 elements | |
[1650638542.051306] [ndv4:68756:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x2031050 | |
[1650638542.051340] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[3]=0x2031050 using dc_mlx5/mlx5_ib0:1 on worker 0x1d084d0 | |
[1650638542.051558] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638542.051567] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638542.051728] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638542.051733] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638542.052570] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638542.054059] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638542.054535] [ndv4:68756:0] ib_iface.c:994 UCX DEBUG iface=0x14445f0: created UD QP 0x212a on mlx5_ib0:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638542.055559] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638542.055948] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638542.055955] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638542.056066] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638542.056071] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638542.056571] [ndv4:68756:0] ib_md.c:812 UCX DEBUG registered memory 0x2b13a1b6e000..0x2b13a1bf3000 on mlx5_ib0 lkey 0x80900 rkey 0x80900 access 0xf flags 0x3e4 | |
[1650638542.056577] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2b13a1b6e018 of 544744 bytes with 128 elements | |
[1650638542.056581] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638542.057614] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x14445f0: adding gid fe80::15:5dff:fd34:1b to hash on device mlx5_ib0 port 1 index 0) | |
[1650638542.058552] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x14445f0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 1) | |
[1650638542.059154] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x14445f0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 2) | |
[1650638542.060177] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x14445f0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 3) | |
[1650638542.061151] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x14445f0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 4) | |
[1650638542.061204] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x14445f0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 5) | |
[1650638542.061702] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x14445f0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 6) | |
[1650638542.069592] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x14445f0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 7) | |
[1650638542.069972] [ndv4:68756:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638542.079823] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x13a09c0 [id=81 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638542.079867] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 81 events 0x5 mode thread_spinlock | |
[1650638542.080370] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[4]=0x14445f0 using ud_verbs/mlx5_ib0:1 on worker 0x1d084d0 | |
[1650638542.080412] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638542.080420] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638542.080664] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638542.080669] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638542.081190] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638542.090539] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638542.090958] [ndv4:68756:0] ib_iface.c:994 UCX DEBUG iface=0x144f920: created UD QP 0x212b on mlx5_ib0:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638542.090976] [ndv4:68756:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638542.091780] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638542.092235] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638542.092241] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638542.092363] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638542.092367] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638542.092905] [ndv4:68756:0] ib_md.c:812 UCX DEBUG registered memory 0x2b13a1bf3000..0x2b13a1c78000 on mlx5_ib0 lkey 0x80a00 rkey 0x80a00 access 0xf flags 0x3e4 | |
[1650638542.092912] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2b13a1bf3018 of 544744 bytes with 128 elements | |
[1650638542.092918] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638542.093508] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x144f920: adding gid fe80::15:5dff:fd34:1b to hash on device mlx5_ib0 port 1 index 0) | |
[1650638542.094465] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x144f920: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 1) | |
[1650638542.095625] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x144f920: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 2) | |
[1650638542.096290] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x144f920: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 3) | |
[1650638542.096879] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x144f920: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 4) | |
[1650638542.097867] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x144f920: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 5) | |
[1650638542.098806] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x144f920: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 6) | |
[1650638542.099084] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x144f920: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 7) | |
[1650638542.099092] [ndv4:68756:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib0:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638542.099125] [ndv4:68756:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638542.099130] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x13a0680 [id=82 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638542.099153] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 82 events 0x5 mode thread_spinlock | |
[1650638542.099166] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[5]=0x144f920 using ud_mlx5/mlx5_ib0:1 on worker 0x1d084d0 | |
[1650638542.099535] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638542.099541] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638542.099808] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638542.099815] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638542.107732] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib1:1 | |
[1650638542.116296] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638542.116305] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638542.116308] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638542.116344] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638542.117347] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638542.117355] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638542.117910] [ndv4:68756:0] ib_iface.c:994 UCX DEBUG iface=0x25d8050: created RC QP 0x2077 on mlx5_ib1:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638542.118368] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[6]=0x25d8050 using rc_verbs/mlx5_ib1:1 on worker 0x1d084d0 | |
[1650638542.118582] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638542.118588] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638542.118720] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638542.118725] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638542.119597] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib1:1 | |
[1650638542.121629] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638542.121639] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638542.121642] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638542.121677] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638542.122160] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x2531010 of 8176 bytes with 127 elements | |
[1650638542.122456] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638542.122464] [ndv4:68756:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib1:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638542.122503] [ndv4:68756:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib1 length=2048) failed: Invalid argument | |
[1650638542.122507] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638542.122520] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x139f910 [id=85 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638542.122548] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 85 events 0x1 mode thread_spinlock | |
[1650638542.122558] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[7]=0x245e020 using rc_mlx5/mlx5_ib1:1 on worker 0x1d084d0 | |
[1650638542.122758] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638542.122764] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638542.122936] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638542.122941] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638542.123477] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib1:1 | |
[1650638542.125956] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638542.125971] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638542.125974] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638542.126031] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638542.126673] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638542.126680] [ndv4:68756:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib1:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638542.126713] [ndv4:68756:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib1 length=2048) failed: Invalid argument | |
[1650638542.126716] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638542.126723] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0xccde70 [id=87 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638542.126753] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 87 events 0x1 mode thread_spinlock | |
[1650638542.127715] [ndv4:68756:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638542.144166] [ndv4:68756:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x277d010: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x209a | |
[1650638542.153058] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638542.153067] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638542.153077] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2b13a4479008 of 151544 bytes with 1052 elements | |
[1650638542.157126] [ndv4:68756:0] ib_md.c:812 UCX DEBUG registered memory 0x2b13a1e00000..0x2b13a4400000 on mlx5_ib1 lkey 0x80400 rkey 0x80400 access 0xf flags 0x3e4 | |
[1650638542.157148] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2b13a1e00018 of 39845864 bytes with 4752 elements | |
[1650638542.157304] [ndv4:68756:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x277d010 | |
[1650638542.157340] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[8]=0x277d010 using dc_mlx5/mlx5_ib1:1 on worker 0x1d084d0 | |
[1650638542.157856] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638542.157870] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638542.158043] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638542.158049] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638542.159813] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib1:1 | |
[1650638542.161848] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638542.162183] [ndv4:68756:0] ib_iface.c:994 UCX DEBUG iface=0x1443710: created UD QP 0x20b9 on mlx5_ib1:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638542.162955] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638542.170450] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638542.170458] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638542.170568] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638542.170574] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638542.171376] [ndv4:68756:0] ib_md.c:812 UCX DEBUG registered memory 0x2b13a449e000..0x2b13a4523000 on mlx5_ib1 lkey 0x80500 rkey 0x80500 access 0xf flags 0x3e4 | |
[1650638542.171382] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2b13a449e018 of 544744 bytes with 128 elements | |
[1650638542.171386] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638542.172154] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x1443710: adding gid fe80::15:5dff:fd34:1c to hash on device mlx5_ib1 port 1 index 0) | |
[1650638542.173326] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x1443710: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 1) | |
[1650638542.174087] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x1443710: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 2) | |
[1650638542.175003] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x1443710: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 3) | |
[1650638542.175694] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x1443710: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 4) | |
[1650638542.176419] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x1443710: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 5) | |
[1650638542.176824] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x1443710: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 6) | |
[1650638542.178000] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x1443710: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 7) | |
[1650638542.178569] [ndv4:68756:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638542.178577] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x1450fb0 [id=88 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638542.178611] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 88 events 0x5 mode thread_spinlock | |
[1650638542.178620] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[9]=0x1443710 using ud_verbs/mlx5_ib1:1 on worker 0x1d084d0 | |
[1650638542.178801] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638542.178819] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638542.178862] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638542.178866] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638542.179830] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib1:1 | |
[1650638542.181637] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638542.182181] [ndv4:68756:0] ib_iface.c:994 UCX DEBUG iface=0x1d286c0: created UD QP 0x20da on mlx5_ib1:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638542.182188] [ndv4:68756:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638542.183081] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638542.183494] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638542.183502] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638542.183528] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638542.183544] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638542.184151] [ndv4:68756:0] ib_md.c:812 UCX DEBUG registered memory 0x2b13a4523000..0x2b13a45a8000 on mlx5_ib1 lkey 0x80600 rkey 0x80600 access 0xf flags 0x3e4 | |
[1650638542.184158] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2b13a4523018 of 544744 bytes with 128 elements | |
[1650638542.184162] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638542.184200] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x1d286c0: adding gid fe80::15:5dff:fd34:1c to hash on device mlx5_ib1 port 1 index 0) | |
[1650638542.194058] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x1d286c0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 1) | |
[1650638542.194681] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x1d286c0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 2) | |
[1650638542.194845] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x1d286c0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 3) | |
[1650638542.195150] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x1d286c0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 4) | |
[1650638542.195572] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x1d286c0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 5) | |
[1650638542.196443] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x1d286c0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 6) | |
[1650638542.196773] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x1d286c0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 7) | |
[1650638542.196780] [ndv4:68756:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib1:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638542.196816] [ndv4:68756:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638542.196820] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x142fb90 [id=89 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638542.196853] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 89 events 0x5 mode thread_spinlock | |
[1650638542.196866] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[10]=0x1d286c0 using ud_mlx5/mlx5_ib1:1 on worker 0x1d084d0 | |
[1650638542.196878] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638542.196884] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638542.196943] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638542.196948] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638542.197798] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib2:1 | |
[1650638542.199958] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638542.199978] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638542.199982] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638542.200035] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638542.201310] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638542.201317] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638542.201805] [ndv4:68756:0] ib_iface.c:994 UCX DEBUG iface=0x2b70020: created RC QP 0x20bd on mlx5_ib2:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638542.202158] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[11]=0x2b70020 using rc_verbs/mlx5_ib2:1 on worker 0x1d084d0 | |
[1650638542.202445] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638542.202456] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638542.202725] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638542.202739] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638542.203641] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib2:1 | |
[1650638542.206445] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638542.206461] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638542.206464] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638542.206516] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638542.206986] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x2e91010 of 8176 bytes with 127 elements | |
[1650638542.207280] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638542.207286] [ndv4:68756:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib2:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638542.207356] [ndv4:68756:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib2 length=2048) failed: Invalid argument | |
[1650638542.207360] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638542.207370] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x2466cc0 [id=92 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638542.207398] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 92 events 0x1 mode thread_spinlock | |
[1650638542.207409] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[12]=0x2cbd030 using rc_mlx5/mlx5_ib2:1 on worker 0x1d084d0 | |
[1650638542.207470] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638542.207475] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638542.207658] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638542.207664] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638542.217072] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib2:1 | |
[1650638542.219152] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638542.219170] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638542.219173] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638542.219252] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638542.219818] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638542.219829] [ndv4:68756:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib2:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638542.219883] [ndv4:68756:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib2 length=2048) failed: Invalid argument | |
[1650638542.219887] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638542.219897] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x2cc5f00 [id=94 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638542.219921] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 94 events 0x1 mode thread_spinlock | |
[1650638542.220651] [ndv4:68756:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638542.229903] [ndv4:68756:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x2e93050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x2100 | |
[1650638542.237007] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638542.237030] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638542.237041] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2b13a6da9008 of 151544 bytes with 1052 elements | |
[1650638542.241110] [ndv4:68756:0] ib_md.c:812 UCX DEBUG registered memory 0x2b13a4600000..0x2b13a6c00000 on mlx5_ib2 lkey 0x80400 rkey 0x80400 access 0xf flags 0x3e4 | |
[1650638542.241133] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2b13a4600018 of 39845864 bytes with 4752 elements | |
[1650638542.241332] [ndv4:68756:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x2e93050 | |
[1650638542.241363] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[13]=0x2e93050 using dc_mlx5/mlx5_ib2:1 on worker 0x1d084d0 | |
[1650638542.241603] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638542.241620] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638542.241819] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638542.241824] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638542.248757] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib2:1 | |
[1650638542.258538] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638542.258966] [ndv4:68756:0] ib_iface.c:994 UCX DEBUG iface=0x21fc500: created UD QP 0x20e7 on mlx5_ib2:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638542.259778] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638542.260071] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638542.260089] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638542.260605] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638542.260624] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638542.261322] [ndv4:68756:0] ib_md.c:812 UCX DEBUG registered memory 0x2b13a6dce000..0x2b13a6e53000 on mlx5_ib2 lkey 0x80500 rkey 0x80500 access 0xf flags 0x3e4 | |
[1650638542.261327] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2b13a6dce018 of 544744 bytes with 128 elements | |
[1650638542.261332] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638542.262110] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x21fc500: adding gid fe80::15:5dff:fd34:1d to hash on device mlx5_ib2 port 1 index 0) | |
[1650638542.263028] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x21fc500: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 1) | |
[1650638542.264868] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x21fc500: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 2) | |
[1650638542.264896] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x21fc500: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 3) | |
[1650638542.272459] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x21fc500: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 4) | |
[1650638542.272809] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x21fc500: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 5) | |
[1650638542.273550] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x21fc500: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 6) | |
[1650638542.273909] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x21fc500: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 7) | |
[1650638542.274326] [ndv4:68756:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638542.274334] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x14371b0 [id=95 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638542.274365] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 95 events 0x5 mode thread_spinlock | |
[1650638542.274379] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[14]=0x21fc500 using ud_verbs/mlx5_ib2:1 on worker 0x1d084d0 | |
[1650638542.274700] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638542.274706] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638542.274772] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638542.274776] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638542.275444] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib2:1 | |
[1650638542.277412] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638542.278022] [ndv4:68756:0] ib_iface.c:994 UCX DEBUG iface=0x142c5a0: created UD QP 0x20e8 on mlx5_ib2:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638542.278031] [ndv4:68756:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638542.278771] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638542.286671] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638542.286707] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638542.286966] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638542.286971] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638542.287421] [ndv4:68756:0] ib_md.c:812 UCX DEBUG registered memory 0x2b13a6e53000..0x2b13a6ed8000 on mlx5_ib2 lkey 0x80600 rkey 0x80600 access 0xf flags 0x3e4 | |
[1650638542.287433] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2b13a6e53018 of 544744 bytes with 128 elements | |
[1650638542.287438] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638542.288501] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x142c5a0: adding gid fe80::15:5dff:fd34:1d to hash on device mlx5_ib2 port 1 index 0) | |
[1650638542.294741] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x142c5a0: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 1) | |
[1650638542.295288] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x142c5a0: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 2) | |
[1650638542.296130] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x142c5a0: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 3) | |
[1650638542.296431] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x142c5a0: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 4) | |
[1650638542.296873] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x142c5a0: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 5) | |
[1650638542.297111] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x142c5a0: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 6) | |
[1650638542.297488] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x142c5a0: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 7) | |
[1650638542.297496] [ndv4:68756:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib2:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638542.297533] [ndv4:68756:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638542.297538] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x2b78c40 [id=96 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638542.297564] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 96 events 0x5 mode thread_spinlock | |
[1650638542.297578] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[15]=0x142c5a0 using ud_mlx5/mlx5_ib2:1 on worker 0x1d084d0 | |
[1650638542.297747] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638542.297754] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638542.297922] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638542.297928] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638542.308064] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib3:1 | |
[1650638542.318199] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638542.318319] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638542.318324] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638542.318385] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638542.319437] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638542.319447] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638542.319931] [ndv4:68756:0] ib_iface.c:994 UCX DEBUG iface=0x328d0a0: created RC QP 0x20cf on mlx5_ib3:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638542.320328] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[16]=0x328d0a0 using rc_verbs/mlx5_ib3:1 on worker 0x1d084d0 | |
[1650638542.320551] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638542.320557] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638542.320897] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638542.320905] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638542.321788] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib3:1 | |
[1650638542.333137] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638542.333152] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638542.333155] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638542.333260] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638542.333945] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x35ae010 of 8176 bytes with 127 elements | |
[1650638542.334175] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638542.334184] [ndv4:68756:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib3:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638542.334275] [ndv4:68756:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib3 length=2048) failed: Invalid argument | |
[1650638542.334279] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638542.334290] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x1451c70 [id=99 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638542.334324] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 99 events 0x1 mode thread_spinlock | |
[1650638542.334348] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[17]=0x33da030 using rc_mlx5/mlx5_ib3:1 on worker 0x1d084d0 | |
[1650638542.334762] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638542.334769] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638542.335406] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638542.335414] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638542.335841] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib3:1 | |
[1650638542.344977] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638542.344996] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638542.344999] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638542.345056] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638542.345519] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638542.345531] [ndv4:68756:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib3:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638542.345570] [ndv4:68756:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib3 length=2048) failed: Invalid argument | |
[1650638542.345574] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638542.345583] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x306faa0 [id=101 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638542.345611] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 101 events 0x1 mode thread_spinlock | |
[1650638542.346275] [ndv4:68756:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638542.355552] [ndv4:68756:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x35b0050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x20fa | |
[1650638542.355812] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638542.355819] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638542.355828] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2b13a96d9008 of 151544 bytes with 1052 elements | |
[1650638542.359873] [ndv4:68756:0] ib_md.c:812 UCX DEBUG registered memory 0x2b13a7000000..0x2b13a9600000 on mlx5_ib3 lkey 0x80400 rkey 0x80400 access 0xf flags 0x3e4 | |
[1650638542.359895] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2b13a7000018 of 39845864 bytes with 4752 elements | |
[1650638542.360034] [ndv4:68756:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x35b0050 | |
[1650638542.360070] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[18]=0x35b0050 using dc_mlx5/mlx5_ib3:1 on worker 0x1d084d0 | |
[1650638542.360507] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638542.360519] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638542.360765] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638542.360771] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638542.368445] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib3:1 | |
[1650638542.379290] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638542.379636] [ndv4:68756:0] ib_iface.c:994 UCX DEBUG iface=0x378c060: created UD QP 0x2108 on mlx5_ib3:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638542.380339] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638542.381154] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638542.381162] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638542.381285] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638542.381291] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638542.381832] [ndv4:68756:0] ib_md.c:812 UCX DEBUG registered memory 0x2b13a96fe000..0x2b13a9783000 on mlx5_ib3 lkey 0x80500 rkey 0x80500 access 0xf flags 0x3e4 | |
[1650638542.381839] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2b13a96fe018 of 544744 bytes with 128 elements | |
[1650638542.381844] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638542.382520] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x378c060: adding gid fe80::15:5dff:fd34:1e to hash on device mlx5_ib3 port 1 index 0) | |
[1650638542.403796] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x378c060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 1) | |
[1650638542.404704] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x378c060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 2) | |
[1650638542.406464] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x378c060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 3) | |
[1650638542.406642] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x378c060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 4) | |
[1650638542.407418] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x378c060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 5) | |
[1650638542.424963] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x378c060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 6) | |
[1650638542.425320] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x378c060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 7) | |
[1650638542.425710] [ndv4:68756:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638542.425720] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x33e2f50 [id=102 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638542.425756] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 102 events 0x5 mode thread_spinlock | |
[1650638542.425767] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[19]=0x378c060 using ud_verbs/mlx5_ib3:1 on worker 0x1d084d0 | |
[1650638542.425905] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638542.425912] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638542.426196] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638542.426201] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638542.427059] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib3:1 | |
[1650638542.428322] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638542.428931] [ndv4:68756:0] ib_iface.c:994 UCX DEBUG iface=0x38aa050: created UD QP 0x2109 on mlx5_ib3:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638542.428941] [ndv4:68756:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638542.429942] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638542.430091] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638542.430097] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638542.430193] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638542.430198] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638542.430846] [ndv4:68756:0] ib_md.c:812 UCX DEBUG registered memory 0x2b13a9783000..0x2b13a9808000 on mlx5_ib3 lkey 0x80600 rkey 0x80600 access 0xf flags 0x3e4 | |
[1650638542.430854] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2b13a9783018 of 544744 bytes with 128 elements | |
[1650638542.430859] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638542.431464] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x38aa050: adding gid fe80::15:5dff:fd34:1e to hash on device mlx5_ib3 port 1 index 0) | |
[1650638542.440778] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x38aa050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 1) | |
[1650638542.441295] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x38aa050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 2) | |
[1650638542.442015] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x38aa050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 3) | |
[1650638542.442300] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x38aa050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 4) | |
[1650638542.443418] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x38aa050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 5) | |
[1650638542.451493] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x38aa050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 6) | |
[1650638542.466187] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x38aa050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 7) | |
[1650638542.466197] [ndv4:68756:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib3:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638542.466259] [ndv4:68756:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638542.466265] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x2959ad0 [id=103 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638542.466300] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 103 events 0x5 mode thread_spinlock | |
[1650638542.466311] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[20]=0x38aa050 using ud_mlx5/mlx5_ib3:1 on worker 0x1d084d0 | |
[1650638542.466721] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638542.466727] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638542.473973] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638542.473996] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638542.481427] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib4:1 | |
[1650638542.483576] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638542.483593] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638542.483597] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638542.483650] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638542.484688] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638542.484699] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638542.485294] [ndv4:68756:0] ib_iface.c:994 UCX DEBUG iface=0x39aa0a0: created RC QP 0x2119 on mlx5_ib4:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638542.485769] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[21]=0x39aa0a0 using rc_verbs/mlx5_ib4:1 on worker 0x1d084d0 | |
[1650638542.486342] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638542.486348] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638542.492335] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638542.492346] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638542.493752] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib4:1 | |
[1650638542.496120] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638542.496134] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638542.496138] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638542.496188] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638542.496966] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x3ccb010 of 8176 bytes with 127 elements | |
[1650638542.497243] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638542.497251] [ndv4:68756:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib4:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638542.497297] [ndv4:68756:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib4 length=2048) failed: Invalid argument | |
[1650638542.497301] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638542.497312] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x33e2d20 [id=106 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638542.497349] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 106 events 0x1 mode thread_spinlock | |
[1650638542.497362] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[22]=0x3af7030 using rc_mlx5/mlx5_ib4:1 on worker 0x1d084d0 | |
[1650638542.497376] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638542.497382] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638542.497707] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638542.497724] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638542.504953] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib4:1 | |
[1650638542.506802] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638542.506825] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638542.506828] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638542.506885] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638542.507310] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638542.507319] [ndv4:68756:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib4:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638542.507358] [ndv4:68756:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib4 length=2048) failed: Invalid argument | |
[1650638542.507362] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638542.507373] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x3affe80 [id=108 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638542.507402] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 108 events 0x1 mode thread_spinlock | |
[1650638542.508021] [ndv4:68756:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638542.518492] [ndv4:68756:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x3ccd050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x213e | |
[1650638542.519702] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638542.519713] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638542.519723] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2b13ac00a008 of 151544 bytes with 1052 elements | |
[1650638542.523876] [ndv4:68756:0] ib_md.c:812 UCX DEBUG registered memory 0x2b13a9a00000..0x2b13ac000000 on mlx5_ib4 lkey 0x80400 rkey 0x80400 access 0xf flags 0x3e4 | |
[1650638542.523901] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2b13a9a00018 of 39845864 bytes with 4752 elements | |
[1650638542.524041] [ndv4:68756:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x3ccd050 | |
[1650638542.524078] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[23]=0x3ccd050 using dc_mlx5/mlx5_ib4:1 on worker 0x1d084d0 | |
[1650638542.524200] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638542.524267] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638542.524568] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638542.524573] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638542.525534] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib4:1 | |
[1650638542.535811] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638542.536143] [ndv4:68756:0] ib_iface.c:994 UCX DEBUG iface=0x3ea9060: created UD QP 0x215c on mlx5_ib4:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638542.536856] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638542.545547] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638542.545556] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638542.545604] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638542.545617] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638542.546166] [ndv4:68756:0] ib_md.c:812 UCX DEBUG registered memory 0x2b13ac02f000..0x2b13ac0b4000 on mlx5_ib4 lkey 0x80500 rkey 0x80500 access 0xf flags 0x3e4 | |
[1650638542.546178] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2b13ac02f018 of 544744 bytes with 128 elements | |
[1650638542.546183] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638542.547464] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x3ea9060: adding gid fe80::15:5dff:fd34:1f to hash on device mlx5_ib4 port 1 index 0) | |
[1650638542.548974] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x3ea9060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 1) | |
[1650638542.549541] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x3ea9060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 2) | |
[1650638542.550106] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x3ea9060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 3) | |
[1650638542.550511] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x3ea9060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 4) | |
[1650638542.551034] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x3ea9060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 5) | |
[1650638542.551307] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x3ea9060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 6) | |
[1650638542.551522] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x3ea9060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 7) | |
[1650638542.551877] [ndv4:68756:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638542.551887] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x3985e10 [id=109 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638542.551902] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 109 events 0x5 mode thread_spinlock | |
[1650638542.551914] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[24]=0x3ea9060 using ud_verbs/mlx5_ib4:1 on worker 0x1d084d0 | |
[1650638542.552065] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638542.552072] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638542.552455] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638542.552462] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638542.552884] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib4:1 | |
[1650638542.554348] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638542.554642] [ndv4:68756:0] ib_iface.c:994 UCX DEBUG iface=0x3fc7460: created UD QP 0x2176 on mlx5_ib4:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638542.554653] [ndv4:68756:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638542.555513] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638542.565438] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638542.565450] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638542.565696] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638542.565701] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638542.567303] [ndv4:68756:0] ib_md.c:812 UCX DEBUG registered memory 0x2b13ac0b4000..0x2b13ac139000 on mlx5_ib4 lkey 0x80600 rkey 0x80600 access 0xf flags 0x3e4 | |
[1650638542.567312] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2b13ac0b4018 of 544744 bytes with 128 elements | |
[1650638542.567317] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638542.568445] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x3fc7460: adding gid fe80::15:5dff:fd34:1f to hash on device mlx5_ib4 port 1 index 0) | |
[1650638542.576343] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x3fc7460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 1) | |
[1650638542.577303] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x3fc7460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 2) | |
[1650638542.578302] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x3fc7460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 3) | |
[1650638542.591343] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x3fc7460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 4) | |
[1650638542.591634] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x3fc7460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 5) | |
[1650638542.592623] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x3fc7460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 6) | |
[1650638542.602050] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x3fc7460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 7) | |
[1650638542.602060] [ndv4:68756:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib4:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638542.602100] [ndv4:68756:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638542.602105] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x3fc7fb0 [id=110 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638542.602149] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 110 events 0x5 mode thread_spinlock | |
[1650638542.602159] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[25]=0x3fc7460 using ud_mlx5/mlx5_ib4:1 on worker 0x1d084d0 | |
[1650638542.602438] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638542.602448] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638542.602571] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638542.602577] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638542.604056] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib5:1 | |
[1650638542.606339] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638542.606356] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638542.606359] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638542.606413] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638542.607407] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638542.607413] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638542.607943] [ndv4:68756:0] ib_iface.c:994 UCX DEBUG iface=0x40c70a0: created RC QP 0x1ccb on mlx5_ib5:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638542.608330] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[26]=0x40c70a0 using rc_verbs/mlx5_ib5:1 on worker 0x1d084d0 | |
[1650638542.608493] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638542.608500] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638542.608671] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638542.608676] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638542.609009] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib5:1 | |
[1650638542.610124] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638542.610138] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638542.610141] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638542.610189] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638542.610824] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x43e8010 of 8176 bytes with 127 elements | |
[1650638542.611094] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638542.611102] [ndv4:68756:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib5:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638542.611137] [ndv4:68756:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib5 length=2048) failed: Invalid argument | |
[1650638542.611140] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638542.611151] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x40cff50 [id=113 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638542.611172] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 113 events 0x1 mode thread_spinlock | |
[1650638542.611180] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[27]=0x4214030 using rc_mlx5/mlx5_ib5:1 on worker 0x1d084d0 | |
[1650638542.611336] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638542.611347] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638542.611521] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638542.611527] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638542.624650] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib5:1 | |
[1650638542.626187] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638542.626205] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638542.626291] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638542.626348] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638542.626831] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638542.626838] [ndv4:68756:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib5:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638542.626872] [ndv4:68756:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib5 length=2048) failed: Invalid argument | |
[1650638542.626876] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638542.626883] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x43f6fc0 [id=115 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638542.626907] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 115 events 0x1 mode thread_spinlock | |
[1650638542.627550] [ndv4:68756:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638542.638291] [ndv4:68756:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x43ea050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x1cee | |
[1650638542.638705] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638542.638714] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638542.638726] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2b13ae93c008 of 151544 bytes with 1052 elements | |
[1650638542.643153] [ndv4:68756:0] ib_md.c:812 UCX DEBUG registered memory 0x2b13ac200000..0x2b13ae800000 on mlx5_ib5 lkey 0x80400 rkey 0x80400 access 0xf flags 0x3e4 | |
[1650638542.643180] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2b13ac200018 of 39845864 bytes with 4752 elements | |
[1650638542.643363] [ndv4:68756:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x43ea050 | |
[1650638542.643397] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[28]=0x43ea050 using dc_mlx5/mlx5_ib5:1 on worker 0x1d084d0 | |
[1650638542.643423] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638542.643435] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638542.643664] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638542.643669] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638542.644374] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib5:1 | |
[1650638542.645720] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638542.646131] [ndv4:68756:0] ib_iface.c:994 UCX DEBUG iface=0x31ba3f0: created UD QP 0x1cd4 on mlx5_ib5:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638542.646862] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638542.646942] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638542.646950] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638542.647365] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638542.647382] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638542.647854] [ndv4:68756:0] ib_md.c:812 UCX DEBUG registered memory 0x2b13ae961000..0x2b13ae9e6000 on mlx5_ib5 lkey 0x80500 rkey 0x80500 access 0xf flags 0x3e4 | |
[1650638542.647861] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2b13ae961018 of 544744 bytes with 128 elements | |
[1650638542.647866] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638542.647913] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x31ba3f0: adding gid fe80::15:5dff:fd34:20 to hash on device mlx5_ib5 port 1 index 0) | |
[1650638542.648755] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x31ba3f0: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 1) | |
[1650638542.657111] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x31ba3f0: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 2) | |
[1650638542.657198] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x31ba3f0: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 3) | |
[1650638542.658364] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x31ba3f0: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 4) | |
[1650638542.659676] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x31ba3f0: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 5) | |
[1650638542.660016] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x31ba3f0: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 6) | |
[1650638542.669389] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x31ba3f0: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 7) | |
[1650638542.669752] [ndv4:68756:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638542.669764] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x2b4b720 [id=116 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638542.669796] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 116 events 0x5 mode thread_spinlock | |
[1650638542.669807] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[29]=0x31ba3f0 using ud_verbs/mlx5_ib5:1 on worker 0x1d084d0 | |
[1650638542.669822] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638542.669828] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638542.670137] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638542.670143] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638542.671296] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib5:1 | |
[1650638542.673066] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638542.673492] [ndv4:68756:0] ib_iface.c:994 UCX DEBUG iface=0x46e4050: created UD QP 0x1cd5 on mlx5_ib5:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638542.673500] [ndv4:68756:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638542.674166] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638542.674655] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638542.674662] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638542.674783] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638542.674789] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638542.675184] [ndv4:68756:0] ib_md.c:812 UCX DEBUG registered memory 0x2b13ae9e6000..0x2b13aea6b000 on mlx5_ib5 lkey 0x80600 rkey 0x80600 access 0xf flags 0x3e4 | |
[1650638542.675191] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2b13ae9e6018 of 544744 bytes with 128 elements | |
[1650638542.675195] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638542.683541] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x46e4050: adding gid fe80::15:5dff:fd34:20 to hash on device mlx5_ib5 port 1 index 0) | |
[1650638542.683855] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x46e4050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 1) | |
[1650638542.684929] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x46e4050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 2) | |
[1650638542.685283] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x46e4050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 3) | |
[1650638542.686300] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x46e4050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 4) | |
[1650638542.695638] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x46e4050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 5) | |
[1650638542.696188] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x46e4050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 6) | |
[1650638542.697813] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x46e4050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 7) | |
[1650638542.697821] [ndv4:68756:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib5:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638542.697858] [ndv4:68756:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638542.697863] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x40a22e0 [id=117 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638542.697889] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 117 events 0x5 mode thread_spinlock | |
[1650638542.697901] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[30]=0x46e4050 using ud_mlx5/mlx5_ib5:1 on worker 0x1d084d0 | |
[1650638542.698013] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638542.698019] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638542.698304] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638542.698311] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638542.698489] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib6:1 | |
[1650638542.700280] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638542.700300] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638542.700303] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638542.700354] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638542.701358] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638542.701369] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638542.701917] [ndv4:68756:0] ib_iface.c:994 UCX DEBUG iface=0x47e40a0: created RC QP 0x1c26 on mlx5_ib6:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638542.702304] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[31]=0x47e40a0 using rc_verbs/mlx5_ib6:1 on worker 0x1d084d0 | |
[1650638542.702457] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638542.702463] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638542.702590] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638542.702596] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638542.711871] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib6:1 | |
[1650638542.713187] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638542.713201] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638542.713204] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638542.713290] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638542.713799] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x4b05010 of 8176 bytes with 127 elements | |
[1650638542.714053] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638542.714063] [ndv4:68756:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib6:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638542.714098] [ndv4:68756:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib6 length=2048) failed: Invalid argument | |
[1650638542.714102] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638542.714112] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x47bf2f0 [id=120 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638542.714138] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 120 events 0x1 mode thread_spinlock | |
[1650638542.714148] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[32]=0x4931030 using rc_mlx5/mlx5_ib6:1 on worker 0x1d084d0 | |
[1650638542.714295] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638542.714301] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638542.714486] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638542.714491] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638542.715153] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib6:1 | |
[1650638542.716516] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638542.716534] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638542.716537] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638542.716592] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638542.716978] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638542.716986] [ndv4:68756:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib6:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638542.717019] [ndv4:68756:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib6 length=2048) failed: Invalid argument | |
[1650638542.717023] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638542.717029] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x4b13fc0 [id=122 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638542.717061] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 122 events 0x1 mode thread_spinlock | |
[1650638544.271600] [ndv4:68756:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638544.295584] [ndv4:68756:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x4b07050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x21ab | |
[1650638544.297628] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638544.298195] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638544.298301] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2b13b126e008 of 151544 bytes with 1052 elements | |
[1650638544.302339] [ndv4:68756:0] ib_md.c:812 UCX DEBUG registered memory 0x2b13aec00000..0x2b13b1200000 on mlx5_ib6 lkey 0x80400 rkey 0x80400 access 0xf flags 0x3e4 | |
[1650638544.302362] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2b13aec00018 of 39845864 bytes with 4752 elements | |
[1650638544.302501] [ndv4:68756:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x4b07050 | |
[1650638544.302533] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[33]=0x4b07050 using dc_mlx5/mlx5_ib6:1 on worker 0x1d084d0 | |
[1650638544.302840] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638544.303181] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638544.303675] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638544.303750] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638544.305410] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib6:1 | |
[1650638544.306624] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638544.307151] [ndv4:68756:0] ib_iface.c:994 UCX DEBUG iface=0x4ce3060: created UD QP 0x21f8 on mlx5_ib6:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638544.307846] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638544.308440] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638544.308772] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638544.309068] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638544.309321] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638544.309810] [ndv4:68756:0] ib_md.c:812 UCX DEBUG registered memory 0x2b13b1293000..0x2b13b1318000 on mlx5_ib6 lkey 0x80500 rkey 0x80500 access 0xf flags 0x3e4 | |
[1650638544.309817] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2b13b1293018 of 544744 bytes with 128 elements | |
[1650638544.309822] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638544.310639] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x4ce3060: adding gid fe80::15:5dff:fd34:21 to hash on device mlx5_ib6 port 1 index 0) | |
[1650638544.311146] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x4ce3060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 1) | |
[1650638544.311679] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x4ce3060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 2) | |
[1650638544.311996] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x4ce3060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 3) | |
[1650638544.321086] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x4ce3060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 4) | |
[1650638544.321841] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x4ce3060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 5) | |
[1650638544.336504] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x4ce3060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 6) | |
[1650638544.336998] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x4ce3060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 7) | |
[1650638544.337620] [ndv4:68756:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638544.337630] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x38d7530 [id=123 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638544.337664] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 123 events 0x5 mode thread_spinlock | |
[1650638544.337676] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[34]=0x4ce3060 using ud_verbs/mlx5_ib6:1 on worker 0x1d084d0 | |
[1650638544.338068] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638544.338348] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638544.338972] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638544.339494] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638544.339825] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib6:1 | |
[1650638544.341074] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638544.341870] [ndv4:68756:0] ib_iface.c:994 UCX DEBUG iface=0x3ff47b0: created UD QP 0x2234 on mlx5_ib6:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638544.341878] [ndv4:68756:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638544.342652] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638544.350400] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638544.350645] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638544.358891] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638544.359164] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638544.359743] [ndv4:68756:0] ib_md.c:812 UCX DEBUG registered memory 0x2b13b1318000..0x2b13b139d000 on mlx5_ib6 lkey 0x80600 rkey 0x80600 access 0xf flags 0x3e4 | |
[1650638544.359751] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2b13b1318018 of 544744 bytes with 128 elements | |
[1650638544.359757] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638544.360038] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x3ff47b0: adding gid fe80::15:5dff:fd34:21 to hash on device mlx5_ib6 port 1 index 0) | |
[1650638544.360652] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x3ff47b0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 1) | |
[1650638544.367844] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x3ff47b0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 2) | |
[1650638544.368812] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x3ff47b0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 3) | |
[1650638544.368913] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x3ff47b0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 4) | |
[1650638544.369193] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x3ff47b0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 5) | |
[1650638544.370123] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x3ff47b0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 6) | |
[1650638544.380775] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x3ff47b0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 7) | |
[1650638544.380786] [ndv4:68756:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib6:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638544.380888] [ndv4:68756:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638544.380895] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x38d72e0 [id=124 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638544.380926] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 124 events 0x5 mode thread_spinlock | |
[1650638544.380948] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[35]=0x3ff47b0 using ud_mlx5/mlx5_ib6:1 on worker 0x1d084d0 | |
[1650638544.388181] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638544.388281] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638544.388553] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638544.389290] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638544.389905] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib7:1 | |
[1650638544.391627] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638544.391644] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638544.391647] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638544.391696] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638544.392885] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638544.392891] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638544.393431] [ndv4:68756:0] ib_iface.c:994 UCX DEBUG iface=0x4f010a0: created RC QP 0x21c1 on mlx5_ib7:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638544.394733] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[36]=0x4f010a0 using rc_verbs/mlx5_ib7:1 on worker 0x1d084d0 | |
[1650638544.403722] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638544.403760] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638544.404299] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638544.404523] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638544.405267] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib7:1 | |
[1650638544.416590] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638544.416606] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638544.416609] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638544.416658] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638544.417281] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x5222010 of 8176 bytes with 127 elements | |
[1650638544.417546] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638544.417553] [ndv4:68756:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib7:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638544.417592] [ndv4:68756:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib7 length=2048) failed: Invalid argument | |
[1650638544.417596] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638544.417607] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x4e01830 [id=127 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638544.417635] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 127 events 0x1 mode thread_spinlock | |
[1650638544.417646] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[37]=0x504e030 using rc_mlx5/mlx5_ib7:1 on worker 0x1d084d0 | |
[1650638544.417868] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638544.418343] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638544.418800] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638544.419199] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638544.744299] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib7:1 | |
[1650638544.746379] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638544.746414] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638544.746418] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638544.746478] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638544.747134] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638544.747159] [ndv4:68756:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib7:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638544.747256] [ndv4:68756:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib7 length=2048) failed: Invalid argument | |
[1650638544.747260] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638544.747276] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x4edc6c0 [id=129 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638544.747299] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 129 events 0x1 mode thread_spinlock | |
[1650638544.753840] [ndv4:69188:0] debug.c:1198 UCX DEBUG using signal stack 0x2ae4caddb000 size 141824 | |
[1650638544.754475] [ndv4:69188:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638544.754498] [ndv4:69188:0] init.c:113 UCX DEBUG /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 loaded at 0x2ae4cac2f000 | |
[1650638544.755638] [ndv4:69188:0] init.c:115 UCX DEBUG cmd line: osu_scatter | |
[1650638544.755651] [ndv4:69188:0] module.c:69 UCX DEBUG ucs library path: /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 | |
[1650638544.755658] [ndv4:69188:0] module.c:253 UCX DEBUG loading modules for ucs | |
[1650638544.771042] [ndv4:69188:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638544.771071] [ndv4:69188:0] time.c:22 UCX DEBUG measured arch clock speed: 1000000.00 Hz | |
[1650638544.771117] [ndv4:69188:0] ucp_context.c:1361 UCX DEBUG estimated number of endpoints is 96 | |
[1650638544.771121] [ndv4:69188:0] ucp_context.c:1368 UCX DEBUG estimated number of endpoints per node is 96 | |
[1650638544.771129] [ndv4:69188:0] ucp_context.c:1375 UCX DEBUG estimated bcopy bandwidth is 5251268608.000000 | |
[1650638544.771137] [ndv4:69188:0] ucp_context.c:1427 UCX DEBUG allocation method[0] is md 'sysv' | |
[1650638544.771140] [ndv4:69188:0] ucp_context.c:1427 UCX DEBUG allocation method[1] is md 'posix' | |
[1650638544.771146] [ndv4:69188:0] ucp_context.c:1439 UCX DEBUG allocation method[2] is 'huge' | |
[1650638544.771149] [ndv4:69188:0] ucp_context.c:1439 UCX DEBUG allocation method[3] is 'thp' | |
[1650638544.771151] [ndv4:69188:0] ucp_context.c:1427 UCX DEBUG allocation method[4] is md '*' | |
[1650638544.771153] [ndv4:69188:0] ucp_context.c:1439 UCX DEBUG allocation method[5] is 'mmap' | |
[1650638544.771155] [ndv4:69188:0] ucp_context.c:1439 UCX DEBUG allocation method[6] is 'heap' | |
[1650638544.771169] [ndv4:69188:0] module.c:253 UCX DEBUG loading modules for uct | |
[1650638544.799847] [ndv4:69188:0] module.c:253 UCX DEBUG loading modules for uct_ib | |
[1650638544.802391] [ndv4:69188:0] ucp_context.c:1117 UCX DEBUG closing md posix because it has no selected transport resources | |
[1650638544.802417] [ndv4:69188:0] ucp_context.c:1117 UCX DEBUG closing md self because it has no selected transport resources | |
[1650638544.802984] [ndv4:69188:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib4) failed: Cannot assign requested address | |
[1650638544.802997] [ndv4:69188:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib3) failed: Cannot assign requested address | |
[1650638544.803007] [ndv4:69188:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib2) failed: Cannot assign requested address | |
[1650638544.803018] [ndv4:69188:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib7) failed: Cannot assign requested address | |
[1650638544.803028] [ndv4:69188:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib6) failed: Cannot assign requested address | |
[1650638544.803039] [ndv4:69188:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib5) failed: Cannot assign requested address | |
[1650638544.803050] [ndv4:69188:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib1) failed: Cannot assign requested address | |
[1650638544.803091] [ndv4:69188:0] ucp_context.c:1117 UCX DEBUG closing md tcp because it has no selected transport resources | |
[1650638544.808704] [ndv4:69188:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: Success, continuing, but fork may be unsafe. | |
[1650638544.831114] [ndv4:69188:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib0 vendor_id: 0x15b3 device_id: 4124 | |
[1650638544.832428] [ndv4:69188:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib0: disable ODP because it's not supported for DevX QP | |
[1650638544.844010] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0xc4b340 [id=54 ref 1] uct_ib_async_event_handler() to hash | |
[1650638544.844116] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 54 events 0x1 mode thread_spinlock | |
[1650638544.844652] [ndv4:69188:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib0' (InfiniBand channel adapter) with 1 ports | |
[1650638544.850581] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638544.851328] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638544.851358] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638544.851428] [ndv4:69188:0] module.c:253 UCX DEBUG loading modules for ucm | |
[1650638544.851451] [ndv4:69188:0] ib_md.c:1319 UCX DEBUG mlx5_ib0: using registration cache | |
[1650638544.864571] [ndv4:69188:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638544.864589] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638544.864861] [ndv4:69188:0] ib_md.c:1604 UCX DEBUG mlx5_ib0: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638544.865337] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638544.865457] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638544.876244] [ndv4:69188:0] topo.c:99 UCX DEBUG bus id 0x101000000 doesn't exist. sys_dev = 0 | |
[1650638544.876250] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638544.881982] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638544.881989] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638544.888987] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638544.889003] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638544.897270] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638544.897275] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638544.901963] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638544.901969] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638544.904237] [ndv4:69188:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638544.924590] [ndv4:69188:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib1 vendor_id: 0x15b3 device_id: 4124 | |
[1650638544.925059] [ndv4:69188:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib1: disable ODP because it's not supported for DevX QP | |
[1650638544.927852] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0xc553e0 [id=59 ref 1] uct_ib_async_event_handler() to hash | |
[1650638544.927880] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 59 events 0x1 mode thread_spinlock | |
[1650638544.927883] [ndv4:69188:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib1' (InfiniBand channel adapter) with 1 ports | |
[1650638544.928192] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638544.928251] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638544.928256] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638544.928299] [ndv4:69188:0] ib_md.c:1319 UCX DEBUG mlx5_ib1: using registration cache | |
[1650638544.930446] [ndv4:69188:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638544.930452] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638544.930721] [ndv4:69188:0] ib_md.c:1604 UCX DEBUG mlx5_ib1: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638544.930955] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638544.931054] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638544.947557] [ndv4:69188:0] topo.c:99 UCX DEBUG bus id 0x102000000 doesn't exist. sys_dev = 1 | |
[1650638544.947564] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638544.955998] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638544.956004] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638544.963780] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638544.963786] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638544.978201] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638544.978237] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638544.984559] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638544.984565] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638544.987870] [ndv4:69188:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638545.013591] [ndv4:69188:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib2 vendor_id: 0x15b3 device_id: 4124 | |
[1650638545.014078] [ndv4:69188:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib2: disable ODP because it's not supported for DevX QP | |
[1650638545.015904] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0xc270c0 [id=61 ref 1] uct_ib_async_event_handler() to hash | |
[1650638545.015933] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 61 events 0x1 mode thread_spinlock | |
[1650638545.015937] [ndv4:69188:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib2' (InfiniBand channel adapter) with 1 ports | |
[1650638545.016346] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638545.016864] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638545.016869] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638545.016913] [ndv4:69188:0] ib_md.c:1319 UCX DEBUG mlx5_ib2: using registration cache | |
[1650638545.018661] [ndv4:69188:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638545.018669] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638545.018972] [ndv4:69188:0] ib_md.c:1604 UCX DEBUG mlx5_ib2: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638545.019624] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638545.019699] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638545.045191] [ndv4:69188:0] topo.c:99 UCX DEBUG bus id 0x103000000 doesn't exist. sys_dev = 2 | |
[1650638545.045198] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638545.060407] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638545.060416] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638545.086354] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638545.086364] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638545.111935] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638545.111946] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638545.118193] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638545.118200] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638545.120265] [ndv4:69188:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638545.147526] [ndv4:69188:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib3 vendor_id: 0x15b3 device_id: 4124 | |
[1650638545.148189] [ndv4:69188:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib3: disable ODP because it's not supported for DevX QP | |
[1650638545.149863] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x1336070 [id=63 ref 1] uct_ib_async_event_handler() to hash | |
[1650638545.149889] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 63 events 0x1 mode thread_spinlock | |
[1650638545.149892] [ndv4:69188:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib3' (InfiniBand channel adapter) with 1 ports | |
[1650638545.150683] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638545.151007] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638545.151013] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638545.151057] [ndv4:69188:0] ib_md.c:1319 UCX DEBUG mlx5_ib3: using registration cache | |
[1650638545.153271] [ndv4:69188:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638545.153278] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638545.153577] [ndv4:69188:0] ib_md.c:1604 UCX DEBUG mlx5_ib3: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638545.154024] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638545.154287] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638545.163013] [ndv4:69188:0] topo.c:99 UCX DEBUG bus id 0x104000000 doesn't exist. sys_dev = 3 | |
[1650638545.163020] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638545.172546] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638545.172554] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638545.179134] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638545.179141] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638545.186042] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638545.186048] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638545.196284] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638545.196291] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638545.196639] [ndv4:69188:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638545.225416] [ndv4:69188:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib4 vendor_id: 0x15b3 device_id: 4124 | |
[1650638545.225976] [ndv4:69188:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib4: disable ODP because it's not supported for DevX QP | |
[1650638545.228567] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0xc534c0 [id=65 ref 1] uct_ib_async_event_handler() to hash | |
[1650638545.228593] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 65 events 0x1 mode thread_spinlock | |
[1650638545.228597] [ndv4:69188:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib4' (InfiniBand channel adapter) with 1 ports | |
[1650638545.229094] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638545.229527] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638545.229536] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638545.229578] [ndv4:69188:0] ib_md.c:1319 UCX DEBUG mlx5_ib4: using registration cache | |
[1650638545.231260] [ndv4:69188:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638545.231267] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638545.231603] [ndv4:69188:0] ib_md.c:1604 UCX DEBUG mlx5_ib4: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638545.232447] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638545.233083] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638545.238936] [ndv4:69188:0] topo.c:99 UCX DEBUG bus id 0x105000000 doesn't exist. sys_dev = 4 | |
[1650638545.238943] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638545.257856] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638545.257865] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638545.285554] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638545.285563] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638545.301876] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638545.301885] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638545.309508] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638545.309515] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638545.313507] [ndv4:69188:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638545.368979] [ndv4:69188:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib5 vendor_id: 0x15b3 device_id: 4124 | |
[1650638545.369642] [ndv4:69188:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib5: disable ODP because it's not supported for DevX QP | |
[1650638545.371964] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0xc4d780 [id=67 ref 1] uct_ib_async_event_handler() to hash | |
[1650638545.371995] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 67 events 0x1 mode thread_spinlock | |
[1650638545.371999] [ndv4:69188:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib5' (InfiniBand channel adapter) with 1 ports | |
[1650638545.372728] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638545.372798] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638545.372803] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638545.372847] [ndv4:69188:0] ib_md.c:1319 UCX DEBUG mlx5_ib5: using registration cache | |
[1650638545.374875] [ndv4:69188:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638545.374882] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638545.375185] [ndv4:69188:0] ib_md.c:1604 UCX DEBUG mlx5_ib5: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638545.375234] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638545.375239] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638545.383967] [ndv4:69188:0] topo.c:99 UCX DEBUG bus id 0x106000000 doesn't exist. sys_dev = 5 | |
[1650638545.383975] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638545.390525] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638545.390532] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638545.406364] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638545.406371] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638545.412415] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638545.412421] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638545.429510] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638545.429518] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638545.432088] [ndv4:69188:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638545.441744] [ndv4:69188:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib6 vendor_id: 0x15b3 device_id: 4124 | |
[1650638545.442157] [ndv4:69188:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib6: disable ODP because it's not supported for DevX QP | |
[1650638545.453091] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x13376d0 [id=69 ref 1] uct_ib_async_event_handler() to hash | |
[1650638545.453121] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 69 events 0x1 mode thread_spinlock | |
[1650638545.453125] [ndv4:69188:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib6' (InfiniBand channel adapter) with 1 ports | |
[1650638545.453686] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638545.454163] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638545.454168] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638545.454287] [ndv4:69188:0] ib_md.c:1319 UCX DEBUG mlx5_ib6: using registration cache | |
[1650638545.455870] [ndv4:69188:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638545.455891] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638545.456202] [ndv4:69188:0] ib_md.c:1604 UCX DEBUG mlx5_ib6: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638545.457504] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638545.457642] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638545.470284] [ndv4:69188:0] topo.c:99 UCX DEBUG bus id 0x107000000 doesn't exist. sys_dev = 6 | |
[1650638545.470294] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638545.483110] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638545.483119] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638545.493286] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638545.493293] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638545.504532] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638545.504539] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638545.515904] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638545.515912] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638545.516294] [ndv4:69188:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638545.537748] [ndv4:69188:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib7 vendor_id: 0x15b3 device_id: 4124 | |
[1650638545.538246] [ndv4:69188:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib7: disable ODP because it's not supported for DevX QP | |
[1650638545.539067] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0xc41a90 [id=71 ref 1] uct_ib_async_event_handler() to hash | |
[1650638545.539095] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 71 events 0x1 mode thread_spinlock | |
[1650638545.539098] [ndv4:69188:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib7' (InfiniBand channel adapter) with 1 ports | |
[1650638545.539415] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638545.539426] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638545.539432] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638545.539486] [ndv4:69188:0] ib_md.c:1319 UCX DEBUG mlx5_ib7: using registration cache | |
[1650638545.540951] [ndv4:69188:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638545.540958] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638545.541164] [ndv4:69188:0] ib_md.c:1604 UCX DEBUG mlx5_ib7: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638545.541356] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638545.541362] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638545.550597] [ndv4:69188:0] topo.c:99 UCX DEBUG bus id 0x108000000 doesn't exist. sys_dev = 7 | |
[1650638545.550603] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638545.562827] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x108000000 exists. sys_dev = 7 | |
[1650638545.562834] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638545.573036] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x108000000 exists. sys_dev = 7 | |
[1650638545.573043] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638545.583123] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x108000000 exists. sys_dev = 7 | |
[1650638545.583130] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638545.612110] [ndv4:69188:0] topo.c:91 UCX DEBUG bus id 0x108000000 exists. sys_dev = 7 | |
[1650638545.612118] [ndv4:69188:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638545.612268] [ndv4:69188:0] ucp_context.c:1556 UCX DEBUG created ucp context 0xc391b0 0xc391b0 [10 mds 42 tls] features 0x1 tl bitmap 0x3ffffffffff 0x0 | |
[1650638545.671264] [ndv4:69188:0] mm_sysv.c:94 UCX DEBUG mm failed to allocate 8447 bytes with hugetlb | |
[1650638545.671364] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool mm_recv_desc: align 64, maxelems 4294967295, elemsize 8368 | |
[1650638545.671390] [ndv4:69188:0] mm_sysv.c:94 UCX DEBUG mm failed to allocate 4292720 bytes with hugetlb | |
[1650638545.671400] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool mm_recv_desc: allocated chunk 0x2ae4d2454018 of 4296680 bytes with 512 elements | |
[1650638545.671955] [ndv4:69188:0] mm_iface.c:600 UCX DEBUG created mm iface 0x13b51e0 FIFO id 0xd0037 va 0x2ae4cbc81000 size 12288 (128 x 64 elems) | |
[1650638545.672003] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[0]=0x13b51e0 using sysv/memory on worker 0x1c8f6f0 | |
[1650638545.672195] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638545.672285] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638545.672354] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638545.672361] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638545.680944] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638545.683311] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638545.683350] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638545.683354] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638545.683407] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638545.684647] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638545.684656] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638545.686329] [ndv4:69188:0] ib_iface.c:994 UCX DEBUG iface=0x13c0480: created RC QP 0x30d4 on mlx5_ib0:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638545.688722] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[1]=0x13c0480 using rc_verbs/mlx5_ib0:1 on worker 0x1c8f6f0 | |
[1650638545.688797] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638545.688803] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638545.688905] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638545.688910] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638545.689315] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638545.689610] [ndv4:69188:0] ib_device.c:1394 UCX DEBUG max IB CQE size is 128 | |
[1650638545.690693] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638545.690702] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638545.690704] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638545.690754] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638545.691336] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x1fb6010 of 8176 bytes with 127 elements | |
[1650638545.691630] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638545.691653] [ndv4:69188:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib0:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638545.691703] [ndv4:69188:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib0 length=2048) failed: Invalid argument | |
[1650638545.691707] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638545.700936] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x131c590 [id=78 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638545.700969] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 78 events 0x1 mode thread_spinlock | |
[1650638545.701621] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[2]=0x13ce010 using rc_mlx5/mlx5_ib0:1 on worker 0x1c8f6f0 | |
[1650638545.701791] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638545.701797] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638545.701973] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638545.701978] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638545.711285] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638545.721002] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638545.721013] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638545.721015] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638545.721029] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638545.721498] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638545.721507] [ndv4:69188:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib0:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638545.721540] [ndv4:69188:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib0 length=2048) failed: Invalid argument | |
[1650638545.721543] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638545.721550] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x131c8a0 [id=80 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638545.721571] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 80 events 0x1 mode thread_spinlock | |
[1650638545.722386] [ndv4:69188:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638545.730066] [ndv4:69188:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x1fb8050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x30f7 | |
[1650638545.730375] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638545.730381] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638545.730400] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2ae4cbc86008 of 151544 bytes with 1052 elements | |
[1650638545.734388] [ndv4:69188:0] ib_md.c:812 UCX DEBUG registered memory 0x2ae4d2a00000..0x2ae4d5000000 on mlx5_ib0 lkey 0x80b00 rkey 0x80b00 access 0xf flags 0x3e4 | |
[1650638545.734413] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2ae4d2a00018 of 39845864 bytes with 4752 elements | |
[1650638545.734552] [ndv4:69188:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x1fb8050 | |
[1650638545.734583] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[3]=0x1fb8050 using dc_mlx5/mlx5_ib0:1 on worker 0x1c8f6f0 | |
[1650638545.734665] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638545.734673] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638545.734794] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638545.734798] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638545.735188] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638545.736048] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638545.736444] [ndv4:69188:0] ib_iface.c:994 UCX DEBUG iface=0x13cb650: created UD QP 0x30dd on mlx5_ib0:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638545.737021] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638545.737103] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638545.737110] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638545.737140] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638545.737145] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638545.737569] [ndv4:69188:0] ib_md.c:812 UCX DEBUG registered memory 0x2ae4cbcab000..0x2ae4cbd30000 on mlx5_ib0 lkey 0x80c00 rkey 0x80c00 access 0xf flags 0x3e4 | |
[1650638545.737577] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ae4cbcab018 of 544744 bytes with 128 elements | |
[1650638545.737581] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638545.737775] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13cb650: adding gid fe80::15:5dff:fd34:1b to hash on device mlx5_ib0 port 1 index 0) | |
[1650638545.738075] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13cb650: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 1) | |
[1650638545.738297] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13cb650: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 2) | |
[1650638545.748820] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13cb650: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 3) | |
[1650638545.748942] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13cb650: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 4) | |
[1650638545.748958] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13cb650: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 5) | |
[1650638545.749194] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13cb650: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 6) | |
[1650638545.749334] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13cb650: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 7) | |
[1650638545.749652] [ndv4:69188:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638545.757568] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x13bcd40 [id=81 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638545.757603] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 81 events 0x5 mode thread_spinlock | |
[1650638545.758144] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[4]=0x13cb650 using ud_verbs/mlx5_ib0:1 on worker 0x1c8f6f0 | |
[1650638545.758380] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638545.758389] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638545.758607] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638545.758613] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638545.759121] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638545.760038] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638545.760329] [ndv4:69188:0] ib_iface.c:994 UCX DEBUG iface=0x1caf570: created UD QP 0x30de on mlx5_ib0:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638545.760348] [ndv4:69188:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638545.760893] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638545.768528] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638545.768538] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638545.768619] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638545.768624] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638545.769057] [ndv4:69188:0] ib_md.c:812 UCX DEBUG registered memory 0x2ae4cbd30000..0x2ae4cbdb5000 on mlx5_ib0 lkey 0x80d00 rkey 0x80d00 access 0xf flags 0x3e4 | |
[1650638545.769064] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ae4cbd30018 of 544744 bytes with 128 elements | |
[1650638545.769068] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638545.769392] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x1caf570: adding gid fe80::15:5dff:fd34:1b to hash on device mlx5_ib0 port 1 index 0) | |
[1650638545.769523] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x1caf570: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 1) | |
[1650638545.769650] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x1caf570: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 2) | |
[1650638545.769851] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x1caf570: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 3) | |
[1650638545.770074] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x1caf570: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 4) | |
[1650638545.770426] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x1caf570: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 5) | |
[1650638545.770774] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x1caf570: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 6) | |
[1650638545.770969] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x1caf570: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 7) | |
[1650638545.770977] [ndv4:69188:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib0:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638545.771006] [ndv4:69188:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638545.771012] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x1322b60 [id=82 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638545.771035] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 82 events 0x5 mode thread_spinlock | |
[1650638545.771048] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[5]=0x1caf570 using ud_mlx5/mlx5_ib0:1 on worker 0x1c8f6f0 | |
[1650638545.771306] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638545.771312] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638545.771421] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638545.771428] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638545.771994] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib1:1 | |
[1650638545.784988] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638545.784997] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638545.784999] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638545.785013] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638545.786168] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638545.786174] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638545.787604] [ndv4:69188:0] ib_iface.c:994 UCX DEBUG iface=0x255f050: created RC QP 0x302a on mlx5_ib1:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638545.788655] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[6]=0x255f050 using rc_verbs/mlx5_ib1:1 on worker 0x1c8f6f0 | |
[1650638545.788766] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638545.788772] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638545.789095] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638545.789100] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638545.789914] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib1:1 | |
[1650638545.791300] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638545.791310] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638545.791313] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638545.791326] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638545.791823] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x24b8010 of 8176 bytes with 127 elements | |
[1650638545.792084] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638545.792091] [ndv4:69188:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib1:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638545.792126] [ndv4:69188:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib1 length=2048) failed: Invalid argument | |
[1650638545.792130] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638545.792142] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x1325ba0 [id=85 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638545.792161] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 85 events 0x1 mode thread_spinlock | |
[1650638545.792170] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[7]=0x23e5020 using rc_mlx5/mlx5_ib1:1 on worker 0x1c8f6f0 | |
[1650638545.792259] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638545.792266] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638545.792361] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638545.792366] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638545.792586] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib1:1 | |
[1650638545.793628] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638545.793644] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638545.793647] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638545.793702] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638545.794064] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638545.794071] [ndv4:69188:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib1:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638545.794103] [ndv4:69188:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib1 length=2048) failed: Invalid argument | |
[1650638545.794107] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638545.794115] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x13ba760 [id=87 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638545.794135] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 87 events 0x1 mode thread_spinlock | |
[1650638545.794822] [ndv4:69188:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638545.802747] [ndv4:69188:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x2704010: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x304d | |
[1650638545.803289] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638545.803295] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638545.803303] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2ae4cbdb7008 of 151544 bytes with 1052 elements | |
[1650638545.807320] [ndv4:69188:0] ib_md.c:812 UCX DEBUG registered memory 0x2ae4d5200000..0x2ae4d7800000 on mlx5_ib1 lkey 0x80700 rkey 0x80700 access 0xf flags 0x3e4 | |
[1650638545.807342] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2ae4d5200018 of 39845864 bytes with 4752 elements | |
[1650638545.807477] [ndv4:69188:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x2704010 | |
[1650638545.807512] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[8]=0x2704010 using dc_mlx5/mlx5_ib1:1 on worker 0x1c8f6f0 | |
[1650638545.808007] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638545.808015] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638545.808090] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638545.808094] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638545.809196] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib1:1 | |
[1650638545.810408] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638545.810790] [ndv4:69188:0] ib_iface.c:994 UCX DEBUG iface=0x13d6be0: created UD QP 0x3033 on mlx5_ib1:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638545.811370] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638545.811858] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638545.811865] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638545.812071] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638545.812075] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638545.812580] [ndv4:69188:0] ib_md.c:812 UCX DEBUG registered memory 0x2ae4cbddc000..0x2ae4cbe61000 on mlx5_ib1 lkey 0x80800 rkey 0x80800 access 0xf flags 0x3e4 | |
[1650638545.812587] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ae4cbddc018 of 544744 bytes with 128 elements | |
[1650638545.812591] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638545.812921] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13d6be0: adding gid fe80::15:5dff:fd34:1c to hash on device mlx5_ib1 port 1 index 0) | |
[1650638545.813546] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13d6be0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 1) | |
[1650638545.813966] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13d6be0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 2) | |
[1650638545.815259] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13d6be0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 3) | |
[1650638545.815472] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13d6be0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 4) | |
[1650638545.815606] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13d6be0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 5) | |
[1650638545.815621] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13d6be0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 6) | |
[1650638545.823696] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13d6be0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 7) | |
[1650638545.824070] [ndv4:69188:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638545.824078] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x13d8010 [id=88 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638545.824105] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 88 events 0x5 mode thread_spinlock | |
[1650638545.824115] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[9]=0x13d6be0 using ud_verbs/mlx5_ib1:1 on worker 0x1c8f6f0 | |
[1650638545.824325] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638545.824331] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638545.824447] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638545.824451] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638545.825051] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib1:1 | |
[1650638545.826102] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638545.826433] [ndv4:69188:0] ib_iface.c:994 UCX DEBUG iface=0x24ba050: created UD QP 0x3034 on mlx5_ib1:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638545.826441] [ndv4:69188:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638545.826953] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638545.827036] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638545.827041] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638545.827100] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638545.827105] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638545.827510] [ndv4:69188:0] ib_md.c:812 UCX DEBUG registered memory 0x2ae4cbe61000..0x2ae4cbee6000 on mlx5_ib1 lkey 0x80900 rkey 0x80900 access 0xf flags 0x3e4 | |
[1650638545.827516] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ae4cbe61018 of 544744 bytes with 128 elements | |
[1650638545.827521] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638545.827561] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x24ba050: adding gid fe80::15:5dff:fd34:1c to hash on device mlx5_ib1 port 1 index 0) | |
[1650638545.827586] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x24ba050: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 1) | |
[1650638545.827606] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x24ba050: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 2) | |
[1650638545.827621] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x24ba050: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 3) | |
[1650638545.827643] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x24ba050: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 4) | |
[1650638545.827661] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x24ba050: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 5) | |
[1650638545.827685] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x24ba050: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 6) | |
[1650638545.827703] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x24ba050: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 7) | |
[1650638545.827709] [ndv4:69188:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib1:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638545.827738] [ndv4:69188:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638545.827743] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x23edc40 [id=89 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638545.827763] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 89 events 0x5 mode thread_spinlock | |
[1650638545.827773] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[10]=0x24ba050 using ud_mlx5/mlx5_ib1:1 on worker 0x1c8f6f0 | |
[1650638545.827786] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638545.827791] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638545.827852] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638545.827856] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638545.828116] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib2:1 | |
[1650638545.829770] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638545.829787] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638545.829790] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638545.829838] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638545.831103] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638545.831109] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638545.832469] [ndv4:69188:0] ib_iface.c:994 UCX DEBUG iface=0x2af7020: created RC QP 0x3036 on mlx5_ib2:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638545.833500] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[11]=0x2af7020 using rc_verbs/mlx5_ib2:1 on worker 0x1c8f6f0 | |
[1650638545.833642] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638545.833648] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638545.833865] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638545.833870] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638545.834503] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib2:1 | |
[1650638545.836172] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638545.836185] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638545.836188] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638545.836284] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638545.836802] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x2e18010 of 8176 bytes with 127 elements | |
[1650638545.837058] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638545.837065] [ndv4:69188:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib2:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638545.837099] [ndv4:69188:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib2 length=2048) failed: Invalid argument | |
[1650638545.837110] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638545.837122] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x2570350 [id=92 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638545.837144] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 92 events 0x1 mode thread_spinlock | |
[1650638545.837154] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[12]=0x2c44030 using rc_mlx5/mlx5_ib2:1 on worker 0x1c8f6f0 | |
[1650638545.837333] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638545.837339] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638545.845509] [ndv4:69197:0] debug.c:1198 UCX DEBUG using signal stack 0x2afd7ddfc000 size 141824 | |
[1650638545.845599] [ndv4:69197:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638545.845619] [ndv4:69197:0] init.c:113 UCX DEBUG /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 loaded at 0x2afd7dc50000 | |
[1650638545.845637] [ndv4:69197:0] init.c:115 UCX DEBUG cmd line: osu_scatter | |
[1650638545.845646] [ndv4:69197:0] module.c:69 UCX DEBUG ucs library path: /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 | |
[1650638545.845652] [ndv4:69197:0] module.c:253 UCX DEBUG loading modules for ucs | |
[1650638545.847905] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638545.847917] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638545.848301] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib2:1 | |
[1650638545.848419] [ndv4:69197:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638545.848441] [ndv4:69197:0] time.c:22 UCX DEBUG measured arch clock speed: 1000000.00 Hz | |
[1650638545.848480] [ndv4:69197:0] ucp_context.c:1361 UCX DEBUG estimated number of endpoints is 96 | |
[1650638545.848483] [ndv4:69197:0] ucp_context.c:1368 UCX DEBUG estimated number of endpoints per node is 96 | |
[1650638545.848490] [ndv4:69197:0] ucp_context.c:1375 UCX DEBUG estimated bcopy bandwidth is 5251268608.000000 | |
[1650638545.848497] [ndv4:69197:0] ucp_context.c:1427 UCX DEBUG allocation method[0] is md 'sysv' | |
[1650638545.848500] [ndv4:69197:0] ucp_context.c:1427 UCX DEBUG allocation method[1] is md 'posix' | |
[1650638545.848504] [ndv4:69197:0] ucp_context.c:1439 UCX DEBUG allocation method[2] is 'huge' | |
[1650638545.848506] [ndv4:69197:0] ucp_context.c:1439 UCX DEBUG allocation method[3] is 'thp' | |
[1650638545.848508] [ndv4:69197:0] ucp_context.c:1427 UCX DEBUG allocation method[4] is md '*' | |
[1650638545.848511] [ndv4:69197:0] ucp_context.c:1439 UCX DEBUG allocation method[5] is 'mmap' | |
[1650638545.848513] [ndv4:69197:0] ucp_context.c:1439 UCX DEBUG allocation method[6] is 'heap' | |
[1650638545.848522] [ndv4:69197:0] module.c:253 UCX DEBUG loading modules for uct | |
[1650638545.850013] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638545.850038] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638545.850042] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638545.850105] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638545.850671] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638545.850737] [ndv4:69188:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib2:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638545.850776] [ndv4:69188:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib2 length=2048) failed: Invalid argument | |
[1650638545.850780] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638545.850794] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x2c4cfa0 [id=94 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638545.850818] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 94 events 0x1 mode thread_spinlock | |
[1650638545.851523] [ndv4:69188:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638545.854491] [ndv4:69197:0] module.c:253 UCX DEBUG loading modules for uct_ib | |
[1650638545.854837] [ndv4:69197:0] ucp_context.c:1117 UCX DEBUG closing md posix because it has no selected transport resources | |
[1650638545.854859] [ndv4:69197:0] ucp_context.c:1117 UCX DEBUG closing md self because it has no selected transport resources | |
[1650638545.855117] [ndv4:69197:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib4) failed: Cannot assign requested address | |
[1650638545.855132] [ndv4:69197:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib3) failed: Cannot assign requested address | |
[1650638545.855143] [ndv4:69197:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib2) failed: Cannot assign requested address | |
[1650638545.855153] [ndv4:69197:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib7) failed: Cannot assign requested address | |
[1650638545.855163] [ndv4:69197:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib6) failed: Cannot assign requested address | |
[1650638545.855173] [ndv4:69197:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib5) failed: Cannot assign requested address | |
[1650638545.855184] [ndv4:69197:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib1) failed: Cannot assign requested address | |
[1650638545.855291] [ndv4:69197:0] ucp_context.c:1117 UCX DEBUG closing md tcp because it has no selected transport resources | |
[1650638545.855657] [ndv4:69197:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: Success, continuing, but fork may be unsafe. | |
[1650638545.859155] [ndv4:69188:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x2e1a050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x3059 | |
[1650638545.859454] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638545.859462] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638545.859476] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2ae4cbee8008 of 151544 bytes with 1052 elements | |
[1650638545.864391] [ndv4:69188:0] ib_md.c:812 UCX DEBUG registered memory 0x2ae4d7a00000..0x2ae4da000000 on mlx5_ib2 lkey 0x80700 rkey 0x80700 access 0xf flags 0x3e4 | |
[1650638545.864421] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2ae4d7a00018 of 39845864 bytes with 4752 elements | |
[1650638545.864564] [ndv4:69188:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x2e1a050 | |
[1650638545.864629] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[13]=0x2e1a050 using dc_mlx5/mlx5_ib2:1 on worker 0x1c8f6f0 | |
[1650638545.864660] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638545.864671] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638545.864767] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638545.864772] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638545.865302] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib2:1 | |
[1650638545.875777] [ndv4:69197:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib0 vendor_id: 0x15b3 device_id: 4124 | |
[1650638545.876175] [ndv4:69197:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib0: disable ODP because it's not supported for DevX QP | |
[1650638545.876325] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638545.876712] [ndv4:69188:0] ib_iface.c:994 UCX DEBUG iface=0x13ca770: created UD QP 0x303f on mlx5_ib2:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638545.878039] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638545.878414] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638545.878420] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638545.878527] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638545.878531] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638545.879033] [ndv4:69188:0] ib_md.c:812 UCX DEBUG registered memory 0x2ae4cbf0d000..0x2ae4cbf92000 on mlx5_ib2 lkey 0x80800 rkey 0x80800 access 0xf flags 0x3e4 | |
[1650638545.879040] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ae4cbf0d018 of 544744 bytes with 128 elements | |
[1650638545.879045] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638545.879401] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13ca770: adding gid fe80::15:5dff:fd34:1d to hash on device mlx5_ib2 port 1 index 0) | |
[1650638545.879759] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13ca770: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 1) | |
[1650638545.880252] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13ca770: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 2) | |
[1650638545.884083] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x2029320 [id=54 ref 1] uct_ib_async_event_handler() to hash | |
[1650638545.884200] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 54 events 0x1 mode thread_spinlock | |
[1650638545.884370] [ndv4:69197:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib0' (InfiniBand channel adapter) with 1 ports | |
[1650638545.885164] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638545.885177] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638545.885222] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638545.885288] [ndv4:69197:0] module.c:253 UCX DEBUG loading modules for ucm | |
[1650638545.885310] [ndv4:69197:0] ib_md.c:1319 UCX DEBUG mlx5_ib0: using registration cache | |
[1650638545.889153] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13ca770: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 3) | |
[1650638545.889751] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13ca770: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 4) | |
[1650638545.890107] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13ca770: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 5) | |
[1650638545.890401] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13ca770: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 6) | |
[1650638545.897868] [ndv4:69197:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638545.897885] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638545.898154] [ndv4:69197:0] ib_md.c:1604 UCX DEBUG mlx5_ib0: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638545.898308] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638545.898314] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638545.905923] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13ca770: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 7) | |
[1650638545.906323] [ndv4:69188:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638545.906333] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x13b6700 [id=95 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638545.906362] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 95 events 0x5 mode thread_spinlock | |
[1650638545.906376] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[14]=0x13ca770 using ud_verbs/mlx5_ib2:1 on worker 0x1c8f6f0 | |
[1650638545.906510] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638545.906516] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638545.906608] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638545.906612] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638545.906999] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib2:1 | |
[1650638545.907959] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638545.908347] [ndv4:69188:0] ib_iface.c:994 UCX DEBUG iface=0x13b3590: created UD QP 0x3040 on mlx5_ib2:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638545.908356] [ndv4:69188:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638545.908939] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638545.909356] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638545.909362] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638545.909498] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638545.909503] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638545.909980] [ndv4:69197:0] topo.c:99 UCX DEBUG bus id 0x101000000 doesn't exist. sys_dev = 0 | |
[1650638545.909988] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638545.909885] [ndv4:69188:0] ib_md.c:812 UCX DEBUG registered memory 0x2ae4da070000..0x2ae4da0f5000 on mlx5_ib2 lkey 0x80900 rkey 0x80900 access 0xf flags 0x3e4 | |
[1650638545.909891] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ae4da070018 of 544744 bytes with 128 elements | |
[1650638545.909895] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638545.910621] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13b3590: adding gid fe80::15:5dff:fd34:1d to hash on device mlx5_ib2 port 1 index 0) | |
[1650638545.911579] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13b3590: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 1) | |
[1650638545.911997] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13b3590: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 2) | |
[1650638545.912279] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13b3590: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 3) | |
[1650638545.912723] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13b3590: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 4) | |
[1650638545.912995] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13b3590: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 5) | |
[1650638545.913359] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13b3590: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 6) | |
[1650638545.913665] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x13b3590: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 7) | |
[1650638545.913671] [ndv4:69188:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib2:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638545.913701] [ndv4:69188:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638545.913706] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x13d8ee0 [id=96 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638545.913728] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 96 events 0x5 mode thread_spinlock | |
[1650638545.913739] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[15]=0x13b3590 using ud_mlx5/mlx5_ib2:1 on worker 0x1c8f6f0 | |
[1650638545.913934] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638545.913939] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638545.914055] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638545.914060] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638545.912579] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638545.912586] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638545.915144] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib3:1 | |
[1650638545.917363] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638545.917380] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638545.917383] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638545.917432] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638545.918681] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638545.918688] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638545.920041] [ndv4:69188:0] ib_iface.c:994 UCX DEBUG iface=0x32140a0: created RC QP 0x3028 on mlx5_ib3:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638545.921025] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[16]=0x32140a0 using rc_verbs/mlx5_ib3:1 on worker 0x1c8f6f0 | |
[1650638545.921221] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638545.921228] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638545.921412] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638545.921416] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638545.921968] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib3:1 | |
[1650638545.923272] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638545.923285] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638545.923288] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638545.923340] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638545.923866] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x3535010 of 8176 bytes with 127 elements | |
[1650638545.924148] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638545.924155] [ndv4:69188:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib3:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638545.924192] [ndv4:69188:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib3 length=2048) failed: Invalid argument | |
[1650638545.924196] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638545.924257] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x13c9f00 [id=99 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638545.924278] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 99 events 0x1 mode thread_spinlock | |
[1650638545.924300] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[17]=0x3361030 using rc_mlx5/mlx5_ib3:1 on worker 0x1c8f6f0 | |
[1650638545.924329] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638545.924334] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638545.924427] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638545.924431] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638545.924666] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib3:1 | |
[1650638545.925794] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638545.925812] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638545.925815] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638545.925870] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638545.926279] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638545.926286] [ndv4:69188:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib3:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638545.926318] [ndv4:69188:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib3 length=2048) failed: Invalid argument | |
[1650638545.926322] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638545.926329] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x28e0f80 [id=101 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638545.926352] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 101 events 0x1 mode thread_spinlock | |
[1650638545.927026] [ndv4:69188:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638545.931750] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638545.931758] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638545.934669] [ndv4:69188:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x3537050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x304b | |
[1650638545.934927] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638545.934934] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638545.934944] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2ae4cbf94008 of 151544 bytes with 1052 elements | |
[1650638545.938953] [ndv4:69188:0] ib_md.c:812 UCX DEBUG registered memory 0x2ae4da200000..0x2ae4dc800000 on mlx5_ib3 lkey 0x80700 rkey 0x80700 access 0xf flags 0x3e4 | |
[1650638545.938974] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2ae4da200018 of 39845864 bytes with 4752 elements | |
[1650638545.939116] [ndv4:69188:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x3537050 | |
[1650638545.939153] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[18]=0x3537050 using dc_mlx5/mlx5_ib3:1 on worker 0x1c8f6f0 | |
[1650638545.939235] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638545.939244] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638545.939314] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638545.939318] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638545.949152] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib3:1 | |
[1650638545.950526] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638545.950911] [ndv4:69188:0] ib_iface.c:994 UCX DEBUG iface=0x3713060: created UD QP 0x3031 on mlx5_ib3:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638545.951422] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638545.951790] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638545.951797] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638545.951880] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638545.951885] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638545.952258] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638545.952266] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638545.952307] [ndv4:69188:0] ib_md.c:812 UCX DEBUG registered memory 0x2ae4dc8f6000..0x2ae4dc97b000 on mlx5_ib3 lkey 0x80800 rkey 0x80800 access 0xf flags 0x3e4 | |
[1650638545.952315] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ae4dc8f6018 of 544744 bytes with 128 elements | |
[1650638545.952320] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638545.952746] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3713060: adding gid fe80::15:5dff:fd34:1e to hash on device mlx5_ib3 port 1 index 0) | |
[1650638545.953165] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3713060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 1) | |
[1650638545.953508] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3713060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 2) | |
[1650638545.953766] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3713060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 3) | |
[1650638545.954120] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3713060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 4) | |
[1650638545.954523] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3713060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 5) | |
[1650638545.954770] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3713060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 6) | |
[1650638545.955018] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3713060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 7) | |
[1650638545.955315] [ndv4:69188:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638545.955324] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0xc3f3e0 [id=102 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638545.955353] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 102 events 0x5 mode thread_spinlock | |
[1650638545.955363] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[19]=0x3713060 using ud_verbs/mlx5_ib3:1 on worker 0x1c8f6f0 | |
[1650638545.955476] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638545.955482] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638545.955563] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638545.955568] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638545.955761] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib3:1 | |
[1650638545.956720] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638545.957033] [ndv4:69188:0] ib_iface.c:994 UCX DEBUG iface=0x3831050: created UD QP 0x3032 on mlx5_ib3:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638545.957040] [ndv4:69188:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638545.957547] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638545.957673] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638545.957678] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638545.957689] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638545.957693] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638545.958027] [ndv4:69188:0] ib_md.c:812 UCX DEBUG registered memory 0x2ae4dc97b000..0x2ae4dca00000 on mlx5_ib3 lkey 0x80900 rkey 0x80900 access 0xf flags 0x3e4 | |
[1650638545.958033] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ae4dc97b018 of 544744 bytes with 128 elements | |
[1650638545.958037] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638545.959320] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3831050: adding gid fe80::15:5dff:fd34:1e to hash on device mlx5_ib3 port 1 index 0) | |
[1650638545.959779] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3831050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 1) | |
[1650638545.960446] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3831050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 2) | |
[1650638545.960503] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3831050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 3) | |
[1650638545.961028] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3831050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 4) | |
[1650638545.961296] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3831050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 5) | |
[1650638545.961556] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3831050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 6) | |
[1650638545.961923] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3831050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 7) | |
[1650638545.961929] [ndv4:69188:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib3:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638545.961957] [ndv4:69188:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638545.961961] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x28e0b40 [id=103 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638545.961982] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 103 events 0x5 mode thread_spinlock | |
[1650638545.961994] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[20]=0x3831050 using ud_mlx5/mlx5_ib3:1 on worker 0x1c8f6f0 | |
[1650638545.962436] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638545.962443] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638545.962798] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638545.962803] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638545.963801] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib4:1 | |
[1650638545.965529] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638545.965547] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638545.965550] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638545.965601] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638545.966736] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638545.966742] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638545.967717] [ndv4:69188:0] ib_iface.c:994 UCX DEBUG iface=0x39310a0: created RC QP 0x3046 on mlx5_ib4:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638545.968704] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[21]=0x39310a0 using rc_verbs/mlx5_ib4:1 on worker 0x1c8f6f0 | |
[1650638545.968869] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638545.968875] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638545.969052] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638545.969057] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638545.969623] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib4:1 | |
[1650638545.970850] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638545.970864] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638545.970867] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638545.970916] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638545.971456] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x3c52010 of 8176 bytes with 127 elements | |
[1650638545.971714] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638545.971720] [ndv4:69188:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib4:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638545.971756] [ndv4:69188:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib4 length=2048) failed: Invalid argument | |
[1650638545.971760] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638545.971771] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x2ff6930 [id=106 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638545.971792] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 106 events 0x1 mode thread_spinlock | |
[1650638545.971801] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[22]=0x3a7e030 using rc_mlx5/mlx5_ib4:1 on worker 0x1c8f6f0 | |
[1650638545.971890] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638545.971895] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638545.972859] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638545.972866] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638545.973098] [ndv4:69197:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638545.980077] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638545.980085] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638545.980767] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib4:1 | |
[1650638545.981926] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638545.981942] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638545.981945] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638545.982004] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638545.982443] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638545.982451] [ndv4:69188:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib4:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638545.982483] [ndv4:69188:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib4 length=2048) failed: Invalid argument | |
[1650638545.982486] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638545.982496] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x3a86f60 [id=108 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638545.982517] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 108 events 0x1 mode thread_spinlock | |
[1650638545.983228] [ndv4:69188:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638545.983949] [ndv4:69197:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib1 vendor_id: 0x15b3 device_id: 4124 | |
[1650638545.984340] [ndv4:69197:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib1: disable ODP because it's not supported for DevX QP | |
[1650638545.986192] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x20333c0 [id=59 ref 1] uct_ib_async_event_handler() to hash | |
[1650638545.986310] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 59 events 0x1 mode thread_spinlock | |
[1650638545.986315] [ndv4:69197:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib1' (InfiniBand channel adapter) with 1 ports | |
[1650638545.986679] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638545.986688] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638545.986696] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638545.986756] [ndv4:69197:0] ib_md.c:1319 UCX DEBUG mlx5_ib1: using registration cache | |
[1650638545.990875] [ndv4:69188:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x3c54050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x3059 | |
[1650638545.995428] [ndv4:69197:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638545.995436] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638545.995702] [ndv4:69197:0] ib_md.c:1604 UCX DEBUG mlx5_ib1: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638545.995789] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638545.995795] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638546.000730] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638546.000743] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638546.000755] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2ae4cbfbb008 of 151544 bytes with 1052 elements | |
[1650638546.004607] [ndv4:69188:0] ib_md.c:812 UCX DEBUG registered memory 0x2ae4dcc00000..0x2ae4df200000 on mlx5_ib4 lkey 0x80900 rkey 0x80900 access 0xf flags 0x3e4 | |
[1650638546.004631] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2ae4dcc00018 of 39845864 bytes with 4752 elements | |
[1650638546.004774] [ndv4:69188:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x3c54050 | |
[1650638546.004813] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[23]=0x3c54050 using dc_mlx5/mlx5_ib4:1 on worker 0x1c8f6f0 | |
[1650638546.004915] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638546.004924] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638546.005028] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638546.005032] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638546.005757] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib4:1 | |
[1650638546.006858] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638546.007249] [ndv4:69188:0] ib_iface.c:994 UCX DEBUG iface=0x3e30060: created UD QP 0x304f on mlx5_ib4:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638546.007781] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638546.015510] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638546.015518] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638546.015689] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638546.015694] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638546.016520] [ndv4:69197:0] topo.c:99 UCX DEBUG bus id 0x102000000 doesn't exist. sys_dev = 1 | |
[1650638546.016528] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638546.016074] [ndv4:69188:0] ib_md.c:812 UCX DEBUG registered memory 0x2ae4df201000..0x2ae4df286000 on mlx5_ib4 lkey 0x80a00 rkey 0x80a00 access 0xf flags 0x3e4 | |
[1650638546.016080] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ae4df201018 of 544744 bytes with 128 elements | |
[1650638546.016085] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638546.016523] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3e30060: adding gid fe80::15:5dff:fd34:1f to hash on device mlx5_ib4 port 1 index 0) | |
[1650638546.016759] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3e30060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 1) | |
[1650638546.016776] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3e30060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 2) | |
[1650638546.016791] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3e30060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 3) | |
[1650638546.017145] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3e30060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 4) | |
[1650638546.017557] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3e30060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 5) | |
[1650638546.017677] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3e30060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 6) | |
[1650638546.018010] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3e30060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 7) | |
[1650638546.018321] [ndv4:69188:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638546.018330] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x3114fc0 [id=109 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638546.018358] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 109 events 0x5 mode thread_spinlock | |
[1650638546.018368] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[24]=0x3e30060 using ud_verbs/mlx5_ib4:1 on worker 0x1c8f6f0 | |
[1650638546.018411] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638546.018416] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638546.027738] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638546.027748] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638546.028180] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib4:1 | |
[1650638546.029319] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638546.029659] [ndv4:69188:0] ib_iface.c:994 UCX DEBUG iface=0x3f4e460: created UD QP 0x3050 on mlx5_ib4:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638546.029666] [ndv4:69188:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638546.030203] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638546.030347] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638546.030354] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638546.030366] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638546.030370] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638546.030702] [ndv4:69188:0] ib_md.c:812 UCX DEBUG registered memory 0x2ae4df286000..0x2ae4df30b000 on mlx5_ib4 lkey 0x80b00 rkey 0x80b00 access 0xf flags 0x3e4 | |
[1650638546.030708] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ae4df286018 of 544744 bytes with 128 elements | |
[1650638546.030712] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638546.031069] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3f4e460: adding gid fe80::15:5dff:fd34:1f to hash on device mlx5_ib4 port 1 index 0) | |
[1650638546.041071] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3f4e460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 1) | |
[1650638546.041240] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3f4e460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 2) | |
[1650638546.041376] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3f4e460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 3) | |
[1650638546.041527] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3f4e460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 4) | |
[1650638546.041679] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3f4e460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 5) | |
[1650638546.041922] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3f4e460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 6) | |
[1650638546.041967] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3f4e460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 7) | |
[1650638546.041972] [ndv4:69188:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib4:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638546.042002] [ndv4:69188:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638546.042006] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x390ce30 [id=110 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638546.042026] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 110 events 0x5 mode thread_spinlock | |
[1650638546.042034] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[25]=0x3f4e460 using ud_mlx5/mlx5_ib4:1 on worker 0x1c8f6f0 | |
[1650638546.042091] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638546.042096] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638546.042187] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638546.042192] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638546.042876] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib5:1 | |
[1650638546.044674] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638546.044691] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638546.044694] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638546.044745] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638546.045937] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638546.045943] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638546.047524] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638546.047532] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638546.047611] [ndv4:69188:0] ib_iface.c:994 UCX DEBUG iface=0x404e0a0: created RC QP 0x2b9d on mlx5_ib5:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638546.048565] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[26]=0x404e0a0 using rc_verbs/mlx5_ib5:1 on worker 0x1c8f6f0 | |
[1650638546.048766] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638546.048771] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638546.048929] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638546.048934] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638546.049510] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib5:1 | |
[1650638546.050883] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638546.050896] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638546.050899] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638546.050950] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638546.051480] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x436f010 of 8176 bytes with 127 elements | |
[1650638546.051738] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638546.051746] [ndv4:69188:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib5:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638546.051780] [ndv4:69188:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib5 length=2048) failed: Invalid argument | |
[1650638546.051783] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638546.051795] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x3f4e3e0 [id=113 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638546.051815] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 113 events 0x1 mode thread_spinlock | |
[1650638546.051825] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[27]=0x419b030 using rc_mlx5/mlx5_ib5:1 on worker 0x1c8f6f0 | |
[1650638546.052080] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638546.052086] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638546.052422] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638546.052428] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638546.052996] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib5:1 | |
[1650638546.054416] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638546.054434] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638546.054437] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638546.054492] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638546.054856] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638546.054863] [ndv4:69188:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib5:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638546.054903] [ndv4:69188:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib5 length=2048) failed: Invalid argument | |
[1650638546.054907] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638546.054913] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x2ad29a0 [id=115 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638546.054932] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 115 events 0x1 mode thread_spinlock | |
[1650638546.055612] [ndv4:69188:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638546.063252] [ndv4:69188:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x4371050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x2bc0 | |
[1650638546.063582] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638546.063588] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638546.063596] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2ae4e1b0c008 of 151544 bytes with 1052 elements | |
[1650638546.067612] [ndv4:69188:0] ib_md.c:812 UCX DEBUG registered memory 0x2ae4df400000..0x2ae4e1a00000 on mlx5_ib5 lkey 0x80700 rkey 0x80700 access 0xf flags 0x3e4 | |
[1650638546.067636] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2ae4df400018 of 39845864 bytes with 4752 elements | |
[1650638546.067771] [ndv4:69188:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x4371050 | |
[1650638546.067805] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[28]=0x4371050 using dc_mlx5/mlx5_ib5:1 on worker 0x1c8f6f0 | |
[1650638546.068076] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638546.068086] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638546.068365] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638546.068370] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638546.069121] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib5:1 | |
[1650638546.070021] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638546.070370] [ndv4:69188:0] ib_iface.c:994 UCX DEBUG iface=0x3141480: created UD QP 0x2ba6 on mlx5_ib5:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638546.070937] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638546.074953] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638546.074960] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638546.079781] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638546.079790] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638546.079931] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638546.079937] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638546.080362] [ndv4:69188:0] ib_md.c:812 UCX DEBUG registered memory 0x2ae4e1b31000..0x2ae4e1bb6000 on mlx5_ib5 lkey 0x80800 rkey 0x80800 access 0xf flags 0x3e4 | |
[1650638546.080369] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ae4e1b31018 of 544744 bytes with 128 elements | |
[1650638546.080373] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638546.080415] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3141480: adding gid fe80::15:5dff:fd34:20 to hash on device mlx5_ib5 port 1 index 0) | |
[1650638546.080433] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3141480: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 1) | |
[1650638546.080448] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3141480: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 2) | |
[1650638546.089801] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3141480: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 3) | |
[1650638546.090352] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3141480: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 4) | |
[1650638546.090908] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3141480: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 5) | |
[1650638546.092440] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638546.092447] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638546.093711] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638546.093717] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638546.093997] [ndv4:69197:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638546.100279] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3141480: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 6) | |
[1650638546.100699] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3141480: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 7) | |
[1650638546.101030] [ndv4:69188:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638546.101038] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x2ad27d0 [id=116 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638546.101064] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 116 events 0x5 mode thread_spinlock | |
[1650638546.101075] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[29]=0x3141480 using ud_verbs/mlx5_ib5:1 on worker 0x1c8f6f0 | |
[1650638546.101256] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638546.101262] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638546.101425] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638546.101430] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638546.101944] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib5:1 | |
[1650638546.103060] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638546.103477] [ndv4:69188:0] ib_iface.c:994 UCX DEBUG iface=0x466b050: created UD QP 0x2ba7 on mlx5_ib5:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638546.103485] [ndv4:69188:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638546.104025] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638546.104270] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638546.104275] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638546.104318] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638546.104322] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638546.104844] [ndv4:69188:0] ib_md.c:812 UCX DEBUG registered memory 0x2ae4e1bb6000..0x2ae4e1c3b000 on mlx5_ib5 lkey 0x80900 rkey 0x80900 access 0xf flags 0x3e4 | |
[1650638546.104851] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ae4e1bb6018 of 544744 bytes with 128 elements | |
[1650638546.104855] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638546.105516] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x466b050: adding gid fe80::15:5dff:fd34:20 to hash on device mlx5_ib5 port 1 index 0) | |
[1650638546.105725] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x466b050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 1) | |
[1650638546.105888] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x466b050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 2) | |
[1650638546.106358] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x466b050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 3) | |
[1650638546.106804] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x466b050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 4) | |
[1650638546.107088] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x466b050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 5) | |
[1650638546.112383] [ndv4:69197:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib2 vendor_id: 0x15b3 device_id: 4124 | |
[1650638546.112736] [ndv4:69197:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib2: disable ODP because it's not supported for DevX QP | |
[1650638546.114138] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x2005060 [id=61 ref 1] uct_ib_async_event_handler() to hash | |
[1650638546.114162] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 61 events 0x1 mode thread_spinlock | |
[1650638546.114167] [ndv4:69197:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib2' (InfiniBand channel adapter) with 1 ports | |
[1650638546.114466] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638546.114478] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638546.114486] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638546.114550] [ndv4:69197:0] ib_md.c:1319 UCX DEBUG mlx5_ib2: using registration cache | |
[1650638546.115179] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x466b050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 6) | |
[1650638546.115882] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x466b050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 7) | |
[1650638546.115888] [ndv4:69188:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib5:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638546.115916] [ndv4:69188:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638546.115921] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x40293a0 [id=117 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638546.115941] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 117 events 0x5 mode thread_spinlock | |
[1650638546.115951] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[30]=0x466b050 using ud_mlx5/mlx5_ib5:1 on worker 0x1c8f6f0 | |
[1650638546.116037] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638546.116043] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638546.116168] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638546.116172] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638546.116510] [ndv4:69197:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638546.116519] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638546.116793] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib6:1 | |
[1650638546.116779] [ndv4:69197:0] ib_md.c:1604 UCX DEBUG mlx5_ib2: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638546.116988] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638546.116994] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638546.118844] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638546.118862] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638546.118865] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638546.118914] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638546.120154] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638546.120160] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638546.121319] [ndv4:69188:0] ib_iface.c:994 UCX DEBUG iface=0x476b0a0: created RC QP 0x2b0c on mlx5_ib6:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638546.122351] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[31]=0x476b0a0 using rc_verbs/mlx5_ib6:1 on worker 0x1c8f6f0 | |
[1650638546.122389] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638546.122396] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638546.122486] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638546.122490] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638546.122831] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib6:1 | |
[1650638546.124010] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638546.124025] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638546.124028] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638546.124078] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638546.124617] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x4a8c010 of 8176 bytes with 127 elements | |
[1650638546.124868] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638546.124875] [ndv4:69188:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib6:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638546.124910] [ndv4:69188:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib6 length=2048) failed: Invalid argument | |
[1650638546.124914] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638546.124925] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x47462d0 [id=120 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638546.124946] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 120 events 0x1 mode thread_spinlock | |
[1650638546.124956] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[32]=0x48b8030 using rc_mlx5/mlx5_ib6:1 on worker 0x1c8f6f0 | |
[1650638546.125065] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638546.125071] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638546.125202] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638546.125227] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638546.126247] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib6:1 | |
[1650638546.127934] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638546.127957] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638546.127962] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638546.128016] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638546.128424] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638546.128435] [ndv4:69188:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib6:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638546.128472] [ndv4:69188:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib6 length=2048) failed: Invalid argument | |
[1650638546.128476] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638546.128487] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x48c0f60 [id=122 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638546.128514] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 122 events 0x1 mode thread_spinlock | |
[1650638546.129198] [ndv4:69188:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638546.136976] [ndv4:69188:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x4a8e050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x2b13 | |
[1650638546.137295] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638546.137303] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638546.137315] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2ae4e443c008 of 151544 bytes with 1052 elements | |
[1650638546.138748] [ndv4:69197:0] topo.c:99 UCX DEBUG bus id 0x103000000 doesn't exist. sys_dev = 2 | |
[1650638546.138757] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638546.141139] [ndv4:69188:0] ib_md.c:812 UCX DEBUG registered memory 0x2ae4e1e00000..0x2ae4e4400000 on mlx5_ib6 lkey 0x80b00 rkey 0x80b00 access 0xf flags 0x3e4 | |
[1650638546.141159] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2ae4e1e00018 of 39845864 bytes with 4752 elements | |
[1650638546.141361] [ndv4:69188:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x4a8e050 | |
[1650638546.141397] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[33]=0x4a8e050 using dc_mlx5/mlx5_ib6:1 on worker 0x1c8f6f0 | |
[1650638546.141486] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638546.141495] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638546.141559] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638546.141563] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638546.141939] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib6:1 | |
[1650638546.142904] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638546.143309] [ndv4:69188:0] ib_iface.c:994 UCX DEBUG iface=0x4c6a060: created UD QP 0x2b18 on mlx5_ib6:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638546.143847] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638546.151872] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638546.151880] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638546.155343] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638546.155351] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638546.155424] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638546.155429] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638546.155824] [ndv4:69188:0] ib_md.c:812 UCX DEBUG registered memory 0x2ae4e4461000..0x2ae4e44e6000 on mlx5_ib6 lkey 0x80c00 rkey 0x80c00 access 0xf flags 0x3e4 | |
[1650638546.155830] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ae4e4461018 of 544744 bytes with 128 elements | |
[1650638546.155834] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638546.156297] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x4c6a060: adding gid fe80::15:5dff:fd34:21 to hash on device mlx5_ib6 port 1 index 0) | |
[1650638546.156692] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x4c6a060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 1) | |
[1650638546.156935] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x4c6a060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 2) | |
[1650638546.156996] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x4c6a060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 3) | |
[1650638546.160738] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638546.160746] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638546.161803] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638546.161807] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638546.164174] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x4c6a060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 4) | |
[1650638546.164944] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x4c6a060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 5) | |
[1650638546.169893] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638546.169901] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638546.170282] [ndv4:69197:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638546.180669] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x4c6a060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 6) | |
[1650638546.180977] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x4c6a060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 7) | |
[1650638546.181334] [ndv4:69188:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638546.181343] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x4c6af20 [id=123 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638546.181369] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 123 events 0x5 mode thread_spinlock | |
[1650638546.181379] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[34]=0x4c6a060 using ud_verbs/mlx5_ib6:1 on worker 0x1c8f6f0 | |
[1650638546.181513] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638546.181519] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638546.181646] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638546.181651] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638546.182299] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib6:1 | |
[1650638546.183698] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638546.184579] [ndv4:69188:0] ib_iface.c:994 UCX DEBUG iface=0x3f7b3a0: created UD QP 0x2b19 on mlx5_ib6:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638546.184586] [ndv4:69188:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638546.185237] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638546.185436] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638546.185441] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638546.185456] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638546.185460] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638546.185876] [ndv4:69188:0] ib_md.c:812 UCX DEBUG registered memory 0x2ae4e44e6000..0x2ae4e456b000 on mlx5_ib6 lkey 0x80d00 rkey 0x80d00 access 0xf flags 0x3e4 | |
[1650638546.185882] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ae4e44e6018 of 544744 bytes with 128 elements | |
[1650638546.185886] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638546.186238] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3f7b3a0: adding gid fe80::15:5dff:fd34:21 to hash on device mlx5_ib6 port 1 index 0) | |
[1650638546.186487] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3f7b3a0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 1) | |
[1650638546.186767] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3f7b3a0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 2) | |
[1650638546.186825] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3f7b3a0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 3) | |
[1650638546.187166] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3f7b3a0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 4) | |
[1650638546.187541] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3f7b3a0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 5) | |
[1650638546.187822] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3f7b3a0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 6) | |
[1650638546.187836] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x3f7b3a0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 7) | |
[1650638546.187842] [ndv4:69188:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib6:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638546.187871] [ndv4:69188:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638546.187875] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x47460e0 [id=124 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638546.187893] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 124 events 0x5 mode thread_spinlock | |
[1650638546.187901] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[35]=0x3f7b3a0 using ud_mlx5/mlx5_ib6:1 on worker 0x1c8f6f0 | |
[1650638546.188027] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638546.188032] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638546.188155] [ndv4:69197:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib3 vendor_id: 0x15b3 device_id: 4124 | |
[1650638546.188568] [ndv4:69197:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib3: disable ODP because it's not supported for DevX QP | |
[1650638546.188380] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638546.188388] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638546.189975] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x2714020 [id=63 ref 1] uct_ib_async_event_handler() to hash | |
[1650638546.190001] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 63 events 0x1 mode thread_spinlock | |
[1650638546.190005] [ndv4:69197:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib3' (InfiniBand channel adapter) with 1 ports | |
[1650638546.196453] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib7:1 | |
[1650638546.197440] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638546.197452] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638546.197460] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638546.197523] [ndv4:69197:0] ib_md.c:1319 UCX DEBUG mlx5_ib3: using registration cache | |
[1650638546.198290] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638546.198307] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638546.198310] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638546.198361] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638546.198939] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638546.198945] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638546.200103] [ndv4:69188:0] ib_iface.c:994 UCX DEBUG iface=0x4e880a0: created RC QP 0x2af4 on mlx5_ib7:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638546.200980] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[36]=0x4e880a0 using rc_verbs/mlx5_ib7:1 on worker 0x1c8f6f0 | |
[1650638546.201095] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638546.201101] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638546.201198] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638546.201202] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638546.201454] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib7:1 | |
[1650638546.209750] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638546.209765] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638546.209768] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638546.209819] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638546.210287] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x51a9010 of 8176 bytes with 127 elements | |
[1650638546.210541] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638546.210548] [ndv4:69188:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib7:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638546.210618] [ndv4:69188:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib7 length=2048) failed: Invalid argument | |
[1650638546.210623] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638546.210635] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x4d888c0 [id=127 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638546.210654] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 127 events 0x1 mode thread_spinlock | |
[1650638546.210667] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[37]=0x4fd5030 using rc_mlx5/mlx5_ib7:1 on worker 0x1c8f6f0 | |
[1650638546.210899] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638546.210905] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638546.211200] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638546.211221] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638546.211688] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib7:1 | |
[1650638546.213164] [ndv4:69197:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638546.213173] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638546.213540] [ndv4:69197:0] ib_md.c:1604 UCX DEBUG mlx5_ib3: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638546.213651] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638546.213658] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638546.221071] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638546.221088] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638546.221091] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638546.221147] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638546.221559] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638546.221567] [ndv4:69188:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib7:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638546.221636] [ndv4:69188:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib7 length=2048) failed: Invalid argument | |
[1650638546.221640] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638546.221648] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x4e63760 [id=129 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638546.221666] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 129 events 0x1 mode thread_spinlock | |
[1650638546.224117] [ndv4:69197:0] topo.c:99 UCX DEBUG bus id 0x104000000 doesn't exist. sys_dev = 3 | |
[1650638546.224125] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638546.234025] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638546.234032] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638546.267757] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638546.267764] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638546.289829] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638546.289836] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638546.291563] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638546.291569] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638546.291867] [ndv4:69197:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638546.309070] [ndv4:68756:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638546.313956] [ndv4:69188:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638546.315996] [ndv4:69197:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib4 vendor_id: 0x15b3 device_id: 4124 | |
[1650638546.316463] [ndv4:69197:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib4: disable ODP because it's not supported for DevX QP | |
[1650638546.323644] [ndv4:68756:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x5224050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x2afb | |
[1650638546.324256] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638546.324265] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638546.324283] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2b13b3ba0008 of 151544 bytes with 1052 elements | |
[1650638546.322417] [ndv4:69188:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x51ab050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x2afc | |
[1650638546.322563] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638546.322570] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638546.322580] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2ae4e6d6c008 of 151544 bytes with 1052 elements | |
[1650638546.326551] [ndv4:69188:0] ib_md.c:812 UCX DEBUG registered memory 0x2ae4e4600000..0x2ae4e6c00000 on mlx5_ib7 lkey 0x80400 rkey 0x80400 access 0xf flags 0x3e4 | |
[1650638546.326574] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2ae4e4600018 of 39845864 bytes with 4752 elements | |
[1650638546.326708] [ndv4:69188:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x51ab050 | |
[1650638546.326747] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[38]=0x51ab050 using dc_mlx5/mlx5_ib7:1 on worker 0x1c8f6f0 | |
[1650638546.326971] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638546.326982] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638546.327156] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638546.327160] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638546.327787] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib7:1 | |
[1650638546.327919] [ndv4:68756:0] ib_md.c:812 UCX DEBUG registered memory 0x2b13b1400000..0x2b13b3a00000 on mlx5_ib7 lkey 0x80500 rkey 0x80500 access 0xf flags 0x3e4 | |
[1650638546.327942] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2b13b1400018 of 39845864 bytes with 4752 elements | |
[1650638546.328082] [ndv4:68756:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x5224050 | |
[1650638546.328157] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[38]=0x5224050 using dc_mlx5/mlx5_ib7:1 on worker 0x1d084d0 | |
[1650638546.328302] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638546.328314] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638546.328428] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638546.328433] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638546.332316] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x27136d0 [id=65 ref 1] uct_ib_async_event_handler() to hash | |
[1650638546.332338] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 65 events 0x1 mode thread_spinlock | |
[1650638546.332342] [ndv4:69197:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib4' (InfiniBand channel adapter) with 1 ports | |
[1650638546.332648] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638546.332656] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638546.332662] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638546.332712] [ndv4:69197:0] ib_md.c:1319 UCX DEBUG mlx5_ib4: using registration cache | |
[1650638546.333944] [ndv4:69197:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638546.333950] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638546.334278] [ndv4:69197:0] ib_md.c:1604 UCX DEBUG mlx5_ib4: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638546.334318] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638546.334324] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638546.335600] [ndv4:69197:0] topo.c:99 UCX DEBUG bus id 0x105000000 doesn't exist. sys_dev = 4 | |
[1650638546.335604] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638546.343743] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638546.343752] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638546.343760] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638546.344189] [ndv4:69188:0] ib_iface.c:994 UCX DEBUG iface=0x4698570: created UD QP 0x2b09 on mlx5_ib7:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638546.344771] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638546.345035] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638546.345042] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638546.345413] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638546.345419] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638546.345903] [ndv4:69188:0] ib_md.c:812 UCX DEBUG registered memory 0x2ae4e6d91000..0x2ae4e6e16000 on mlx5_ib7 lkey 0x80600 rkey 0x80600 access 0xf flags 0x3e4 | |
[1650638546.345909] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ae4e6d91018 of 544744 bytes with 128 elements | |
[1650638546.345913] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638546.346497] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x4698570: adding gid fe80::15:5dff:fd34:22 to hash on device mlx5_ib7 port 1 index 0) | |
[1650638546.347103] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x4698570: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 1) | |
[1650638546.345593] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib7:1 | |
[1650638546.346958] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638546.347313] [ndv4:68756:0] ib_iface.c:994 UCX DEBUG iface=0x47114e0: created UD QP 0x2b0a on mlx5_ib7:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638546.346949] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638546.346955] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638546.348049] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x4698570: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 2) | |
[1650638546.349191] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x4698570: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 3) | |
[1650638546.347762] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638546.348138] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638546.348147] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638546.348336] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638546.348341] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638546.348701] [ndv4:68756:0] ib_md.c:812 UCX DEBUG registered memory 0x2b13b3bc5000..0x2b13b3c4a000 on mlx5_ib7 lkey 0x80700 rkey 0x80700 access 0xf flags 0x3e4 | |
[1650638546.348707] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2b13b3bc5018 of 544744 bytes with 128 elements | |
[1650638546.348712] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638546.349464] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x4698570: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 4) | |
[1650638546.349507] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x4698570: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 5) | |
[1650638546.349738] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x4698570: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 6) | |
[1650638546.349907] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x4698570: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 7) | |
[1650638546.350191] [ndv4:69188:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638546.350200] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x4698fc0 [id=130 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638546.350269] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 130 events 0x5 mode thread_spinlock | |
[1650638546.350283] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[39]=0x4698570 using ud_verbs/mlx5_ib7:1 on worker 0x1c8f6f0 | |
[1650638546.350385] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638546.350390] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638546.350461] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638546.350466] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638546.350615] [ndv4:69188:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib7:1 | |
[1650638546.351625] [ndv4:69188:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638546.351943] [ndv4:69188:0] ib_iface.c:994 UCX DEBUG iface=0x54a5460: created UD QP 0x2b0b on mlx5_ib7:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638546.351952] [ndv4:69188:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638546.352436] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638546.352579] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638546.352585] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638546.352598] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638546.352602] [ndv4:69188:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638546.352935] [ndv4:69188:0] ib_md.c:812 UCX DEBUG registered memory 0x2ae4e6e16000..0x2ae4e6e9b000 on mlx5_ib7 lkey 0x80800 rkey 0x80800 access 0xf flags 0x3e4 | |
[1650638546.352942] [ndv4:69188:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ae4e6e16018 of 544744 bytes with 128 elements | |
[1650638546.352946] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638546.353302] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x54a5460: adding gid fe80::15:5dff:fd34:22 to hash on device mlx5_ib7 port 1 index 0) | |
[1650638546.353422] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x54a5460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 1) | |
[1650638546.357005] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x47114e0: adding gid fe80::15:5dff:fd34:22 to hash on device mlx5_ib7 port 1 index 0) | |
[1650638546.357439] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x47114e0: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 1) | |
[1650638546.357977] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x47114e0: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 2) | |
[1650638546.361920] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x54a5460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 2) | |
[1650638546.365423] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638546.365432] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638546.365905] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x47114e0: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 3) | |
[1650638546.366550] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x47114e0: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 4) | |
[1650638546.367061] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x47114e0: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 5) | |
[1650638546.367341] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x47114e0: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 6) | |
[1650638546.372593] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x54a5460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 3) | |
[1650638546.372830] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x54a5460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 4) | |
[1650638546.372971] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x54a5460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 5) | |
[1650638546.373118] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x54a5460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 6) | |
[1650638546.373149] [ndv4:69188:0] ud_iface.c:393 UCX DEBUG iface 0x54a5460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 7) | |
[1650638546.373155] [ndv4:69188:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib7:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638546.373185] [ndv4:69188:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638546.373189] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x54a5f80 [id=131 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638546.373270] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 131 events 0x5 mode thread_spinlock | |
[1650638546.373281] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[40]=0x54a5460 using ud_mlx5/mlx5_ib7:1 on worker 0x1c8f6f0 | |
[1650638546.373384] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool uct_scopy_iface_tx_mp: align 64, maxelems 4294967295, elemsize 736 | |
[1650638546.373442] [ndv4:69188:0] ucp_worker.c:1159 UCX DEBUG created interface[41]=0x4db5600 using cma/memory on worker 0x1c8f6f0 | |
[1650638546.373448] [ndv4:69188:0] ucp_worker.c:982 UCX DEBUG selected scalable tl bitmap: 0x3ffffffffff 0x0 (42 tls) | |
[1650638546.378434] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x47114e0: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 7) | |
[1650638546.378662] [ndv4:68756:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638546.378674] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x4711dd0 [id=130 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638546.378703] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 130 events 0x5 mode thread_spinlock | |
[1650638546.379159] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[39]=0x47114e0 using ud_verbs/mlx5_ib7:1 on worker 0x1d084d0 | |
[1650638546.379260] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638546.379267] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638546.379327] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638546.379332] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638546.379594] [ndv4:68756:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib7:1 | |
[1650638546.380485] [ndv4:68756:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638546.380815] [ndv4:68756:0] ib_iface.c:994 UCX DEBUG iface=0x551e460: created UD QP 0x2b0c on mlx5_ib7:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638546.380823] [ndv4:68756:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638546.381388] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638546.381516] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638546.381522] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638546.381596] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638546.381601] [ndv4:68756:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638546.381962] [ndv4:68756:0] ib_md.c:812 UCX DEBUG registered memory 0x2b13b3c4a000..0x2b13b3ccf000 on mlx5_ib7 lkey 0x80900 rkey 0x80900 access 0xf flags 0x3e4 | |
[1650638546.381968] [ndv4:68756:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2b13b3c4a018 of 544744 bytes with 128 elements | |
[1650638546.381972] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638546.382318] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x551e460: adding gid fe80::15:5dff:fd34:22 to hash on device mlx5_ib7 port 1 index 0) | |
[1650638546.382642] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x551e460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 1) | |
[1650638546.382899] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x551e460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 2) | |
[1650638546.392828] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x551e460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 3) | |
[1650638546.393161] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x551e460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 4) | |
[1650638546.395898] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638546.395907] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638546.396164] [ndv4:69197:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638546.400794] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x551e460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 5) | |
[1650638546.401140] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x551e460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 6) | |
[1650638546.401927] [ndv4:68756:0] ud_iface.c:393 UCX DEBUG iface 0x551e460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 7) | |
[1650638546.401938] [ndv4:68756:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib7:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638546.401972] [ndv4:68756:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638546.401976] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x4edc300 [id=131 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638546.401998] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 131 events 0x5 mode thread_spinlock | |
[1650638546.402009] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[40]=0x551e460 using ud_mlx5/mlx5_ib7:1 on worker 0x1d084d0 | |
[1650638546.402100] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool uct_scopy_iface_tx_mp: align 64, maxelems 4294967295, elemsize 736 | |
[1650638546.402149] [ndv4:68756:0] ucp_worker.c:1159 UCX DEBUG created interface[41]=0x4e2e600 using cma/memory on worker 0x1d084d0 | |
[1650638546.402155] [ndv4:68756:0] ucp_worker.c:982 UCX DEBUG selected scalable tl bitmap: 0x3ffffffffff 0x0 (42 tls) | |
[1650638546.405799] [ndv4:69197:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib5 vendor_id: 0x15b3 device_id: 4124 | |
[1650638546.406155] [ndv4:69197:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib5: disable ODP because it's not supported for DevX QP | |
[1650638546.413832] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x201bf70 [id=67 ref 1] uct_ib_async_event_handler() to hash | |
[1650638546.413853] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 67 events 0x1 mode thread_spinlock | |
[1650638546.413857] [ndv4:69197:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib5' (InfiniBand channel adapter) with 1 ports | |
[1650638546.414067] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638546.414079] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638546.414085] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638546.414132] [ndv4:69197:0] ib_md.c:1319 UCX DEBUG mlx5_ib5: using registration cache | |
[1650638546.414922] [ndv4:69197:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638546.414928] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638546.415198] [ndv4:69197:0] ib_md.c:1604 UCX DEBUG mlx5_ib5: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638546.415435] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638546.415445] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638546.424969] [ndv4:69197:0] topo.c:99 UCX DEBUG bus id 0x106000000 doesn't exist. sys_dev = 5 | |
[1650638546.425003] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638546.466183] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x5387fb0 [id=75 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.466263] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 75 events 0x0 mode thread_spinlock | |
[1650638546.466359] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x4711f90 [id=75 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.466405] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 75 events 0x0 mode thread_spinlock | |
[1650638546.466602] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x145c290 [id=76 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.466644] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 76 events 0x0 mode thread_spinlock | |
[1650638546.466761] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x145c2d0 [id=77 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.466780] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 77 events 0x0 mode thread_spinlock | |
[1650638546.466806] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x145c310 [id=79 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.466825] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 79 events 0x0 mode thread_spinlock | |
[1650638546.466879] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x145c350 [id=83 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.466895] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 83 events 0x0 mode thread_spinlock | |
[1650638546.466911] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x145c390 [id=84 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.466938] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 84 events 0x0 mode thread_spinlock | |
[1650638546.466953] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x145c3d0 [id=86 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.466972] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 86 events 0x0 mode thread_spinlock | |
[1650638546.467008] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x145c410 [id=90 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467024] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 90 events 0x0 mode thread_spinlock | |
[1650638546.467040] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x145c450 [id=91 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467058] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 91 events 0x0 mode thread_spinlock | |
[1650638546.467078] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x142dd20 [id=93 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467096] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 93 events 0x0 mode thread_spinlock | |
[1650638546.467140] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x4e2ef60 [id=97 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467161] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 97 events 0x0 mode thread_spinlock | |
[1650638546.467181] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x4e2efa0 [id=98 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467201] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 98 events 0x0 mode thread_spinlock | |
[1650638546.467274] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x3affee0 [id=100 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467294] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 100 events 0x0 mode thread_spinlock | |
[1650638546.467331] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x3afff20 [id=104 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467348] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 104 events 0x0 mode thread_spinlock | |
[1650638546.467365] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x3afff60 [id=105 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467379] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 105 events 0x0 mode thread_spinlock | |
[1650638546.467393] [ndv4:68756:0] async.c:228 UCX DEBUG ad[1650638546.466532] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x4e63370 [id=76 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.466558] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 76 events 0x0 mode thread_spinlock | |
[1650638546.466617] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x13b6a70 [id=77 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.466634] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 77 events 0x0 mode thread_spinlock | |
[1650638546.466651] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x13b6ab0 [id=79 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.466671] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 79 events 0x0 mode thread_spinlock | |
[1650638546.466717] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x13b6af0 [id=83 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.466733] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 83 events 0x0 mode thread_spinlock | |
[1650638546.466747] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x13b6b30 [id=84 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.466762] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 84 events 0x0 mode thread_spinlock | |
[1650638546.466776] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x13b6b70 [id=86 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.466793] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 86 events 0x0 mode thread_spinlock | |
[1650638546.466827] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x1cafdc0 [id=90 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.466844] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 90 events 0x0 mode thread_spinlock | |
[1650638546.466859] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x1cafe00 [id=91 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.466873] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 91 events 0x0 mode thread_spinlock | |
[1650638546.466888] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x1cafe40 [id=93 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.466903] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 93 events 0x0 mode thread_spinlock | |
[1650638546.466937] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x4db5f60 [id=97 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.466951] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 97 events 0x0 mode thread_spinlock | |
[1650638546.466966] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x4db5fa0 [id=98 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.466979] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 98 events 0x0 mode thread_spinlock | |
[1650638546.466993] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0xc51820 [id=100 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467006] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 100 events 0x0 mode thread_spinlock | |
[1650638546.467041] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0xc51860 [id=104 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467056] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 104 events 0x0 mode thread_spinlock | |
[1650638546.467070] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0xc518a0 [id=105 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467083] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 105 events 0x0 mode thread_spinlock | |
[1650638546.467097] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0xc518e0 [id=107 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467109] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 107 events 0x0 mode thread_spinlock | |
[1650638546.467143] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x1cafe80 [id=111 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467158] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 111 events 0x0 mode thread_spinlock | |
[1650638546.467581] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x1cafec0 [id=112 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467597] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 112 events 0x0 mode thread_spinlock | |
[1650638546.467612] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x1caff00 [id=114 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467626] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 114 events 0x0 mode thread_spinlock | |
[1650638546.467666] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x1caff40 [id=118 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467682] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 118 events 0x0 mode thread_spinlock | |
[1650638546.467697] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x1caff80 [id=119 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467710] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 119 events 0x0 mode thread_spinlock | |
[1650638546.467725] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x4fdde10 [id=121 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467739] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 121 events 0x0 mode thread_spinlock | |
[1650638546.467774] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x4fdde50 [id=125 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467787] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 125 events 0x0 mode thread_spinlock | |
[1650638546.467800] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x4fdde90 [id=126 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467815] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 126 events 0x0 mode thread_spinlock | |
[1650638546.467828] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x4fdded0 [id=128 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467844] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 128 events 0x0 mode thread_spinlock | |
ded async handler 0x3afffa0 [id=107 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467408] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 107 events 0x0 mode thread_spinlock | |
[1650638546.467444] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x142dd90 [id=111 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467460] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 111 events 0x0 mode thread_spinlock | |
[1650638546.467556] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x142ddd0 [id=112 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467572] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 112 events 0x0 mode thread_spinlock | |
[1650638546.467589] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x142de10 [id=114 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467608] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 114 events 0x0 mode thread_spinlock | |
[1650638546.467647] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x142de50 [id=118 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467663] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 118 events 0x0 mode thread_spinlock | |
[1650638546.467679] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x142de90 [id=119 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467694] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 119 events 0x0 mode thread_spinlock | |
[1650638546.467709] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x142ded0 [id=121 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467723] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 121 events 0x0 mode thread_spinlock | |
[1650638546.467760] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x5056e10 [id=125 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467775] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 125 events 0x0 mode thread_spinlock | |
[1650638546.467790] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x5056e50 [id=126 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467806] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 126 events 0x0 mode thread_spinlock | |
[1650638546.467821] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x5056e90 [id=128 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638546.467836] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 128 events 0x0 mode thread_spinlock | |
[1650638546.470784] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638546.470794] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638546.472824] [ndv4:69188:0] async.c:228 UCX DEBUG added async handler 0x31147e0 [id=132 ref 1] uct_rdmacm_cm_event_handler() to hash | |
[1650638546.472847] [ndv4:69188:0] async.c:506 UCX DEBUG listening to async event fd 132 events 0x1 mode thread_spinlock | |
[1650638546.472932] [ndv4:69188:0] rdmacm_cm.c:922 UCX DEBUG created rdmacm_cm 0x5580050 with event_channel 0x4db5fe0 (fd=132) | |
[1650638546.472964] [ndv4:69188:0] tcp_sockcm.c:186 UCX DEBUG created tcp_sockcm 0x4a6a7e0 | |
[1650638546.473138] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ucp_requests: align 64, maxelems 4294967295, elemsize 440 | |
[1650638546.473144] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ucp_rkeys: align 64, maxelems 4294967295, elemsize 168 | |
[1650638546.473148] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ucp_am_bufs: align 64, maxelems 4294967295, elemsize 8344 | |
[1650638546.473151] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ucp_reg_bufs: align 64, maxelems 4294967295, elemsize 8208 | |
[1650638546.473154] [ndv4:69188:0] mpool.c:88 UCX DEBUG mpool ucp_rndv_frags: align 512, maxelems 4294967295, elemsize 524304 | |
[1650638546.473062] [ndv4:68756:0] async.c:228 UCX DEBUG added async handler 0x318d7e0 [id=132 ref 1] uct_rdmacm_cm_event_handler() to hash | |
[1650638546.473087] [ndv4:68756:0] async.c:506 UCX DEBUG listening to async event fd 132 events 0x1 mode thread_spinlock | |
[1650638546.473109] [ndv4:68756:0] rdmacm_cm.c:922 UCX DEBUG created rdmacm_cm 0x55f9050 with event_channel 0x4e2efe0 (fd=132) | |
[1650638546.473145] [ndv4:68756:0] tcp_sockcm.c:186 UCX DEBUG created tcp_sockcm 0x4ae37e0 | |
[1650638546.473165] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ucp_requests: align 64, maxelems 4294967295, elemsize 440 | |
[1650638546.473168] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ucp_rkeys: align 64, maxelems 4294967295, elemsize 168 | |
[1650638546.473172] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ucp_am_bufs: align 64, maxelems 4294967295, elemsize 8344 | |
[1650638546.473175] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ucp_reg_bufs: align 64, maxelems 4294967295, elemsize 8208 | |
[1650638546.473177] [ndv4:68756:0] mpool.c:88 UCX DEBUG mpool ucp_rndv_frags: align 512, maxelems 4294967295, elemsize 524304 | |
[1650638546.473423] [ndv4:69188:0] parser.c:1893 UCX INFO UCX_* env variables: UCX_TLS=sysv,cma,ib UCX_POSIX_USE_PROC_LINK=n UCX_LOG_LEVEL=debug | |
[1650638546.473444] [ndv4:68756:0] parser.c:1893 UCX INFO UCX_* env variables: UCX_TLS=sysv,cma,ib UCX_POSIX_USE_PROC_LINK=n UCX_LOG_LEVEL=debug | |
[1650638546.480843] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638546.480850] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638546.499582] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638546.499590] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638546.501321] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638546.501327] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638546.501604] [ndv4:69197:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638546.527308] [ndv4:69197:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib6 vendor_id: 0x15b3 device_id: 4124 | |
[1650638546.527664] [ndv4:69197:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib6: disable ODP because it's not supported for DevX QP | |
[1650638546.528127] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x2714ec0 [id=69 ref 1] uct_ib_async_event_handler() to hash | |
[1650638546.528149] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 69 events 0x1 mode thread_spinlock | |
[1650638546.528152] [ndv4:69197:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib6' (InfiniBand channel adapter) with 1 ports | |
[1650638546.528388] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638546.528398] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638546.528403] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638546.528452] [ndv4:69197:0] ib_md.c:1319 UCX DEBUG mlx5_ib6: using registration cache | |
[1650638546.529048] [ndv4:69197:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638546.529053] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638546.529320] [ndv4:69197:0] ib_md.c:1604 UCX DEBUG mlx5_ib6: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638546.529380] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638546.529386] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638546.539397] [ndv4:69197:0] topo.c:99 UCX DEBUG bus id 0x107000000 doesn't exist. sys_dev = 6 | |
[1650638546.539404] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638546.540304] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638546.540308] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638546.549308] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638546.549315] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638546.567594] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638546.567601] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638546.576885] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638546.576891] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638546.577114] [ndv4:69197:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638546.586384] [ndv4:69197:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib7 vendor_id: 0x15b3 device_id: 4124 | |
[1650638546.586718] [ndv4:69197:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib7: disable ODP because it's not supported for DevX QP | |
[1650638546.587430] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x202f7b0 [id=71 ref 1] uct_ib_async_event_handler() to hash | |
[1650638546.587464] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 71 events 0x1 mode thread_spinlock | |
[1650638546.587469] [ndv4:69197:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib7' (InfiniBand channel adapter) with 1 ports | |
[1650638546.587771] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638546.587782] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638546.587795] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638546.587874] [ndv4:69197:0] ib_md.c:1319 UCX DEBUG mlx5_ib7: using registration cache | |
[1650638546.597542] [ndv4:69197:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638546.597553] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638546.597814] [ndv4:69197:0] ib_md.c:1604 UCX DEBUG mlx5_ib7: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638546.597985] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638546.597990] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638546.607306] [ndv4:69197:0] topo.c:99 UCX DEBUG bus id 0x108000000 doesn't exist. sys_dev = 7 | |
[1650638546.607314] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638546.607882] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x108000000 exists. sys_dev = 7 | |
[1650638546.607885] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638546.636403] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x108000000 exists. sys_dev = 7 | |
[1650638546.636410] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638546.648221] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x108000000 exists. sys_dev = 7 | |
[1650638546.648229] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638546.657508] [ndv4:68758:0] debug.c:1198 UCX DEBUG using signal stack 0x2ad1de44a000 size 141824 | |
[1650638546.657583] [ndv4:68758:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638546.657600] [ndv4:68758:0] init.c:113 UCX DEBUG /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 loaded at 0x2ad1de29e000 | |
[1650638546.657622] [ndv4:68758:0] init.c:115 UCX DEBUG cmd line: osu_scatter | |
[1650638546.657630] [ndv4:68758:0] module.c:69 UCX DEBUG ucs library path: /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 | |
[1650638546.657636] [ndv4:68758:0] module.c:253 UCX DEBUG loading modules for ucs | |
[1650638546.659943] [ndv4:68758:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638546.659964] [ndv4:68758:0] time.c:22 UCX DEBUG measured arch clock speed: 1000000.00 Hz | |
[1650638546.659997] [ndv4:68758:0] ucp_context.c:1361 UCX DEBUG estimated number of endpoints is 96 | |
[1650638546.660000] [ndv4:68758:0] ucp_context.c:1368 UCX DEBUG estimated number of endpoints per node is 96 | |
[1650638546.660006] [ndv4:68758:0] ucp_context.c:1375 UCX DEBUG estimated bcopy bandwidth is 5251268608.000000 | |
[1650638546.660014] [ndv4:68758:0] ucp_context.c:1427 UCX DEBUG allocation method[0] is md 'sysv' | |
[1650638546.660016] [ndv4:68758:0] ucp_context.c:1427 UCX DEBUG allocation method[1] is md 'posix' | |
[1650638546.660021] [ndv4:68758:0] ucp_context.c:1439 UCX DEBUG allocation method[2] is 'huge' | |
[1650638546.660023] [ndv4:68758:0] ucp_context.c:1439 UCX DEBUG allocation method[3] is 'thp' | |
[1650638546.660025] [ndv4:68758:0] ucp_context.c:1427 UCX DEBUG allocation method[4] is md '*' | |
[1650638546.660028] [ndv4:68758:0] ucp_context.c:1439 UCX DEBUG allocation method[5] is 'mmap' | |
[1650638546.660031] [ndv4:68758:0] ucp_context.c:1439 UCX DEBUG allocation method[6] is 'heap' | |
[1650638546.660039] [ndv4:68758:0] module.c:253 UCX DEBUG loading modules for uct | |
[1650638546.661888] [ndv4:69197:0] topo.c:91 UCX DEBUG bus id 0x108000000 exists. sys_dev = 7 | |
[1650638546.661903] [ndv4:69197:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638546.662085] [ndv4:69197:0] ucp_context.c:1556 UCX DEBUG created ucp context 0x2017190 0x2017190 [10 mds 42 tls] features 0x1 tl bitmap 0x3ffffffffff 0x0 | |
[1650638546.665502] [ndv4:68758:0] module.c:253 UCX DEBUG loading modules for uct_ib | |
[1650638546.665859] [ndv4:68758:0] ucp_context.c:1117 UCX DEBUG closing md posix because it has no selected transport resources | |
[1650638546.665882] [ndv4:68758:0] ucp_context.c:1117 UCX DEBUG closing md self because it has no selected transport resources | |
[1650638546.666108] [ndv4:68758:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib4) failed: Cannot assign requested address | |
[1650638546.666120] [ndv4:68758:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib3) failed: Cannot assign requested address | |
[1650638546.666130] [ndv4:68758:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib2) failed: Cannot assign requested address | |
[1650638546.666140] [ndv4:68758:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib7) failed: Cannot assign requested address | |
[1650638546.666150] [ndv4:68758:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib6) failed: Cannot assign requested address | |
[1650638546.666161] [ndv4:68758:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib5) failed: Cannot assign requested address | |
[1650638546.666171] [ndv4:68758:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib1) failed: Cannot assign requested address | |
[1650638546.666234] [ndv4:68758:0] ucp_context.c:1117 UCX DEBUG closing md tcp because it has no selected transport resources | |
[1650638546.666589] [ndv4:68758:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: Success, continuing, but fork may be unsafe. | |
[1650638546.698808] [ndv4:68758:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib0 vendor_id: 0x15b3 device_id: 4124 | |
[1650638546.699432] [ndv4:68758:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib0: disable ODP because it's not supported for DevX QP | |
[1650638546.707387] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x1a25680 [id=54 ref 1] uct_ib_async_event_handler() to hash | |
[1650638546.707490] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 54 events 0x1 mode thread_spinlock | |
[1650638546.707784] [ndv4:68758:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib0' (InfiniBand channel adapter) with 1 ports | |
[1650638546.708068] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638546.708079] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638546.708105] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638546.708170] [ndv4:68758:0] module.c:253 UCX DEBUG loading modules for ucm | |
[1650638546.708192] [ndv4:68758:0] ib_md.c:1319 UCX DEBUG mlx5_ib0: using registration cache | |
[1650638546.708867] [ndv4:68758:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638546.708877] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638546.709187] [ndv4:68758:0] ib_md.c:1604 UCX DEBUG mlx5_ib0: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638546.709284] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638546.709290] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638546.729378] [ndv4:69197:0] mm_sysv.c:94 UCX DEBUG mm failed to allocate 8447 bytes with hugetlb | |
[1650638546.729487] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool mm_recv_desc: align 64, maxelems 4294967295, elemsize 8368 | |
[1650638546.729517] [ndv4:69197:0] mm_sysv.c:94 UCX DEBUG mm failed to allocate 4292720 bytes with hugetlb | |
[1650638546.729527] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool mm_recv_desc: allocated chunk 0x2afd85462018 of 4296680 bytes with 512 elements | |
[1650638546.730099] [ndv4:69197:0] mm_iface.c:600 UCX DEBUG created mm iface 0x27931a0 FIFO id 0xd8006 va 0x2afd7fc8d000 size 12288 (128 x 64 elems) | |
[1650638546.730155] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[0]=0x27931a0 using sysv/memory on worker 0x306d6f0 | |
[1650638546.730655] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638546.730666] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638546.730854] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638546.730860] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638546.731764] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638546.731241] [ndv4:68758:0] topo.c:99 UCX DEBUG bus id 0x101000000 doesn't exist. sys_dev = 0 | |
[1650638546.731249] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638546.733583] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638546.733641] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638546.733646] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638546.733709] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638546.734407] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638546.734419] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638546.735836] [ndv4:69197:0] ib_iface.c:994 UCX DEBUG iface=0x279e440: created RC QP 0x30df on mlx5_ib0:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638546.738660] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[1]=0x279e440 using rc_verbs/mlx5_ib0:1 on worker 0x306d6f0 | |
[1650638546.738746] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638546.738753] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638546.738959] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638546.738964] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638546.739523] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638546.739579] [ndv4:69197:0] ib_device.c:1394 UCX DEBUG max IB CQE size is 128 | |
[1650638546.740568] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638546.740580] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638546.740583] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638546.740634] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638546.741012] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x3394010 of 8176 bytes with 127 elements | |
[1650638546.741293] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638546.741329] [ndv4:69197:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib0:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638546.741381] [ndv4:69197:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib0 length=2048) failed: Invalid argument | |
[1650638546.741387] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638546.750642] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x2703ae0 [id=78 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638546.750671] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 78 events 0x1 mode thread_spinlock | |
[1650638546.751240] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[2]=0x27abfd0 using rc_mlx5/mlx5_ib0:1 on worker 0x306d6f0 | |
[1650638546.751264] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638546.751271] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638546.751448] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638546.751458] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638546.752321] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638546.752325] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638546.753088] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638546.753092] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638546.763575] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638546.763582] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638546.763824] [ndv4:68758:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638546.768005] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638546.768016] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638546.768824] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638546.771914] [ndv4:68758:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib1 vendor_id: 0x15b3 device_id: 4124 | |
[1650638546.772286] [ndv4:68758:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib1: disable ODP because it's not supported for DevX QP | |
[1650638546.772583] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x19f9080 [id=59 ref 1] uct_ib_async_event_handler() to hash | |
[1650638546.772605] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 59 events 0x1 mode thread_spinlock | |
[1650638546.772608] [ndv4:68758:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib1' (InfiniBand channel adapter) with 1 ports | |
[1650638546.772794] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638546.772803] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638546.772809] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638546.772849] [ndv4:68758:0] ib_md.c:1319 UCX DEBUG mlx5_ib1: using registration cache | |
[1650638546.773287] [ndv4:68758:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638546.773292] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638546.773572] [ndv4:68758:0] ib_md.c:1604 UCX DEBUG mlx5_ib1: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638546.773587] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638546.773592] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638546.773728] [ndv4:68758:0] topo.c:99 UCX DEBUG bus id 0x102000000 doesn't exist. sys_dev = 1 | |
[1650638546.773732] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638546.773850] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638546.773854] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638546.773953] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638546.773956] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638546.774063] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638546.774066] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638546.774166] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638546.774169] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638546.774406] [ndv4:68758:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638546.782928] [ndv4:68758:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib2 vendor_id: 0x15b3 device_id: 4124 | |
[1650638546.783242] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638546.783252] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638546.783255] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638546.783270] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638546.783343] [ndv4:68758:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib2: disable ODP because it's not supported for DevX QP | |
[1650638546.783651] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638546.783660] [ndv4:69197:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib0:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638546.783693] [ndv4:69197:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib0 length=2048) failed: Invalid argument | |
[1650638546.783697] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638546.783707] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x26fa5b0 [id=80 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638546.783730] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 80 events 0x1 mode thread_spinlock | |
[1650638546.784454] [ndv4:69197:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638546.791676] [ndv4:69197:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x3396050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x3105 | |
[1650638546.793878] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x1a1dd50 [id=61 ref 1] uct_ib_async_event_handler() to hash | |
[1650638546.793903] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 61 events 0x1 mode thread_spinlock | |
[1650638546.793907] [ndv4:68758:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib2' (InfiniBand channel adapter) with 1 ports | |
[1650638546.794108] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638546.794117] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638546.794122] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638546.794162] [ndv4:68758:0] ib_md.c:1319 UCX DEBUG mlx5_ib2: using registration cache | |
[1650638546.794899] [ndv4:68758:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638546.794905] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638546.795187] [ndv4:68758:0] ib_md.c:1604 UCX DEBUG mlx5_ib2: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638546.795261] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638546.795269] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638546.800871] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638546.800881] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638546.800906] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2afd7fc92008 of 151544 bytes with 1052 elements | |
[1650638546.804918] [ndv4:69197:0] ib_md.c:812 UCX DEBUG registered memory 0x2afd85a00000..0x2afd88000000 on mlx5_ib0 lkey 0x80600 rkey 0x80600 access 0xf flags 0x3e4 | |
[1650638546.804963] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2afd85a00018 of 39845864 bytes with 4752 elements | |
[1650638546.805286] [ndv4:69197:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x3396050 | |
[1650638546.805338] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[3]=0x3396050 using dc_mlx5/mlx5_ib0:1 on worker 0x306d6f0 | |
[1650638546.805406] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638546.805416] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638546.805526] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638546.805531] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638546.806071] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638546.805538] [ndv4:68758:0] topo.c:99 UCX DEBUG bus id 0x103000000 doesn't exist. sys_dev = 2 | |
[1650638546.805556] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638546.806340] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638546.806345] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638546.806602] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638546.806605] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638546.807296] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638546.807301] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638546.807449] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638546.807816] [ndv4:69197:0] ib_iface.c:994 UCX DEBUG iface=0x27a9610: created UD QP 0x30e8 on mlx5_ib0:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638546.808346] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638546.808422] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638546.808431] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638546.808523] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638546.808528] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638546.808867] [ndv4:69197:0] ib_md.c:812 UCX DEBUG registered memory 0x2afd7fcb7000..0x2afd7fd3c000 on mlx5_ib0 lkey 0x80700 rkey 0x80700 access 0xf flags 0x3e4 | |
[1650638546.808881] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2afd7fcb7018 of 544744 bytes with 128 elements | |
[1650638546.808894] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638546.809539] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x27a9610: adding gid fe80::15:5dff:fd34:1b to hash on device mlx5_ib0 port 1 index 0) | |
[1650638546.809940] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x27a9610: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 1) | |
[1650638546.810315] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x27a9610: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 2) | |
[1650638546.810429] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x27a9610: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 3) | |
[1650638546.810721] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x27a9610: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 4) | |
[1650638546.811071] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x27a9610: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 5) | |
[1650638546.818345] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638546.818355] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638546.818653] [ndv4:68758:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638546.821377] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x27a9610: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 6) | |
[1650638546.821576] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x27a9610: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 7) | |
[1650638546.821946] [ndv4:69197:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638546.829309] [ndv4:68758:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib3 vendor_id: 0x15b3 device_id: 4124 | |
[1650638546.829685] [ndv4:68758:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib3: disable ODP because it's not supported for DevX QP | |
[1650638546.829995] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x1a1dc40 [id=63 ref 1] uct_ib_async_event_handler() to hash | |
[1650638546.830025] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 63 events 0x1 mode thread_spinlock | |
[1650638546.830029] [ndv4:68758:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib3' (InfiniBand channel adapter) with 1 ports | |
[1650638546.830243] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638546.830256] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638546.830263] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638546.830317] [ndv4:68758:0] ib_md.c:1319 UCX DEBUG mlx5_ib3: using registration cache | |
[1650638546.830797] [ndv4:68758:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638546.830803] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638546.830184] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x279ad00 [id=81 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638546.830283] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 81 events 0x5 mode thread_spinlock | |
[1650638546.830417] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[4]=0x27a9610 using ud_verbs/mlx5_ib0:1 on worker 0x306d6f0 | |
[1650638546.830453] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638546.830461] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638546.830534] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638546.830538] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638546.830782] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638546.831065] [ndv4:68758:0] ib_md.c:1604 UCX DEBUG mlx5_ib3: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638546.831095] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638546.831102] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638546.831375] [ndv4:68758:0] topo.c:99 UCX DEBUG bus id 0x104000000 doesn't exist. sys_dev = 3 | |
[1650638546.831380] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638546.843926] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638546.844302] [ndv4:69197:0] ib_iface.c:994 UCX DEBUG iface=0x308d570: created UD QP 0x30e9 on mlx5_ib0:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638546.844318] [ndv4:69197:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638546.844416] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638546.844423] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638546.844794] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638546.845019] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638546.845026] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638546.845102] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638546.845107] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638546.845528] [ndv4:69197:0] ib_md.c:812 UCX DEBUG registered memory 0x2afd7fd3c000..0x2afd7fdc1000 on mlx5_ib0 lkey 0x80e00 rkey 0x80e00 access 0xf flags 0x3e4 | |
[1650638546.845539] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2afd7fd3c018 of 544744 bytes with 128 elements | |
[1650638546.845553] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638546.845712] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x308d570: adding gid fe80::15:5dff:fd34:1b to hash on device mlx5_ib0 port 1 index 0) | |
[1650638546.857858] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x308d570: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 1) | |
[1650638546.858046] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x308d570: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 2) | |
[1650638546.858118] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x308d570: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 3) | |
[1650638546.858269] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x308d570: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 4) | |
[1650638546.858375] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x308d570: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 5) | |
[1650638546.858530] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x308d570: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 6) | |
[1650638546.858742] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x308d570: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 7) | |
[1650638546.858760] [ndv4:69197:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib0:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638546.859050] [ndv4:69197:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638546.859069] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x2703420 [id=82 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638546.859103] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 82 events 0x5 mode thread_spinlock | |
[1650638546.859131] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[5]=0x308d570 using ud_mlx5/mlx5_ib0:1 on worker 0x306d6f0 | |
[1650638546.859294] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638546.859300] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638546.859437] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638546.859442] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638546.869672] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib1:1 | |
[1650638546.867826] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638546.867836] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638546.871056] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638546.871062] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638546.876994] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638546.877033] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638546.877038] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638546.877110] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638546.877796] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638546.877804] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638546.878986] [ndv4:69197:0] ib_iface.c:994 UCX DEBUG iface=0x393d050: created RC QP 0x3036 on mlx5_ib1:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638546.880068] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[6]=0x393d050 using rc_verbs/mlx5_ib1:1 on worker 0x306d6f0 | |
[1650638546.880176] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638546.880183] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638546.880404] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638546.880410] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638546.889464] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638546.889472] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638546.889698] [ndv4:68758:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638546.891097] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib1:1 | |
[1650638546.892648] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638546.892659] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638546.892662] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638546.892697] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638546.893079] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x3896010 of 8176 bytes with 127 elements | |
[1650638546.893352] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638546.893361] [ndv4:69197:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib1:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638546.893398] [ndv4:69197:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib1 length=2048) failed: Invalid argument | |
[1650638546.893403] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638546.893417] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x2703ee0 [id=85 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638546.893443] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 85 events 0x1 mode thread_spinlock | |
[1650638546.893452] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[7]=0x37c3020 using rc_mlx5/mlx5_ib1:1 on worker 0x306d6f0 | |
[1650638546.893470] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638546.893476] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638546.893720] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638546.893725] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638546.894498] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib1:1 | |
[1650638546.904060] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638546.904076] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638546.904079] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638546.904134] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638546.904541] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638546.904549] [ndv4:69197:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib1:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638546.904580] [ndv4:69197:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib1 length=2048) failed: Invalid argument | |
[1650638546.904584] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638546.904590] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x2798720 [id=87 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638546.904609] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 87 events 0x1 mode thread_spinlock | |
[1650638546.905326] [ndv4:69197:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638546.906945] [ndv4:68758:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib4 vendor_id: 0x15b3 device_id: 4124 | |
[1650638546.907438] [ndv4:68758:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib4: disable ODP because it's not supported for DevX QP | |
[1650638546.912183] [ndv4:69197:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x3ae2010: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x305c | |
[1650638546.912663] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638546.912670] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638546.912680] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2afd7fdc3008 of 151544 bytes with 1052 elements | |
[1650638546.916516] [ndv4:69197:0] ib_md.c:812 UCX DEBUG registered memory 0x2afd88200000..0x2afd8a800000 on mlx5_ib1 lkey 0x80c00 rkey 0x80c00 access 0xf flags 0x3e4 | |
[1650638546.916530] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2afd88200018 of 39845864 bytes with 4752 elements | |
[1650638546.916710] [ndv4:69197:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x3ae2010 | |
[1650638546.916728] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[8]=0x3ae2010 using dc_mlx5/mlx5_ib1:1 on worker 0x306d6f0 | |
[1650638546.916879] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638546.916886] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638546.916947] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638546.916952] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638546.917050] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x21076a0 [id=65 ref 1] uct_ib_async_event_handler() to hash | |
[1650638546.917076] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 65 events 0x1 mode thread_spinlock | |
[1650638546.917080] [ndv4:68758:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib4' (InfiniBand channel adapter) with 1 ports | |
[1650638546.917388] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638546.917397] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638546.917403] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638546.917444] [ndv4:68758:0] ib_md.c:1319 UCX DEBUG mlx5_ib4: using registration cache | |
[1650638546.918445] [ndv4:68758:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638546.918451] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638546.917138] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib1:1 | |
[1650638546.918366] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638546.918703] [ndv4:69197:0] ib_iface.c:994 UCX DEBUG iface=0x39e1020: created UD QP 0x303f on mlx5_ib1:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638546.919152] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638546.920499] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638546.920509] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638546.920827] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638546.920832] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638546.918779] [ndv4:68758:0] ib_md.c:1604 UCX DEBUG mlx5_ib4: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638546.919005] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638546.919011] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638546.921150] [ndv4:69197:0] ib_md.c:812 UCX DEBUG registered memory 0x2afd7fde8000..0x2afd7fe6d000 on mlx5_ib1 lkey 0x80d00 rkey 0x80d00 access 0xf flags 0x3e4 | |
[1650638546.921159] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2afd7fde8018 of 544744 bytes with 128 elements | |
[1650638546.921167] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638546.921792] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x39e1020: adding gid fe80::15:5dff:fd34:1c to hash on device mlx5_ib1 port 1 index 0) | |
[1650638546.922106] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x39e1020: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 1) | |
[1650638546.922684] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x39e1020: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 2) | |
[1650638546.929044] [ndv4:68758:0] topo.c:99 UCX DEBUG bus id 0x105000000 doesn't exist. sys_dev = 4 | |
[1650638546.929052] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638546.931376] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x39e1020: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 3) | |
[1650638546.931623] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x39e1020: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 4) | |
[1650638546.931642] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x39e1020: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 5) | |
[1650638546.931657] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x39e1020: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 6) | |
[1650638546.931731] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x39e1020: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 7) | |
[1650638546.932040] [ndv4:69197:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638546.932052] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x279c1c0 [id=88 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638546.932082] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 88 events 0x5 mode thread_spinlock | |
[1650638546.932106] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[9]=0x39e1020 using ud_verbs/mlx5_ib1:1 on worker 0x306d6f0 | |
[1650638546.932154] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638546.932160] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638546.932295] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638546.932300] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638546.932607] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib1:1 | |
[1650638546.933426] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638546.933805] [ndv4:69197:0] ib_iface.c:994 UCX DEBUG iface=0x38983c0: created UD QP 0x3040 on mlx5_ib1:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638546.933814] [ndv4:69197:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638546.934286] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638546.934368] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638546.934373] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638546.934387] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638546.934391] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638546.934696] [ndv4:69197:0] ib_md.c:812 UCX DEBUG registered memory 0x2afd7fe6d000..0x2afd7fef2000 on mlx5_ib1 lkey 0x80e00 rkey 0x80e00 access 0xf flags 0x3e4 | |
[1650638546.934707] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2afd7fe6d018 of 544744 bytes with 128 elements | |
[1650638546.934716] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638546.934988] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x38983c0: adding gid fe80::15:5dff:fd34:1c to hash on device mlx5_ib1 port 1 index 0) | |
[1650638546.935057] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x38983c0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 1) | |
[1650638546.935073] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x38983c0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 2) | |
[1650638546.935420] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x38983c0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 3) | |
[1650638546.935624] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x38983c0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 4) | |
[1650638546.943025] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638546.943032] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638546.944712] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638546.944718] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638546.945443] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638546.945447] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638546.946001] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638546.946004] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638546.946308] [ndv4:68758:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638546.946528] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x38983c0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 5) | |
[1650638546.947054] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x38983c0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 6) | |
[1650638546.947270] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x38983c0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 7) | |
[1650638546.947275] [ndv4:69197:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib1:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638546.947417] [ndv4:69197:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638546.947422] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x3945b90 [id=89 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638546.947443] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 89 events 0x5 mode thread_spinlock | |
[1650638546.947454] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[10]=0x38983c0 using ud_mlx5/mlx5_ib1:1 on worker 0x306d6f0 | |
[1650638546.947580] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638546.947586] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638546.947787] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638546.947791] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638546.955278] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib2:1 | |
[1650638546.956783] [ndv4:68758:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib5 vendor_id: 0x15b3 device_id: 4124 | |
[1650638546.957185] [ndv4:68758:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib5: disable ODP because it's not supported for DevX QP | |
[1650638546.957709] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x1a254b0 [id=67 ref 1] uct_ib_async_event_handler() to hash | |
[1650638546.957733] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 67 events 0x1 mode thread_spinlock | |
[1650638546.957736] [ndv4:68758:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib5' (InfiniBand channel adapter) with 1 ports | |
[1650638546.964739] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638546.964756] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638546.964760] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638546.964810] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638546.965457] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638546.965472] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638546.966857] [ndv4:69197:0] ib_iface.c:994 UCX DEBUG iface=0x3ed5020: created RC QP 0x3041 on mlx5_ib2:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638546.968019] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[11]=0x3ed5020 using rc_verbs/mlx5_ib2:1 on worker 0x306d6f0 | |
[1650638546.968181] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638546.968188] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638546.968658] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638546.968665] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638546.966759] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638546.966770] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638546.966775] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638546.966818] [ndv4:68758:0] ib_md.c:1319 UCX DEBUG mlx5_ib5: using registration cache | |
[1650638546.968478] [ndv4:68758:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638546.968485] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638546.969432] [ndv4:68758:0] ib_md.c:1604 UCX DEBUG mlx5_ib5: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638546.969776] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638546.969783] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638546.969225] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib2:1 | |
[1650638546.970292] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638546.970312] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638546.970315] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638546.970364] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638546.970778] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x41f6010 of 8176 bytes with 127 elements | |
[1650638546.971024] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638546.971033] [ndv4:69197:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib2:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638546.971068] [ndv4:69197:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib2 length=2048) failed: Invalid argument | |
[1650638546.971072] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638546.971085] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x37cbc80 [id=92 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638546.971106] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 92 events 0x1 mode thread_spinlock | |
[1650638546.971117] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[12]=0x4022030 using rc_mlx5/mlx5_ib2:1 on worker 0x306d6f0 | |
[1650638546.971306] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638546.971312] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638546.971555] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638546.971560] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638546.971898] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib2:1 | |
[1650638546.980866] [ndv4:68758:0] topo.c:99 UCX DEBUG bus id 0x106000000 doesn't exist. sys_dev = 5 | |
[1650638546.980874] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638546.981538] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638546.981543] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638546.982768] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638546.982785] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638546.982788] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638546.982842] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638546.983247] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638546.983255] [ndv4:69197:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib2:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638546.983288] [ndv4:69197:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib2 length=2048) failed: Invalid argument | |
[1650638546.983292] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638546.983299] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x27962a0 [id=94 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638546.983319] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 94 events 0x1 mode thread_spinlock | |
[1650638546.984001] [ndv4:69197:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638546.990929] [ndv4:69197:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x41f8050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x3067 | |
[1650638546.991362] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638546.991369] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638546.991380] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2afd7fef4008 of 151544 bytes with 1052 elements | |
[1650638546.992163] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638546.992171] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638546.993744] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638546.993749] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638546.995373] [ndv4:69197:0] ib_md.c:812 UCX DEBUG registered memory 0x2afd8aa00000..0x2afd8d000000 on mlx5_ib2 lkey 0x80a00 rkey 0x80a00 access 0xf flags 0x3e4 | |
[1650638546.995402] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2afd8aa00018 of 39845864 bytes with 4752 elements | |
[1650638546.995572] [ndv4:69197:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x41f8050 | |
[1650638546.995609] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[13]=0x41f8050 using dc_mlx5/mlx5_ib2:1 on worker 0x306d6f0 | |
[1650638546.995864] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638546.995876] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638546.996100] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638546.996106] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638546.997297] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib2:1 | |
[1650638547.005544] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.005855] [ndv4:69197:0] ib_iface.c:994 UCX DEBUG iface=0x27a8bf0: created UD QP 0x304a on mlx5_ib2:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.006415] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.006638] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638547.006648] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638547.010970] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638547.010977] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638547.011196] [ndv4:68758:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638547.015294] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638547.015303] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638547.015691] [ndv4:69197:0] ib_md.c:812 UCX DEBUG registered memory 0x2afd7ff19000..0x2afd7ff9e000 on mlx5_ib2 lkey 0x80b00 rkey 0x80b00 access 0xf flags 0x3e4 | |
[1650638547.015698] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2afd7ff19018 of 544744 bytes with 128 elements | |
[1650638547.015703] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.016519] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x27a8bf0: adding gid fe80::15:5dff:fd34:1d to hash on device mlx5_ib2 port 1 index 0) | |
[1650638547.017172] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x27a8bf0: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 1) | |
[1650638547.018121] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x27a8bf0: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 2) | |
[1650638547.018343] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x27a8bf0: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 3) | |
[1650638547.018754] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x27a8bf0: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 4) | |
[1650638547.019607] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x27a8bf0: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 5) | |
[1650638547.020120] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x27a8bf0: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 6) | |
[1650638547.020427] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x27a8bf0: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 7) | |
[1650638547.020716] [ndv4:69197:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638547.020726] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x279a6e0 [id=95 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638547.020753] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 95 events 0x5 mode thread_spinlock | |
[1650638547.020777] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[14]=0x27a8bf0 using ud_verbs/mlx5_ib2:1 on worker 0x306d6f0 | |
[1650638547.020802] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638547.020807] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638547.021118] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638547.021124] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638547.021883] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib2:1 | |
[1650638547.034573] [ndv4:68758:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib6 vendor_id: 0x15b3 device_id: 4124 | |
[1650638547.035002] [ndv4:68758:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib6: disable ODP because it's not supported for DevX QP | |
[1650638547.036548] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x2107da0 [id=69 ref 1] uct_ib_async_event_handler() to hash | |
[1650638547.036570] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 69 events 0x1 mode thread_spinlock | |
[1650638547.036573] [ndv4:68758:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib6' (InfiniBand channel adapter) with 1 ports | |
[1650638547.036792] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.036802] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.036807] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638547.036847] [ndv4:68758:0] ib_md.c:1319 UCX DEBUG mlx5_ib6: using registration cache | |
[1650638547.038394] [ndv4:68758:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638547.038400] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638547.038682] [ndv4:68758:0] ib_md.c:1604 UCX DEBUG mlx5_ib6: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638547.038785] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.038790] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.040349] [ndv4:68758:0] topo.c:99 UCX DEBUG bus id 0x107000000 doesn't exist. sys_dev = 6 | |
[1650638547.040354] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638547.044960] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.045328] [ndv4:69197:0] ib_iface.c:994 UCX DEBUG iface=0x2791550: created UD QP 0x304b on mlx5_ib2:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.045337] [ndv4:69197:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638547.045885] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.046073] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638547.046080] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638547.046135] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638547.046140] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638547.046539] [ndv4:69197:0] ib_md.c:812 UCX DEBUG registered memory 0x2afd8d07e000..0x2afd8d103000 on mlx5_ib2 lkey 0x80c00 rkey 0x80c00 access 0xf flags 0x3e4 | |
[1650638547.046550] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2afd8d07e018 of 544744 bytes with 128 elements | |
[1650638547.046559] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.047100] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x2791550: adding gid fe80::15:5dff:fd34:1d to hash on device mlx5_ib2 port 1 index 0) | |
[1650638547.053036] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638547.053042] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638547.060417] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x2791550: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 1) | |
[1650638547.060457] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x2791550: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 2) | |
[1650638547.060475] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x2791550: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 3) | |
[1650638547.060544] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x2791550: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 4) | |
[1650638547.060561] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x2791550: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 5) | |
[1650638547.066171] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638547.066177] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638547.069993] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x2791550: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 6) | |
[1650638547.070117] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x2791550: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 7) | |
[1650638547.070126] [ndv4:69197:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib2:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.070276] [ndv4:69197:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638547.070285] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x2792340 [id=96 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638547.070311] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 96 events 0x5 mode thread_spinlock | |
[1650638547.070326] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[15]=0x2791550 using ud_mlx5/mlx5_ib2:1 on worker 0x306d6f0 | |
[1650638547.070451] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.070457] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.070655] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.070660] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.077245] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638547.077251] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638547.080703] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib3:1 | |
[1650638547.082533] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638547.082572] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.082577] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.082665] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.083311] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638547.083318] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638547.084767] [ndv4:69197:0] ib_iface.c:994 UCX DEBUG iface=0x45f20a0: created RC QP 0x3033 on mlx5_ib3:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638547.085888] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[16]=0x45f20a0 using rc_verbs/mlx5_ib3:1 on worker 0x306d6f0 | |
[1650638547.085960] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.085966] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.086067] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.086072] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.090334] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638547.090341] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638547.090565] [ndv4:68758:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638547.095497] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib3:1 | |
[1650638547.097360] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638547.097375] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.097378] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.097428] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.097888] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x4913010 of 8176 bytes with 127 elements | |
[1650638547.098139] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638547.098148] [ndv4:69197:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib3:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.098183] [ndv4:69197:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib3 length=2048) failed: Invalid argument | |
[1650638547.098187] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638547.098200] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x27b6ca0 [id=99 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638547.098270] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 99 events 0x1 mode thread_spinlock | |
[1650638547.098295] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[17]=0x473f030 using rc_mlx5/mlx5_ib3:1 on worker 0x306d6f0 | |
[1650638547.098435] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.098442] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.098658] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.098663] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.098863] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib3:1 | |
[1650638547.099861] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638547.099875] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.099878] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.099932] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.100278] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638547.100284] [ndv4:69197:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib3:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.100316] [ndv4:69197:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib3 length=2048) failed: Invalid argument | |
[1650638547.100320] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638547.100329] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x43d4f40 [id=101 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638547.100351] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 101 events 0x1 mode thread_spinlock | |
[1650638547.101039] [ndv4:69197:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638547.103549] [ndv4:68758:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib7 vendor_id: 0x15b3 device_id: 4124 | |
[1650638547.103983] [ndv4:68758:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib7: disable ODP because it's not supported for DevX QP | |
[1650638547.104516] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x1a21010 [id=71 ref 1] uct_ib_async_event_handler() to hash | |
[1650638547.104548] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 71 events 0x1 mode thread_spinlock | |
[1650638547.104551] [ndv4:68758:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib7' (InfiniBand channel adapter) with 1 ports | |
[1650638547.104827] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.104837] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.104843] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638547.104887] [ndv4:68758:0] ib_md.c:1319 UCX DEBUG mlx5_ib7: using registration cache | |
[1650638547.107942] [ndv4:69197:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x4915050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x3059 | |
[1650638547.108305] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.108312] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.108322] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2afd7ffa0008 of 151544 bytes with 1052 elements | |
[1650638547.112335] [ndv4:69197:0] ib_md.c:812 UCX DEBUG registered memory 0x2afd8d200000..0x2afd8f800000 on mlx5_ib3 lkey 0x80a00 rkey 0x80a00 access 0xf flags 0x3e4 | |
[1650638547.112349] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2afd8d200018 of 39845864 bytes with 4752 elements | |
[1650638547.112519] [ndv4:69197:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x4915050 | |
[1650638547.112539] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[18]=0x4915050 using dc_mlx5/mlx5_ib3:1 on worker 0x306d6f0 | |
[1650638547.112640] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.112647] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.112748] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.112754] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.113228] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib3:1 | |
[1650638547.123991] [ndv4:68758:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638547.124001] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638547.124313] [ndv4:68758:0] ib_md.c:1604 UCX DEBUG mlx5_ib7: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638547.125571] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.125939] [ndv4:69197:0] ib_iface.c:994 UCX DEBUG iface=0x4af1060: created UD QP 0x303c on mlx5_ib3:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.126516] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.126719] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.126728] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.126837] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.126842] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.127180] [ndv4:69197:0] ib_md.c:812 UCX DEBUG registered memory 0x2afd8f904000..0x2afd8f989000 on mlx5_ib3 lkey 0x80b00 rkey 0x80b00 access 0xf flags 0x3e4 | |
[1650638547.127186] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2afd8f904018 of 544744 bytes with 128 elements | |
[1650638547.127191] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.127611] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x4af1060: adding gid fe80::15:5dff:fd34:1e to hash on device mlx5_ib3 port 1 index 0) | |
[1650638547.134941] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.134954] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.136928] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x4af1060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 1) | |
[1650638547.137114] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x4af1060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 2) | |
[1650638547.137525] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x4af1060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 3) | |
[1650638547.137571] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x4af1060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 4) | |
[1650638547.138009] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x4af1060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 5) | |
[1650638547.138616] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x4af1060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 6) | |
[1650638547.143917] [ndv4:68758:0] topo.c:99 UCX DEBUG bus id 0x108000000 doesn't exist. sys_dev = 7 | |
[1650638547.143925] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638547.146051] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x108000000 exists. sys_dev = 7 | |
[1650638547.146060] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638547.146083] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x4af1060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 7) | |
[1650638547.146361] [ndv4:69197:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638547.146369] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x3cbec00 [id=102 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638547.146398] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 102 events 0x5 mode thread_spinlock | |
[1650638547.146421] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[19]=0x4af1060 using ud_verbs/mlx5_ib3:1 on worker 0x306d6f0 | |
[1650638547.146569] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.146576] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.146764] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.146768] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.147413] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib3:1 | |
[1650638547.148651] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.149066] [ndv4:69197:0] ib_iface.c:994 UCX DEBUG iface=0x4c0f050: created UD QP 0x303d on mlx5_ib3:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.149073] [ndv4:69197:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638547.149583] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.149911] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.149917] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.150014] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.150019] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.150406] [ndv4:69197:0] ib_md.c:812 UCX DEBUG registered memory 0x2afd8f989000..0x2afd8fa0e000 on mlx5_ib3 lkey 0x80c00 rkey 0x80c00 access 0xf flags 0x3e4 | |
[1650638547.150413] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2afd8f989018 of 544744 bytes with 128 elements | |
[1650638547.150418] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.155575] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x108000000 exists. sys_dev = 7 | |
[1650638547.155583] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638547.157773] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x4c0f050: adding gid fe80::15:5dff:fd34:1e to hash on device mlx5_ib3 port 1 index 0) | |
[1650638547.158076] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x4c0f050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 1) | |
[1650638547.158352] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x4c0f050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 2) | |
[1650638547.158699] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x4c0f050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 3) | |
[1650638547.165988] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x108000000 exists. sys_dev = 7 | |
[1650638547.165996] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638547.167726] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x4c0f050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 4) | |
[1650638547.167875] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x4c0f050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 5) | |
[1650638547.176911] [ndv4:68758:0] topo.c:91 UCX DEBUG bus id 0x108000000 exists. sys_dev = 7 | |
[1650638547.176919] [ndv4:68758:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638547.177008] [ndv4:68758:0] ucp_context.c:1556 UCX DEBUG created ucp context 0x1a0b0c0 0x1a0b0c0 [10 mds 42 tls] features 0x1 tl bitmap 0x3ffffffffff 0x0 | |
[1650638547.177442] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x4c0f050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 6) | |
[1650638547.185868] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x4c0f050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 7) | |
[1650638547.185877] [ndv4:69197:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib3:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.185912] [ndv4:69197:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638547.185917] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x3cbeb40 [id=103 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638547.185937] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 103 events 0x5 mode thread_spinlock | |
[1650638547.185947] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[20]=0x4c0f050 using ud_mlx5/mlx5_ib3:1 on worker 0x306d6f0 | |
[1650638547.185990] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.185996] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.186098] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.186103] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.187194] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib4:1 | |
[1650638547.201310] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638547.201331] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.201334] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.201385] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.201995] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638547.202002] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638547.203295] [ndv4:69197:0] ib_iface.c:994 UCX DEBUG iface=0x4d0f0a0: created RC QP 0x3052 on mlx5_ib4:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638547.204423] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[21]=0x4d0f0a0 using rc_verbs/mlx5_ib4:1 on worker 0x306d6f0 | |
[1650638547.204710] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.204717] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.204888] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.204893] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.205779] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib4:1 | |
[1650638547.207881] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638547.207894] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.207896] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.207947] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.208429] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x5030010 of 8176 bytes with 127 elements | |
[1650638547.208659] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638547.208666] [ndv4:69197:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib4:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.208702] [ndv4:69197:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib4 length=2048) failed: Invalid argument | |
[1650638547.208706] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638547.208718] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x43d4630 [id=106 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638547.208746] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 106 events 0x1 mode thread_spinlock | |
[1650638547.208756] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[22]=0x4e5c030 using rc_mlx5/mlx5_ib4:1 on worker 0x306d6f0 | |
[1650638547.208883] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.208889] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.208960] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.208966] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.219556] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib4:1 | |
[1650638547.220918] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638547.220934] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.220937] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.220992] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.221360] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638547.221367] [ndv4:69197:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib4:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.221402] [ndv4:69197:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib4 length=2048) failed: Invalid argument | |
[1650638547.221406] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638547.221416] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x4e64f60 [id=108 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638547.221438] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 108 events 0x1 mode thread_spinlock | |
[1650638547.222117] [ndv4:69197:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638547.229088] [ndv4:69197:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x5032050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x3068 | |
[1650638547.229358] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.229365] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.229374] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2afd7ffc7008 of 151544 bytes with 1052 elements | |
[1650638547.233219] [ndv4:69197:0] ib_md.c:812 UCX DEBUG registered memory 0x2afd8fc00000..0x2afd92200000 on mlx5_ib4 lkey 0x80e00 rkey 0x80e00 access 0xf flags 0x3e4 | |
[1650638547.233234] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2afd8fc00018 of 39845864 bytes with 4752 elements | |
[1650638547.233404] [ndv4:69197:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x5032050 | |
[1650638547.233420] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[23]=0x5032050 using dc_mlx5/mlx5_ib4:1 on worker 0x306d6f0 | |
[1650638547.233495] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.233502] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.233557] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.233562] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.233917] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib4:1 | |
[1650638547.234031] [ndv4:68758:0] mm_sysv.c:94 UCX DEBUG mm failed to allocate 8447 bytes with hugetlb | |
[1650638547.234124] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool mm_recv_desc: align 64, maxelems 4294967295, elemsize 8368 | |
[1650638547.234153] [ndv4:68758:0] mm_sysv.c:94 UCX DEBUG mm failed to allocate 4292720 bytes with hugetlb | |
[1650638547.234163] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool mm_recv_desc: allocated chunk 0x2ad1e5841018 of 4296680 bytes with 512 elements | |
[1650638547.234742] [ndv4:68758:0] mm_iface.c:600 UCX DEBUG created mm iface 0x2187280 FIFO id 0xd800f va 0x2ad1dfeff000 size 12288 (128 x 64 elems) | |
[1650638547.234792] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[0]=0x2187280 using sysv/memory on worker 0x2a61550 | |
[1650638547.234842] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638547.234850] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638547.234965] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638547.234970] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638547.235493] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638547.236970] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638547.237014] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.237017] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.237068] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.237640] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638547.237655] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638547.238775] [ndv4:68758:0] ib_iface.c:994 UCX DEBUG iface=0x2192490: created RC QP 0x30ea on mlx5_ib0:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638547.247009] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.247383] [ndv4:69197:0] ib_iface.c:994 UCX DEBUG iface=0x520e060: created UD QP 0x305e on mlx5_ib4:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.247969] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.248322] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.248333] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.248371] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.248376] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.248742] [ndv4:69197:0] ib_md.c:812 UCX DEBUG registered memory 0x2afd9220f000..0x2afd92294000 on mlx5_ib4 lkey 0x80f00 rkey 0x80f00 access 0xf flags 0x3e4 | |
[1650638547.248753] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2afd9220f018 of 544744 bytes with 128 elements | |
[1650638547.248761] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.250685] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[1]=0x2192490 using rc_verbs/mlx5_ib0:1 on worker 0x2a61550 | |
[1650638547.250996] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638547.251004] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638547.251272] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638547.251278] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638547.252084] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638547.252610] [ndv4:68758:0] ib_device.c:1394 UCX DEBUG max IB CQE size is 128 | |
[1650638547.253699] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638547.253709] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.253712] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.253761] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.254309] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x2d88010 of 8176 bytes with 127 elements | |
[1650638547.254599] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638547.254624] [ndv4:68758:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib0:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.254672] [ndv4:68758:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib0 length=2048) failed: Invalid argument | |
[1650638547.254676] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638547.264366] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x20f5b60 [id=78 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638547.264398] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 78 events 0x1 mode thread_spinlock | |
[1650638547.264779] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x520e060: adding gid fe80::15:5dff:fd34:1f to hash on device mlx5_ib4 port 1 index 0) | |
[1650638547.264857] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[2]=0x21a0010 using rc_mlx5/mlx5_ib0:1 on worker 0x2a61550 | |
[1650638547.264917] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638547.264924] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638547.265022] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638547.265028] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638547.265540] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638547.273463] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x520e060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 1) | |
[1650638547.273712] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x520e060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 2) | |
[1650638547.273817] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x520e060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 3) | |
[1650638547.274028] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x520e060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 4) | |
[1650638547.274286] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x520e060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 5) | |
[1650638547.274575] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x520e060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 6) | |
[1650638547.274625] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x520e060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 7) | |
[1650638547.274873] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638547.274887] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.274891] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.274905] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.275303] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638547.275314] [ndv4:68758:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib0:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.275348] [ndv4:68758:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib0 length=2048) failed: Invalid argument | |
[1650638547.275352] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638547.275361] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x20f4b60 [id=80 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638547.275382] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 80 events 0x1 mode thread_spinlock | |
[1650638547.274989] [ndv4:69197:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638547.275003] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x44f2fc0 [id=109 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638547.275033] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 109 events 0x5 mode thread_spinlock | |
[1650638547.275065] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[24]=0x520e060 using ud_verbs/mlx5_ib4:1 on worker 0x306d6f0 | |
[1650638547.275176] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.275183] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.275327] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.275332] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.275546] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib4:1 | |
[1650638547.276103] [ndv4:68758:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638547.276358] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.276716] [ndv4:69197:0] ib_iface.c:994 UCX DEBUG iface=0x532c460: created UD QP 0x305f on mlx5_ib4:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.276727] [ndv4:69197:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638547.277271] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.277556] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.277563] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.277652] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.277657] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.278012] [ndv4:69197:0] ib_md.c:812 UCX DEBUG registered memory 0x2afd92294000..0x2afd92319000 on mlx5_ib4 lkey 0x81000 rkey 0x81000 access 0xf flags 0x3e4 | |
[1650638547.278021] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2afd92294018 of 544744 bytes with 128 elements | |
[1650638547.278029] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.278382] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x532c460: adding gid fe80::15:5dff:fd34:1f to hash on device mlx5_ib4 port 1 index 0) | |
[1650638547.278711] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x532c460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 1) | |
[1650638547.278929] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x532c460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 2) | |
[1650638547.279175] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x532c460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 3) | |
[1650638547.279429] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x532c460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 4) | |
[1650638547.279613] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x532c460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 5) | |
[1650638547.279743] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x532c460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 6) | |
[1650638547.279833] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x532c460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 7) | |
[1650638547.279843] [ndv4:69197:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib4:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.279932] [ndv4:69197:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638547.279938] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x4ceae30 [id=110 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638547.279965] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 110 events 0x5 mode thread_spinlock | |
[1650638547.279976] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[25]=0x532c460 using ud_mlx5/mlx5_ib4:1 on worker 0x306d6f0 | |
[1650638547.280042] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.280049] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.280149] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.280154] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.280884] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib5:1 | |
[1650638547.282391] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638547.282413] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.282416] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.282472] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.283103] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638547.283109] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638547.284525] [ndv4:69197:0] ib_iface.c:994 UCX DEBUG iface=0x542c0a0: created RC QP 0x2ba8 on mlx5_ib5:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638547.285637] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[26]=0x542c0a0 using rc_verbs/mlx5_ib5:1 on worker 0x306d6f0 | |
[1650638547.285677] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.285684] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.285900] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.285905] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.284183] [ndv4:68758:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x2d8a050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x3113 | |
[1650638547.284353] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638547.284361] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638547.284388] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2ad1dff04008 of 151544 bytes with 1052 elements | |
[1650638547.288324] [ndv4:68758:0] ib_md.c:812 UCX DEBUG registered memory 0x2ad1e5e00000..0x2ad1e8400000 on mlx5_ib0 lkey 0x80f00 rkey 0x80f00 access 0xf flags 0x3e4 | |
[1650638547.288347] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2ad1e5e00018 of 39845864 bytes with 4752 elements | |
[1650638547.288491] [ndv4:68758:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x2d8a050 | |
[1650638547.288528] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[3]=0x2d8a050 using dc_mlx5/mlx5_ib0:1 on worker 0x2a61550 | |
[1650638547.288965] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638547.288976] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638547.289434] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638547.289439] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638547.290096] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638547.291170] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.291613] [ndv4:68758:0] ib_iface.c:994 UCX DEBUG iface=0x219d650: created UD QP 0x30f3 on mlx5_ib0:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.292233] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.292378] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638547.292384] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638547.292551] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638547.292556] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638547.292956] [ndv4:68758:0] ib_md.c:812 UCX DEBUG registered memory 0x2ad1dff29000..0x2ad1dffae000 on mlx5_ib0 lkey 0x81000 rkey 0x81000 access 0xf flags 0x3e4 | |
[1650638547.292962] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ad1dff29018 of 544744 bytes with 128 elements | |
[1650638547.292966] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.296494] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib5:1 | |
[1650638547.298025] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638547.298043] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.298046] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.298101] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.298567] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x574d010 of 8176 bytes with 127 elements | |
[1650638547.298846] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638547.298852] [ndv4:69197:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib5:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.298894] [ndv4:69197:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib5 length=2048) failed: Invalid argument | |
[1650638547.298898] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638547.298909] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x532c3e0 [id=113 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638547.298933] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 113 events 0x1 mode thread_spinlock | |
[1650638547.298944] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[27]=0x5579030 using rc_mlx5/mlx5_ib5:1 on worker 0x306d6f0 | |
[1650638547.298977] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.298985] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.299089] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.299094] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.299385] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib5:1 | |
[1650638547.300513] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638547.300531] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.300534] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.300590] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.300926] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638547.300933] [ndv4:69197:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib5:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.300968] [ndv4:69197:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib5 length=2048) failed: Invalid argument | |
[1650638547.300972] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638547.300981] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x3eb09a0 [id=115 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638547.301005] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 115 events 0x1 mode thread_spinlock | |
[1650638547.301645] [ndv4:69197:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638547.304135] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x219d650: adding gid fe80::15:5dff:fd34:1b to hash on device mlx5_ib0 port 1 index 0) | |
[1650638547.304243] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x219d650: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 1) | |
[1650638547.304484] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x219d650: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 2) | |
[1650638547.304655] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x219d650: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 3) | |
[1650638547.304870] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x219d650: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 4) | |
[1650638547.305406] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x219d650: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 5) | |
[1650638547.305919] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x219d650: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 6) | |
[1650638547.308600] [ndv4:69197:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x574f050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x2bce | |
[1650638547.308841] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.308849] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.308860] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2afd94b1a008 of 151544 bytes with 1052 elements | |
[1650638547.312786] [ndv4:69197:0] ib_md.c:812 UCX DEBUG registered memory 0x2afd92400000..0x2afd94a00000 on mlx5_ib5 lkey 0x80a00 rkey 0x80a00 access 0xf flags 0x3e4 | |
[1650638547.312801] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2afd92400018 of 39845864 bytes with 4752 elements | |
[1650638547.312969] [ndv4:69197:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x574f050 | |
[1650638547.312988] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[28]=0x574f050 using dc_mlx5/mlx5_ib5:1 on worker 0x306d6f0 | |
[1650638547.313010] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.313017] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.313071] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.313076] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.313334] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib5:1 | |
[1650638547.314183] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.314526] [ndv4:69197:0] ib_iface.c:994 UCX DEBUG iface=0x451f480: created UD QP 0x2bb1 on mlx5_ib5:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.314995] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.315055] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.315064] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.315085] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.315090] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.315436] [ndv4:69197:0] ib_md.c:812 UCX DEBUG registered memory 0x2afd94b3f000..0x2afd94bc4000 on mlx5_ib5 lkey 0x80b00 rkey 0x80b00 access 0xf flags 0x3e4 | |
[1650638547.315445] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2afd94b3f018 of 544744 bytes with 128 elements | |
[1650638547.315452] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.315511] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x451f480: adding gid fe80::15:5dff:fd34:20 to hash on device mlx5_ib5 port 1 index 0) | |
[1650638547.315542] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x451f480: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 1) | |
[1650638547.315568] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x451f480: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 2) | |
[1650638547.315590] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x451f480: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 3) | |
[1650638547.315616] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x451f480: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 4) | |
[1650638547.315641] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x451f480: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 5) | |
[1650638547.315670] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x451f480: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 6) | |
[1650638547.315695] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x451f480: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 7) | |
[1650638547.315957] [ndv4:69197:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638547.315967] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x3eb07d0 [id=116 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638547.315998] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 116 events 0x5 mode thread_spinlock | |
[1650638547.316024] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[29]=0x451f480 using ud_verbs/mlx5_ib5:1 on worker 0x306d6f0 | |
[1650638547.316046] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.316052] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.316125] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.316129] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.316304] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib5:1 | |
[1650638547.316921] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.317275] [ndv4:69197:0] ib_iface.c:994 UCX DEBUG iface=0x5a49050: created UD QP 0x2bb2 on mlx5_ib5:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.317283] [ndv4:69197:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638547.317671] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.317754] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.317761] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.317791] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.317795] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.318064] [ndv4:69197:0] ib_md.c:812 UCX DEBUG registered memory 0x2afd94bc4000..0x2afd94c49000 on mlx5_ib5 lkey 0x80c00 rkey 0x80c00 access 0xf flags 0x3e4 | |
[1650638547.318072] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2afd94bc4018 of 544744 bytes with 128 elements | |
[1650638547.318080] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.318152] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x5a49050: adding gid fe80::15:5dff:fd34:20 to hash on device mlx5_ib5 port 1 index 0) | |
[1650638547.324713] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x219d650: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 7) | |
[1650638547.325075] [ndv4:68758:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638547.332622] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x5a49050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 1) | |
[1650638547.332937] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x20f6a70 [id=81 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638547.332974] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 81 events 0x5 mode thread_spinlock | |
[1650638547.333119] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[4]=0x219d650 using ud_verbs/mlx5_ib0:1 on worker 0x2a61550 | |
[1650638547.333146] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638547.333153] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638547.333416] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638547.333421] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638547.334063] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638547.334986] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.335391] [ndv4:68758:0] ib_iface.c:994 UCX DEBUG iface=0x21a8980: created UD QP 0x30f5 on mlx5_ib0:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.335409] [ndv4:68758:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638547.336023] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.336140] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638547.336146] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638547.336185] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638547.336189] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638547.336665] [ndv4:68758:0] ib_md.c:812 UCX DEBUG registered memory 0x2ad1e845b000..0x2ad1e84e0000 on mlx5_ib0 lkey 0x81100 rkey 0x81100 access 0xf flags 0x3e4 | |
[1650638547.336671] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ad1e845b018 of 544744 bytes with 128 elements | |
[1650638547.336675] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.337245] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x21a8980: adding gid fe80::15:5dff:fd34:1b to hash on device mlx5_ib0 port 1 index 0) | |
[1650638547.337523] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x21a8980: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 1) | |
[1650638547.337856] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x21a8980: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 2) | |
[1650638547.338006] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x21a8980: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 3) | |
[1650638547.338026] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x21a8980: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 4) | |
[1650638547.338040] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x21a8980: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 5) | |
[1650638547.338272] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x21a8980: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 6) | |
[1650638547.338501] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x21a8980: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 7) | |
[1650638547.338511] [ndv4:68758:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib0:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.338542] [ndv4:68758:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638547.338547] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x20f7ba0 [id=82 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638547.338568] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 82 events 0x5 mode thread_spinlock | |
[1650638547.338582] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[5]=0x21a8980 using ud_mlx5/mlx5_ib0:1 on worker 0x2a61550 | |
[1650638547.338664] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638547.338670] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638547.338791] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638547.338796] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638547.341902] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x5a49050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 2) | |
[1650638547.351658] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x5a49050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 3) | |
[1650638547.351874] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x5a49050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 4) | |
[1650638547.352316] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x5a49050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 5) | |
[1650638547.357289] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib1:1 | |
[1650638547.358368] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638547.358378] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.358381] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.358397] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.358877] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638547.358882] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638547.359814] [ndv4:68758:0] ib_iface.c:994 UCX DEBUG iface=0x3331050: created RC QP 0x3042 on mlx5_ib1:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638547.361018] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[6]=0x3331050 using rc_verbs/mlx5_ib1:1 on worker 0x2a61550 | |
[1650638547.361097] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638547.361102] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638547.361182] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638547.361187] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638547.361544] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib1:1 | |
[1650638547.362937] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638547.362945] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.362948] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.362964] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.363489] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x328a010 of 8176 bytes with 127 elements | |
[1650638547.363741] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638547.363747] [ndv4:68758:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib1:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.363785] [ndv4:68758:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib1 length=2048) failed: Invalid argument | |
[1650638547.363789] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638547.363800] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x20f85e0 [id=85 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638547.363821] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 85 events 0x1 mode thread_spinlock | |
[1650638547.363830] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[7]=0x31b7020 using rc_mlx5/mlx5_ib1:1 on worker 0x2a61550 | |
[1650638547.363904] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638547.363909] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638547.364071] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638547.364076] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638547.364611] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib1:1 | |
[1650638547.364677] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x5a49050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 6) | |
[1650638547.364913] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x5a49050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 7) | |
[1650638547.364931] [ndv4:69197:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib5:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.365026] [ndv4:69197:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638547.365034] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x54073a0 [id=117 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638547.365062] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 117 events 0x5 mode thread_spinlock | |
[1650638547.365084] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[30]=0x5a49050 using ud_mlx5/mlx5_ib5:1 on worker 0x306d6f0 | |
[1650638547.365413] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.365420] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.365481] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.365486] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.366121] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638547.366135] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.366138] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.366194] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.366614] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638547.366620] [ndv4:68758:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib1:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.366656] [ndv4:68758:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib1 length=2048) failed: Invalid argument | |
[1650638547.366660] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638547.366666] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x1a27080 [id=87 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638547.366687] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 87 events 0x1 mode thread_spinlock | |
[1650638547.367294] [ndv4:68758:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638547.366279] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib6:1 | |
[1650638547.368347] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638547.368367] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.368370] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.368420] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.369042] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638547.369049] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638547.370518] [ndv4:69197:0] ib_iface.c:994 UCX DEBUG iface=0x5b490a0: created RC QP 0x2b1a on mlx5_ib6:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638547.371647] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[31]=0x5b490a0 using rc_verbs/mlx5_ib6:1 on worker 0x306d6f0 | |
[1650638547.371771] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.371780] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.371909] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.371914] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.372064] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib6:1 | |
[1650638547.373277] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638547.373297] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.373300] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.373352] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.373746] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x5e6a010 of 8176 bytes with 127 elements | |
[1650638547.374147] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638547.374156] [ndv4:69197:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib6:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.374304] [ndv4:69197:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib6 length=2048) failed: Invalid argument | |
[1650638547.374311] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638547.374324] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x5b242d0 [id=120 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638547.374347] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 120 events 0x1 mode thread_spinlock | |
[1650638547.374363] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[32]=0x5c96030 using rc_mlx5/mlx5_ib6:1 on worker 0x306d6f0 | |
[1650638547.374511] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.374517] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.374717] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.374722] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.375472] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib6:1 | |
[1650638547.374928] [ndv4:68758:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x34d6010: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x306b | |
[1650638547.375033] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638547.375039] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638547.375048] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2ad1dffb0008 of 151544 bytes with 1052 elements | |
[1650638547.377671] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638547.377692] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.377696] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.377755] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.378167] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638547.378176] [ndv4:69197:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib6:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.378275] [ndv4:69197:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib6 length=2048) failed: Invalid argument | |
[1650638547.378281] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638547.378292] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x5c9ef60 [id=122 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638547.378318] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 122 events 0x1 mode thread_spinlock | |
[1650638547.379040] [ndv4:69197:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638547.378995] [ndv4:68758:0] ib_md.c:812 UCX DEBUG registered memory 0x2ad1e8600000..0x2ad1eac00000 on mlx5_ib1 lkey 0x81100 rkey 0x81100 access 0xf flags 0x3e4 | |
[1650638547.379013] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2ad1e8600018 of 39845864 bytes with 4752 elements | |
[1650638547.379156] [ndv4:68758:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x34d6010 | |
[1650638547.379190] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[8]=0x34d6010 using dc_mlx5/mlx5_ib1:1 on worker 0x2a61550 | |
[1650638547.379382] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638547.379391] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638547.379588] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638547.379593] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638547.386200] [ndv4:69197:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x5e6c050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x2b21 | |
[1650638547.386387] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.386395] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.386405] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2afd9744a008 of 151544 bytes with 1052 elements | |
[1650638547.390001] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib1:1 | |
[1650638547.390374] [ndv4:69197:0] ib_md.c:812 UCX DEBUG registered memory 0x2afd94e00000..0x2afd97400000 on mlx5_ib6 lkey 0x80900 rkey 0x80900 access 0xf flags 0x3e4 | |
[1650638547.390394] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2afd94e00018 of 39845864 bytes with 4752 elements | |
[1650638547.390567] [ndv4:69197:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x5e6c050 | |
[1650638547.390597] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[33]=0x5e6c050 using dc_mlx5/mlx5_ib6:1 on worker 0x306d6f0 | |
[1650638547.391171] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.391666] [ndv4:68758:0] ib_iface.c:994 UCX DEBUG iface=0x219c9d0: created UD QP 0x304e on mlx5_ib1:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.392288] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.399588] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.399603] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.399728] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.399732] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.400160] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib6:1 | |
[1650638547.402749] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.403165] [ndv4:69197:0] ib_iface.c:994 UCX DEBUG iface=0x6048060: created UD QP 0x2b26 on mlx5_ib6:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.403898] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.404321] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.404332] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.404375] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.404380] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.404741] [ndv4:69197:0] ib_md.c:812 UCX DEBUG registered memory 0x2afd9746f000..0x2afd974f4000 on mlx5_ib6 lkey 0x80a00 rkey 0x80a00 access 0xf flags 0x3e4 | |
[1650638547.404750] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2afd9746f018 of 544744 bytes with 128 elements | |
[1650638547.404759] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.404876] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x6048060: adding gid fe80::15:5dff:fd34:21 to hash on device mlx5_ib6 port 1 index 0) | |
[1650638547.406083] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x6048060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 1) | |
[1650638547.410336] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638547.410346] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638547.410496] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638547.410501] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638547.410968] [ndv4:68758:0] ib_md.c:812 UCX DEBUG registered memory 0x2ad1eace1000..0x2ad1ead66000 on mlx5_ib1 lkey 0x81200 rkey 0x81200 access 0xf flags 0x3e4 | |
[1650638547.410975] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ad1eace1018 of 544744 bytes with 128 elements | |
[1650638547.410979] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.411479] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x219c9d0: adding gid fe80::15:5dff:fd34:1c to hash on device mlx5_ib1 port 1 index 0) | |
[1650638547.411697] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x219c9d0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 1) | |
[1650638547.411782] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x219c9d0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 2) | |
[1650638547.411866] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x219c9d0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 3) | |
[1650638547.411900] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x219c9d0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 4) | |
[1650638547.411935] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x219c9d0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 5) | |
[1650638547.411962] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x219c9d0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 6) | |
[1650638547.411988] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x219c9d0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 7) | |
[1650638547.412319] [ndv4:68758:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638547.412328] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x219c5f0 [id=88 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638547.412353] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 88 events 0x5 mode thread_spinlock | |
[1650638547.412364] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[9]=0x219c9d0 using ud_verbs/mlx5_ib1:1 on worker 0x2a61550 | |
[1650638547.412379] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638547.412385] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638547.412429] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638547.412434] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638547.412622] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib1:1 | |
[1650638547.413396] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.413790] [ndv4:68758:0] ib_iface.c:994 UCX DEBUG iface=0x2a813d0: created UD QP 0x304f on mlx5_ib1:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.413798] [ndv4:68758:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638547.414450] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.414483] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638547.414489] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638547.414511] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638547.414515] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638547.414839] [ndv4:68758:0] ib_md.c:812 UCX DEBUG registered memory 0x2ad1ead66000..0x2ad1eadeb000 on mlx5_ib1 lkey 0x81300 rkey 0x81300 access 0xf flags 0x3e4 | |
[1650638547.414846] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ad1ead66018 of 544744 bytes with 128 elements | |
[1650638547.414850] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.414940] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x2a813d0: adding gid fe80::15:5dff:fd34:1c to hash on device mlx5_ib1 port 1 index 0) | |
[1650638547.415008] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x2a813d0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 1) | |
[1650638547.415039] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x2a813d0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 2) | |
[1650638547.415062] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x2a813d0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 3) | |
[1650638547.415087] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x2a813d0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 4) | |
[1650638547.415110] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x2a813d0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 5) | |
[1650638547.415134] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x2a813d0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 6) | |
[1650638547.415159] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x2a813d0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 7) | |
[1650638547.415165] [ndv4:68758:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib1:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.415195] [ndv4:68758:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638547.415199] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x2a81ed0 [id=89 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638547.415262] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 89 events 0x5 mode thread_spinlock | |
[1650638547.415271] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[10]=0x2a813d0 using ud_mlx5/mlx5_ib1:1 on worker 0x2a61550 | |
[1650638547.415283] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638547.415287] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638547.415341] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638547.415344] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638547.415755] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib2:1 | |
[1650638547.416984] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638547.417000] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.417003] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.417051] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.417558] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638547.417564] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638547.418585] [ndv4:68758:0] ib_iface.c:994 UCX DEBUG iface=0x38c9020: created RC QP 0x304c on mlx5_ib2:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638547.419537] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x6048060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 2) | |
[1650638547.419564] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x6048060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 3) | |
[1650638547.419708] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x6048060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 4) | |
[1650638547.420012] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x6048060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 5) | |
[1650638547.420285] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x6048060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 6) | |
[1650638547.419794] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[11]=0x38c9020 using rc_verbs/mlx5_ib2:1 on worker 0x2a61550 | |
[1650638547.419848] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638547.419854] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638547.420013] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638547.420019] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638547.420738] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib2:1 | |
[1650638547.431119] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x6048060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 7) | |
[1650638547.431452] [ndv4:69197:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638547.431466] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x6048f20 [id=123 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638547.431496] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 123 events 0x5 mode thread_spinlock | |
[1650638547.431522] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[34]=0x6048060 using ud_verbs/mlx5_ib6:1 on worker 0x306d6f0 | |
[1650638547.431729] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.431736] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.431869] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.431874] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.432764] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib6:1 | |
[1650638547.442733] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638547.442753] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.442757] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.442808] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.444535] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x3bea010 of 8176 bytes with 127 elements | |
[1650638547.444539] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.444899] [ndv4:69197:0] ib_iface.c:994 UCX DEBUG iface=0x53593a0: created UD QP 0x2b27 on mlx5_ib6:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.444908] [ndv4:69197:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638547.444849] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638547.444860] [ndv4:68758:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib2:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.444949] [ndv4:68758:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib2 length=2048) failed: Invalid argument | |
[1650638547.444954] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638547.444969] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x3339b90 [id=92 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638547.444992] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 92 events 0x1 mode thread_spinlock | |
[1650638547.445013] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[12]=0x3a16030 using rc_mlx5/mlx5_ib2:1 on worker 0x2a61550 | |
[1650638547.445113] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638547.445120] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638547.445268] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638547.445274] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638547.445432] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.445590] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.445599] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.445672] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.445677] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.446174] [ndv4:69197:0] ib_md.c:812 UCX DEBUG registered memory 0x2afd974f4000..0x2afd97579000 on mlx5_ib6 lkey 0x80e00 rkey 0x80e00 access 0xf flags 0x3e4 | |
[1650638547.446182] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2afd974f4018 of 544744 bytes with 128 elements | |
[1650638547.446186] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.446343] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x53593a0: adding gid fe80::15:5dff:fd34:21 to hash on device mlx5_ib6 port 1 index 0) | |
[1650638547.446571] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x53593a0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 1) | |
[1650638547.446645] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x53593a0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 2) | |
[1650638547.455748] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib2:1 | |
[1650638547.456525] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x53593a0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 3) | |
[1650638547.456868] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x53593a0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 4) | |
[1650638547.457119] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x53593a0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 5) | |
[1650638547.457433] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x53593a0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 6) | |
[1650638547.457453] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x53593a0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 7) | |
[1650638547.457459] [ndv4:69197:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib6:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.457489] [ndv4:69197:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638547.457494] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x5b240e0 [id=124 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638547.457515] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 124 events 0x5 mode thread_spinlock | |
[1650638547.457525] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[35]=0x53593a0 using ud_mlx5/mlx5_ib6:1 on worker 0x306d6f0 | |
[1650638547.457626] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.457633] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.457692] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.457697] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.458365] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib7:1 | |
[1650638547.465611] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638547.465636] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.465640] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.465703] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.466159] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638547.466169] [ndv4:68758:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib2:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.466235] [ndv4:68758:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib2 length=2048) failed: Invalid argument | |
[1650638547.466240] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638547.466252] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x3a1efc0 [id=94 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638547.466277] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 94 events 0x1 mode thread_spinlock | |
[1650638547.467035] [ndv4:68758:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638547.467042] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638547.467066] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.467070] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.467125] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.467956] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638547.467964] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638547.469444] [ndv4:69197:0] ib_iface.c:994 UCX DEBUG iface=0x62660a0: created RC QP 0x2b0d on mlx5_ib7:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638547.470668] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[36]=0x62660a0 using rc_verbs/mlx5_ib7:1 on worker 0x306d6f0 | |
[1650638547.470786] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.470793] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.471011] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.471017] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.471403] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib7:1 | |
[1650638547.472533] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638547.472546] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.472550] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.472597] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.473018] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x6587010 of 8176 bytes with 127 elements | |
[1650638547.473334] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638547.473344] [ndv4:69197:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib7:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.473384] [ndv4:69197:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib7 length=2048) failed: Invalid argument | |
[1650638547.473388] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638547.473401] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x61668c0 [id=127 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638547.473430] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 127 events 0x1 mode thread_spinlock | |
[1650638547.473440] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[37]=0x63b3030 using rc_mlx5/mlx5_ib7:1 on worker 0x306d6f0 | |
[1650638547.473468] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.473475] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.473545] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.473550] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.473756] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib7:1 | |
[1650638547.475130] [ndv4:68758:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x3bec050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x3075 | |
[1650638547.475439] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638547.475449] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638547.475459] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2ad1dffd7008 of 151544 bytes with 1052 elements | |
[1650638547.474980] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638547.474995] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.474998] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.475056] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.475436] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638547.475443] [ndv4:69197:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib7:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.475476] [ndv4:69197:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib7 length=2048) failed: Invalid argument | |
[1650638547.475480] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638547.475487] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x6241760 [id=129 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638547.475510] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 129 events 0x1 mode thread_spinlock | |
[1650638547.476146] [ndv4:69197:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638547.479245] [ndv4:68758:0] ib_md.c:812 UCX DEBUG registered memory 0x2ad1eae00000..0x2ad1ed400000 on mlx5_ib2 lkey 0x80d00 rkey 0x80d00 access 0xf flags 0x3e4 | |
[1650638547.479262] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2ad1eae00018 of 39845864 bytes with 4752 elements | |
[1650638547.479398] [ndv4:68758:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x3bec050 | |
[1650638547.479431] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[13]=0x3bec050 using dc_mlx5/mlx5_ib2:1 on worker 0x2a61550 | |
[1650638547.479482] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638547.479492] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638547.479582] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638547.479588] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638547.483385] [ndv4:69197:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x6589050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x2b14 | |
[1650638547.483671] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.483680] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.483689] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2afd99d7a008 of 151544 bytes with 1052 elements | |
[1650638547.487563] [ndv4:69197:0] ib_md.c:812 UCX DEBUG registered memory 0x2afd97600000..0x2afd99c00000 on mlx5_ib7 lkey 0x80a00 rkey 0x80a00 access 0xf flags 0x3e4 | |
[1650638547.487578] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2afd97600018 of 39845864 bytes with 4752 elements | |
[1650638547.487743] [ndv4:69197:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x6589050 | |
[1650638547.487765] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[38]=0x6589050 using dc_mlx5/mlx5_ib7:1 on worker 0x306d6f0 | |
[1650638547.487810] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.487819] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.487868] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.487874] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.488301] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib7:1 | |
[1650638547.489157] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.489504] [ndv4:69197:0] ib_iface.c:994 UCX DEBUG iface=0x5a76570: created UD QP 0x2b19 on mlx5_ib7:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.489976] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.490190] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.490201] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.490277] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.490282] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.490624] [ndv4:69197:0] ib_md.c:812 UCX DEBUG registered memory 0x2afd99d9f000..0x2afd99e24000 on mlx5_ib7 lkey 0x80b00 rkey 0x80b00 access 0xf flags 0x3e4 | |
[1650638547.490635] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2afd99d9f018 of 544744 bytes with 128 elements | |
[1650638547.490643] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.490767] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x5a76570: adding gid fe80::15:5dff:fd34:22 to hash on device mlx5_ib7 port 1 index 0) | |
[1650638547.490983] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x5a76570: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 1) | |
[1650638547.491007] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x5a76570: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 2) | |
[1650638547.491023] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x5a76570: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 3) | |
[1650638547.491255] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x5a76570: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 4) | |
[1650638547.491274] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x5a76570: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 5) | |
[1650638547.491531] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x5a76570: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 6) | |
[1650638547.491552] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x5a76570: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 7) | |
[1650638547.490952] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib2:1 | |
[1650638547.492057] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.492435] [ndv4:68758:0] ib_iface.c:994 UCX DEBUG iface=0x2f55500: created UD QP 0x3055 on mlx5_ib2:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.491799] [ndv4:69197:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638547.491810] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x5a76fc0 [id=130 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638547.491844] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 130 events 0x5 mode thread_spinlock | |
[1650638547.491875] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[39]=0x5a76570 using ud_verbs/mlx5_ib7:1 on worker 0x306d6f0 | |
[1650638547.492200] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.492253] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.492381] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.492387] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.492857] [ndv4:69197:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib7:1 | |
[1650638547.493724] [ndv4:69197:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.493015] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.493450] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638547.493458] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638547.493621] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638547.493626] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638547.494029] [ndv4:69197:0] ib_iface.c:994 UCX DEBUG iface=0x6883460: created UD QP 0x2b1a on mlx5_ib7:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.494038] [ndv4:69197:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638547.494521] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.494968] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.494975] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.494994] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.494999] [ndv4:69197:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.494140] [ndv4:68758:0] ib_md.c:812 UCX DEBUG registered memory 0x2ad1ed5ec000..0x2ad1ed671000 on mlx5_ib2 lkey 0x80e00 rkey 0x80e00 access 0xf flags 0x3e4 | |
[1650638547.494147] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ad1ed5ec018 of 544744 bytes with 128 elements | |
[1650638547.494151] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.495006] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x2f55500: adding gid fe80::15:5dff:fd34:1d to hash on device mlx5_ib2 port 1 index 0) | |
[1650638547.496157] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x2f55500: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 1) | |
[1650638547.496838] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x2f55500: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 2) | |
[1650638547.497528] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x2f55500: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 3) | |
[1650638547.497821] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x2f55500: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 4) | |
[1650638547.498165] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x2f55500: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 5) | |
[1650638547.495323] [ndv4:69197:0] ib_md.c:812 UCX DEBUG registered memory 0x2afd99e24000..0x2afd99ea9000 on mlx5_ib7 lkey 0x80c00 rkey 0x80c00 access 0xf flags 0x3e4 | |
[1650638547.495333] [ndv4:69197:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2afd99e24018 of 544744 bytes with 128 elements | |
[1650638547.495341] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.496315] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x6883460: adding gid fe80::15:5dff:fd34:22 to hash on device mlx5_ib7 port 1 index 0) | |
[1650638547.497062] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x6883460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 1) | |
[1650638547.498646] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x2f55500: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 6) | |
[1650638547.499008] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x2f55500: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 7) | |
[1650638547.499324] [ndv4:68758:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638547.499333] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x21901c0 [id=95 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638547.499363] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 95 events 0x5 mode thread_spinlock | |
[1650638547.499373] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[14]=0x2f55500 using ud_verbs/mlx5_ib2:1 on worker 0x2a61550 | |
[1650638547.499517] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638547.499524] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638547.499645] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638547.499651] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638547.500173] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib2:1 | |
[1650638547.501302] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.501664] [ndv4:68758:0] ib_iface.c:994 UCX DEBUG iface=0x2185600: created UD QP 0x3057 on mlx5_ib2:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.501672] [ndv4:68758:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638547.502276] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.502524] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638547.502530] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638547.510688] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638547.510698] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638547.511227] [ndv4:68758:0] ib_md.c:812 UCX DEBUG registered memory 0x2ad1ed671000..0x2ad1ed6f6000 on mlx5_ib2 lkey 0x80f00 rkey 0x80f00 access 0xf flags 0x3e4 | |
[1650638547.511233] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ad1ed671018 of 544744 bytes with 128 elements | |
[1650638547.511237] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.511275] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x2185600: adding gid fe80::15:5dff:fd34:1d to hash on device mlx5_ib2 port 1 index 0) | |
[1650638547.511739] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x2185600: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 1) | |
[1650638547.512101] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x2185600: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 2) | |
[1650638547.512119] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x2185600: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 3) | |
[1650638547.512767] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x2185600: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 4) | |
[1650638547.513167] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x2185600: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 5) | |
[1650638547.513599] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x2185600: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 6) | |
[1650638547.513860] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x2185600: adding gid fe80:: to hash on device mlx5_ib2 port 1 index 7) | |
[1650638547.513867] [ndv4:68758:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib2:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.513896] [ndv4:68758:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638547.513901] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x328c960 [id=96 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638547.513924] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 96 events 0x5 mode thread_spinlock | |
[1650638547.513938] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[15]=0x2185600 using ud_mlx5/mlx5_ib2:1 on worker 0x2a61550 | |
[1650638547.514042] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.514047] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.514145] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.514151] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.514706] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x6883460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 2) | |
[1650638547.515070] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x6883460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 3) | |
[1650638547.515397] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x6883460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 4) | |
[1650638547.525012] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib3:1 | |
[1650638547.525993] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x6883460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 5) | |
[1650638547.526029] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x6883460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 6) | |
[1650638547.526131] [ndv4:69197:0] ud_iface.c:393 UCX DEBUG iface 0x6883460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 7) | |
[1650638547.526139] [ndv4:69197:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib7:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.526273] [ndv4:69197:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638547.526280] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x6883f80 [id=131 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638547.526302] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 131 events 0x5 mode thread_spinlock | |
[1650638547.526314] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[40]=0x6883460 using ud_mlx5/mlx5_ib7:1 on worker 0x306d6f0 | |
[1650638547.526378] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool uct_scopy_iface_tx_mp: align 64, maxelems 4294967295, elemsize 736 | |
[1650638547.526440] [ndv4:69197:0] ucp_worker.c:1159 UCX DEBUG created interface[41]=0x6193600 using cma/memory on worker 0x306d6f0 | |
[1650638547.526447] [ndv4:69197:0] ucp_worker.c:982 UCX DEBUG selected scalable tl bitmap: 0x3ffffffffff 0x0 (42 tls) | |
[1650638547.527533] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638547.527553] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.527556] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.527606] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.529467] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638547.529474] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638547.530733] [ndv4:68758:0] ib_iface.c:994 UCX DEBUG iface=0x3fe60a0: created RC QP 0x303e on mlx5_ib3:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638547.532003] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[16]=0x3fe60a0 using rc_verbs/mlx5_ib3:1 on worker 0x2a61550 | |
[1650638547.532039] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.532045] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.532118] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.532123] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.532555] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib3:1 | |
[1650638547.533959] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638547.533971] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.533974] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.534022] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.534527] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x4307010 of 8176 bytes with 127 elements | |
[1650638547.534820] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638547.534827] [ndv4:68758:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib3:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.534862] [ndv4:68758:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib3 length=2048) failed: Invalid argument | |
[1650638547.534865] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638547.534876] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x21aaf80 [id=99 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638547.534905] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 99 events 0x1 mode thread_spinlock | |
[1650638547.534925] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[17]=0x4133030 using rc_mlx5/mlx5_ib3:1 on worker 0x2a61550 | |
[1650638547.534971] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.534976] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.535172] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.535177] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.535473] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib3:1 | |
[1650638547.536771] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638547.536788] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.536791] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.536846] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.537329] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638547.537336] [ndv4:68758:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib3:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.537368] [ndv4:68758:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib3 length=2048) failed: Invalid argument | |
[1650638547.537372] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638547.537380] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x413bfc0 [id=101 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638547.537403] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 101 events 0x1 mode thread_spinlock | |
[1650638547.538143] [ndv4:68758:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638547.539080] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x6765fb0 [id=75 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638547.539112] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 75 events 0x0 mode thread_spinlock | |
[1650638547.539182] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x6241370 [id=76 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638547.539198] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 76 events 0x0 mode thread_spinlock | |
[1650638547.539425] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x2794a30 [id=77 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638547.539446] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 77 events 0x0 mode thread_spinlock | |
[1650638547.539465] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x2794a70 [id=79 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638547.539511] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 79 events 0x0 mode thread_spinlock | |
[1650638547.539562] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x2794ab0 [id=83 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638547.539579] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 83 events 0x0 mode thread_spinlock | |
[1650638547.539595] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x2794af0 [id=84 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638547.539600] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 84 events 0x0 mode thread_spinlock | |
[1650638547.539614] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x2794b30 [id=86 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638547.539619] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 86 events 0x0 mode thread_spinlock | |
[1650638547.539653] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x308ddc0 [id=90 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638547.539658] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 90 events 0x0 mode thread_spinlock | |
[1650638547.539674] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x308de00 [id=91 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638547.539678] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 91 events 0x0 mode thread_spinlock | |
[1650638547.539695] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x308de40 [id=93 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638547.539700] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 93 events 0x0 mode thread_spinlock | |
[1650638547.539734] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x6193f60 [id=97 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638547.539758] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 97 events 0x0 mode thread_spinlock | |
[1650638547.539774] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x6193fa0 [id=98 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638547.539792] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 98 events 0x0 mode thread_spinlock | |
[1650638547.539808] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x202f820 [id=100 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638547.539826] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 100 events 0x0 mode thread_spinlock | |
[1650638547.539862] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x202f860 [id=104 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638547.539880] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 104 events 0x0 mode thread_spinlock | |
[1650638547.539895] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x202f8a0 [id=105 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638547.539931] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 105 events 0x0 mode thread_spinlock | |
[1650638547.539945] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x202f8e0 [id=107 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638547.540006] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 107 events 0x0 mode thread_spinlock | |
[1650638547.540041] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x308de80 [id=111 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638547.540067] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 111 events 0x0 mode thread_spinlock | |
[1650638547.540084] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x308dec0 [id=112 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638547.540113] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 112 events 0x0 mode thread_spinlock | |
[1650638547.540265] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x308df00 [id=114 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638547.540294] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 114 events 0x0 mode thread_spinlock | |
[1650638547.540331] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x308df40 [id=118 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638547.540348] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 118 events 0x0 mode thread_spinlock | |
[1650638547.540364] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x308df80 [id=119 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638547.540379] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 119 events 0x0 mode thread_spinlock | |
[1650638547.540393] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x63bbe10 [id=121 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638547.540410] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 121 events 0x0 mode thread_spinlock | |
[1650638547.540447] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x63bbe50 [id=125 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638547.540464] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 125 events 0x0 mode thread_spinlock | |
[1650638547.540480] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x63bbe90 [id=126 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638547.540501] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 126 events 0x0 mode thread_spinlock | |
[1650638547.540515] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x63bbed0 [id=128 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638547.540531] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 128 events 0x0 mode thread_spinlock | |
[1650638547.545376] [ndv4:69197:0] async.c:228 UCX DEBUG added async handler 0x44f27e0 [id=132 ref 1] uct_rdmacm_cm_event_handler() to hash | |
[1650638547.545402] [ndv4:69197:0] async.c:506 UCX DEBUG listening to async event fd 132 events 0x1 mode thread_spinlock | |
[1650638547.545504] [ndv4:69197:0] rdmacm_cm.c:922 UCX DEBUG created rdmacm_cm 0x695e050 with event_channel 0x6193fe0 (fd=132) | |
[1650638547.545540] [ndv4:69197:0] tcp_sockcm.c:186 UCX DEBUG created tcp_sockcm 0x5e487e0 | |
[1650638547.545558] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ucp_requests: align 64, maxelems 4294967295, elemsize 440 | |
[1650638547.545561] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ucp_rkeys: align 64, maxelems 4294967295, elemsize 168 | |
[1650638547.545566] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ucp_am_bufs: align 64, maxelems 4294967295, elemsize 8344 | |
[1650638547.545569] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ucp_reg_bufs: align 64, maxelems 4294967295, elemsize 8208 | |
[1650638547.545571] [ndv4:69197:0] mpool.c:88 UCX DEBUG mpool ucp_rndv_frags: align 512, maxelems 4294967295, elemsize 524304 | |
[1650638547.545820] [ndv4:69197:0] parser.c:1893 UCX INFO UCX_* env variables: UCX_TLS=sysv,cma,ib UCX_POSIX_USE_PROC_LINK=n UCX_LOG_LEVEL=debug | |
[1650638547.546243] [ndv4:68758:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x4309050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x3067 | |
[1650638547.546674] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.546681] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.546690] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2ad1efef7008 of 151544 bytes with 1052 elements | |
[1650638547.550612] [ndv4:68758:0] ib_md.c:812 UCX DEBUG registered memory 0x2ad1ed800000..0x2ad1efe00000 on mlx5_ib3 lkey 0x80d00 rkey 0x80d00 access 0xf flags 0x3e4 | |
[1650638547.550631] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2ad1ed800018 of 39845864 bytes with 4752 elements | |
[1650638547.550768] [ndv4:68758:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x4309050 | |
[1650638547.550800] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[18]=0x4309050 using dc_mlx5/mlx5_ib3:1 on worker 0x2a61550 | |
[1650638547.550952] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.550960] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.551079] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.551093] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.551639] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib3:1 | |
[1650638547.552815] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.554508] [ndv4:68758:0] ib_iface.c:994 UCX DEBUG iface=0x44e5060: created UD QP 0x3047 on mlx5_ib3:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.555421] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.555660] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.555668] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.555826] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.555831] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.556320] [ndv4:68758:0] ib_md.c:812 UCX DEBUG registered memory 0x2ad1eff1c000..0x2ad1effa1000 on mlx5_ib3 lkey 0x80e00 rkey 0x80e00 access 0xf flags 0x3e4 | |
[1650638547.556326] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ad1eff1c018 of 544744 bytes with 128 elements | |
[1650638547.556331] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.556513] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x44e5060: adding gid fe80::15:5dff:fd34:1e to hash on device mlx5_ib3 port 1 index 0) | |
[1650638547.556952] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x44e5060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 1) | |
[1650638547.567037] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x44e5060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 2) | |
[1650638547.567247] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x44e5060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 3) | |
[1650638547.567266] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x44e5060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 4) | |
[1650638547.567281] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x44e5060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 5) | |
[1650638547.587752] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x44e5060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 6) | |
[1650638547.587790] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x44e5060: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 7) | |
[1650638547.588769] [ndv4:68758:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638547.588778] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x44e5e60 [id=102 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638547.588806] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 102 events 0x5 mode thread_spinlock | |
[1650638547.588816] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[19]=0x44e5060 using ud_verbs/mlx5_ib3:1 on worker 0x2a61550 | |
[1650638547.588852] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.588858] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.588916] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.588922] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.591348] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib3:1 | |
[1650638547.600535] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.601417] [ndv4:68758:0] ib_iface.c:994 UCX DEBUG iface=0x4603050: created UD QP 0x3049 on mlx5_ib3:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.601429] [ndv4:68758:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638547.602320] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.602399] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.602406] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.602440] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.602445] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.602913] [ndv4:68758:0] ib_md.c:812 UCX DEBUG registered memory 0x2ad1effa1000..0x2ad1f0026000 on mlx5_ib3 lkey 0x80f00 rkey 0x80f00 access 0xf flags 0x3e4 | |
[1650638547.602919] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ad1effa1018 of 544744 bytes with 128 elements | |
[1650638547.602924] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.602972] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4603050: adding gid fe80::15:5dff:fd34:1e to hash on device mlx5_ib3 port 1 index 0) | |
[1650638547.602998] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4603050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 1) | |
[1650638547.603015] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4603050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 2) | |
[1650638547.603040] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4603050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 3) | |
[1650638547.603062] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4603050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 4) | |
[1650638547.603084] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4603050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 5) | |
[1650638547.603102] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4603050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 6) | |
[1650638547.603126] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4603050: adding gid fe80:: to hash on device mlx5_ib3 port 1 index 7) | |
[1650638547.603137] [ndv4:68758:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib3:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.603176] [ndv4:68758:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638547.603184] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x44e5f30 [id=103 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638547.603241] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 103 events 0x5 mode thread_spinlock | |
[1650638547.603386] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[20]=0x4603050 using ud_mlx5/mlx5_ib3:1 on worker 0x2a61550 | |
[1650638547.603445] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.603451] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.603558] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.603563] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.603644] [ndv4:68766:0] debug.c:1198 UCX DEBUG using signal stack 0x2b8e797ee000 size 141824 | |
[1650638547.603724] [ndv4:68766:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638547.603745] [ndv4:68766:0] init.c:113 UCX DEBUG /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 loaded at 0x2b8e7965c000 | |
[1650638547.603767] [ndv4:68766:0] init.c:115 UCX DEBUG cmd line: osu_scatter | |
[1650638547.603777] [ndv4:68766:0] module.c:69 UCX DEBUG ucs library path: /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 | |
[1650638547.603783] [ndv4:68766:0] module.c:253 UCX DEBUG loading modules for ucs | |
[1650638547.604000] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib4:1 | |
[1650638547.605493] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638547.605511] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.605515] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.605566] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.606110] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638547.606116] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638547.607572] [ndv4:68758:0] ib_iface.c:994 UCX DEBUG iface=0x47030a0: created RC QP 0x3060 on mlx5_ib4:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638547.608140] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[21]=0x47030a0 using rc_verbs/mlx5_ib4:1 on worker 0x2a61550 | |
[1650638547.608157] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.608163] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.608306] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.608312] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.608072] [ndv4:68766:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638547.608091] [ndv4:68766:0] time.c:22 UCX DEBUG measured arch clock speed: 1000000.00 Hz | |
[1650638547.608125] [ndv4:68766:0] ucp_context.c:1361 UCX DEBUG estimated number of endpoints is 96 | |
[1650638547.608128] [ndv4:68766:0] ucp_context.c:1368 UCX DEBUG estimated number of endpoints per node is 96 | |
[1650638547.608135] [ndv4:68766:0] ucp_context.c:1375 UCX DEBUG estimated bcopy bandwidth is 5251268608.000000 | |
[1650638547.608141] [ndv4:68766:0] ucp_context.c:1427 UCX DEBUG allocation method[0] is md 'sysv' | |
[1650638547.608145] [ndv4:68766:0] ucp_context.c:1427 UCX DEBUG allocation method[1] is md 'posix' | |
[1650638547.608149] [ndv4:68766:0] ucp_context.c:1439 UCX DEBUG allocation method[2] is 'huge' | |
[1650638547.608151] [ndv4:68766:0] ucp_context.c:1439 UCX DEBUG allocation method[3] is 'thp' | |
[1650638547.608154] [ndv4:68766:0] ucp_context.c:1427 UCX DEBUG allocation method[4] is md '*' | |
[1650638547.608156] [ndv4:68766:0] ucp_context.c:1439 UCX DEBUG allocation method[5] is 'mmap' | |
[1650638547.608158] [ndv4:68766:0] ucp_context.c:1439 UCX DEBUG allocation method[6] is 'heap' | |
[1650638547.608166] [ndv4:68766:0] module.c:253 UCX DEBUG loading modules for uct | |
[1650638547.608981] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib4:1 | |
[1650638547.610307] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638547.610326] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.610329] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.610382] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.611035] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x4a24010 of 8176 bytes with 127 elements | |
[1650638547.611307] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638547.611318] [ndv4:68758:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib4:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.611356] [ndv4:68758:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib4 length=2048) failed: Invalid argument | |
[1650638547.611360] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638547.611372] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x46defc0 [id=106 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638547.611392] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 106 events 0x1 mode thread_spinlock | |
[1650638547.611403] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[22]=0x4850030 using rc_mlx5/mlx5_ib4:1 on worker 0x2a61550 | |
[1650638547.611525] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.611531] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.611642] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.611646] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.611841] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib4:1 | |
[1650638547.612867] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638547.612882] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.612885] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.612939] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.613467] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638547.613474] [ndv4:68758:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib4:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.613506] [ndv4:68758:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib4 length=2048) failed: Invalid argument | |
[1650638547.613510] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638547.613519] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x3fc1bf0 [id=108 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638547.613540] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 108 events 0x1 mode thread_spinlock | |
[1650638547.614182] [ndv4:68758:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638547.614157] [ndv4:68766:0] module.c:253 UCX DEBUG loading modules for uct_ib | |
[1650638547.614532] [ndv4:68766:0] ucp_context.c:1117 UCX DEBUG closing md posix because it has no selected transport resources | |
[1650638547.614554] [ndv4:68766:0] ucp_context.c:1117 UCX DEBUG closing md self because it has no selected transport resources | |
[1650638547.614674] [ndv4:68766:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib4) failed: Cannot assign requested address | |
[1650638547.614687] [ndv4:68766:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib3) failed: Cannot assign requested address | |
[1650638547.614698] [ndv4:68766:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib2) failed: Cannot assign requested address | |
[1650638547.614709] [ndv4:68766:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib7) failed: Cannot assign requested address | |
[1650638547.614720] [ndv4:68766:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib6) failed: Cannot assign requested address | |
[1650638547.614731] [ndv4:68766:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib5) failed: Cannot assign requested address | |
[1650638547.614742] [ndv4:68766:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib1) failed: Cannot assign requested address | |
[1650638547.614776] [ndv4:68766:0] ucp_context.c:1117 UCX DEBUG closing md tcp because it has no selected transport resources | |
[1650638547.615139] [ndv4:68766:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: Success, continuing, but fork may be unsafe. | |
[1650638547.622112] [ndv4:68758:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x4a26050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x3076 | |
[1650638547.622181] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.622188] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.622198] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2ad1f2827008 of 151544 bytes with 1052 elements | |
[1650638547.624181] [ndv4:68766:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib0 vendor_id: 0x15b3 device_id: 4124 | |
[1650638547.624620] [ndv4:68766:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib0: disable ODP because it's not supported for DevX QP | |
[1650638547.626524] [ndv4:68758:0] ib_md.c:812 UCX DEBUG registered memory 0x2ad1f0200000..0x2ad1f2800000 on mlx5_ib4 lkey 0x80c00 rkey 0x80c00 access 0xf flags 0x3e4 | |
[1650638547.626549] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2ad1f0200018 of 39845864 bytes with 4752 elements | |
[1650638547.626901] [ndv4:68758:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x4a26050 | |
[1650638547.626938] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[23]=0x4a26050 using dc_mlx5/mlx5_ib4:1 on worker 0x2a61550 | |
[1650638547.627017] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.627027] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.627112] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.627116] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.627554] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib4:1 | |
[1650638547.628549] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.628912] [ndv4:68758:0] ib_iface.c:994 UCX DEBUG iface=0x4c02060: created UD QP 0x306c on mlx5_ib4:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.629468] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.632151] [ndv4:68766:0] async.c:228 UCX DEBUG added async handler 0x1c15610 [id=54 ref 1] uct_ib_async_event_handler() to hash | |
[1650638547.632287] [ndv4:68766:0] async.c:506 UCX DEBUG listening to async event fd 54 events 0x1 mode thread_spinlock | |
[1650638547.632404] [ndv4:68766:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib0' (InfiniBand channel adapter) with 1 ports | |
[1650638547.632655] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638547.632665] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638547.632689] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638547.632753] [ndv4:68766:0] module.c:253 UCX DEBUG loading modules for ucm | |
[1650638547.632774] [ndv4:68766:0] ib_md.c:1319 UCX DEBUG mlx5_ib0: using registration cache | |
[1650638547.642808] [ndv4:68766:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638547.642821] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638547.643126] [ndv4:68766:0] ib_md.c:1604 UCX DEBUG mlx5_ib0: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638547.643358] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638547.643365] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638547.646915] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.646925] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.654516] [ndv4:68766:0] topo.c:99 UCX DEBUG bus id 0x101000000 doesn't exist. sys_dev = 0 | |
[1650638547.654524] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638547.656094] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.656103] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.656646] [ndv4:68758:0] ib_md.c:812 UCX DEBUG registered memory 0x2ad1f284c000..0x2ad1f28d1000 on mlx5_ib4 lkey 0x80d00 rkey 0x80d00 access 0xf flags 0x3e4 | |
[1650638547.656653] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ad1f284c018 of 544744 bytes with 128 elements | |
[1650638547.656658] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.657119] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4c02060: adding gid fe80::15:5dff:fd34:1f to hash on device mlx5_ib4 port 1 index 0) | |
[1650638547.657397] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4c02060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 1) | |
[1650638547.657605] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4c02060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 2) | |
[1650638547.664669] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638547.664676] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638547.665497] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638547.665501] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638547.666487] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4c02060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 3) | |
[1650638547.666845] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4c02060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 4) | |
[1650638547.680782] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638547.680788] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638547.681457] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638547.681461] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638547.681686] [ndv4:68766:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638547.681764] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4c02060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 5) | |
[1650638547.682102] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4c02060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 6) | |
[1650638547.682265] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4c02060: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 7) | |
[1650638547.682621] [ndv4:68758:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638547.682630] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x3ee6fc0 [id=109 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638547.682652] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 109 events 0x5 mode thread_spinlock | |
[1650638547.682665] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[24]=0x4c02060 using ud_verbs/mlx5_ib4:1 on worker 0x2a61550 | |
[1650638547.682676] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.682680] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.682726] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.682731] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.682902] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib4:1 | |
[1650638547.683977] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.684417] [ndv4:68758:0] ib_iface.c:994 UCX DEBUG iface=0x4d20460: created UD QP 0x306d on mlx5_ib4:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.684426] [ndv4:68758:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638547.685137] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.685200] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.685205] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.685244] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.685249] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.685633] [ndv4:68758:0] ib_md.c:812 UCX DEBUG registered memory 0x2ad1f28d1000..0x2ad1f2956000 on mlx5_ib4 lkey 0x81100 rkey 0x81100 access 0xf flags 0x3e4 | |
[1650638547.685639] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ad1f28d1018 of 544744 bytes with 128 elements | |
[1650638547.685643] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.685763] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4d20460: adding gid fe80::15:5dff:fd34:1f to hash on device mlx5_ib4 port 1 index 0) | |
[1650638547.699371] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4d20460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 1) | |
[1650638547.699477] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4d20460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 2) | |
[1650638547.699596] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4d20460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 3) | |
[1650638547.699739] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4d20460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 4) | |
[1650638547.700250] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4d20460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 5) | |
[1650638547.704682] [ndv4:68766:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib1 vendor_id: 0x15b3 device_id: 4124 | |
[1650638547.706533] [ndv4:68766:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib1: disable ODP because it's not supported for DevX QP | |
[1650638547.706797] [ndv4:68766:0] async.c:228 UCX DEBUG added async handler 0x1be8ff0 [id=59 ref 1] uct_ib_async_event_handler() to hash | |
[1650638547.706820] [ndv4:68766:0] async.c:506 UCX DEBUG listening to async event fd 59 events 0x1 mode thread_spinlock | |
[1650638547.706823] [ndv4:68766:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib1' (InfiniBand channel adapter) with 1 ports | |
[1650638547.707032] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638547.707040] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638547.707044] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638547.707084] [ndv4:68766:0] ib_md.c:1319 UCX DEBUG mlx5_ib1: using registration cache | |
[1650638547.712935] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4d20460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 6) | |
[1650638547.713607] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4d20460: adding gid fe80:: to hash on device mlx5_ib4 port 1 index 7) | |
[1650638547.713616] [ndv4:68758:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib4:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.713646] [ndv4:68758:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638547.713650] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x3ee6ec0 [id=110 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638547.713670] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 110 events 0x5 mode thread_spinlock | |
[1650638547.713679] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[25]=0x4d20460 using ud_mlx5/mlx5_ib4:1 on worker 0x2a61550 | |
[1650638547.713748] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.713754] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.713870] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.713875] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.714448] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib5:1 | |
[1650638547.714655] [ndv4:68766:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638547.714662] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638547.714937] [ndv4:68766:0] ib_md.c:1604 UCX DEBUG mlx5_ib1: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638547.714985] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638547.714992] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638547.715628] [ndv4:68766:0] topo.c:99 UCX DEBUG bus id 0x102000000 doesn't exist. sys_dev = 1 | |
[1650638547.715633] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638547.715993] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638547.715998] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638547.716397] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638547.716401] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638547.715792] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638547.715807] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.715810] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.715858] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.716393] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638547.716399] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638547.717477] [ndv4:68758:0] ib_iface.c:994 UCX DEBUG iface=0x4e200a0: created RC QP 0x2bb4 on mlx5_ib5:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638547.718491] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[26]=0x4e200a0 using rc_verbs/mlx5_ib5:1 on worker 0x2a61550 | |
[1650638547.718626] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.718632] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.755223] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.755231] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.755895] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib5:1 | |
[1650638547.757718] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638547.757731] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.757734] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.757783] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.758308] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x5141010 of 8176 bytes with 127 elements | |
[1650638547.758607] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638547.758614] [ndv4:68758:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib5:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.758650] [ndv4:68758:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib5 length=2048) failed: Invalid argument | |
[1650638547.758654] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638547.758665] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x4e28db0 [id=113 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638547.758685] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 113 events 0x1 mode thread_spinlock | |
[1650638547.758694] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[27]=0x4f6d030 using rc_mlx5/mlx5_ib5:1 on worker 0x2a61550 | |
[1650638547.758807] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.758812] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.759031] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.759036] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.760076] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib5:1 | |
[1650638547.761478] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638547.761492] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.761495] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.761552] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.761970] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638547.761976] [ndv4:68758:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib5:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.762008] [ndv4:68758:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib5 length=2048) failed: Invalid argument | |
[1650638547.762012] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638547.762019] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x38a4970 [id=115 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638547.762040] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 115 events 0x1 mode thread_spinlock | |
[1650638547.762753] [ndv4:68758:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638547.766032] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638547.766039] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638547.766578] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638547.766582] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638547.766797] [ndv4:68766:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638547.770564] [ndv4:68758:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x5143050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x2bdd | |
[1650638547.770848] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.770854] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.770862] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2ad1f5159008 of 151544 bytes with 1052 elements | |
[1650638547.774784] [ndv4:68758:0] ib_md.c:812 UCX DEBUG registered memory 0x2ad1f2a00000..0x2ad1f5000000 on mlx5_ib5 lkey 0x80f00 rkey 0x80f00 access 0xf flags 0x3e4 | |
[1650638547.774804] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2ad1f2a00018 of 39845864 bytes with 4752 elements | |
[1650638547.774940] [ndv4:68758:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x5143050 | |
[1650638547.774970] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[28]=0x5143050 using dc_mlx5/mlx5_ib5:1 on worker 0x2a61550 | |
[1650638547.775462] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.775472] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.775643] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.775648] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.786600] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib5:1 | |
[1650638547.786803] [ndv4:68766:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib2 vendor_id: 0x15b3 device_id: 4124 | |
[1650638547.787279] [ndv4:68766:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib2: disable ODP because it's not supported for DevX QP | |
[1650638547.787729] [ndv4:68766:0] async.c:228 UCX DEBUG added async handler 0x1c0dce0 [id=61 ref 1] uct_ib_async_event_handler() to hash | |
[1650638547.787753] [ndv4:68766:0] async.c:506 UCX DEBUG listening to async event fd 61 events 0x1 mode thread_spinlock | |
[1650638547.787757] [ndv4:68766:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib2' (InfiniBand channel adapter) with 1 ports | |
[1650638547.787776] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.788163] [ndv4:68758:0] ib_iface.c:994 UCX DEBUG iface=0x3f13480: created UD QP 0x2bbe on mlx5_ib5:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.788802] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.788933] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.788940] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.788984] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.788989] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.788055] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638547.788063] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638547.788069] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638547.788110] [ndv4:68766:0] ib_md.c:1319 UCX DEBUG mlx5_ib2: using registration cache | |
[1650638547.789571] [ndv4:68766:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638547.789578] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638547.789848] [ndv4:68766:0] ib_md.c:1604 UCX DEBUG mlx5_ib2: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638547.789975] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638547.789981] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638547.790979] [ndv4:68766:0] topo.c:99 UCX DEBUG bus id 0x103000000 doesn't exist. sys_dev = 2 | |
[1650638547.790983] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638547.789344] [ndv4:68758:0] ib_md.c:812 UCX DEBUG registered memory 0x2ad1f517e000..0x2ad1f5203000 on mlx5_ib5 lkey 0x81000 rkey 0x81000 access 0xf flags 0x3e4 | |
[1650638547.789350] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ad1f517e018 of 544744 bytes with 128 elements | |
[1650638547.789355] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.789386] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x3f13480: adding gid fe80::15:5dff:fd34:20 to hash on device mlx5_ib5 port 1 index 0) | |
[1650638547.789677] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x3f13480: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 1) | |
[1650638547.789987] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x3f13480: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 2) | |
[1650638547.804134] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x3f13480: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 3) | |
[1650638547.804440] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x3f13480: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 4) | |
[1650638547.804999] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638547.805006] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638547.812660] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x3f13480: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 5) | |
[1650638547.812765] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x3f13480: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 6) | |
[1650638547.812780] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x3f13480: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 7) | |
[1650638547.813126] [ndv4:68758:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638547.813135] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x531f630 [id=116 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638547.813160] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 116 events 0x5 mode thread_spinlock | |
[1650638547.813170] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[29]=0x3f13480 using ud_verbs/mlx5_ib5:1 on worker 0x2a61550 | |
[1650638547.813339] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.813346] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.813541] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.813546] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.814149] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib5:1 | |
[1650638547.814663] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638547.814670] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638547.815712] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.816077] [ndv4:68758:0] ib_iface.c:994 UCX DEBUG iface=0x543d050: created UD QP 0x2bc1 on mlx5_ib5:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.816084] [ndv4:68758:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638547.816672] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.816903] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.816910] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.817011] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638547.817016] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638547.817431] [ndv4:68758:0] ib_md.c:812 UCX DEBUG registered memory 0x2ad1f5203000..0x2ad1f5288000 on mlx5_ib5 lkey 0x81100 rkey 0x81100 access 0xf flags 0x3e4 | |
[1650638547.817443] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ad1f5203018 of 544744 bytes with 128 elements | |
[1650638547.817448] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.818386] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x543d050: adding gid fe80::15:5dff:fd34:20 to hash on device mlx5_ib5 port 1 index 0) | |
[1650638547.818715] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x543d050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 1) | |
[1650638547.818977] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x543d050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 2) | |
[1650638547.817343] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638547.817350] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638547.819310] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x543d050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 3) | |
[1650638547.819329] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638547.819335] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638547.819550] [ndv4:68766:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638547.829932] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x543d050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 4) | |
[1650638547.830740] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x543d050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 5) | |
[1650638547.831608] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x543d050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 6) | |
[1650638547.832169] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x543d050: adding gid fe80:: to hash on device mlx5_ib5 port 1 index 7) | |
[1650638547.832175] [ndv4:68758:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib5:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.832224] [ndv4:68758:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638547.832229] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x4dfb3a0 [id=117 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638547.832281] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 117 events 0x5 mode thread_spinlock | |
[1650638547.832291] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[30]=0x543d050 using ud_mlx5/mlx5_ib5:1 on worker 0x2a61550 | |
[1650638547.832529] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.832536] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.832689] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.832694] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.841382] [ndv4:68766:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib3 vendor_id: 0x15b3 device_id: 4124 | |
[1650638547.841772] [ndv4:68766:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib3: disable ODP because it's not supported for DevX QP | |
[1650638547.841242] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib6:1 | |
[1650638547.843302] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638547.843322] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.843326] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.843376] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.843932] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638547.843939] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638547.845244] [ndv4:68758:0] ib_iface.c:994 UCX DEBUG iface=0x553d0a0: created RC QP 0x2b28 on mlx5_ib6:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638547.846300] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[31]=0x553d0a0 using rc_verbs/mlx5_ib6:1 on worker 0x2a61550 | |
[1650638547.846522] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.846528] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.846590] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.846594] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.847451] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib6:1 | |
[1650638547.849194] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638547.849229] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.849232] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.849282] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.849807] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x585e010 of 8176 bytes with 127 elements | |
[1650638547.850090] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638547.850098] [ndv4:68758:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib6:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.850132] [ndv4:68758:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib6 length=2048) failed: Invalid argument | |
[1650638547.850137] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638547.850148] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x55182d0 [id=120 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638547.850168] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 120 events 0x1 mode thread_spinlock | |
[1650638547.850178] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[32]=0x568a030 using rc_mlx5/mlx5_ib6:1 on worker 0x2a61550 | |
[1650638547.850382] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.850389] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.850523] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.850528] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.850735] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib6:1 | |
[1650638547.851623] [ndv4:68766:0] async.c:228 UCX DEBUG added async handler 0x1c15730 [id=63 ref 1] uct_ib_async_event_handler() to hash | |
[1650638547.851646] [ndv4:68766:0] async.c:506 UCX DEBUG listening to async event fd 63 events 0x1 mode thread_spinlock | |
[1650638547.851649] [ndv4:68766:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib3' (InfiniBand channel adapter) with 1 ports | |
[1650638547.851706] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638547.851721] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.851725] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.851778] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.852133] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638547.852140] [ndv4:68758:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib6:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.852171] [ndv4:68758:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib6 length=2048) failed: Invalid argument | |
[1650638547.852176] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638547.852183] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x5692f60 [id=122 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638547.852201] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 122 events 0x1 mode thread_spinlock | |
[1650638547.852884] [ndv4:68758:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638547.852005] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.852013] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.852018] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638547.852061] [ndv4:68766:0] ib_md.c:1319 UCX DEBUG mlx5_ib3: using registration cache | |
[1650638547.860761] [ndv4:68758:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x5860050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x2b2f | |
[1650638547.860815] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.860821] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.860830] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2ad1f7a8b008 of 151544 bytes with 1052 elements | |
[1650638547.864768] [ndv4:68758:0] ib_md.c:812 UCX DEBUG registered memory 0x2ad1f5400000..0x2ad1f7a00000 on mlx5_ib6 lkey 0x80f00 rkey 0x80f00 access 0xf flags 0x3e4 | |
[1650638547.864786] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2ad1f5400018 of 39845864 bytes with 4752 elements | |
[1650638547.864923] [ndv4:68758:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x5860050 | |
[1650638547.864953] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[33]=0x5860050 using dc_mlx5/mlx5_ib6:1 on worker 0x2a61550 | |
[1650638547.864995] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.865003] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.865070] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.865074] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.865326] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib6:1 | |
[1650638547.866135] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.866533] [ndv4:68758:0] ib_iface.c:994 UCX DEBUG iface=0x5a3c060: created UD QP 0x2b34 on mlx5_ib6:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.867091] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.867147] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.867154] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.867175] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.867180] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.867580] [ndv4:68758:0] ib_md.c:812 UCX DEBUG registered memory 0x2ad1f7ab0000..0x2ad1f7b35000 on mlx5_ib6 lkey 0x81000 rkey 0x81000 access 0xf flags 0x3e4 | |
[1650638547.867586] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ad1f7ab0018 of 544744 bytes with 128 elements | |
[1650638547.867591] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.867648] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x5a3c060: adding gid fe80::15:5dff:fd34:21 to hash on device mlx5_ib6 port 1 index 0) | |
[1650638547.867701] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x5a3c060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 1) | |
[1650638547.867747] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x5a3c060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 2) | |
[1650638547.867775] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x5a3c060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 3) | |
[1650638547.867794] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x5a3c060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 4) | |
[1650638547.867818] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x5a3c060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 5) | |
[1650638547.867844] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x5a3c060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 6) | |
[1650638547.867873] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x5a3c060: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 7) | |
[1650638547.868129] [ndv4:68758:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638547.868139] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x5a3cf20 [id=123 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638547.868167] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 123 events 0x5 mode thread_spinlock | |
[1650638547.868179] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[34]=0x5a3c060 using ud_verbs/mlx5_ib6:1 on worker 0x2a61550 | |
[1650638547.868193] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.868199] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.868262] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.868267] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.868415] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib6:1 | |
[1650638547.869133] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.869476] [ndv4:68758:0] ib_iface.c:994 UCX DEBUG iface=0x4d4d3a0: created UD QP 0x2b35 on mlx5_ib6:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.869484] [ndv4:68758:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638547.869983] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.870017] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.870023] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.870037] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638547.870042] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638547.870378] [ndv4:68758:0] ib_md.c:812 UCX DEBUG registered memory 0x2ad1f7b35000..0x2ad1f7bba000 on mlx5_ib6 lkey 0x81100 rkey 0x81100 access 0xf flags 0x3e4 | |
[1650638547.870385] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ad1f7b35018 of 544744 bytes with 128 elements | |
[1650638547.870389] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.870617] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4d4d3a0: adding gid fe80::15:5dff:fd34:21 to hash on device mlx5_ib6 port 1 index 0) | |
[1650638547.870814] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4d4d3a0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 1) | |
[1650638547.870944] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4d4d3a0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 2) | |
[1650638547.882488] [ndv4:68766:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638547.882499] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638547.883008] [ndv4:68766:0] ib_md.c:1604 UCX DEBUG mlx5_ib3: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638547.884461] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4d4d3a0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 3) | |
[1650638547.885118] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4d4d3a0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 4) | |
[1650638547.885622] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4d4d3a0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 5) | |
[1650638547.885979] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4d4d3a0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 6) | |
[1650638547.886751] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x4d4d3a0: adding gid fe80:: to hash on device mlx5_ib6 port 1 index 7) | |
[1650638547.886759] [ndv4:68758:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib6:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.886791] [ndv4:68758:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638547.886795] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x55180e0 [id=124 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638547.886816] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 124 events 0x5 mode thread_spinlock | |
[1650638547.887052] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[35]=0x4d4d3a0 using ud_mlx5/mlx5_ib6:1 on worker 0x2a61550 | |
[1650638547.887111] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.887117] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.887355] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.887362] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.887935] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib7:1 | |
[1650638547.890018] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638547.890040] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.890043] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.890108] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.890768] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638547.890774] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638547.892071] [ndv4:68758:0] ib_iface.c:994 UCX DEBUG iface=0x5c5a0a0: created RC QP 0x2b1b on mlx5_ib7:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638547.892076] [ndv4:69236:0] debug.c:1198 UCX DEBUG using signal stack 0x2abade93e000 size 141824 | |
[1650638547.892156] [ndv4:69236:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638547.892179] [ndv4:69236:0] init.c:113 UCX DEBUG /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 loaded at 0x2abade792000 | |
[1650638547.892196] [ndv4:69236:0] init.c:115 UCX DEBUG cmd line: osu_scatter | |
[1650638547.892290] [ndv4:69236:0] module.c:69 UCX DEBUG ucs library path: /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 | |
[1650638547.892299] [ndv4:69236:0] module.c:253 UCX DEBUG loading modules for ucs | |
[1650638547.893334] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638547.893347] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638547.893529] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[36]=0x5c5a0a0 using rc_verbs/mlx5_ib7:1 on worker 0x2a61550 | |
[1650638547.893575] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.893581] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.893824] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.893830] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.894748] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib7:1 | |
[1650638547.894834] [ndv4:69236:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638547.894853] [ndv4:69236:0] time.c:22 UCX DEBUG measured arch clock speed: 1000000.00 Hz | |
[1650638547.894887] [ndv4:69236:0] ucp_context.c:1361 UCX DEBUG estimated number of endpoints is 96 | |
[1650638547.894891] [ndv4:69236:0] ucp_context.c:1368 UCX DEBUG estimated number of endpoints per node is 96 | |
[1650638547.894898] [ndv4:69236:0] ucp_context.c:1375 UCX DEBUG estimated bcopy bandwidth is 5251268608.000000 | |
[1650638547.894904] [ndv4:69236:0] ucp_context.c:1427 UCX DEBUG allocation method[0] is md 'sysv' | |
[1650638547.894907] [ndv4:69236:0] ucp_context.c:1427 UCX DEBUG allocation method[1] is md 'posix' | |
[1650638547.894911] [ndv4:69236:0] ucp_context.c:1439 UCX DEBUG allocation method[2] is 'huge' | |
[1650638547.894914] [ndv4:69236:0] ucp_context.c:1439 UCX DEBUG allocation method[3] is 'thp' | |
[1650638547.894916] [ndv4:69236:0] ucp_context.c:1427 UCX DEBUG allocation method[4] is md '*' | |
[1650638547.894918] [ndv4:69236:0] ucp_context.c:1439 UCX DEBUG allocation method[5] is 'mmap' | |
[1650638547.894921] [ndv4:69236:0] ucp_context.c:1439 UCX DEBUG allocation method[6] is 'heap' | |
[1650638547.894928] [ndv4:69236:0] module.c:253 UCX DEBUG loading modules for uct | |
[1650638547.895703] [ndv4:68766:0] topo.c:99 UCX DEBUG bus id 0x104000000 doesn't exist. sys_dev = 3 | |
[1650638547.895713] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638547.897011] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638547.897017] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638547.896852] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638547.896872] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.896875] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.896925] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.897500] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x5f7b010 of 8176 bytes with 127 elements | |
[1650638547.897784] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638547.897802] [ndv4:68758:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib7:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.897846] [ndv4:68758:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib7 length=2048) failed: Invalid argument | |
[1650638547.897850] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638547.897867] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x5b5a8c0 [id=127 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638547.897891] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 127 events 0x1 mode thread_spinlock | |
[1650638547.897903] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[37]=0x5da7030 using rc_mlx5/mlx5_ib7:1 on worker 0x2a61550 | |
[1650638547.900722] [ndv4:69236:0] module.c:253 UCX DEBUG loading modules for uct_ib | |
[1650638547.901076] [ndv4:69236:0] ucp_context.c:1117 UCX DEBUG closing md posix because it has no selected transport resources | |
[1650638547.901096] [ndv4:69236:0] ucp_context.c:1117 UCX DEBUG closing md self because it has no selected transport resources | |
[1650638547.905921] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.905931] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.906196] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.906201] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.907038] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib7:1 | |
[1650638547.907337] [ndv4:69236:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib4) failed: Cannot assign requested address | |
[1650638547.907358] [ndv4:69236:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib3) failed: Cannot assign requested address | |
[1650638547.907370] [ndv4:69236:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib2) failed: Cannot assign requested address | |
[1650638547.907380] [ndv4:69236:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib7) failed: Cannot assign requested address | |
[1650638547.907391] [ndv4:69236:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib6) failed: Cannot assign requested address | |
[1650638547.907401] [ndv4:69236:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib5) failed: Cannot assign requested address | |
[1650638547.907413] [ndv4:69236:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib1) failed: Cannot assign requested address | |
[1650638547.907447] [ndv4:69236:0] ucp_context.c:1117 UCX DEBUG closing md tcp because it has no selected transport resources | |
[1650638547.907827] [ndv4:69236:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: Success, continuing, but fork may be unsafe. | |
[1650638547.910080] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638547.910098] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638547.910101] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638547.910182] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638547.910709] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638547.910717] [ndv4:68758:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib7:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638547.910744] [ndv4:68758:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib7 length=2048) failed: Invalid argument | |
[1650638547.910748] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638547.910756] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x5c35760 [id=129 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638547.910777] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 129 events 0x1 mode thread_spinlock | |
[1650638547.911451] [ndv4:68758:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638547.919725] [ndv4:68758:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x5f7d050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x2b22 | |
[1650638547.919793] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.919800] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.919812] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2ad1fa3bd008 of 151544 bytes with 1052 elements | |
[1650638547.923766] [ndv4:68758:0] ib_md.c:812 UCX DEBUG registered memory 0x2ad1f7c00000..0x2ad1fa200000 on mlx5_ib7 lkey 0x80d00 rkey 0x80d00 access 0xf flags 0x3e4 | |
[1650638547.923785] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2ad1f7c00018 of 39845864 bytes with 4752 elements | |
[1650638547.923923] [ndv4:68758:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x5f7d050 | |
[1650638547.923957] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[38]=0x5f7d050 using dc_mlx5/mlx5_ib7:1 on worker 0x2a61550 | |
[1650638547.924033] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.924042] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.924335] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.924340] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.924740] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib7:1 | |
[1650638547.925998] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.926415] [ndv4:68758:0] ib_iface.c:994 UCX DEBUG iface=0x546a570: created UD QP 0x2b27 on mlx5_ib7:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.926971] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.927344] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.927352] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.927453] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.927458] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.927824] [ndv4:68758:0] ib_md.c:812 UCX DEBUG registered memory 0x2ad1fa3e2000..0x2ad1fa467000 on mlx5_ib7 lkey 0x80e00 rkey 0x80e00 access 0xf flags 0x3e4 | |
[1650638547.927830] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ad1fa3e2018 of 544744 bytes with 128 elements | |
[1650638547.927835] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.928202] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x546a570: adding gid fe80::15:5dff:fd34:22 to hash on device mlx5_ib7 port 1 index 0) | |
[1650638547.928446] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x546a570: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 1) | |
[1650638547.929284] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x546a570: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 2) | |
[1650638547.930052] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x546a570: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 3) | |
[1650638547.930595] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x546a570: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 4) | |
[1650638547.930322] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638547.930329] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638547.932916] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638547.932922] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638547.934009] [ndv4:69236:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib0 vendor_id: 0x15b3 device_id: 4124 | |
[1650638547.934493] [ndv4:69236:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib0: disable ODP because it's not supported for DevX QP | |
[1650638547.934267] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638547.934272] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638547.934538] [ndv4:68766:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638547.940482] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x546a570: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 5) | |
[1650638547.942423] [ndv4:69236:0] async.c:228 UCX DEBUG added async handler 0x14a7610 [id=54 ref 1] uct_ib_async_event_handler() to hash | |
[1650638547.942526] [ndv4:69236:0] async.c:506 UCX DEBUG listening to async event fd 54 events 0x1 mode thread_spinlock | |
[1650638547.942977] [ndv4:69236:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib0' (InfiniBand channel adapter) with 1 ports | |
[1650638547.943331] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638547.943346] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638547.943393] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638547.943479] [ndv4:69236:0] module.c:253 UCX DEBUG loading modules for ucm | |
[1650638547.943512] [ndv4:69236:0] ib_md.c:1319 UCX DEBUG mlx5_ib0: using registration cache | |
[1650638547.946220] [ndv4:69236:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638547.946233] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638547.946553] [ndv4:69236:0] ib_md.c:1604 UCX DEBUG mlx5_ib0: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638547.946857] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638547.946863] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638547.947932] [ndv4:69236:0] topo.c:99 UCX DEBUG bus id 0x101000000 doesn't exist. sys_dev = 0 | |
[1650638547.947939] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638547.954172] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x546a570: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 6) | |
[1650638547.955230] [ndv4:68766:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib4 vendor_id: 0x15b3 device_id: 4124 | |
[1650638547.955715] [ndv4:68766:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib4: disable ODP because it's not supported for DevX QP | |
[1650638547.956090] [ndv4:68766:0] async.c:228 UCX DEBUG added async handler 0x22f7610 [id=65 ref 1] uct_ib_async_event_handler() to hash | |
[1650638547.956115] [ndv4:68766:0] async.c:506 UCX DEBUG listening to async event fd 65 events 0x1 mode thread_spinlock | |
[1650638547.956119] [ndv4:68766:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib4' (InfiniBand channel adapter) with 1 ports | |
[1650638547.957529] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638547.957536] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638547.957867] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638547.957871] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638547.958376] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638547.958381] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638547.958454] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638547.958457] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638547.958767] [ndv4:69236:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638547.956413] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.956422] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.956433] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638547.956490] [ndv4:68766:0] ib_md.c:1319 UCX DEBUG mlx5_ib4: using registration cache | |
[1650638547.963283] [ndv4:68766:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638547.963290] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638547.963580] [ndv4:68766:0] ib_md.c:1604 UCX DEBUG mlx5_ib4: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638547.963595] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638547.963601] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638547.963740] [ndv4:68766:0] topo.c:99 UCX DEBUG bus id 0x105000000 doesn't exist. sys_dev = 4 | |
[1650638547.963744] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638547.963849] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638547.963853] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638547.963970] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638547.963973] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638547.964080] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638547.964083] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638547.964188] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638547.964191] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638547.964433] [ndv4:68766:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638547.966658] [ndv4:69236:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib1 vendor_id: 0x15b3 device_id: 4124 | |
[1650638547.967003] [ndv4:69236:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib1: disable ODP because it's not supported for DevX QP | |
[1650638547.969341] [ndv4:69236:0] async.c:228 UCX DEBUG added async handler 0x147aff0 [id=59 ref 1] uct_ib_async_event_handler() to hash | |
[1650638547.969363] [ndv4:69236:0] async.c:506 UCX DEBUG listening to async event fd 59 events 0x1 mode thread_spinlock | |
[1650638547.969367] [ndv4:69236:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib1' (InfiniBand channel adapter) with 1 ports | |
[1650638547.969680] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638547.969689] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638547.969694] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638547.969735] [ndv4:69236:0] ib_md.c:1319 UCX DEBUG mlx5_ib1: using registration cache | |
[1650638547.972622] [ndv4:69236:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638547.972630] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638547.972898] [ndv4:69236:0] ib_md.c:1604 UCX DEBUG mlx5_ib1: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638547.972925] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638547.972930] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638547.973017] [ndv4:69236:0] topo.c:99 UCX DEBUG bus id 0x102000000 doesn't exist. sys_dev = 1 | |
[1650638547.973021] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638547.974294] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x546a570: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 7) | |
[1650638547.974674] [ndv4:68758:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638547.974685] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x546afc0 [id=130 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638547.974710] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 130 events 0x5 mode thread_spinlock | |
[1650638547.974732] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[39]=0x546a570 using ud_verbs/mlx5_ib7:1 on worker 0x2a61550 | |
[1650638547.974754] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.974760] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.974938] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.974944] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.975504] [ndv4:68758:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib7:1 | |
[1650638547.976314] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638547.976321] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638547.976389] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638547.976392] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638547.977729] [ndv4:68758:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638547.978234] [ndv4:68758:0] ib_iface.c:994 UCX DEBUG iface=0x6277460: created UD QP 0x2b28 on mlx5_ib7:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638547.978244] [ndv4:68758:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638547.979052] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638547.979431] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.979437] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.979448] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638547.979452] [ndv4:68758:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638547.979863] [ndv4:68758:0] ib_md.c:812 UCX DEBUG registered memory 0x2ad1fa467000..0x2ad1fa4ec000 on mlx5_ib7 lkey 0x80f00 rkey 0x80f00 access 0xf flags 0x3e4 | |
[1650638547.979869] [ndv4:68758:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2ad1fa467018 of 544744 bytes with 128 elements | |
[1650638547.979873] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638547.980557] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x6277460: adding gid fe80::15:5dff:fd34:22 to hash on device mlx5_ib7 port 1 index 0) | |
[1650638547.981002] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x6277460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 1) | |
[1650638547.990029] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x6277460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 2) | |
[1650638547.996035] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638547.996045] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638547.996975] [ndv4:68755:0] debug.c:1198 UCX DEBUG using signal stack 0x2af633c84000 size 141824 | |
[1650638547.997057] [ndv4:68755:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638547.997078] [ndv4:68755:0] init.c:113 UCX DEBUG /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 loaded at 0x2af633af2000 | |
[1650638547.997097] [ndv4:68755:0] init.c:115 UCX DEBUG cmd line: osu_scatter | |
[1650638547.997106] [ndv4:68755:0] module.c:69 UCX DEBUG ucs library path: /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 | |
[1650638547.997111] [ndv4:68755:0] module.c:253 UCX DEBUG loading modules for ucs | |
[1650638547.998897] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638547.998904] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638547.999406] [ndv4:69236:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638547.999644] [ndv4:68755:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638547.999666] [ndv4:68755:0] time.c:22 UCX DEBUG measured arch clock speed: 1000000.00 Hz | |
[1650638547.999700] [ndv4:68755:0] ucp_context.c:1361 UCX DEBUG estimated number of endpoints is 96 | |
[1650638547.999703] [ndv4:68755:0] ucp_context.c:1368 UCX DEBUG estimated number of endpoints per node is 96 | |
[1650638547.999709] [ndv4:68755:0] ucp_context.c:1375 UCX DEBUG estimated bcopy bandwidth is 5251268608.000000 | |
[1650638547.999716] [ndv4:68755:0] ucp_context.c:1427 UCX DEBUG allocation method[0] is md 'sysv' | |
[1650638547.999719] [ndv4:68755:0] ucp_context.c:1427 UCX DEBUG allocation method[1] is md 'posix' | |
[1650638547.999723] [ndv4:68755:0] ucp_context.c:1439 UCX DEBUG allocation method[2] is 'huge' | |
[1650638547.999726] [ndv4:68755:0] ucp_context.c:1439 UCX DEBUG allocation method[3] is 'thp' | |
[1650638547.999728] [ndv4:68755:0] ucp_context.c:1427 UCX DEBUG allocation method[4] is md '*' | |
[1650638547.999731] [ndv4:68755:0] ucp_context.c:1439 UCX DEBUG allocation method[5] is 'mmap' | |
[1650638547.999733] [ndv4:68755:0] ucp_context.c:1439 UCX DEBUG allocation method[6] is 'heap' | |
[1650638547.999741] [ndv4:68755:0] module.c:253 UCX DEBUG loading modules for uct | |
[1650638548.005750] [ndv4:68755:0] module.c:253 UCX DEBUG loading modules for uct_ib | |
[1650638548.006153] [ndv4:68755:0] ucp_context.c:1117 UCX DEBUG closing md posix because it has no selected transport resources | |
[1650638548.006178] [ndv4:68755:0] ucp_context.c:1117 UCX DEBUG closing md self because it has no selected transport resources | |
[1650638548.006486] [ndv4:68755:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib4) failed: Cannot assign requested address | |
[1650638548.006499] [ndv4:68755:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib3) failed: Cannot assign requested address | |
[1650638548.006510] [ndv4:68755:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib2) failed: Cannot assign requested address | |
[1650638548.006520] [ndv4:68755:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib7) failed: Cannot assign requested address | |
[1650638548.006556] [ndv4:68755:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib6) failed: Cannot assign requested address | |
[1650638548.006566] [ndv4:68755:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib5) failed: Cannot assign requested address | |
[1650638548.006577] [ndv4:68755:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib1) failed: Cannot assign requested address | |
[1650638548.006613] [ndv4:68755:0] ucp_context.c:1117 UCX DEBUG closing md tcp because it has no selected transport resources | |
[1650638548.007058] [ndv4:68755:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: Success, continuing, but fork may be unsafe. | |
[1650638548.008692] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x6277460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 3) | |
[1650638548.008937] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x6277460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 4) | |
[1650638548.012546] [ndv4:68766:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib5 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.013060] [ndv4:68766:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib5: disable ODP because it's not supported for DevX QP | |
[1650638548.013552] [ndv4:68766:0] async.c:228 UCX DEBUG added async handler 0x1c11420 [id=67 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.013579] [ndv4:68766:0] async.c:506 UCX DEBUG listening to async event fd 67 events 0x1 mode thread_spinlock | |
[1650638548.013582] [ndv4:68766:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib5' (InfiniBand channel adapter) with 1 ports | |
[1650638548.013895] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638548.013906] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638548.013921] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.013991] [ndv4:68766:0] ib_md.c:1319 UCX DEBUG mlx5_ib5: using registration cache | |
[1650638548.015052] [ndv4:68766:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.015058] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.014691] [ndv4:68755:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib0 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.015035] [ndv4:68755:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib0: disable ODP because it's not supported for DevX QP | |
[1650638548.015359] [ndv4:68766:0] ib_md.c:1604 UCX DEBUG mlx5_ib5: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.015487] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638548.015493] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638548.017022] [ndv4:68766:0] topo.c:99 UCX DEBUG bus id 0x106000000 doesn't exist. sys_dev = 5 | |
[1650638548.017032] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638548.023048] [ndv4:68755:0] async.c:228 UCX DEBUG added async handler 0xcaa2c0 [id=54 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.023152] [ndv4:68755:0] async.c:506 UCX DEBUG listening to async event fd 54 events 0x1 mode thread_spinlock | |
[1650638548.023530] [ndv4:68755:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib0' (InfiniBand channel adapter) with 1 ports | |
[1650638548.023749] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.023764] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.023790] [ndv4:68755:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.023848] [ndv4:68755:0] module.c:253 UCX DEBUG loading modules for ucm | |
[1650638548.023870] [ndv4:68755:0] ib_md.c:1319 UCX DEBUG mlx5_ib0: using registration cache | |
[1650638548.026401] [ndv4:68755:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.026415] [ndv4:68755:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.026647] [ndv4:68755:0] ib_md.c:1604 UCX DEBUG mlx5_ib0: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.026668] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.026674] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.026862] [ndv4:68755:0] topo.c:99 UCX DEBUG bus id 0x101000000 doesn't exist. sys_dev = 0 | |
[1650638548.026867] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.026991] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.026995] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.027168] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.027174] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.027258] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.027262] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.027436] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.027439] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.027725] [ndv4:68755:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.030142] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x6277460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 5) | |
[1650638548.030336] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x6277460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 6) | |
[1650638548.030553] [ndv4:68758:0] ud_iface.c:393 UCX DEBUG iface 0x6277460: adding gid fe80:: to hash on device mlx5_ib7 port 1 index 7) | |
[1650638548.030578] [ndv4:68758:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib7:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638548.030625] [ndv4:68758:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638548.030634] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x6277f80 [id=131 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638548.030660] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 131 events 0x5 mode thread_spinlock | |
[1650638548.030709] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[40]=0x6277460 using ud_mlx5/mlx5_ib7:1 on worker 0x2a61550 | |
[1650638548.030814] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool uct_scopy_iface_tx_mp: align 64, maxelems 4294967295, elemsize 736 | |
[1650638548.030871] [ndv4:68758:0] ucp_worker.c:1159 UCX DEBUG created interface[41]=0x5b87600 using cma/memory on worker 0x2a61550 | |
[1650638548.030877] [ndv4:68758:0] ucp_worker.c:982 UCX DEBUG selected scalable tl bitmap: 0x3ffffffffff 0x0 (42 tls) | |
[1650638548.035842] [ndv4:68755:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib1 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.036174] [ndv4:68755:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib1: disable ODP because it's not supported for DevX QP | |
[1650638548.036767] [ndv4:69236:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib2 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.037295] [ndv4:69236:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib2: disable ODP because it's not supported for DevX QP | |
[1650638548.037705] [ndv4:69236:0] async.c:228 UCX DEBUG added async handler 0x149fce0 [id=61 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.037730] [ndv4:69236:0] async.c:506 UCX DEBUG listening to async event fd 61 events 0x1 mode thread_spinlock | |
[1650638548.037733] [ndv4:69236:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib2' (InfiniBand channel adapter) with 1 ports | |
[1650638548.037947] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638548.037962] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638548.037974] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.038029] [ndv4:69236:0] ib_md.c:1319 UCX DEBUG mlx5_ib2: using registration cache | |
[1650638548.039511] [ndv4:69236:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.039518] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.039776] [ndv4:69236:0] ib_md.c:1604 UCX DEBUG mlx5_ib2: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.039955] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638548.039965] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638548.040231] [ndv4:69236:0] topo.c:99 UCX DEBUG bus id 0x103000000 doesn't exist. sys_dev = 2 | |
[1650638548.040236] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.040302] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.040305] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.040364] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.040367] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.040424] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.040427] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.040484] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.040487] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.040881] [ndv4:69236:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.043443] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x6159fb0 [id=75 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638548.043471] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 75 events 0x0 mode thread_spinlock | |
[1650638548.043603] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x5c35370 [id=76 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638548.043627] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 76 events 0x0 mode thread_spinlock | |
[1650638548.043689] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x2188510 [id=77 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638548.043709] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 77 events 0x0 mode thread_spinlock | |
[1650638548.043751] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x2188550 [id=79 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638548.043767] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 79 events 0x0 mode thread_spinlock | |
[1650638548.043815] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x2188590 [id=83 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638548.043832] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 83 events 0x0 mode thread_spinlock | |
[1650638548.043847] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x21885d0 [id=84 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638548.043861] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 84 events 0x0 mode thread_spinlock | |
[1650638548.043876] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x2188610 [id=86 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638548.043887] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 86 events 0x0 mode thread_spinlock | |
[1650638548.043922] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x219ad00 [id=90 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638548.043939] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 90 events 0x0 mode thread_spinlock | |
[1650638548.043954] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x219ad40 [id=91 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638548.043971] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 91 events 0x0 mode thread_spinlock | |
[1650638548.043987] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x219ad80 [id=93 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638548.044006] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 93 events 0x0 mode thread_spinlock | |
[1650638548.044040] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x5b87f60 [id=97 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638548.044060] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 97 events 0x0 mode thread_spinlock | |
[1650638548.044074] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x5b87fa0 [id=98 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638548.044088] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 98 events 0x0 mode thread_spinlock | |
[1650638548.044101] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x3fee910 [id=100 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638548.044115] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 100 events 0x0 mode thread_spinlock | |
[1650638548.044149] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x3fee950 [id=104 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638548.044165] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 104 events 0x0 mode thread_spinlock | |
[1650638548.044181] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x3fee990 [id=105 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638548.044195] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 105 events 0x0 mode thread_spinlock | |
[1650638548.044224] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x3fee9d0 [id=107 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638548.044267] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 107 events 0x0 mode thread_spinlock | |
[1650638548.044302] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x219adc0 [id=111 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638548.044321] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 111 events 0x0 mode thread_spinlock | |
[1650638548.044413] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x219ae00 [id=112 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638548.044437] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 112 events 0x0 mode thread_spinlock | |
[1650638548.044452] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x219ae40 [id=114 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638548.044467] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 114 events 0x0 mode thread_spinlock | |
[1650638548.044501] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x219ae80 [id=118 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638548.044519] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 118 events 0x0 mode thread_spinlock | |
[1650638548.044533] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x219aec0 [id=119 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638548.044556] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 119 events 0x0 mode thread_spinlock | |
[1650638548.044569] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x5dafe10 [id=121 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638548.044588] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 121 events 0x0 mode thread_spinlock | |
[1650638548.044622] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x5dafe50 [id=125 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638548.044643] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 125 events 0x0 mode thread_spinlock | |
[1650638548.044657] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x5dafe90 [id=126 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638548.044670] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 126 events 0x0 mode thread_spinlock | |
[1650638548.044683] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x5dafed0 [id=128 ref 1] ucp_worker_iface_async_fd_event() to hash | |
[1650638548.044702] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 128 events 0x0 mode thread_spinlock | |
[1650638548.048129] [ndv4:68755:0] async.c:228 UCX DEBUG added async handler 0xcaa440 [id=59 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.048157] [ndv4:68755:0] async.c:506 UCX DEBUG listening to async event fd 59 events 0x1 mode thread_spinlock | |
[1650638548.048161] [ndv4:68755:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib1' (InfiniBand channel adapter) with 1 ports | |
[1650638548.048506] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.048517] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.048525] [ndv4:68755:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.048582] [ndv4:68755:0] ib_md.c:1319 UCX DEBUG mlx5_ib1: using registration cache | |
[1650638548.049335] [ndv4:68758:0] async.c:228 UCX DEBUG added async handler 0x3ee67e0 [id=132 ref 1] uct_rdmacm_cm_event_handler() to hash | |
[1650638548.049359] [ndv4:68758:0] async.c:506 UCX DEBUG listening to async event fd 132 events 0x1 mode thread_spinlock | |
[1650638548.049451] [ndv4:68758:0] rdmacm_cm.c:922 UCX DEBUG created rdmacm_cm 0x6352050 with event_channel 0x5b87fe0 (fd=132) | |
[1650638548.049482] [ndv4:68758:0] tcp_sockcm.c:186 UCX DEBUG created tcp_sockcm 0x583c7e0 | |
[1650638548.049498] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ucp_requests: align 64, maxelems 4294967295, elemsize 440 | |
[1650638548.049500] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ucp_rkeys: align 64, maxelems 4294967295, elemsize 168 | |
[1650638548.049504] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ucp_am_bufs: align 64, maxelems 4294967295, elemsize 8344 | |
[1650638548.049507] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ucp_reg_bufs: align 64, maxelems 4294967295, elemsize 8208 | |
[1650638548.049510] [ndv4:68758:0] mpool.c:88 UCX DEBUG mpool ucp_rndv_frags: align 512, maxelems 4294967295, elemsize 524304 | |
[1650638548.049734] [ndv4:68758:0] parser.c:1893 UCX INFO UCX_* env variables: UCX_TLS=sysv,cma,ib UCX_POSIX_USE_PROC_LINK=n UCX_LOG_LEVEL=debug | |
[1650638548.048622] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638548.048629] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638548.050636] [ndv4:68755:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.050643] [ndv4:68755:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.050843] [ndv4:68755:0] ib_md.c:1604 UCX DEBUG mlx5_ib1: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.050914] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.050920] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.051272] [ndv4:68755:0] topo.c:99 UCX DEBUG bus id 0x102000000 doesn't exist. sys_dev = 1 | |
[1650638548.051278] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.051884] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.051888] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.052662] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.052668] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.053203] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.053225] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.063634] [ndv4:69236:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib3 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.066953] [ndv4:69236:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib3: disable ODP because it's not supported for DevX QP | |
[1650638548.068452] [ndv4:69236:0] async.c:228 UCX DEBUG added async handler 0x14a7730 [id=63 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.068474] [ndv4:69236:0] async.c:506 UCX DEBUG listening to async event fd 63 events 0x1 mode thread_spinlock | |
[1650638548.068477] [ndv4:69236:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib3' (InfiniBand channel adapter) with 1 ports | |
[1650638548.068517] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.068525] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.068763] [ndv4:68755:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.070288] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638548.070297] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638548.070302] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.070346] [ndv4:69236:0] ib_md.c:1319 UCX DEBUG mlx5_ib3: using registration cache | |
[1650638548.074687] [ndv4:68755:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib2 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.075024] [ndv4:68755:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib2: disable ODP because it's not supported for DevX QP | |
[1650638548.075621] [ndv4:68755:0] async.c:228 UCX DEBUG added async handler 0xcaad60 [id=61 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.075644] [ndv4:68755:0] async.c:506 UCX DEBUG listening to async event fd 61 events 0x1 mode thread_spinlock | |
[1650638548.075647] [ndv4:68755:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib2' (InfiniBand channel adapter) with 1 ports | |
[1650638548.075817] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638548.075828] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638548.075836] [ndv4:68755:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.075924] [ndv4:68755:0] ib_md.c:1319 UCX DEBUG mlx5_ib2: using registration cache | |
[1650638548.075957] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638548.075965] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638548.076128] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638548.076133] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638548.076452] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638548.076456] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638548.076728] [ndv4:68766:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.080441] [ndv4:69236:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.080449] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.081403] [ndv4:69236:0] ib_md.c:1604 UCX DEBUG mlx5_ib3: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.081488] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638548.081495] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638548.090421] [ndv4:68754:0] debug.c:1198 UCX DEBUG using signal stack 0x2afb59707000 size 141824 | |
[1650638548.090505] [ndv4:68754:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638548.090525] [ndv4:68754:0] init.c:113 UCX DEBUG /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 loaded at 0x2afb5955b000 | |
[1650638548.090548] [ndv4:68754:0] init.c:115 UCX DEBUG cmd line: osu_scatter | |
[1650638548.090557] [ndv4:68754:0] module.c:69 UCX DEBUG ucs library path: /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 | |
[1650638548.090563] [ndv4:68754:0] module.c:253 UCX DEBUG loading modules for ucs | |
[1650638548.091830] [ndv4:68755:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.091846] [ndv4:68755:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.092103] [ndv4:68755:0] ib_md.c:1604 UCX DEBUG mlx5_ib2: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.092138] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638548.092144] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638548.091617] [ndv4:69236:0] topo.c:99 UCX DEBUG bus id 0x104000000 doesn't exist. sys_dev = 3 | |
[1650638548.091628] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.093593] [ndv4:68754:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638548.093612] [ndv4:68754:0] time.c:22 UCX DEBUG measured arch clock speed: 1000000.00 Hz | |
[1650638548.093650] [ndv4:68754:0] ucp_context.c:1361 UCX DEBUG estimated number of endpoints is 96 | |
[1650638548.093653] [ndv4:68754:0] ucp_context.c:1368 UCX DEBUG estimated number of endpoints per node is 96 | |
[1650638548.093659] [ndv4:68754:0] ucp_context.c:1375 UCX DEBUG estimated bcopy bandwidth is 5251268608.000000 | |
[1650638548.093666] [ndv4:68754:0] ucp_context.c:1427 UCX DEBUG allocation method[0] is md 'sysv' | |
[1650638548.093669] [ndv4:68754:0] ucp_context.c:1427 UCX DEBUG allocation method[1] is md 'posix' | |
[1650638548.093673] [ndv4:68754:0] ucp_context.c:1439 UCX DEBUG allocation method[2] is 'huge' | |
[1650638548.093675] [ndv4:68754:0] ucp_context.c:1439 UCX DEBUG allocation method[3] is 'thp' | |
[1650638548.093678] [ndv4:68754:0] ucp_context.c:1427 UCX DEBUG allocation method[4] is md '*' | |
[1650638548.093680] [ndv4:68754:0] ucp_context.c:1439 UCX DEBUG allocation method[5] is 'mmap' | |
[1650638548.093682] [ndv4:68754:0] ucp_context.c:1439 UCX DEBUG allocation method[6] is 'heap' | |
[1650638548.093691] [ndv4:68754:0] module.c:253 UCX DEBUG loading modules for uct | |
[1650638548.095197] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.095241] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.096194] [ndv4:68766:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib6 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.096907] [ndv4:68766:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib6: disable ODP because it's not supported for DevX QP | |
[1650638548.099983] [ndv4:68754:0] module.c:253 UCX DEBUG loading modules for uct_ib | |
[1650638548.100380] [ndv4:68754:0] ucp_context.c:1117 UCX DEBUG closing md posix because it has no selected transport resources | |
[1650638548.100402] [ndv4:68754:0] ucp_context.c:1117 UCX DEBUG closing md self because it has no selected transport resources | |
[1650638548.100756] [ndv4:68754:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib4) failed: Cannot assign requested address | |
[1650638548.100769] [ndv4:68754:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib3) failed: Cannot assign requested address | |
[1650638548.100780] [ndv4:68754:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib2) failed: Cannot assign requested address | |
[1650638548.100790] [ndv4:68754:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib7) failed: Cannot assign requested address | |
[1650638548.100802] [ndv4:68754:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib6) failed: Cannot assign requested address | |
[1650638548.100820] [ndv4:68754:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib5) failed: Cannot assign requested address | |
[1650638548.100832] [ndv4:68754:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib1) failed: Cannot assign requested address | |
[1650638548.100886] [ndv4:68754:0] ucp_context.c:1117 UCX DEBUG closing md tcp because it has no selected transport resources | |
[1650638548.101317] [ndv4:68754:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: Success, continuing, but fork may be unsafe. | |
[1650638548.108171] [ndv4:68754:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib0 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.108560] [ndv4:68754:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib0: disable ODP because it's not supported for DevX QP | |
[1650638548.115971] [ndv4:68754:0] async.c:228 UCX DEBUG added async handler 0x212e660 [id=54 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.116073] [ndv4:68754:0] async.c:506 UCX DEBUG listening to async event fd 54 events 0x1 mode thread_spinlock | |
[1650638548.116508] [ndv4:68754:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib0' (InfiniBand channel adapter) with 1 ports | |
[1650638548.116905] [ndv4:68754:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.116916] [ndv4:68754:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.116942] [ndv4:68754:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.117003] [ndv4:68754:0] module.c:253 UCX DEBUG loading modules for ucm | |
[1650638548.117025] [ndv4:68754:0] ib_md.c:1319 UCX DEBUG mlx5_ib0: using registration cache | |
[1650638548.117992] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.118004] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.120749] [ndv4:68766:0] async.c:228 UCX DEBUG added async handler 0x1bfff60 [id=69 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.120776] [ndv4:68766:0] async.c:506 UCX DEBUG listening to async event fd 69 events 0x1 mode thread_spinlock | |
[1650638548.120782] [ndv4:68766:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib6' (InfiniBand channel adapter) with 1 ports | |
[1650638548.121110] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638548.121119] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638548.121134] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.121197] [ndv4:68766:0] ib_md.c:1319 UCX DEBUG mlx5_ib6: using registration cache | |
[1650638548.121003] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.121011] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.121648] [ndv4:68755:0] topo.c:99 UCX DEBUG bus id 0x103000000 doesn't exist. sys_dev = 2 | |
[1650638548.121660] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.123616] [ndv4:68754:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.123633] [ndv4:68754:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.123900] [ndv4:68754:0] ib_md.c:1604 UCX DEBUG mlx5_ib0: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.124030] [ndv4:68754:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.124036] [ndv4:68754:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.132599] [ndv4:68766:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.132615] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.132916] [ndv4:68766:0] ib_md.c:1604 UCX DEBUG mlx5_ib6: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.133034] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638548.133041] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638548.133632] [ndv4:68766:0] topo.c:99 UCX DEBUG bus id 0x107000000 doesn't exist. sys_dev = 6 | |
[1650638548.133637] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638548.141382] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.141393] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.141717] [ndv4:69236:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.142412] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.142420] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.142794] [ndv4:68754:0] topo.c:99 UCX DEBUG bus id 0x101000000 doesn't exist. sys_dev = 0 | |
[1650638548.142815] [ndv4:68754:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.147109] [ndv4:68754:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.147116] [ndv4:68754:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.148148] [ndv4:68754:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.148157] [ndv4:68754:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.152478] [ndv4:70439:0] debug.c:1198 UCX DEBUG using signal stack 0x2ab4a4343000 size 141824 | |
[1650638548.152558] [ndv4:70439:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638548.152579] [ndv4:70439:0] init.c:113 UCX DEBUG /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 loaded at 0x2ab4a4197000 | |
[1650638548.152603] [ndv4:70439:0] init.c:115 UCX DEBUG cmd line: osu_scatter | |
[1650638548.152613] [ndv4:70439:0] module.c:69 UCX DEBUG ucs library path: /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 | |
[1650638548.152620] [ndv4:70439:0] module.c:253 UCX DEBUG loading modules for ucs | |
[1650638548.154229] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638548.154239] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638548.154493] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638548.154497] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638548.154891] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638548.154894] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638548.155170] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638548.155173] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638548.155696] [ndv4:68766:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.155512] [ndv4:70439:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638548.155532] [ndv4:70439:0] time.c:22 UCX DEBUG measured arch clock speed: 1000000.00 Hz | |
[1650638548.155570] [ndv4:70439:0] ucp_context.c:1361 UCX DEBUG estimated number of endpoints is 96 | |
[1650638548.155574] [ndv4:70439:0] ucp_context.c:1368 UCX DEBUG estimated number of endpoints per node is 96 | |
[1650638548.155581] [ndv4:70439:0] ucp_context.c:1375 UCX DEBUG estimated bcopy bandwidth is 5251268608.000000 | |
[1650638548.155588] [ndv4:70439:0] ucp_context.c:1427 UCX DEBUG allocation method[0] is md 'sysv' | |
[1650638548.155591] [ndv4:70439:0] ucp_context.c:1427 UCX DEBUG allocation method[1] is md 'posix' | |
[1650638548.155596] [ndv4:70439:0] ucp_context.c:1439 UCX DEBUG allocation method[2] is 'huge' | |
[1650638548.155599] [ndv4:70439:0] ucp_context.c:1439 UCX DEBUG allocation method[3] is 'thp' | |
[1650638548.155601] [ndv4:70439:0] ucp_context.c:1427 UCX DEBUG allocation method[4] is md '*' | |
[1650638548.155604] [ndv4:70439:0] ucp_context.c:1439 UCX DEBUG allocation method[5] is 'mmap' | |
[1650638548.155606] [ndv4:70439:0] ucp_context.c:1439 UCX DEBUG allocation method[6] is 'heap' | |
[1650638548.155614] [ndv4:70439:0] module.c:253 UCX DEBUG loading modules for uct | |
[1650638548.163347] [ndv4:70439:0] module.c:253 UCX DEBUG loading modules for uct_ib | |
[1650638548.164075] [ndv4:70439:0] ucp_context.c:1117 UCX DEBUG closing md posix because it has no selected transport resources | |
[1650638548.164106] [ndv4:70439:0] ucp_context.c:1117 UCX DEBUG closing md self because it has no selected transport resources | |
[1650638548.164245] [ndv4:70439:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib4) failed: Cannot assign requested address | |
[1650638548.164257] [ndv4:70439:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib3) failed: Cannot assign requested address | |
[1650638548.164268] [ndv4:70439:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib2) failed: Cannot assign requested address | |
[1650638548.164278] [ndv4:70439:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib7) failed: Cannot assign requested address | |
[1650638548.164289] [ndv4:70439:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib6) failed: Cannot assign requested address | |
[1650638548.164300] [ndv4:70439:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib5) failed: Cannot assign requested address | |
[1650638548.164311] [ndv4:70439:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib1) failed: Cannot assign requested address | |
[1650638548.164346] [ndv4:70439:0] ucp_context.c:1117 UCX DEBUG closing md tcp because it has no selected transport resources | |
[1650638548.164730] [ndv4:70439:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: Success, continuing, but fork may be unsafe. | |
[1650638548.164388] [ndv4:68766:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib7 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.164764] [ndv4:68766:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib7: disable ODP because it's not supported for DevX QP | |
[1650638548.164700] [ndv4:70055:0] debug.c:1198 UCX DEBUG using signal stack 0x2b312a516000 size 141824 | |
[1650638548.164779] [ndv4:70055:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638548.164799] [ndv4:70055:0] init.c:113 UCX DEBUG /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 loaded at 0x2b312a36a000 | |
[1650638548.164821] [ndv4:70055:0] init.c:115 UCX DEBUG cmd line: osu_scatter | |
[1650638548.164830] [ndv4:70055:0] module.c:69 UCX DEBUG ucs library path: /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 | |
[1650638548.164836] [ndv4:70055:0] module.c:253 UCX DEBUG loading modules for ucs | |
[1650638548.165076] [ndv4:68766:0] async.c:228 UCX DEBUG added async handler 0x1c0ff90 [id=71 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.165113] [ndv4:68766:0] async.c:506 UCX DEBUG listening to async event fd 71 events 0x1 mode thread_spinlock | |
[1650638548.165116] [ndv4:68766:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib7' (InfiniBand channel adapter) with 1 ports | |
[1650638548.165347] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638548.165354] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638548.165369] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.165428] [ndv4:68766:0] ib_md.c:1319 UCX DEBUG mlx5_ib7: using registration cache | |
[1650638548.167404] [ndv4:70055:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638548.167422] [ndv4:70055:0] time.c:22 UCX DEBUG measured arch clock speed: 1000000.00 Hz | |
[1650638548.167461] [ndv4:70055:0] ucp_context.c:1361 UCX DEBUG estimated number of endpoints is 96 | |
[1650638548.167465] [ndv4:70055:0] ucp_context.c:1368 UCX DEBUG estimated number of endpoints per node is 96 | |
[1650638548.167472] [ndv4:70055:0] ucp_context.c:1375 UCX DEBUG estimated bcopy bandwidth is 5251268608.000000 | |
[1650638548.167479] [ndv4:70055:0] ucp_context.c:1427 UCX DEBUG allocation method[0] is md 'sysv' | |
[1650638548.167481] [ndv4:70055:0] ucp_context.c:1427 UCX DEBUG allocation method[1] is md 'posix' | |
[1650638548.167485] [ndv4:70055:0] ucp_context.c:1439 UCX DEBUG allocation method[2] is 'huge' | |
[1650638548.167488] [ndv4:70055:0] ucp_context.c:1439 UCX DEBUG allocation method[3] is 'thp' | |
[1650638548.167491] [ndv4:70055:0] ucp_context.c:1427 UCX DEBUG allocation method[4] is md '*' | |
[1650638548.167493] [ndv4:70055:0] ucp_context.c:1439 UCX DEBUG allocation method[5] is 'mmap' | |
[1650638548.167495] [ndv4:70055:0] ucp_context.c:1439 UCX DEBUG allocation method[6] is 'heap' | |
[1650638548.167503] [ndv4:70055:0] module.c:253 UCX DEBUG loading modules for uct | |
[1650638548.170664] [ndv4:68766:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.170677] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.171501] [ndv4:68754:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.171511] [ndv4:68754:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.170956] [ndv4:68766:0] ib_md.c:1604 UCX DEBUG mlx5_ib7: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.170980] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638548.170985] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638548.173146] [ndv4:70055:0] module.c:253 UCX DEBUG loading modules for uct_ib | |
[1650638548.173508] [ndv4:70055:0] ucp_context.c:1117 UCX DEBUG closing md posix because it has no selected transport resources | |
[1650638548.173529] [ndv4:70055:0] ucp_context.c:1117 UCX DEBUG closing md self because it has no selected transport resources | |
[1650638548.173885] [ndv4:70055:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib4) failed: Cannot assign requested address | |
[1650638548.173898] [ndv4:70055:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib3) failed: Cannot assign requested address | |
[1650638548.173909] [ndv4:70055:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib2) failed: Cannot assign requested address | |
[1650638548.173919] [ndv4:70055:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib7) failed: Cannot assign requested address | |
[1650638548.173930] [ndv4:70055:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib6) failed: Cannot assign requested address | |
[1650638548.173940] [ndv4:70055:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib5) failed: Cannot assign requested address | |
[1650638548.173951] [ndv4:70055:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib1) failed: Cannot assign requested address | |
[1650638548.173990] [ndv4:70055:0] ucp_context.c:1117 UCX DEBUG closing md tcp because it has no selected transport resources | |
[1650638548.172685] [ndv4:68766:0] topo.c:99 UCX DEBUG bus id 0x108000000 doesn't exist. sys_dev = 7 | |
[1650638548.172691] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638548.173595] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x108000000 exists. sys_dev = 7 | |
[1650638548.173601] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638548.174417] [ndv4:70055:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: Success, continuing, but fork may be unsafe. | |
[1650638548.177183] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.177198] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.177855] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.177860] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.180630] [ndv4:70055:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib0 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.180985] [ndv4:70055:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib0: disable ODP because it's not supported for DevX QP | |
[1650638548.184785] [ndv4:68754:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.184794] [ndv4:68754:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.185055] [ndv4:68754:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.185019] [ndv4:70439:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib0 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.185436] [ndv4:70439:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib0: disable ODP because it's not supported for DevX QP | |
[1650638548.187920] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x108000000 exists. sys_dev = 7 | |
[1650638548.187930] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638548.188377] [ndv4:70055:0] async.c:228 UCX DEBUG added async handler 0x2358640 [id=54 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.188491] [ndv4:70055:0] async.c:506 UCX DEBUG listening to async event fd 54 events 0x1 mode thread_spinlock | |
[1650638548.188510] [ndv4:70055:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib0' (InfiniBand channel adapter) with 1 ports | |
[1650638548.189973] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x108000000 exists. sys_dev = 7 | |
[1650638548.189979] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638548.189284] [ndv4:70055:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.189294] [ndv4:70055:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.189318] [ndv4:70055:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.189379] [ndv4:70055:0] module.c:253 UCX DEBUG loading modules for ucm | |
[1650638548.189399] [ndv4:70055:0] ib_md.c:1319 UCX DEBUG mlx5_ib0: using registration cache | |
[1650638548.189653] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.189664] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.189981] [ndv4:68755:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.190885] [ndv4:68766:0] topo.c:91 UCX DEBUG bus id 0x108000000 exists. sys_dev = 7 | |
[1650638548.190891] [ndv4:68766:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638548.191007] [ndv4:68766:0] ucp_context.c:1556 UCX DEBUG created ucp context 0x1bfb130 0x1bfb130 [10 mds 42 tls] features 0x1 tl bitmap 0x3ffffffffff 0x0 | |
[1650638548.195105] [ndv4:69236:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib4 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.195747] [ndv4:69236:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib4: disable ODP because it's not supported for DevX QP | |
[1650638548.196446] [ndv4:70055:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.196463] [ndv4:70055:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.196750] [ndv4:70055:0] ib_md.c:1604 UCX DEBUG mlx5_ib0: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.196867] [ndv4:70055:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.196874] [ndv4:70055:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.196603] [ndv4:69236:0] async.c:228 UCX DEBUG added async handler 0x1b89610 [id=65 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.196653] [ndv4:69236:0] async.c:506 UCX DEBUG listening to async event fd 65 events 0x1 mode thread_spinlock | |
[1650638548.196658] [ndv4:69236:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib4' (InfiniBand channel adapter) with 1 ports | |
[1650638548.197135] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638548.197146] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638548.197165] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.197293] [ndv4:69236:0] ib_md.c:1319 UCX DEBUG mlx5_ib4: using registration cache | |
[1650638548.199998] [ndv4:70055:0] topo.c:99 UCX DEBUG bus id 0x101000000 doesn't exist. sys_dev = 0 | |
[1650638548.200008] [ndv4:70055:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.203041] [ndv4:68754:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib1 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.203547] [ndv4:68754:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib1: disable ODP because it's not supported for DevX QP | |
[1650638548.205602] [ndv4:68754:0] async.c:228 UCX DEBUG added async handler 0x21020b0 [id=59 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.205643] [ndv4:68754:0] async.c:506 UCX DEBUG listening to async event fd 59 events 0x1 mode thread_spinlock | |
[1650638548.205647] [ndv4:68754:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib1' (InfiniBand channel adapter) with 1 ports | |
[1650638548.206072] [ndv4:68754:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.206083] [ndv4:68754:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.206099] [ndv4:68754:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.206372] [ndv4:68754:0] ib_md.c:1319 UCX DEBUG mlx5_ib1: using registration cache | |
[1650638548.206900] [ndv4:70439:0] async.c:228 UCX DEBUG added async handler 0x1c41340 [id=54 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.207012] [ndv4:70439:0] async.c:506 UCX DEBUG listening to async event fd 54 events 0x1 mode thread_spinlock | |
[1650638548.207038] [ndv4:70439:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib0' (InfiniBand channel adapter) with 1 ports | |
[1650638548.208200] [ndv4:68755:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib3 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.208559] [ndv4:68755:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib3: disable ODP because it's not supported for DevX QP | |
[1650638548.210587] [ndv4:68755:0] async.c:228 UCX DEBUG added async handler 0xcaa590 [id=63 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.210627] [ndv4:68755:0] async.c:506 UCX DEBUG listening to async event fd 63 events 0x1 mode thread_spinlock | |
[1650638548.210633] [ndv4:68755:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib3' (InfiniBand channel adapter) with 1 ports | |
[1650638548.211349] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638548.211360] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638548.211375] [ndv4:68755:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.211462] [ndv4:68755:0] ib_md.c:1319 UCX DEBUG mlx5_ib3: using registration cache | |
[1650638548.207387] [ndv4:70439:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.207397] [ndv4:70439:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.207425] [ndv4:70439:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.207495] [ndv4:70439:0] module.c:253 UCX DEBUG loading modules for ucm | |
[1650638548.207517] [ndv4:70439:0] ib_md.c:1319 UCX DEBUG mlx5_ib0: using registration cache | |
[1650638548.213144] [ndv4:70439:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.213163] [ndv4:70439:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.213501] [ndv4:70439:0] ib_md.c:1604 UCX DEBUG mlx5_ib0: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.213703] [ndv4:70439:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.213710] [ndv4:70439:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.213171] [ndv4:69236:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.213241] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.213607] [ndv4:69236:0] ib_md.c:1604 UCX DEBUG mlx5_ib4: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.213836] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638548.213844] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638548.209187] [ndv4:69075:0] debug.c:1198 UCX DEBUG using signal stack 0x2b4f90d00000 size 141824 | |
[1650638548.209297] [ndv4:69075:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638548.209318] [ndv4:69075:0] init.c:113 UCX DEBUG /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 loaded at 0x2b4f90b54000 | |
[1650638548.209342] [ndv4:69075:0] init.c:115 UCX DEBUG cmd line: osu_scatter | |
[1650638548.209351] [ndv4:69075:0] module.c:69 UCX DEBUG ucs library path: /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 | |
[1650638548.209357] [ndv4:69075:0] module.c:253 UCX DEBUG loading modules for ucs | |
[1650638548.211906] [ndv4:69075:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638548.211926] [ndv4:69075:0] time.c:22 UCX DEBUG measured arch clock speed: 1000000.00 Hz | |
[1650638548.211958] [ndv4:69075:0] ucp_context.c:1361 UCX DEBUG estimated number of endpoints is 96 | |
[1650638548.211961] [ndv4:69075:0] ucp_context.c:1368 UCX DEBUG estimated number of endpoints per node is 96 | |
[1650638548.211968] [ndv4:69075:0] ucp_context.c:1375 UCX DEBUG estimated bcopy bandwidth is 5251268608.000000 | |
[1650638548.211975] [ndv4:69075:0] ucp_context.c:1427 UCX DEBUG allocation method[0] is md 'sysv' | |
[1650638548.211978] [ndv4:69075:0] ucp_context.c:1427 UCX DEBUG allocation method[1] is md 'posix' | |
[1650638548.211982] [ndv4:69075:0] ucp_context.c:1439 UCX DEBUG allocation method[2] is 'huge' | |
[1650638548.211985] [ndv4:69075:0] ucp_context.c:1439 UCX DEBUG allocation method[3] is 'thp' | |
[1650638548.211987] [ndv4:69075:0] ucp_context.c:1427 UCX DEBUG allocation method[4] is md '*' | |
[1650638548.211989] [ndv4:69075:0] ucp_context.c:1439 UCX DEBUG allocation method[5] is 'mmap' | |
[1650638548.211991] [ndv4:69075:0] ucp_context.c:1439 UCX DEBUG allocation method[6] is 'heap' | |
[1650638548.212000] [ndv4:69075:0] module.c:253 UCX DEBUG loading modules for uct | |
[1650638548.215227] [ndv4:70439:0] topo.c:99 UCX DEBUG bus id 0x101000000 doesn't exist. sys_dev = 0 | |
[1650638548.215236] [ndv4:70439:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.215290] [ndv4:69236:0] topo.c:99 UCX DEBUG bus id 0x105000000 doesn't exist. sys_dev = 4 | |
[1650638548.215297] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638548.214600] [ndv4:68755:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.214613] [ndv4:68755:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.215602] [ndv4:68755:0] ib_md.c:1604 UCX DEBUG mlx5_ib3: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.215638] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638548.215645] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638548.215809] [ndv4:68755:0] topo.c:99 UCX DEBUG bus id 0x104000000 doesn't exist. sys_dev = 3 | |
[1650638548.215814] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.215947] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.215951] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.216094] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.216097] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.215517] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638548.215525] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638548.215883] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638548.215890] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638548.216070] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638548.216074] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638548.216275] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638548.216279] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638548.216589] [ndv4:69236:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.216302] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.216308] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.216371] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.216374] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.216703] [ndv4:68755:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.218040] [ndv4:69075:0] module.c:253 UCX DEBUG loading modules for uct_ib | |
[1650638548.218410] [ndv4:69075:0] ucp_context.c:1117 UCX DEBUG closing md posix because it has no selected transport resources | |
[1650638548.218431] [ndv4:69075:0] ucp_context.c:1117 UCX DEBUG closing md self because it has no selected transport resources | |
[1650638548.218580] [ndv4:69075:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib4) failed: Cannot assign requested address | |
[1650638548.218593] [ndv4:69075:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib3) failed: Cannot assign requested address | |
[1650638548.218604] [ndv4:69075:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib2) failed: Cannot assign requested address | |
[1650638548.218615] [ndv4:69075:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib7) failed: Cannot assign requested address | |
[1650638548.218625] [ndv4:69075:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib6) failed: Cannot assign requested address | |
[1650638548.218636] [ndv4:69075:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib5) failed: Cannot assign requested address | |
[1650638548.218646] [ndv4:69075:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib1) failed: Cannot assign requested address | |
[1650638548.218680] [ndv4:69075:0] ucp_context.c:1117 UCX DEBUG closing md tcp because it has no selected transport resources | |
[1650638548.219036] [ndv4:69075:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: Success, continuing, but fork may be unsafe. | |
[1650638548.223150] [ndv4:68755:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib4 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.223543] [ndv4:68755:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib4: disable ODP because it's not supported for DevX QP | |
[1650638548.223802] [ndv4:68755:0] async.c:228 UCX DEBUG added async handler 0x1395cf0 [id=65 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.223829] [ndv4:68755:0] async.c:506 UCX DEBUG listening to async event fd 65 events 0x1 mode thread_spinlock | |
[1650638548.223834] [ndv4:68755:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib4' (InfiniBand channel adapter) with 1 ports | |
[1650638548.223985] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638548.223995] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638548.224002] [ndv4:68755:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.224053] [ndv4:68755:0] ib_md.c:1319 UCX DEBUG mlx5_ib4: using registration cache | |
[1650638548.227294] [ndv4:69236:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib5 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.227700] [ndv4:69236:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib5: disable ODP because it's not supported for DevX QP | |
[1650638548.227996] [ndv4:69236:0] async.c:228 UCX DEBUG added async handler 0x14a3420 [id=67 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.228032] [ndv4:69236:0] async.c:506 UCX DEBUG listening to async event fd 67 events 0x1 mode thread_spinlock | |
[1650638548.228035] [ndv4:69236:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib5' (InfiniBand channel adapter) with 1 ports | |
[1650638548.228292] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638548.228303] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638548.228318] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.228416] [ndv4:69236:0] ib_md.c:1319 UCX DEBUG mlx5_ib5: using registration cache | |
[1650638548.238127] [ndv4:68764:0] debug.c:1198 UCX DEBUG using signal stack 0x2ab717dec000 size 141824 | |
[1650638548.238276] [ndv4:68764:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638548.238299] [ndv4:68764:0] init.c:113 UCX DEBUG /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 loaded at 0x2ab717c40000 | |
[1650638548.238320] [ndv4:68764:0] init.c:115 UCX DEBUG cmd line: osu_scatter | |
[1650638548.238330] [ndv4:68764:0] module.c:69 UCX DEBUG ucs library path: /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 | |
[1650638548.238336] [ndv4:68764:0] module.c:253 UCX DEBUG loading modules for ucs | |
[1650638548.241097] [ndv4:68764:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638548.241116] [ndv4:68764:0] time.c:22 UCX DEBUG measured arch clock speed: 1000000.00 Hz | |
[1650638548.241150] [ndv4:68764:0] ucp_context.c:1361 UCX DEBUG estimated number of endpoints is 96 | |
[1650638548.241154] [ndv4:68764:0] ucp_context.c:1368 UCX DEBUG estimated number of endpoints per node is 96 | |
[1650638548.241162] [ndv4:68764:0] ucp_context.c:1375 UCX DEBUG estimated bcopy bandwidth is 5251268608.000000 | |
[1650638548.241170] [ndv4:68764:0] ucp_context.c:1427 UCX DEBUG allocation method[0] is md 'sysv' | |
[1650638548.241173] [ndv4:68764:0] ucp_context.c:1427 UCX DEBUG allocation method[1] is md 'posix' | |
[1650638548.241177] [ndv4:68764:0] ucp_context.c:1439 UCX DEBUG allocation method[2] is 'huge' | |
[1650638548.241180] [ndv4:68764:0] ucp_context.c:1439 UCX DEBUG allocation method[3] is 'thp' | |
[1650638548.241182] [ndv4:68764:0] ucp_context.c:1427 UCX DEBUG allocation method[4] is md '*' | |
[1650638548.241185] [ndv4:68764:0] ucp_context.c:1439 UCX DEBUG allocation method[5] is 'mmap' | |
[1650638548.241187] [ndv4:68764:0] ucp_context.c:1439 UCX DEBUG allocation method[6] is 'heap' | |
[1650638548.241196] [ndv4:68764:0] module.c:253 UCX DEBUG loading modules for uct | |
[1650638548.247758] [ndv4:68764:0] module.c:253 UCX DEBUG loading modules for uct_ib | |
[1650638548.248102] [ndv4:68764:0] ucp_context.c:1117 UCX DEBUG closing md posix because it has no selected transport resources | |
[1650638548.248124] [ndv4:68764:0] ucp_context.c:1117 UCX DEBUG closing md self because it has no selected transport resources | |
[1650638548.248280] [ndv4:68764:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib4) failed: Cannot assign requested address | |
[1650638548.248293] [ndv4:68764:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib3) failed: Cannot assign requested address | |
[1650638548.248304] [ndv4:68764:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib2) failed: Cannot assign requested address | |
[1650638548.248316] [ndv4:68764:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib7) failed: Cannot assign requested address | |
[1650638548.248327] [ndv4:68764:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib6) failed: Cannot assign requested address | |
[1650638548.248339] [ndv4:68764:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib5) failed: Cannot assign requested address | |
[1650638548.248351] [ndv4:68764:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib1) failed: Cannot assign requested address | |
[1650638548.248389] [ndv4:68764:0] ucp_context.c:1117 UCX DEBUG closing md tcp because it has no selected transport resources | |
[1650638548.248776] [ndv4:68764:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: Success, continuing, but fork may be unsafe. | |
[1650638548.238960] [ndv4:70462:0] debug.c:1198 UCX DEBUG using signal stack 0x2b87e10ae000 size 141824 | |
[1650638548.239040] [ndv4:70462:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638548.239061] [ndv4:70462:0] init.c:113 UCX DEBUG /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 loaded at 0x2b87e0f1c000 | |
[1650638548.239079] [ndv4:70462:0] init.c:115 UCX DEBUG cmd line: osu_scatter | |
[1650638548.239088] [ndv4:70462:0] module.c:69 UCX DEBUG ucs library path: /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 | |
[1650638548.239094] [ndv4:70462:0] module.c:253 UCX DEBUG loading modules for ucs | |
[1650638548.241846] [ndv4:70462:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638548.241864] [ndv4:70462:0] time.c:22 UCX DEBUG measured arch clock speed: 1000000.00 Hz | |
[1650638548.241899] [ndv4:70462:0] ucp_context.c:1361 UCX DEBUG estimated number of endpoints is 96 | |
[1650638548.241902] [ndv4:70462:0] ucp_context.c:1368 UCX DEBUG estimated number of endpoints per node is 96 | |
[1650638548.241909] [ndv4:70462:0] ucp_context.c:1375 UCX DEBUG estimated bcopy bandwidth is 5251268608.000000 | |
[1650638548.241916] [ndv4:70462:0] ucp_context.c:1427 UCX DEBUG allocation method[0] is md 'sysv' | |
[1650638548.241918] [ndv4:70462:0] ucp_context.c:1427 UCX DEBUG allocation method[1] is md 'posix' | |
[1650638548.241923] [ndv4:70462:0] ucp_context.c:1439 UCX DEBUG allocation method[2] is 'huge' | |
[1650638548.241925] [ndv4:70462:0] ucp_context.c:1439 UCX DEBUG allocation method[3] is 'thp' | |
[1650638548.241927] [ndv4:70462:0] ucp_context.c:1427 UCX DEBUG allocation method[4] is md '*' | |
[1650638548.241930] [ndv4:70462:0] ucp_context.c:1439 UCX DEBUG allocation method[5] is 'mmap' | |
[1650638548.241932] [ndv4:70462:0] ucp_context.c:1439 UCX DEBUG allocation method[6] is 'heap' | |
[1650638548.241940] [ndv4:70462:0] module.c:253 UCX DEBUG loading modules for uct | |
[1650638548.247768] [ndv4:70462:0] module.c:253 UCX DEBUG loading modules for uct_ib | |
[1650638548.248120] [ndv4:70462:0] ucp_context.c:1117 UCX DEBUG closing md posix because it has no selected transport resources | |
[1650638548.248141] [ndv4:70462:0] ucp_context.c:1117 UCX DEBUG closing md self because it has no selected transport resources | |
[1650638548.248293] [ndv4:70462:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib4) failed: Cannot assign requested address | |
[1650638548.248306] [ndv4:70462:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib3) failed: Cannot assign requested address | |
[1650638548.248317] [ndv4:70462:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib2) failed: Cannot assign requested address | |
[1650638548.248328] [ndv4:70462:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib7) failed: Cannot assign requested address | |
[1650638548.248340] [ndv4:70462:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib6) failed: Cannot assign requested address | |
[1650638548.248352] [ndv4:70462:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib5) failed: Cannot assign requested address | |
[1650638548.248365] [ndv4:70462:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib1) failed: Cannot assign requested address | |
[1650638548.248400] [ndv4:70462:0] ucp_context.c:1117 UCX DEBUG closing md tcp because it has no selected transport resources | |
[1650638548.248798] [ndv4:70462:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: Success, continuing, but fork may be unsafe. | |
[1650638548.255416] [ndv4:68764:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib0 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.255405] [ndv4:70462:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib0 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.255827] [ndv4:68764:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib0: disable ODP because it's not supported for DevX QP | |
[1650638548.255816] [ndv4:70462:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib0: disable ODP because it's not supported for DevX QP | |
[1650638548.263713] [ndv4:68764:0] async.c:228 UCX DEBUG added async handler 0x25402c0 [id=54 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.263808] [ndv4:68764:0] async.c:506 UCX DEBUG listening to async event fd 54 events 0x1 mode thread_spinlock | |
[1650638548.263621] [ndv4:70462:0] async.c:228 UCX DEBUG added async handler 0xa96680 [id=54 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.263720] [ndv4:70462:0] async.c:506 UCX DEBUG listening to async event fd 54 events 0x1 mode thread_spinlock | |
[1650638548.264139] [ndv4:68764:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib0' (InfiniBand channel adapter) with 1 ports | |
[1650638548.264120] [ndv4:70462:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib0' (InfiniBand channel adapter) with 1 ports | |
[1650638548.264351] [ndv4:68764:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.264361] [ndv4:68764:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.264384] [ndv4:68764:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.264447] [ndv4:68764:0] module.c:253 UCX DEBUG loading modules for ucm | |
[1650638548.264468] [ndv4:68764:0] ib_md.c:1319 UCX DEBUG mlx5_ib0: using registration cache | |
[1650638548.264343] [ndv4:70462:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.264352] [ndv4:70462:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.264377] [ndv4:70462:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.264442] [ndv4:70462:0] module.c:253 UCX DEBUG loading modules for ucm | |
[1650638548.264464] [ndv4:70462:0] ib_md.c:1319 UCX DEBUG mlx5_ib0: using registration cache | |
[1650638548.269595] [ndv4:69236:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.269624] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.269967] [ndv4:69236:0] ib_md.c:1604 UCX DEBUG mlx5_ib5: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.270446] [ndv4:70462:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.271175] [ndv4:70462:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.271251] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638548.271264] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638548.271673] [ndv4:69236:0] topo.c:99 UCX DEBUG bus id 0x106000000 doesn't exist. sys_dev = 5 | |
[1650638548.271682] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638548.271787] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638548.271792] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638548.271472] [ndv4:70462:0] ib_md.c:1604 UCX DEBUG mlx5_ib0: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.271491] [ndv4:70462:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.271497] [ndv4:70462:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.271522] [ndv4:68764:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.271546] [ndv4:68764:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.271817] [ndv4:68764:0] ib_md.c:1604 UCX DEBUG mlx5_ib0: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.272069] [ndv4:68764:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.272078] [ndv4:68764:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.272096] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638548.272104] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638548.272278] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638548.272283] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638548.272354] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638548.272358] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638548.274448] [ndv4:68764:0] topo.c:99 UCX DEBUG bus id 0x101000000 doesn't exist. sys_dev = 0 | |
[1650638548.274461] [ndv4:68764:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.274436] [ndv4:70462:0] topo.c:99 UCX DEBUG bus id 0x101000000 doesn't exist. sys_dev = 0 | |
[1650638548.274451] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.274712] [ndv4:70462:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.274716] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.274826] [ndv4:70462:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.274830] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.274888] [ndv4:70462:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.274891] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.274956] [ndv4:70462:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.274960] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.275198] [ndv4:70462:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.274784] [ndv4:69236:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.278851] [ndv4:70038:0] debug.c:1198 UCX DEBUG using signal stack 0x2aec0006d000 size 141824 | |
[1650638548.278929] [ndv4:70038:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638548.278948] [ndv4:70038:0] init.c:113 UCX DEBUG /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 loaded at 0x2aebffec1000 | |
[1650638548.278968] [ndv4:70038:0] init.c:115 UCX DEBUG cmd line: osu_scatter | |
[1650638548.278978] [ndv4:70038:0] module.c:69 UCX DEBUG ucs library path: /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 | |
[1650638548.278984] [ndv4:70038:0] module.c:253 UCX DEBUG loading modules for ucs | |
[1650638548.281574] [ndv4:70038:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638548.281593] [ndv4:70038:0] time.c:22 UCX DEBUG measured arch clock speed: 1000000.00 Hz | |
[1650638548.281626] [ndv4:70038:0] ucp_context.c:1361 UCX DEBUG estimated number of endpoints is 96 | |
[1650638548.281629] [ndv4:70038:0] ucp_context.c:1368 UCX DEBUG estimated number of endpoints per node is 96 | |
[1650638548.281636] [ndv4:70038:0] ucp_context.c:1375 UCX DEBUG estimated bcopy bandwidth is 5251268608.000000 | |
[1650638548.281642] [ndv4:70038:0] ucp_context.c:1427 UCX DEBUG allocation method[0] is md 'sysv' | |
[1650638548.281645] [ndv4:70038:0] ucp_context.c:1427 UCX DEBUG allocation method[1] is md 'posix' | |
[1650638548.281649] [ndv4:70038:0] ucp_context.c:1439 UCX DEBUG allocation method[2] is 'huge' | |
[1650638548.281651] [ndv4:70038:0] ucp_context.c:1439 UCX DEBUG allocation method[3] is 'thp' | |
[1650638548.281654] [ndv4:70038:0] ucp_context.c:1427 UCX DEBUG allocation method[4] is md '*' | |
[1650638548.281656] [ndv4:70038:0] ucp_context.c:1439 UCX DEBUG allocation method[5] is 'mmap' | |
[1650638548.281658] [ndv4:70038:0] ucp_context.c:1439 UCX DEBUG allocation method[6] is 'heap' | |
[1650638548.281665] [ndv4:70038:0] module.c:253 UCX DEBUG loading modules for uct | |
[1650638548.282504] [ndv4:69236:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib6 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.282492] [ndv4:70462:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib1 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.282878] [ndv4:70462:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib1: disable ODP because it's not supported for DevX QP | |
[1650638548.282910] [ndv4:69236:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib6: disable ODP because it's not supported for DevX QP | |
[1650638548.283202] [ndv4:69236:0] async.c:228 UCX DEBUG added async handler 0x1491f60 [id=69 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.283281] [ndv4:69236:0] async.c:506 UCX DEBUG listening to async event fd 69 events 0x1 mode thread_spinlock | |
[1650638548.283285] [ndv4:69236:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib6' (InfiniBand channel adapter) with 1 ports | |
[1650638548.283135] [ndv4:70462:0] async.c:228 UCX DEBUG added async handler 0xa6a080 [id=59 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.283161] [ndv4:70462:0] async.c:506 UCX DEBUG listening to async event fd 59 events 0x1 mode thread_spinlock | |
[1650638548.283164] [ndv4:70462:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib1' (InfiniBand channel adapter) with 1 ports | |
[1650638548.283391] [ndv4:70462:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.283399] [ndv4:70462:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.283409] [ndv4:70462:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.283464] [ndv4:70462:0] ib_md.c:1319 UCX DEBUG mlx5_ib1: using registration cache | |
[1650638548.287348] [ndv4:70038:0] module.c:253 UCX DEBUG loading modules for uct_ib | |
[1650638548.287680] [ndv4:70038:0] ucp_context.c:1117 UCX DEBUG closing md posix because it has no selected transport resources | |
[1650638548.287701] [ndv4:70038:0] ucp_context.c:1117 UCX DEBUG closing md self because it has no selected transport resources | |
[1650638548.287803] [ndv4:70038:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib4) failed: Cannot assign requested address | |
[1650638548.287814] [ndv4:70038:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib3) failed: Cannot assign requested address | |
[1650638548.287825] [ndv4:70038:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib2) failed: Cannot assign requested address | |
[1650638548.287835] [ndv4:70038:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib7) failed: Cannot assign requested address | |
[1650638548.287847] [ndv4:70038:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib6) failed: Cannot assign requested address | |
[1650638548.287857] [ndv4:70038:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib5) failed: Cannot assign requested address | |
[1650638548.287869] [ndv4:70038:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib1) failed: Cannot assign requested address | |
[1650638548.287903] [ndv4:70038:0] ucp_context.c:1117 UCX DEBUG closing md tcp because it has no selected transport resources | |
[1650638548.288319] [ndv4:70038:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: Success, continuing, but fork may be unsafe. | |
[1650638548.283470] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638548.283482] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638548.283502] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.283588] [ndv4:69236:0] ib_md.c:1319 UCX DEBUG mlx5_ib6: using registration cache | |
[1650638548.287756] [ndv4:69236:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.287764] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.287982] [ndv4:69236:0] ib_md.c:1604 UCX DEBUG mlx5_ib6: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.287997] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638548.288002] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638548.288331] [ndv4:69236:0] topo.c:99 UCX DEBUG bus id 0x107000000 doesn't exist. sys_dev = 6 | |
[1650638548.288340] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638548.288404] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638548.288408] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638548.288496] [ndv4:70462:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.288504] [ndv4:70462:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.288804] [ndv4:70462:0] ib_md.c:1604 UCX DEBUG mlx5_ib1: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.288819] [ndv4:70462:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.288824] [ndv4:70462:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.288915] [ndv4:70462:0] topo.c:99 UCX DEBUG bus id 0x102000000 doesn't exist. sys_dev = 1 | |
[1650638548.288918] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.288980] [ndv4:70462:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.288983] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.289041] [ndv4:70462:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.289044] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.289104] [ndv4:70462:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.289107] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.289166] [ndv4:70462:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.289168] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.289415] [ndv4:70462:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.288591] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638548.288599] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638548.288663] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638548.288667] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638548.288729] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638548.288732] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638548.288987] [ndv4:69236:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.294796] [ndv4:70038:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib0 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.295166] [ndv4:70038:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib0: disable ODP because it's not supported for DevX QP | |
[1650638548.297200] [ndv4:69236:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib7 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.297180] [ndv4:70462:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib2 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.297607] [ndv4:70462:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib2: disable ODP because it's not supported for DevX QP | |
[1650638548.297629] [ndv4:69236:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib7: disable ODP because it's not supported for DevX QP | |
[1650638548.297877] [ndv4:70462:0] async.c:228 UCX DEBUG added async handler 0xa8ed50 [id=61 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.297900] [ndv4:70462:0] async.c:506 UCX DEBUG listening to async event fd 61 events 0x1 mode thread_spinlock | |
[1650638548.297904] [ndv4:70462:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib2' (InfiniBand channel adapter) with 1 ports | |
[1650638548.297901] [ndv4:69236:0] async.c:228 UCX DEBUG added async handler 0x14a1f90 [id=71 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.297929] [ndv4:69236:0] async.c:506 UCX DEBUG listening to async event fd 71 events 0x1 mode thread_spinlock | |
[1650638548.297933] [ndv4:69236:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib7' (InfiniBand channel adapter) with 1 ports | |
[1650638548.298099] [ndv4:70462:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638548.298108] [ndv4:70462:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638548.298114] [ndv4:70462:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.298154] [ndv4:70462:0] ib_md.c:1319 UCX DEBUG mlx5_ib2: using registration cache | |
[1650638548.306951] [ndv4:70462:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.306958] [ndv4:70462:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.307179] [ndv4:70462:0] ib_md.c:1604 UCX DEBUG mlx5_ib2: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.307202] [ndv4:70462:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638548.307222] [ndv4:70462:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638548.307530] [ndv4:70462:0] topo.c:99 UCX DEBUG bus id 0x103000000 doesn't exist. sys_dev = 2 | |
[1650638548.307534] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.307685] [ndv4:70462:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.307689] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.307853] [ndv4:70462:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.307856] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.308007] [ndv4:70462:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.308009] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.308153] [ndv4:70462:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.308156] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.298121] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638548.298132] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638548.298139] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.298190] [ndv4:69236:0] ib_md.c:1319 UCX DEBUG mlx5_ib7: using registration cache | |
[1650638548.305467] [ndv4:69236:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.305475] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.305720] [ndv4:69236:0] ib_md.c:1604 UCX DEBUG mlx5_ib7: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.305736] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638548.305741] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638548.306172] [ndv4:69236:0] topo.c:99 UCX DEBUG bus id 0x108000000 doesn't exist. sys_dev = 7 | |
[1650638548.306179] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638548.306354] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x108000000 exists. sys_dev = 7 | |
[1650638548.306358] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638548.306613] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x108000000 exists. sys_dev = 7 | |
[1650638548.306619] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638548.306805] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x108000000 exists. sys_dev = 7 | |
[1650638548.306809] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638548.306969] [ndv4:69236:0] topo.c:91 UCX DEBUG bus id 0x108000000 exists. sys_dev = 7 | |
[1650638548.306973] [ndv4:69236:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638548.307098] [ndv4:69236:0] ucp_context.c:1556 UCX DEBUG created ucp context 0x148d130 0x148d130 [10 mds 42 tls] features 0x1 tl bitmap 0x3ffffffffff 0x0 | |
[1650638548.302544] [ndv4:70038:0] async.c:228 UCX DEBUG added async handler 0x14f52c0 [id=54 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.302633] [ndv4:70038:0] async.c:506 UCX DEBUG listening to async event fd 54 events 0x1 mode thread_spinlock | |
[1650638548.302725] [ndv4:70038:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib0' (InfiniBand channel adapter) with 1 ports | |
[1650638548.302912] [ndv4:70038:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.302919] [ndv4:70038:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.302944] [ndv4:70038:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.303003] [ndv4:70038:0] module.c:253 UCX DEBUG loading modules for ucm | |
[1650638548.303023] [ndv4:70038:0] ib_md.c:1319 UCX DEBUG mlx5_ib0: using registration cache | |
[1650638548.308447] [ndv4:70462:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.310293] [ndv4:70038:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.310306] [ndv4:70038:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.310539] [ndv4:70038:0] ib_md.c:1604 UCX DEBUG mlx5_ib0: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.310555] [ndv4:70038:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.310560] [ndv4:70038:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.310721] [ndv4:70038:0] topo.c:99 UCX DEBUG bus id 0x101000000 doesn't exist. sys_dev = 0 | |
[1650638548.310726] [ndv4:70038:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.310892] [ndv4:70038:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.310896] [ndv4:70038:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.311010] [ndv4:70038:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.311013] [ndv4:70038:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.311073] [ndv4:70038:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.311075] [ndv4:70038:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.311134] [ndv4:70038:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.311136] [ndv4:70038:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.311393] [ndv4:70038:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.311689] [ndv4:70055:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.311708] [ndv4:70055:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.312123] [ndv4:70055:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.312128] [ndv4:70055:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.312650] [ndv4:70055:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.312654] [ndv4:70055:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.312940] [ndv4:70055:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.312943] [ndv4:70055:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.313291] [ndv4:70055:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.318046] [ndv4:68754:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.318066] [ndv4:68754:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.318316] [ndv4:68754:0] ib_md.c:1604 UCX DEBUG mlx5_ib1: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.318504] [ndv4:68754:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.318511] [ndv4:68754:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.319104] [ndv4:69075:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib0 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.319575] [ndv4:69075:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib0: disable ODP because it's not supported for DevX QP | |
[1650638548.320136] [ndv4:68755:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.320158] [ndv4:68755:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.320724] [ndv4:68755:0] ib_md.c:1604 UCX DEBUG mlx5_ib4: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.320807] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638548.320815] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638548.321011] [ndv4:68754:0] topo.c:99 UCX DEBUG bus id 0x102000000 doesn't exist. sys_dev = 1 | |
[1650638548.321019] [ndv4:68754:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.322701] [ndv4:68764:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.322709] [ndv4:68764:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.325425] [ndv4:68764:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.325431] [ndv4:68764:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.326471] [ndv4:70439:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.326482] [ndv4:70439:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.327331] [ndv4:70055:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib1 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.327679] [ndv4:70055:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib1: disable ODP because it's not supported for DevX QP | |
[1650638548.328020] [ndv4:70055:0] async.c:228 UCX DEBUG added async handler 0x232bfd0 [id=59 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.328046] [ndv4:70055:0] async.c:506 UCX DEBUG listening to async event fd 59 events 0x1 mode thread_spinlock | |
[1650638548.328050] [ndv4:70055:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib1' (InfiniBand channel adapter) with 1 ports | |
[1650638548.328336] [ndv4:70055:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.328346] [ndv4:70055:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.328356] [ndv4:70055:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.328410] [ndv4:70055:0] ib_md.c:1319 UCX DEBUG mlx5_ib1: using registration cache | |
[1650638548.330267] [ndv4:70462:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib3 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.331198] [ndv4:70462:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib3: disable ODP because it's not supported for DevX QP | |
[1650638548.332560] [ndv4:70055:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.332574] [ndv4:70055:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.332851] [ndv4:70055:0] ib_md.c:1604 UCX DEBUG mlx5_ib1: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.332932] [ndv4:70055:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.332937] [ndv4:70055:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.333576] [ndv4:68755:0] topo.c:99 UCX DEBUG bus id 0x105000000 doesn't exist. sys_dev = 4 | |
[1650638548.333585] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638548.335555] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638548.335562] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638548.335372] [ndv4:70055:0] topo.c:99 UCX DEBUG bus id 0x102000000 doesn't exist. sys_dev = 1 | |
[1650638548.335379] [ndv4:70055:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.336159] [ndv4:70055:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.336165] [ndv4:70055:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.336303] [ndv4:70055:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.336310] [ndv4:70055:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.337027] [ndv4:70055:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.337031] [ndv4:70055:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.336825] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638548.336832] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638548.338317] [ndv4:69075:0] async.c:228 UCX DEBUG added async handler 0xad95f0 [id=54 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.338419] [ndv4:69075:0] async.c:506 UCX DEBUG listening to async event fd 54 events 0x1 mode thread_spinlock | |
[1650638548.338516] [ndv4:69075:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib0' (InfiniBand channel adapter) with 1 ports | |
[1650638548.338670] [ndv4:70038:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib1 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.339354] [ndv4:70439:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.339362] [ndv4:70439:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.339410] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638548.339416] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638548.339379] [ndv4:70038:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib1: disable ODP because it's not supported for DevX QP | |
[1650638548.339274] [ndv4:69075:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.339284] [ndv4:69075:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.339316] [ndv4:69075:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.339387] [ndv4:69075:0] module.c:253 UCX DEBUG loading modules for ucm | |
[1650638548.339409] [ndv4:69075:0] ib_md.c:1319 UCX DEBUG mlx5_ib0: using registration cache | |
[1650638548.340440] [ndv4:70055:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.340446] [ndv4:70055:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.340680] [ndv4:70055:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.340589] [ndv4:70462:0] async.c:228 UCX DEBUG added async handler 0xa8ec40 [id=63 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.340616] [ndv4:70462:0] async.c:506 UCX DEBUG listening to async event fd 63 events 0x1 mode thread_spinlock | |
[1650638548.340620] [ndv4:70462:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib3' (InfiniBand channel adapter) with 1 ports | |
[1650638548.341092] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638548.341098] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638548.341505] [ndv4:68755:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.341044] [ndv4:70462:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638548.341054] [ndv4:70462:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638548.341059] [ndv4:70462:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.341107] [ndv4:70462:0] ib_md.c:1319 UCX DEBUG mlx5_ib3: using registration cache | |
[1650638548.341698] [ndv4:68754:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.341710] [ndv4:68754:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.341880] [ndv4:68754:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.341884] [ndv4:68754:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.344029] [ndv4:68764:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.344040] [ndv4:68764:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.345805] [ndv4:68764:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.345811] [ndv4:68764:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.346046] [ndv4:68764:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.347932] [ndv4:68766:0] mm_sysv.c:94 UCX DEBUG mm failed to allocate 8447 bytes with hugetlb | |
[1650638548.348021] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool mm_recv_desc: align 64, maxelems 4294967295, elemsize 8368 | |
[1650638548.348047] [ndv4:68766:0] mm_sysv.c:94 UCX DEBUG mm failed to allocate 4292720 bytes with hugetlb | |
[1650638548.348058] [ndv4:68766:0] mpool.c:205 UCX DEBUG mpool mm_recv_desc: allocated chunk 0x2b8e80c25018 of 4296680 bytes with 512 elements | |
[1650638548.348686] [ndv4:68766:0] mm_iface.c:600 UCX DEBUG created mm iface 0x23771f0 FIFO id 0xe000c va 0x2b8e7beb3000 size 12288 (128 x 64 elems) | |
[1650638548.348739] [ndv4:68766:0] ucp_worker.c:1159 UCX DEBUG created interface[0]=0x23771f0 using sysv/memory on worker 0x2c514d0 | |
[1650638548.348896] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.348904] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.349138] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.349143] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.350078] [ndv4:70038:0] async.c:228 UCX DEBUG added async handler 0x14f5440 [id=59 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.350099] [ndv4:70038:0] async.c:506 UCX DEBUG listening to async event fd 59 events 0x1 mode thread_spinlock | |
[1650638548.350103] [ndv4:70038:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib1' (InfiniBand channel adapter) with 1 ports | |
[1650638548.350862] [ndv4:70462:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.350871] [ndv4:70462:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.351149] [ndv4:70462:0] ib_md.c:1604 UCX DEBUG mlx5_ib3: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.351505] [ndv4:70462:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638548.351512] [ndv4:70462:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638548.350354] [ndv4:70038:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.350363] [ndv4:70038:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.350368] [ndv4:70038:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.350411] [ndv4:70038:0] ib_md.c:1319 UCX DEBUG mlx5_ib1: using registration cache | |
[1650638548.350176] [ndv4:68766:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638548.351842] [ndv4:68766:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638548.351884] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638548.351888] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638548.351942] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638548.350991] [ndv4:69075:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.351007] [ndv4:69075:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.351307] [ndv4:69075:0] ib_md.c:1604 UCX DEBUG mlx5_ib0: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.351691] [ndv4:69075:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.351698] [ndv4:69075:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.352574] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638548.352584] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638548.353934] [ndv4:68766:0] ib_iface.c:994 UCX DEBUG iface=0x2382400: created RC QP 0x30f9 on mlx5_ib0:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638548.354760] [ndv4:68754:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.354773] [ndv4:68754:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.355524] [ndv4:68764:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib1 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.355911] [ndv4:68766:0] ucp_worker.c:1159 UCX DEBUG created interface[1]=0x2382400 using rc_verbs/mlx5_ib0:1 on worker 0x2c514d0 | |
[1650638548.356191] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.356199] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.355888] [ndv4:68764:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib1: disable ODP because it's not supported for DevX QP | |
[1650638548.356534] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.356541] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.357318] [ndv4:68766:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638548.357356] [ndv4:68766:0] ib_device.c:1394 UCX DEBUG max IB CQE size is 128 | |
[1650638548.357002] [ndv4:68754:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.357009] [ndv4:68754:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.357434] [ndv4:68754:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.358377] [ndv4:68766:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638548.358386] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638548.358389] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638548.358441] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638548.358020] [ndv4:70055:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib2 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.358513] [ndv4:70055:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib2: disable ODP because it's not supported for DevX QP | |
[1650638548.358301] [ndv4:68764:0] async.c:228 UCX DEBUG added async handler 0x2540440 [id=59 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.358337] [ndv4:68764:0] async.c:506 UCX DEBUG listening to async event fd 59 events 0x1 mode thread_spinlock | |
[1650638548.358340] [ndv4:68764:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib1' (InfiniBand channel adapter) with 1 ports | |
[1650638548.358628] [ndv4:68764:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.358638] [ndv4:68764:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.358646] [ndv4:68764:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.358691] [ndv4:68764:0] ib_md.c:1319 UCX DEBUG mlx5_ib1: using registration cache | |
[1650638548.358990] [ndv4:68766:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x2f78010 of 8176 bytes with 127 elements | |
[1650638548.359314] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638548.359343] [ndv4:68766:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib0:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638548.359393] [ndv4:68766:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib0 length=2048) failed: Invalid argument | |
[1650638548.359399] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638548.360323] [ndv4:68755:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib5 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.360428] [ndv4:70055:0] async.c:228 UCX DEBUG added async handler 0x2350d10 [id=61 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.360458] [ndv4:70055:0] async.c:506 UCX DEBUG listening to async event fd 61 events 0x1 mode thread_spinlock | |
[1650638548.360467] [ndv4:70055:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib2' (InfiniBand channel adapter) with 1 ports | |
[1650638548.361818] [ndv4:70439:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.361843] [ndv4:70439:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.361914] [ndv4:68755:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib5: disable ODP because it's not supported for DevX QP | |
[1650638548.362845] [ndv4:70462:0] topo.c:99 UCX DEBUG bus id 0x104000000 doesn't exist. sys_dev = 3 | |
[1650638548.362855] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.363610] [ndv4:70462:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.363616] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.363711] [ndv4:68755:0] async.c:228 UCX DEBUG added async handler 0xc9cfa0 [id=67 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.363745] [ndv4:68755:0] async.c:506 UCX DEBUG listening to async event fd 67 events 0x1 mode thread_spinlock | |
[1650638548.363751] [ndv4:68755:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib5' (InfiniBand channel adapter) with 1 ports | |
[1650638548.364437] [ndv4:68764:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.364460] [ndv4:68764:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.364721] [ndv4:68764:0] ib_md.c:1604 UCX DEBUG mlx5_ib1: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.364860] [ndv4:68764:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.364869] [ndv4:68764:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.363981] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638548.363994] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638548.364002] [ndv4:68755:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.364080] [ndv4:68755:0] ib_md.c:1319 UCX DEBUG mlx5_ib5: using registration cache | |
[1650638548.366276] [ndv4:70462:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.366283] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.367291] [ndv4:68764:0] topo.c:99 UCX DEBUG bus id 0x102000000 doesn't exist. sys_dev = 1 | |
[1650638548.367298] [ndv4:68764:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.367654] [ndv4:68764:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.367657] [ndv4:68764:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.368130] [ndv4:68764:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.368133] [ndv4:68764:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.368600] [ndv4:68755:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.368608] [ndv4:68755:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.368867] [ndv4:68755:0] ib_md.c:1604 UCX DEBUG mlx5_ib5: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.368889] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638548.368895] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638548.369651] [ndv4:70038:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.369665] [ndv4:70038:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.369845] [ndv4:70038:0] ib_md.c:1604 UCX DEBUG mlx5_ib1: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.369991] [ndv4:70038:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.369998] [ndv4:70038:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.371130] [ndv4:70038:0] topo.c:99 UCX DEBUG bus id 0x102000000 doesn't exist. sys_dev = 1 | |
[1650638548.371135] [ndv4:70038:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.373738] [ndv4:70038:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.373744] [ndv4:70038:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.373816] [ndv4:70038:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.373819] [ndv4:70038:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.378620] [ndv4:70038:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.378625] [ndv4:70038:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.378694] [ndv4:70038:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.378697] [ndv4:70038:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.378938] [ndv4:70038:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.369788] [ndv4:70055:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638548.369801] [ndv4:70055:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638548.369808] [ndv4:70055:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.369853] [ndv4:70055:0] ib_md.c:1319 UCX DEBUG mlx5_ib2: using registration cache | |
[1650638548.379702] [ndv4:68764:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.379709] [ndv4:68764:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.377200] [ndv4:69236:0] mm_sysv.c:94 UCX DEBUG mm failed to allocate 8447 bytes with hugetlb | |
[1650638548.377329] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool mm_recv_desc: align 64, maxelems 4294967295, elemsize 8368 | |
[1650638548.377357] [ndv4:69236:0] mm_sysv.c:94 UCX DEBUG mm failed to allocate 4292720 bytes with hugetlb | |
[1650638548.377368] [ndv4:69236:0] mpool.c:205 UCX DEBUG mpool mm_recv_desc: allocated chunk 0x2abae5c64018 of 4296680 bytes with 512 elements | |
[1650638548.378033] [ndv4:69236:0] mm_iface.c:600 UCX DEBUG created mm iface 0x1c091f0 FIFO id 0xe0014 va 0x2abadffd6000 size 12288 (128 x 64 elems) | |
[1650638548.378103] [ndv4:69236:0] ucp_worker.c:1159 UCX DEBUG created interface[0]=0x1c091f0 using sysv/memory on worker 0x24e34d0 | |
[1650638548.378276] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.378284] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.378925] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.378930] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.379947] [ndv4:69236:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638548.381377] [ndv4:69236:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638548.381424] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638548.381427] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638548.381482] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638548.381867] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638548.381961] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638548.371608] [ndv4:70439:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.371616] [ndv4:70439:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.371896] [ndv4:70439:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.381079] [ndv4:70439:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib1 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.381462] [ndv4:70439:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib1: disable ODP because it's not supported for DevX QP | |
[1650638548.378782] [ndv4:68754:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib2 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.379289] [ndv4:68754:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib2: disable ODP because it's not supported for DevX QP | |
[1650638548.371305] [ndv4:68766:0] async.c:228 UCX DEBUG added async handler 0x22e5820 [id=78 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638548.371342] [ndv4:68766:0] async.c:506 UCX DEBUG listening to async event fd 78 events 0x1 mode thread_spinlock | |
[1650638548.371413] [ndv4:68766:0] ucp_worker.c:1159 UCX DEBUG created interface[2]=0x238ff80 using rc_mlx5/mlx5_ib0:1 on worker 0x2c514d0 | |
[1650638548.371699] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.371706] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.371788] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.371793] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.371968] [ndv4:68766:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638548.374545] [ndv4:68766:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638548.374555] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638548.374558] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638548.374572] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638548.375005] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638548.375014] [ndv4:68766:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib0:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638548.375048] [ndv4:68766:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib0 length=2048) failed: Invalid argument | |
[1650638548.375052] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638548.375060] [ndv4:68766:0] async.c:228 UCX DEBUG added async handler 0x22e7440 [id=80 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638548.375079] [ndv4:68766:0] async.c:506 UCX DEBUG listening to async event fd 80 events 0x1 mode thread_spinlock | |
[1650638548.375913] [ndv4:68766:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638548.382606] [ndv4:69236:0] ib_iface.c:994 UCX DEBUG iface=0x1c14400: created RC QP 0x30ff on mlx5_ib0:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638548.385252] [ndv4:70055:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.385266] [ndv4:70055:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.385494] [ndv4:70055:0] ib_md.c:1604 UCX DEBUG mlx5_ib2: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.382713] [ndv4:69075:0] topo.c:99 UCX DEBUG bus id 0x101000000 doesn't exist. sys_dev = 0 | |
[1650638548.382723] [ndv4:69075:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.382822] [ndv4:70439:0] async.c:228 UCX DEBUG added async handler 0x1c4b3e0 [id=59 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.382850] [ndv4:70439:0] async.c:506 UCX DEBUG listening to async event fd 59 events 0x1 mode thread_spinlock | |
[1650638548.382854] [ndv4:70439:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib1' (InfiniBand channel adapter) with 1 ports | |
[1650638548.383288] [ndv4:70439:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.383297] [ndv4:70439:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.383306] [ndv4:70439:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.383364] [ndv4:70439:0] ib_md.c:1319 UCX DEBUG mlx5_ib1: using registration cache | |
[1650638548.385278] [ndv4:68766:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x2f7a050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x3122 | |
[1650638548.385869] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.385878] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.385903] [ndv4:68766:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2b8e7beb8008 of 151544 bytes with 1052 elements | |
[1650638548.385697] [ndv4:70055:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638548.385706] [ndv4:70055:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638548.390663] [ndv4:70462:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.390753] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.390992] [ndv4:70462:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.390995] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.390645] [ndv4:68766:0] ib_md.c:812 UCX DEBUG registered memory 0x2b8e81200000..0x2b8e83800000 on mlx5_ib0 lkey 0x81400 rkey 0x81400 access 0xf flags 0x3e4 | |
[1650638548.390763] [ndv4:68766:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2b8e81200018 of 39845864 bytes with 4752 elements | |
[1650638548.390907] [ndv4:68766:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x2f7a050 | |
[1650638548.391123] [ndv4:68766:0] ucp_worker.c:1159 UCX DEBUG created interface[3]=0x2f7a050 using dc_mlx5/mlx5_ib0:1 on worker 0x2c514d0 | |
[1650638548.391392] [ndv4:70462:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.391439] [ndv4:68946:0] debug.c:1198 UCX DEBUG using signal stack 0x2ad22f670000 size 141824 | |
[1650638548.391517] [ndv4:68946:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638548.391537] [ndv4:68946:0] init.c:113 UCX DEBUG /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 loaded at 0x2ad22f4c4000 | |
[1650638548.391555] [ndv4:68946:0] init.c:115 UCX DEBUG cmd line: osu_scatter | |
[1650638548.391564] [ndv4:68946:0] module.c:69 UCX DEBUG ucs library path: /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 | |
[1650638548.391570] [ndv4:68946:0] module.c:253 UCX DEBUG loading modules for ucs | |
[1650638548.391550] [ndv4:70439:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.391570] [ndv4:70439:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.391827] [ndv4:70439:0] ib_md.c:1604 UCX DEBUG mlx5_ib1: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.391864] [ndv4:70439:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.391873] [ndv4:70439:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.393327] [ndv4:68754:0] async.c:228 UCX DEBUG added async handler 0x2126c20 [id=61 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.393372] [ndv4:68754:0] async.c:506 UCX DEBUG listening to async event fd 61 events 0x1 mode thread_spinlock | |
[1650638548.393378] [ndv4:68754:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib2' (InfiniBand channel adapter) with 1 ports | |
[1650638548.393734] [ndv4:68754:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638548.393743] [ndv4:68754:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638548.393755] [ndv4:68754:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.393848] [ndv4:68754:0] ib_md.c:1319 UCX DEBUG mlx5_ib2: using registration cache | |
[1650638548.393459] [ndv4:68755:0] topo.c:99 UCX DEBUG bus id 0x106000000 doesn't exist. sys_dev = 5 | |
[1650638548.393471] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638548.394403] [ndv4:68946:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638548.394422] [ndv4:68946:0] time.c:22 UCX DEBUG measured arch clock speed: 1000000.00 Hz | |
[1650638548.394460] [ndv4:68946:0] ucp_context.c:1361 UCX DEBUG estimated number of endpoints is 96 | |
[1650638548.394463] [ndv4:68946:0] ucp_context.c:1368 UCX DEBUG estimated number of endpoints per node is 96 | |
[1650638548.394470] [ndv4:68946:0] ucp_context.c:1375 UCX DEBUG estimated bcopy bandwidth is 5251268608.000000 | |
[1650638548.394477] [ndv4:68946:0] ucp_context.c:1427 UCX DEBUG allocation method[0] is md 'sysv' | |
[1650638548.394480] [ndv4:68946:0] ucp_context.c:1427 UCX DEBUG allocation method[1] is md 'posix' | |
[1650638548.394485] [ndv4:68946:0] ucp_context.c:1439 UCX DEBUG allocation method[2] is 'huge' | |
[1650638548.394487] [ndv4:68946:0] ucp_context.c:1439 UCX DEBUG allocation method[3] is 'thp' | |
[1650638548.394490] [ndv4:68946:0] ucp_context.c:1427 UCX DEBUG allocation method[4] is md '*' | |
[1650638548.394492] [ndv4:68946:0] ucp_context.c:1439 UCX DEBUG allocation method[5] is 'mmap' | |
[1650638548.394495] [ndv4:68946:0] ucp_context.c:1439 UCX DEBUG allocation method[6] is 'heap' | |
[1650638548.394503] [ndv4:68946:0] module.c:253 UCX DEBUG loading modules for uct | |
[1650638548.397304] [ndv4:69236:0] ucp_worker.c:1159 UCX DEBUG created interface[1]=0x1c14400 using rc_verbs/mlx5_ib0:1 on worker 0x24e34d0 | |
[1650638548.397855] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.397864] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.397240] [ndv4:69075:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.397253] [ndv4:69075:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.398377] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.398386] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.399363] [ndv4:69236:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638548.400177] [ndv4:70055:0] topo.c:99 UCX DEBUG bus id 0x103000000 doesn't exist. sys_dev = 2 | |
[1650638548.400192] [ndv4:70055:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.400573] [ndv4:69236:0] ib_device.c:1394 UCX DEBUG max IB CQE size is 128 | |
[1650638548.400708] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.400722] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.401038] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.401046] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.400793] [ndv4:68754:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.400806] [ndv4:68754:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.401129] [ndv4:68754:0] ib_md.c:1604 UCX DEBUG mlx5_ib2: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.401154] [ndv4:68754:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638548.401159] [ndv4:68754:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638548.402088] [ndv4:70038:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib2 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.402655] [ndv4:70038:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib2: disable ODP because it's not supported for DevX QP | |
[1650638548.402250] [ndv4:68946:0] module.c:253 UCX DEBUG loading modules for uct_ib | |
[1650638548.402597] [ndv4:68946:0] ucp_context.c:1117 UCX DEBUG closing md posix because it has no selected transport resources | |
[1650638548.402620] [ndv4:68946:0] ucp_context.c:1117 UCX DEBUG closing md self because it has no selected transport resources | |
[1650638548.401774] [ndv4:69236:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638548.401790] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638548.401794] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638548.401849] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638548.402310] [ndv4:69236:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x280a010 of 8176 bytes with 127 elements | |
[1650638548.402554] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638548.402578] [ndv4:69236:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib0:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638548.402626] [ndv4:69236:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib0 length=2048) failed: Invalid argument | |
[1650638548.402634] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638548.401381] [ndv4:70462:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib4 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.401882] [ndv4:70462:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib4: disable ODP because it's not supported for DevX QP | |
[1650638548.402319] [ndv4:70462:0] async.c:228 UCX DEBUG added async handler 0x11786a0 [id=65 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.402344] [ndv4:70462:0] async.c:506 UCX DEBUG listening to async event fd 65 events 0x1 mode thread_spinlock | |
[1650638548.402348] [ndv4:70462:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib4' (InfiniBand channel adapter) with 1 ports | |
[1650638548.402600] [ndv4:70462:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638548.402608] [ndv4:70462:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638548.402617] [ndv4:70462:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.402671] [ndv4:70462:0] ib_md.c:1319 UCX DEBUG mlx5_ib4: using registration cache | |
[1650638548.402669] [ndv4:68766:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638548.402914] [ndv4:70439:0] topo.c:99 UCX DEBUG bus id 0x102000000 doesn't exist. sys_dev = 1 | |
[1650638548.402922] [ndv4:70439:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.403329] [ndv4:70055:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.403336] [ndv4:70055:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.403851] [ndv4:70055:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.403855] [ndv4:70055:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.404122] [ndv4:68946:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib4) failed: Cannot assign requested address | |
[1650638548.404150] [ndv4:68946:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib3) failed: Cannot assign requested address | |
[1650638548.404161] [ndv4:68946:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib2) failed: Cannot assign requested address | |
[1650638548.404176] [ndv4:68946:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib7) failed: Cannot assign requested address | |
[1650638548.404000] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638548.404010] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638548.404495] [ndv4:68946:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib6) failed: Cannot assign requested address | |
[1650638548.404510] [ndv4:68946:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib5) failed: Cannot assign requested address | |
[1650638548.404522] [ndv4:68946:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib1) failed: Cannot assign requested address | |
[1650638548.404731] [ndv4:68946:0] ucp_context.c:1117 UCX DEBUG closing md tcp because it has no selected transport resources | |
[1650638548.404851] [ndv4:70038:0] async.c:228 UCX DEBUG added async handler 0x14f5d60 [id=61 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.404887] [ndv4:70038:0] async.c:506 UCX DEBUG listening to async event fd 61 events 0x1 mode thread_spinlock | |
[1650638548.405043] [ndv4:70439:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.405050] [ndv4:70439:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.405055] [ndv4:70038:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib2' (InfiniBand channel adapter) with 1 ports | |
[1650638548.405080] [ndv4:68766:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638548.406047] [ndv4:70038:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638548.406062] [ndv4:70038:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638548.406080] [ndv4:70038:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.406169] [ndv4:70038:0] ib_md.c:1319 UCX DEBUG mlx5_ib2: using registration cache | |
[1650638548.406689] [ndv4:70439:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.406698] [ndv4:70439:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.406958] [ndv4:70439:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.406961] [ndv4:70439:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.406066] [ndv4:68946:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: Success, continuing, but fork may be unsafe. | |
[1650638548.405801] [ndv4:68766:0] ib_iface.c:994 UCX DEBUG iface=0x238d5c0: created UD QP 0x3106 on mlx5_ib0:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638548.406812] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638548.407376] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.407384] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.407827] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.407832] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.405969] [ndv4:70462:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.406053] [ndv4:70462:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.406317] [ndv4:70462:0] ib_md.c:1604 UCX DEBUG mlx5_ib4: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.406350] [ndv4:70462:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638548.406356] [ndv4:70462:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638548.407694] [ndv4:68754:0] topo.c:99 UCX DEBUG bus id 0x103000000 doesn't exist. sys_dev = 2 | |
[1650638548.407704] [ndv4:68754:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.408276] [ndv4:68766:0] ib_md.c:812 UCX DEBUG registered memory 0x2b8e7bedd000..0x2b8e7bf62000 on mlx5_ib0 lkey 0x81500 rkey 0x81500 access 0xf flags 0x3e4 | |
[1650638548.408283] [ndv4:68766:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2b8e7bedd018 of 544744 bytes with 128 elements | |
[1650638548.408288] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638548.408961] [ndv4:68766:0] ud_iface.c:393 UCX DEBUG iface 0x238d5c0: adding gid fe80::15:5dff:fd34:1b to hash on device mlx5_ib0 port 1 index 0) | |
[1650638548.409436] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638548.409448] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638548.409881] [ndv4:68766:0] ud_iface.c:393 UCX DEBUG iface 0x238d5c0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 1) | |
[1650638548.410395] [ndv4:70038:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.410425] [ndv4:70038:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.410640] [ndv4:70038:0] ib_md.c:1604 UCX DEBUG mlx5_ib2: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.410752] [ndv4:70038:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638548.410761] [ndv4:70038:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638548.410504] [ndv4:69048:0] debug.c:1198 UCX DEBUG using signal stack 0x2b565b69f000 size 141824 | |
[1650638548.410598] [ndv4:69048:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638548.410617] [ndv4:69048:0] init.c:113 UCX DEBUG /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 loaded at 0x2b565b4f3000 | |
[1650638548.410641] [ndv4:69048:0] init.c:115 UCX DEBUG cmd line: osu_scatter | |
[1650638548.410649] [ndv4:69048:0] module.c:69 UCX DEBUG ucs library path: /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 | |
[1650638548.410656] [ndv4:69048:0] module.c:253 UCX DEBUG loading modules for ucs | |
[1650638548.410973] [ndv4:68754:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.410981] [ndv4:68754:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.411282] [ndv4:68764:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.411294] [ndv4:68764:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.411616] [ndv4:68764:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.412248] [ndv4:70038:0] topo.c:99 UCX DEBUG bus id 0x103000000 doesn't exist. sys_dev = 2 | |
[1650638548.412254] [ndv4:70038:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.414055] [ndv4:68946:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib0 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.413282] [ndv4:70055:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.413295] [ndv4:70055:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.413605] [ndv4:70055:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.413609] [ndv4:70055:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.413897] [ndv4:70055:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.413602] [ndv4:69048:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638548.413630] [ndv4:69048:0] time.c:22 UCX DEBUG measured arch clock speed: 1000000.00 Hz | |
[1650638548.413669] [ndv4:69048:0] ucp_context.c:1361 UCX DEBUG estimated number of endpoints is 96 | |
[1650638548.413672] [ndv4:69048:0] ucp_context.c:1368 UCX DEBUG estimated number of endpoints per node is 96 | |
[1650638548.413681] [ndv4:69048:0] ucp_context.c:1375 UCX DEBUG estimated bcopy bandwidth is 5251268608.000000 | |
[1650638548.413689] [ndv4:69048:0] ucp_context.c:1427 UCX DEBUG allocation method[0] is md 'sysv' | |
[1650638548.413692] [ndv4:69048:0] ucp_context.c:1427 UCX DEBUG allocation method[1] is md 'posix' | |
[1650638548.413698] [ndv4:69048:0] ucp_context.c:1439 UCX DEBUG allocation method[2] is 'huge' | |
[1650638548.413700] [ndv4:69048:0] ucp_context.c:1439 UCX DEBUG allocation method[3] is 'thp' | |
[1650638548.413702] [ndv4:69048:0] ucp_context.c:1427 UCX DEBUG allocation method[4] is md '*' | |
[1650638548.413705] [ndv4:69048:0] ucp_context.c:1439 UCX DEBUG allocation method[5] is 'mmap' | |
[1650638548.413707] [ndv4:69048:0] ucp_context.c:1439 UCX DEBUG allocation method[6] is 'heap' | |
[1650638548.413717] [ndv4:69048:0] module.c:253 UCX DEBUG loading modules for uct | |
[1650638548.414958] [ndv4:68946:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib0: disable ODP because it's not supported for DevX QP | |
[1650638548.415845] [ndv4:69236:0] async.c:228 UCX DEBUG added async handler 0x1b77820 [id=78 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638548.415875] [ndv4:69236:0] async.c:506 UCX DEBUG listening to async event fd 78 events 0x1 mode thread_spinlock | |
[1650638548.415947] [ndv4:69236:0] ucp_worker.c:1159 UCX DEBUG created interface[2]=0x1c21f80 using rc_mlx5/mlx5_ib0:1 on worker 0x24e34d0 | |
[1650638548.416003] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.416010] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.418034] [ndv4:69075:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.418048] [ndv4:69075:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.419988] [ndv4:68764:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib2 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.420406] [ndv4:68764:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib2: disable ODP because it's not supported for DevX QP | |
[1650638548.421175] [ndv4:68764:0] async.c:228 UCX DEBUG added async handler 0x2540d60 [id=61 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.421276] [ndv4:68764:0] async.c:506 UCX DEBUG listening to async event fd 61 events 0x1 mode thread_spinlock | |
[1650638548.421281] [ndv4:68764:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib2' (InfiniBand channel adapter) with 1 ports | |
[1650638548.422517] [ndv4:68766:0] ud_iface.c:393 UCX DEBUG iface 0x238d5c0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 2) | |
[1650638548.421655] [ndv4:68764:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638548.421666] [ndv4:68764:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638548.421676] [ndv4:68764:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.421739] [ndv4:68764:0] ib_md.c:1319 UCX DEBUG mlx5_ib2: using registration cache | |
[1650638548.422094] [ndv4:69048:0] module.c:253 UCX DEBUG loading modules for uct_ib | |
[1650638548.422794] [ndv4:69048:0] ucp_context.c:1117 UCX DEBUG closing md posix because it has no selected transport resources | |
[1650638548.422827] [ndv4:69048:0] ucp_context.c:1117 UCX DEBUG closing md self because it has no selected transport resources | |
[1650638548.423657] [ndv4:69048:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib4) failed: Cannot assign requested address | |
[1650638548.423670] [ndv4:69048:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib3) failed: Cannot assign requested address | |
[1650638548.423680] [ndv4:69048:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib2) failed: Cannot assign requested address | |
[1650638548.423691] [ndv4:69048:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib7) failed: Cannot assign requested address | |
[1650638548.423703] [ndv4:69048:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib6) failed: Cannot assign requested address | |
[1650638548.423715] [ndv4:69048:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib5) failed: Cannot assign requested address | |
[1650638548.423725] [ndv4:69048:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib1) failed: Cannot assign requested address | |
[1650638548.423760] [ndv4:69048:0] ucp_context.c:1117 UCX DEBUG closing md tcp because it has no selected transport resources | |
[1650638548.424151] [ndv4:69048:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: Success, continuing, but fork may be unsafe. | |
[1650638548.423492] [ndv4:68766:0] ud_iface.c:393 UCX DEBUG iface 0x238d5c0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 3) | |
[1650638548.423521] [ndv4:68766:0] ud_iface.c:393 UCX DEBUG iface 0x238d5c0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 4) | |
[1650638548.423536] [ndv4:68766:0] ud_iface.c:393 UCX DEBUG iface 0x238d5c0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 5) | |
[1650638548.423550] [ndv4:68766:0] ud_iface.c:393 UCX DEBUG iface 0x238d5c0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 6) | |
[1650638548.423565] [ndv4:68766:0] ud_iface.c:393 UCX DEBUG iface 0x238d5c0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 7) | |
[1650638548.423979] [ndv4:68766:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638548.423738] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638548.423751] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638548.424826] [ndv4:68764:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.424841] [ndv4:68764:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.425034] [ndv4:68764:0] ib_md.c:1604 UCX DEBUG mlx5_ib2: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.425135] [ndv4:68764:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638548.425151] [ndv4:68764:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638548.425966] [ndv4:68946:0] async.c:228 UCX DEBUG added async handler 0x22b9390 [id=54 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.426090] [ndv4:68946:0] async.c:506 UCX DEBUG listening to async event fd 54 events 0x1 mode thread_spinlock | |
[1650638548.426355] [ndv4:68762:0] debug.c:1198 UCX DEBUG using signal stack 0x2b6485377000 size 141824 | |
[1650638548.426431] [ndv4:68762:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638548.426453] [ndv4:68762:0] init.c:113 UCX DEBUG /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 loaded at 0x2b64851e5000 | |
[1650638548.426476] [ndv4:68762:0] init.c:115 UCX DEBUG cmd line: osu_scatter | |
[1650638548.426486] [ndv4:68762:0] module.c:69 UCX DEBUG ucs library path: /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 | |
[1650638548.426493] [ndv4:68762:0] module.c:253 UCX DEBUG loading modules for ucs | |
[1650638548.426601] [ndv4:68946:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib0' (InfiniBand channel adapter) with 1 ports | |
[1650638548.426905] [ndv4:68946:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.426916] [ndv4:68946:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.426955] [ndv4:68946:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.427033] [ndv4:68946:0] module.c:253 UCX DEBUG loading modules for ucm | |
[1650638548.427059] [ndv4:68946:0] ib_md.c:1319 UCX DEBUG mlx5_ib0: using registration cache | |
[1650638548.427383] [ndv4:68754:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.427396] [ndv4:68754:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.428612] [ndv4:68946:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.428627] [ndv4:68946:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.428855] [ndv4:68946:0] ib_md.c:1604 UCX DEBUG mlx5_ib0: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.429392] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.429406] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.429611] [ndv4:68762:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638548.429628] [ndv4:68762:0] time.c:22 UCX DEBUG measured arch clock speed: 1000000.00 Hz | |
[1650638548.429663] [ndv4:68762:0] ucp_context.c:1361 UCX DEBUG estimated number of endpoints is 96 | |
[1650638548.429667] [ndv4:68762:0] ucp_context.c:1368 UCX DEBUG estimated number of endpoints per node is 96 | |
[1650638548.429674] [ndv4:68762:0] ucp_context.c:1375 UCX DEBUG estimated bcopy bandwidth is 5251268608.000000 | |
[1650638548.429681] [ndv4:68762:0] ucp_context.c:1427 UCX DEBUG allocation method[0] is md 'sysv' | |
[1650638548.429684] [ndv4:68762:0] ucp_context.c:1427 UCX DEBUG allocation method[1] is md 'posix' | |
[1650638548.429688] [ndv4:68762:0] ucp_context.c:1439 UCX DEBUG allocation method[2] is 'huge' | |
[1650638548.429691] [ndv4:68762:0] ucp_context.c:1439 UCX DEBUG allocation method[3] is 'thp' | |
[1650638548.429694] [ndv4:68762:0] ucp_context.c:1427 UCX DEBUG allocation method[4] is md '*' | |
[1650638548.429696] [ndv4:68762:0] ucp_context.c:1439 UCX DEBUG allocation method[5] is 'mmap' | |
[1650638548.429699] [ndv4:68762:0] ucp_context.c:1439 UCX DEBUG allocation method[6] is 'heap' | |
[1650638548.429705] [ndv4:68762:0] module.c:253 UCX DEBUG loading modules for uct | |
[1650638548.430614] [ndv4:69236:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638548.431195] [ndv4:70439:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.431231] [ndv4:70439:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.431571] [ndv4:70439:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.431724] [ndv4:69236:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638548.431745] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638548.431748] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638548.431768] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638548.432086] [ndv4:70462:0] topo.c:99 UCX DEBUG bus id 0x105000000 doesn't exist. sys_dev = 4 | |
[1650638548.432099] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638548.432423] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638548.432442] [ndv4:69236:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib0:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638548.432480] [ndv4:69236:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib0 length=2048) failed: Invalid argument | |
[1650638548.432485] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638548.432498] [ndv4:69236:0] async.c:228 UCX DEBUG added async handler 0x1b79440 [id=80 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638548.432521] [ndv4:69236:0] async.c:506 UCX DEBUG listening to async event fd 80 events 0x1 mode thread_spinlock | |
[1650638548.432837] [ndv4:69048:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib0 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.433098] [ndv4:69236:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638548.433316] [ndv4:69048:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib0: disable ODP because it's not supported for DevX QP | |
[1650638548.433736] [ndv4:68766:0] async.c:228 UCX DEBUG added async handler 0x22e68f0 [id=81 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638548.433773] [ndv4:68766:0] async.c:506 UCX DEBUG listening to async event fd 81 events 0x5 mode thread_spinlock | |
[1650638548.433838] [ndv4:68766:0] ucp_worker.c:1159 UCX DEBUG created interface[4]=0x238d5c0 using ud_verbs/mlx5_ib0:1 on worker 0x2c514d0 | |
[1650638548.433949] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.433956] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.434006] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.434011] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.434178] [ndv4:68766:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638548.434596] [ndv4:70055:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib3 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.434888] [ndv4:68766:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638548.435136] [ndv4:70055:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib3: disable ODP because it's not supported for DevX QP | |
[1650638548.435341] [ndv4:68766:0] ib_iface.c:994 UCX DEBUG iface=0x23988f0: created UD QP 0x3108 on mlx5_ib0:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638548.435356] [ndv4:68766:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638548.436548] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638548.436697] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.436703] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.437014] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.437019] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.436493] [ndv4:68762:0] module.c:253 UCX DEBUG loading modules for uct_ib | |
[1650638548.436768] [ndv4:68762:0] ucp_context.c:1117 UCX DEBUG closing md posix because it has no selected transport resources | |
[1650638548.436788] [ndv4:68762:0] ucp_context.c:1117 UCX DEBUG closing md self because it has no selected transport resources | |
[1650638548.436860] [ndv4:68762:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib4) failed: Cannot assign requested address | |
[1650638548.436872] [ndv4:68762:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib3) failed: Cannot assign requested address | |
[1650638548.436882] [ndv4:68762:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib2) failed: Cannot assign requested address | |
[1650638548.436891] [ndv4:68762:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib7) failed: Cannot assign requested address | |
[1650638548.436901] [ndv4:68762:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib6) failed: Cannot assign requested address | |
[1650638548.436912] [ndv4:68762:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib5) failed: Cannot assign requested address | |
[1650638548.436922] [ndv4:68762:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib1) failed: Cannot assign requested address | |
[1650638548.436956] [ndv4:68762:0] ucp_context.c:1117 UCX DEBUG closing md tcp because it has no selected transport resources | |
[1650638548.437486] [ndv4:68766:0] ib_md.c:812 UCX DEBUG registered memory 0x2b8e7bf62000..0x2b8e7bfe7000 on mlx5_ib0 lkey 0x81600 rkey 0x81600 access 0xf flags 0x3e4 | |
[1650638548.437492] [ndv4:68766:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2b8e7bf62018 of 544744 bytes with 128 elements | |
[1650638548.437497] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638548.437770] [ndv4:68766:0] ud_iface.c:393 UCX DEBUG iface 0x23988f0: adding gid fe80::15:5dff:fd34:1b to hash on device mlx5_ib0 port 1 index 0) | |
[1650638548.438189] [ndv4:68766:0] ud_iface.c:393 UCX DEBUG iface 0x23988f0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 1) | |
[1650638548.438279] [ndv4:68766:0] ud_iface.c:393 UCX DEBUG iface 0x23988f0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 2) | |
[1650638548.437976] [ndv4:70038:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.437993] [ndv4:70038:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.438466] [ndv4:70038:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.438472] [ndv4:70038:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.439099] [ndv4:70038:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.439103] [ndv4:70038:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.437391] [ndv4:68762:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: Success, continuing, but fork may be unsafe. | |
[1650638548.438823] [ndv4:70439:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib2 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.439192] [ndv4:70439:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib2: disable ODP because it's not supported for DevX QP | |
[1650638548.438456] [ndv4:68946:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.438469] [ndv4:68946:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.439374] [ndv4:68946:0] topo.c:99 UCX DEBUG bus id 0x101000000 doesn't exist. sys_dev = 0 | |
[1650638548.439379] [ndv4:68946:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.439685] [ndv4:68946:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.439689] [ndv4:68946:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.440948] [ndv4:68946:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.440954] [ndv4:68946:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.441608] [ndv4:69236:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x280c050: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x312f | |
[1650638548.442128] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.442135] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.442165] [ndv4:69236:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2abadffdb008 of 151544 bytes with 1052 elements | |
[1650638548.441381] [ndv4:68946:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.441388] [ndv4:68946:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.442413] [ndv4:68946:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.442418] [ndv4:68946:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.442691] [ndv4:68946:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.442547] [ndv4:69048:0] async.c:228 UCX DEBUG added async handler 0x2541340 [id=54 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.442644] [ndv4:69048:0] async.c:506 UCX DEBUG listening to async event fd 54 events 0x1 mode thread_spinlock | |
[1650638548.442659] [ndv4:69048:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib0' (InfiniBand channel adapter) with 1 ports | |
[1650638548.443321] [ndv4:70462:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638548.443331] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638548.443934] [ndv4:70462:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638548.443938] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638548.444877] [ndv4:70462:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638548.444882] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638548.443858] [ndv4:68762:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib0 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.444257] [ndv4:68762:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib0: disable ODP because it's not supported for DevX QP | |
[1650638548.443053] [ndv4:69048:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.443063] [ndv4:69048:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.443087] [ndv4:69048:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.443153] [ndv4:69048:0] module.c:253 UCX DEBUG loading modules for ucm | |
[1650638548.443176] [ndv4:69048:0] ib_md.c:1319 UCX DEBUG mlx5_ib0: using registration cache | |
[1650638548.447505] [ndv4:68766:0] ud_iface.c:393 UCX DEBUG iface 0x23988f0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 3) | |
[1650638548.447710] [ndv4:68766:0] ud_iface.c:393 UCX DEBUG iface 0x23988f0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 4) | |
[1650638548.447872] [ndv4:68766:0] ud_iface.c:393 UCX DEBUG iface 0x23988f0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 5) | |
[1650638548.448275] [ndv4:68766:0] ud_iface.c:393 UCX DEBUG iface 0x23988f0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 6) | |
[1650638548.448598] [ndv4:68766:0] ud_iface.c:393 UCX DEBUG iface 0x23988f0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 7) | |
[1650638548.448613] [ndv4:68766:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib0:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638548.448652] [ndv4:68766:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638548.448659] [ndv4:68766:0] async.c:228 UCX DEBUG added async handler 0x22e7a90 [id=82 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638548.448686] [ndv4:68766:0] async.c:506 UCX DEBUG listening to async event fd 82 events 0x5 mode thread_spinlock | |
[1650638548.448707] [ndv4:68766:0] ucp_worker.c:1159 UCX DEBUG created interface[5]=0x23988f0 using ud_mlx5/mlx5_ib0:1 on worker 0x2c514d0 | |
[1650638548.448726] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.448732] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.448790] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.448794] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.446123] [ndv4:69236:0] ib_md.c:812 UCX DEBUG registered memory 0x2abae6200000..0x2abae8800000 on mlx5_ib0 lkey 0x81700 rkey 0x81700 access 0xf flags 0x3e4 | |
[1650638548.446144] [ndv4:69236:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2abae6200018 of 39845864 bytes with 4752 elements | |
[1650638548.446306] [ndv4:69236:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x280c050 | |
[1650638548.446369] [ndv4:69236:0] ucp_worker.c:1159 UCX DEBUG created interface[3]=0x280c050 using dc_mlx5/mlx5_ib0:1 on worker 0x24e34d0 | |
[1650638548.447036] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.447045] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.447184] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.447189] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.447997] [ndv4:69236:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638548.449071] [ndv4:69236:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638548.447983] [ndv4:70462:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638548.447989] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638548.448275] [ndv4:70462:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.448404] [ndv4:69048:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.448417] [ndv4:69048:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.448607] [ndv4:69048:0] ib_md.c:1604 UCX DEBUG mlx5_ib0: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.448625] [ndv4:69048:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.448630] [ndv4:69048:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.446116] [ndv4:70055:0] async.c:228 UCX DEBUG added async handler 0x2350420 [id=63 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.446148] [ndv4:70055:0] async.c:506 UCX DEBUG listening to async event fd 63 events 0x1 mode thread_spinlock | |
[1650638548.446153] [ndv4:70055:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib3' (InfiniBand channel adapter) with 1 ports | |
[1650638548.446369] [ndv4:70055:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638548.446378] [ndv4:70055:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638548.446392] [ndv4:70055:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.446450] [ndv4:70055:0] ib_md.c:1319 UCX DEBUG mlx5_ib3: using registration cache | |
[1650638548.445096] [ndv4:68764:0] topo.c:99 UCX DEBUG bus id 0x103000000 doesn't exist. sys_dev = 2 | |
[1650638548.445107] [ndv4:68764:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.447700] [ndv4:68764:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.447707] [ndv4:68764:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.449493] [ndv4:69236:0] ib_iface.c:994 UCX DEBUG iface=0x1c1f5c0: created UD QP 0x3111 on mlx5_ib0:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638548.450182] [ndv4:68766:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib1:1 | |
[1650638548.450017] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638548.450052] [ndv4:70439:0] async.c:228 UCX DEBUG added async handler 0x1c1d0c0 [id=61 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.450088] [ndv4:70439:0] async.c:506 UCX DEBUG listening to async event fd 61 events 0x1 mode thread_spinlock | |
[1650638548.450092] [ndv4:70439:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib2' (InfiniBand channel adapter) with 1 ports | |
[1650638548.450350] [ndv4:70439:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638548.450358] [ndv4:70439:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638548.450366] [ndv4:70439:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.450420] [ndv4:70439:0] ib_md.c:1319 UCX DEBUG mlx5_ib2: using registration cache | |
[1650638548.451502] [ndv4:68766:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638548.451511] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638548.451515] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638548.451529] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638548.451160] [ndv4:69075:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.451178] [ndv4:69075:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.452076] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638548.452083] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638548.452244] [ndv4:69075:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.452251] [ndv4:69075:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.452523] [ndv4:69075:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.452507] [ndv4:70055:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.452518] [ndv4:70055:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.452592] [ndv4:68946:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib1 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.453196] [ndv4:68946:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib1: disable ODP because it's not supported for DevX QP | |
[1650638548.452777] [ndv4:70055:0] ib_md.c:1604 UCX DEBUG mlx5_ib3: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.452970] [ndv4:70055:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638548.452976] [ndv4:70055:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638548.453071] [ndv4:70055:0] topo.c:99 UCX DEBUG bus id 0x104000000 doesn't exist. sys_dev = 3 | |
[1650638548.453075] [ndv4:70055:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.453469] [ndv4:68766:0] ib_iface.c:994 UCX DEBUG iface=0x3521050: created RC QP 0x3051 on mlx5_ib1:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638548.452804] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638548.452815] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638548.453175] [ndv4:68755:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.456624] [ndv4:70462:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib5 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.456978] [ndv4:70462:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib5: disable ODP because it's not supported for DevX QP | |
[1650638548.455229] [ndv4:70439:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.455236] [ndv4:70439:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.455450] [ndv4:70439:0] ib_md.c:1604 UCX DEBUG mlx5_ib2: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.455587] [ndv4:70439:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638548.455593] [ndv4:70439:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638548.456686] [ndv4:68764:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.456695] [ndv4:68764:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.456767] [ndv4:68764:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.456770] [ndv4:68764:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.454484] [ndv4:68766:0] ucp_worker.c:1159 UCX DEBUG created interface[6]=0x3521050 using rc_verbs/mlx5_ib1:1 on worker 0x2c514d0 | |
[1650638548.454641] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.454648] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.454846] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.454852] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.454517] [ndv4:68946:0] async.c:228 UCX DEBUG added async handler 0x22a21b0 [id=59 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.454548] [ndv4:68946:0] async.c:506 UCX DEBUG listening to async event fd 59 events 0x1 mode thread_spinlock | |
[1650638548.454552] [ndv4:68946:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib1' (InfiniBand channel adapter) with 1 ports | |
[1650638548.454713] [ndv4:68946:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.454722] [ndv4:68946:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.454728] [ndv4:68946:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.454790] [ndv4:68946:0] ib_md.c:1319 UCX DEBUG mlx5_ib1: using registration cache | |
[1650638548.456282] [ndv4:68754:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.456290] [ndv4:68754:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.456991] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.457001] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.457425] [ndv4:70462:0] async.c:228 UCX DEBUG added async handler 0xa964b0 [id=67 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.457456] [ndv4:70462:0] async.c:506 UCX DEBUG listening to async event fd 67 events 0x1 mode thread_spinlock | |
[1650638548.457460] [ndv4:70462:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib5' (InfiniBand channel adapter) with 1 ports | |
[1650638548.457750] [ndv4:70462:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638548.457759] [ndv4:70462:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638548.457769] [ndv4:70462:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.457822] [ndv4:70462:0] ib_md.c:1319 UCX DEBUG mlx5_ib5: using registration cache | |
[1650638548.457690] [ndv4:68764:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.457696] [ndv4:68764:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.457964] [ndv4:68764:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.459827] [ndv4:68946:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.459841] [ndv4:68946:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.460139] [ndv4:68946:0] ib_md.c:1604 UCX DEBUG mlx5_ib1: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.460364] [ndv4:68946:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.460371] [ndv4:68946:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.458754] [ndv4:68754:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.458761] [ndv4:68754:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.459020] [ndv4:68754:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.458825] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.458833] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.459827] [ndv4:69236:0] ib_md.c:812 UCX DEBUG registered memory 0x2abae887e000..0x2abae8903000 on mlx5_ib0 lkey 0x81800 rkey 0x81800 access 0xf flags 0x3e4 | |
[1650638548.459837] [ndv4:69236:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2abae887e018 of 544744 bytes with 128 elements | |
[1650638548.459847] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638548.460384] [ndv4:69236:0] ud_iface.c:393 UCX DEBUG iface 0x1c1f5c0: adding gid fe80::15:5dff:fd34:1b to hash on device mlx5_ib0 port 1 index 0) | |
[1650638548.458810] [ndv4:69048:0] topo.c:99 UCX DEBUG bus id 0x101000000 doesn't exist. sys_dev = 0 | |
[1650638548.458819] [ndv4:69048:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.460762] [ndv4:69048:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.460771] [ndv4:69048:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.461362] [ndv4:68755:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib6 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.461707] [ndv4:68755:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib6: disable ODP because it's not supported for DevX QP | |
[1650638548.462571] [ndv4:68755:0] async.c:228 UCX DEBUG added async handler 0x1395d70 [id=69 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.462604] [ndv4:68755:0] async.c:506 UCX DEBUG listening to async event fd 69 events 0x1 mode thread_spinlock | |
[1650638548.462608] [ndv4:68755:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib6' (InfiniBand channel adapter) with 1 ports | |
[1650638548.463018] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638548.463029] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638548.463040] [ndv4:68755:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.463103] [ndv4:68755:0] ib_md.c:1319 UCX DEBUG mlx5_ib6: using registration cache | |
[1650638548.462448] [ndv4:68766:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib1:1 | |
[1650638548.463706] [ndv4:69048:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.463713] [ndv4:69048:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.465956] [ndv4:68762:0] async.c:228 UCX DEBUG added async handler 0x1f04640 [id=54 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.466035] [ndv4:68762:0] async.c:506 UCX DEBUG listening to async event fd 54 events 0x1 mode thread_spinlock | |
[1650638548.466051] [ndv4:68762:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib0' (InfiniBand channel adapter) with 1 ports | |
[1650638548.463821] [ndv4:70439:0] topo.c:99 UCX DEBUG bus id 0x103000000 doesn't exist. sys_dev = 2 | |
[1650638548.463830] [ndv4:70439:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.465062] [ndv4:70038:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.465072] [ndv4:70038:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.465378] [ndv4:70038:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.467465] [ndv4:68764:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib3 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.467872] [ndv4:68764:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib3: disable ODP because it's not supported for DevX QP | |
[1650638548.467786] [ndv4:69236:0] ud_iface.c:393 UCX DEBUG iface 0x1c1f5c0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 1) | |
[1650638548.468441] [ndv4:68755:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.468449] [ndv4:68755:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.468693] [ndv4:68755:0] ib_md.c:1604 UCX DEBUG mlx5_ib6: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.468714] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638548.468719] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638548.468977] [ndv4:70462:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.468989] [ndv4:70462:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.469269] [ndv4:70462:0] ib_md.c:1604 UCX DEBUG mlx5_ib5: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.469521] [ndv4:70462:0] ib_md.c:296 UCX DEBUG mlx5_ib5: cuda GPUDirect RDMA is enabled | |
[1650638548.469527] [ndv4:70462:0] ib_md.c:296 UCX DEBUG mlx5_ib5: rocm GPUDirect RDMA is disabled | |
[1650638548.469554] [ndv4:68766:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638548.469570] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638548.469573] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638548.469591] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638548.470004] [ndv4:68766:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x347a010 of 8176 bytes with 127 elements | |
[1650638548.470304] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638548.470314] [ndv4:68766:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib1:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638548.470346] [ndv4:68766:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib1 length=2048) failed: Invalid argument | |
[1650638548.470350] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638548.470362] [ndv4:68766:0] async.c:228 UCX DEBUG added async handler 0x22e6960 [id=85 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638548.470386] [ndv4:68766:0] async.c:506 UCX DEBUG listening to async event fd 85 events 0x1 mode thread_spinlock | |
[1650638548.470398] [ndv4:68766:0] ucp_worker.c:1159 UCX DEBUG created interface[7]=0x33a7020 using rc_mlx5/mlx5_ib1:1 on worker 0x2c514d0 | |
[1650638548.470681] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.470687] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.470765] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.470770] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.469935] [ndv4:68764:0] async.c:228 UCX DEBUG added async handler 0x2540590 [id=63 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.469972] [ndv4:68764:0] async.c:506 UCX DEBUG listening to async event fd 63 events 0x1 mode thread_spinlock | |
[1650638548.469975] [ndv4:68764:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib3' (InfiniBand channel adapter) with 1 ports | |
[1650638548.470323] [ndv4:68764:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638548.470332] [ndv4:68764:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638548.470338] [ndv4:68764:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.470394] [ndv4:68764:0] ib_md.c:1319 UCX DEBUG mlx5_ib3: using registration cache | |
[1650638548.472310] [ndv4:68755:0] topo.c:99 UCX DEBUG bus id 0x107000000 doesn't exist. sys_dev = 6 | |
[1650638548.472317] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638548.471969] [ndv4:68766:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib1:1 | |
[1650638548.473285] [ndv4:68766:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638548.473298] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638548.473301] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638548.473354] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638548.473542] [ndv4:68762:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.473555] [ndv4:68762:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.473580] [ndv4:68762:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.473639] [ndv4:68762:0] module.c:253 UCX DEBUG loading modules for ucm | |
[1650638548.473658] [ndv4:68762:0] ib_md.c:1319 UCX DEBUG mlx5_ib0: using registration cache | |
[1650638548.473776] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638548.473784] [ndv4:68766:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib1:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638548.473816] [ndv4:68766:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib1 length=2048) failed: Invalid argument | |
[1650638548.473820] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638548.473828] [ndv4:68766:0] async.c:228 UCX DEBUG added async handler 0x3529f90 [id=87 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638548.473848] [ndv4:68766:0] async.c:506 UCX DEBUG listening to async event fd 87 events 0x1 mode thread_spinlock | |
[1650638548.474583] [ndv4:68766:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638548.475234] [ndv4:70439:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.475243] [ndv4:70439:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.474129] [ndv4:69075:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib1 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.474644] [ndv4:69075:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib1: disable ODP because it's not supported for DevX QP | |
[1650638548.475811] [ndv4:69236:0] ud_iface.c:393 UCX DEBUG iface 0x1c1f5c0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 2) | |
[1650638548.476657] [ndv4:69236:0] ud_iface.c:393 UCX DEBUG iface 0x1c1f5c0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 3) | |
[1650638548.475995] [ndv4:70055:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.476004] [ndv4:70055:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.474037] [ndv4:68754:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib3 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.474413] [ndv4:68754:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib3: disable ODP because it's not supported for DevX QP | |
[1650638548.476009] [ndv4:68754:0] async.c:228 UCX DEBUG added async handler 0x2126b10 [id=63 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.476038] [ndv4:68754:0] async.c:506 UCX DEBUG listening to async event fd 63 events 0x1 mode thread_spinlock | |
[1650638548.476042] [ndv4:68754:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib3' (InfiniBand channel adapter) with 1 ports | |
[1650638548.476187] [ndv4:68754:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638548.476195] [ndv4:68754:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638548.476202] [ndv4:68754:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.476276] [ndv4:68754:0] ib_md.c:1319 UCX DEBUG mlx5_ib3: using registration cache | |
[1650638548.473930] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638548.473937] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638548.476555] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638548.476561] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638548.476951] [ndv4:68946:0] topo.c:99 UCX DEBUG bus id 0x102000000 doesn't exist. sys_dev = 1 | |
[1650638548.476960] [ndv4:68946:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.479086] [ndv4:68762:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.479104] [ndv4:68762:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.479349] [ndv4:68762:0] ib_md.c:1604 UCX DEBUG mlx5_ib0: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.479819] [ndv4:68762:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.479826] [ndv4:68762:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.479079] [ndv4:70439:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.479085] [ndv4:70439:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.479179] [ndv4:70055:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.479186] [ndv4:70055:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.480510] [ndv4:69048:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.480519] [ndv4:69048:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.481606] [ndv4:68754:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.481613] [ndv4:68754:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.481852] [ndv4:68754:0] ib_md.c:1604 UCX DEBUG mlx5_ib3: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.482281] [ndv4:68754:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638548.482287] [ndv4:68754:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638548.482362] [ndv4:69075:0] async.c:228 UCX DEBUG added async handler 0xaacf80 [id=59 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.482391] [ndv4:69075:0] async.c:506 UCX DEBUG listening to async event fd 59 events 0x1 mode thread_spinlock | |
[1650638548.482395] [ndv4:69075:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib1' (InfiniBand channel adapter) with 1 ports | |
[1650638548.482731] [ndv4:69075:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.482740] [ndv4:69075:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.482751] [ndv4:69075:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.482807] [ndv4:69075:0] ib_md.c:1319 UCX DEBUG mlx5_ib1: using registration cache | |
[1650638548.482263] [ndv4:69236:0] ud_iface.c:393 UCX DEBUG iface 0x1c1f5c0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 4) | |
[1650638548.483992] [ndv4:69236:0] ud_iface.c:393 UCX DEBUG iface 0x1c1f5c0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 5) | |
[1650638548.484873] [ndv4:69236:0] ud_iface.c:393 UCX DEBUG iface 0x1c1f5c0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 6) | |
[1650638548.482787] [ndv4:68766:0] dc_mlx5.c:1327 UCX DEBUG dc iface 0x36c6010: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x307a | |
[1650638548.483010] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.483016] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.483026] [ndv4:68766:0] mpool.c:205 UCX DEBUG mpool rcache_mp: allocated chunk 0x2b8e86040008 of 151544 bytes with 1052 elements | |
[1650638548.484150] [ndv4:68946:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.484158] [ndv4:68946:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.484259] [ndv4:68946:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.484264] [ndv4:68946:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.485750] [ndv4:68946:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.485756] [ndv4:68946:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.485817] [ndv4:68946:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.485820] [ndv4:68946:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.486141] [ndv4:68946:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.484476] [ndv4:68754:0] topo.c:99 UCX DEBUG bus id 0x104000000 doesn't exist. sys_dev = 3 | |
[1650638548.484487] [ndv4:68754:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.485241] [ndv4:70462:0] topo.c:99 UCX DEBUG bus id 0x106000000 doesn't exist. sys_dev = 5 | |
[1650638548.485250] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638548.485547] [ndv4:70439:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.485565] [ndv4:70439:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.486046] [ndv4:69236:0] ud_iface.c:393 UCX DEBUG iface 0x1c1f5c0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 7) | |
[1650638548.486397] [ndv4:70462:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638548.486403] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638548.486575] [ndv4:70462:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638548.486579] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638548.486582] [ndv4:69236:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638548.486968] [ndv4:68766:0] ib_md.c:812 UCX DEBUG registered memory 0x2b8e83a00000..0x2b8e86000000 on mlx5_ib1 lkey 0x81400 rkey 0x81400 access 0xf flags 0x3e4 | |
[1650638548.486984] [ndv4:68766:0] mpool.c:205 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x2b8e83a00018 of 39845864 bytes with 4752 elements | |
[1650638548.487121] [ndv4:68766:0] dc_mlx5.c:1346 UCX DEBUG created dc iface 0x36c6010 | |
[1650638548.487154] [ndv4:68766:0] ucp_worker.c:1159 UCX DEBUG created interface[8]=0x36c6010 using dc_mlx5/mlx5_ib1:1 on worker 0x2c514d0 | |
[1650638548.486708] [ndv4:68764:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.486722] [ndv4:68764:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.486910] [ndv4:68764:0] ib_md.c:1604 UCX DEBUG mlx5_ib3: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.486941] [ndv4:68764:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638548.486948] [ndv4:68764:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638548.487343] [ndv4:68764:0] topo.c:99 UCX DEBUG bus id 0x104000000 doesn't exist. sys_dev = 3 | |
[1650638548.487349] [ndv4:68764:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.488626] [ndv4:68764:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.488634] [ndv4:68764:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.489283] [ndv4:68764:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.489289] [ndv4:68764:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.487800] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.487826] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.488161] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.488168] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.489483] [ndv4:68766:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib1:1 | |
[1650638548.488599] [ndv4:70038:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib3 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.489012] [ndv4:70038:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib3: disable ODP because it's not supported for DevX QP | |
[1650638548.490173] [ndv4:70038:0] async.c:228 UCX DEBUG added async handler 0x14f5590 [id=63 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.490227] [ndv4:70038:0] async.c:506 UCX DEBUG listening to async event fd 63 events 0x1 mode thread_spinlock | |
[1650638548.490230] [ndv4:70038:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib3' (InfiniBand channel adapter) with 1 ports | |
[1650638548.490499] [ndv4:70038:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638548.490508] [ndv4:70038:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638548.490517] [ndv4:70038:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.490568] [ndv4:70038:0] ib_md.c:1319 UCX DEBUG mlx5_ib3: using registration cache | |
[1650638548.490448] [ndv4:68762:0] topo.c:99 UCX DEBUG bus id 0x101000000 doesn't exist. sys_dev = 0 | |
[1650638548.490458] [ndv4:68762:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.490844] [ndv4:69048:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.490853] [ndv4:69048:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.491098] [ndv4:69048:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.491023] [ndv4:68764:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.491029] [ndv4:68764:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.491279] [ndv4:68764:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.491284] [ndv4:68764:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.491516] [ndv4:68764:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.492499] [ndv4:68946:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib2 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.492862] [ndv4:68946:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib2: disable ODP because it's not supported for DevX QP | |
[1650638548.493125] [ndv4:68946:0] async.c:228 UCX DEBUG added async handler 0x22b1a60 [id=61 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.493150] [ndv4:68946:0] async.c:506 UCX DEBUG listening to async event fd 61 events 0x1 mode thread_spinlock | |
[1650638548.493154] [ndv4:68946:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib2' (InfiniBand channel adapter) with 1 ports | |
[1650638548.497031] [ndv4:69236:0] async.c:228 UCX DEBUG added async handler 0x1b788f0 [id=81 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638548.497075] [ndv4:69236:0] async.c:506 UCX DEBUG listening to async event fd 81 events 0x5 mode thread_spinlock | |
[1650638548.497361] [ndv4:69236:0] ucp_worker.c:1159 UCX DEBUG created interface[4]=0x1c1f5c0 using ud_verbs/mlx5_ib0:1 on worker 0x24e34d0 | |
[1650638548.497391] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.497399] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.497460] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.497464] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.497674] [ndv4:69236:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib0:1 | |
[1650638548.498493] [ndv4:69236:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638548.498971] [ndv4:69236:0] ib_iface.c:994 UCX DEBUG iface=0x1c2a8f0: created UD QP 0x3114 on mlx5_ib0:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638548.498987] [ndv4:69236:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post | |
[1650638548.499682] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638548.499726] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.499731] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.499744] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.499748] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.500229] [ndv4:69236:0] ib_md.c:812 UCX DEBUG registered memory 0x2abae8903000..0x2abae8988000 on mlx5_ib0 lkey 0x81900 rkey 0x81900 access 0xf flags 0x3e4 | |
[1650638548.500235] [ndv4:69236:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2abae8903018 of 544744 bytes with 128 elements | |
[1650638548.500240] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638548.500281] [ndv4:69236:0] ud_iface.c:393 UCX DEBUG iface 0x1c2a8f0: adding gid fe80::15:5dff:fd34:1b to hash on device mlx5_ib0 port 1 index 0) | |
[1650638548.500315] [ndv4:69236:0] ud_iface.c:393 UCX DEBUG iface 0x1c2a8f0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 1) | |
[1650638548.500346] [ndv4:69236:0] ud_iface.c:393 UCX DEBUG iface 0x1c2a8f0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 2) | |
[1650638548.500364] [ndv4:69236:0] ud_iface.c:393 UCX DEBUG iface 0x1c2a8f0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 3) | |
[1650638548.500378] [ndv4:69236:0] ud_iface.c:393 UCX DEBUG iface 0x1c2a8f0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 4) | |
[1650638548.500393] [ndv4:69236:0] ud_iface.c:393 UCX DEBUG iface 0x1c2a8f0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 5) | |
[1650638548.500418] [ndv4:69236:0] ud_iface.c:393 UCX DEBUG iface 0x1c2a8f0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 6) | |
[1650638548.500431] [ndv4:69236:0] ud_iface.c:393 UCX DEBUG iface 0x1c2a8f0: adding gid fe80:: to hash on device mlx5_ib0 port 1 index 7) | |
[1650638548.500443] [ndv4:69236:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib0:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638548.500537] [ndv4:69236:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638548.500547] [ndv4:69236:0] async.c:228 UCX DEBUG added async handler 0x1b7[1650638548.493321] [ndv4:68946:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638548.493331] [ndv4:68946:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638548.493336] [ndv4:68946:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.493379] [ndv4:68946:0] ib_md.c:1319 UCX DEBUG mlx5_ib2: using registration cache | |
[1650638548.498489] [ndv4:68764:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib4 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.498865] [ndv4:68764:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib4: disable ODP because it's not supported for DevX QP | |
[1650638548.499117] [ndv4:68764:0] async.c:228 UCX DEBUG added async handler 0x2c2bcf0 [id=65 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.499145] [ndv4:68764:0] async.c:506 UCX DEBUG listening to async event fd 65 events 0x1 mode thread_spinlock | |
[1650638548.499149] [ndv4:68764:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib4' (InfiniBand channel adapter) with 1 ports | |
[1650638548.499314] [ndv4:68764:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638548.499321] [ndv4:68764:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638548.499334] [ndv4:68764:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.499392] [ndv4:68764:0] ib_md.c:1319 UCX DEBUG mlx5_ib4: using registration cache | |
[1650638548.569385] [ndv4:68764:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.569414] [ndv4:68764:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.569753] [ndv4:68764:0] ib_md.c:1604 UCX DEBUG mlx5_ib4: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.570062] [ndv4:68764:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638548.570070] [ndv4:68764:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638548.499967] [ndv4:69919:0] debug.c:1198 UCX DEBUG using signal stack 0x2b889d013000 size 141824 | |
[1650638548.500045] [ndv4:69919:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638548.500064] [ndv4:69919:0] init.c:113 UCX DEBUG /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 loaded at 0x2b889ce67000 | |
[1650638548.500086] [ndv4:69919:0] init.c:115 UCX DEBUG cmd line: osu_scatter | |
[1650638548.500094] [ndv4:69919:0] module.c:69 UCX DEBUG ucs library path: /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 | |
[1650638548.500100] [ndv4:69919:0] module.c:253 UCX DEBUG loading modules for ucs | |
[1650638548.502483] [ndv4:69919:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638548.502502] [ndv4:69919:0] time.c:22 UCX DEBUG measured arch clock speed: 1000000.00 Hz | |
[1650638548.502534] [ndv4:69919:0] ucp_context.c:1361 UCX DEBUG estimated number of endpoints is 96 | |
[1650638548.502537] [ndv4:69919:0] ucp_context.c:1368 UCX DEBUG estimated number of endpoints per node is 96 | |
[1650638548.502543] [ndv4:69919:0] ucp_context.c:1375 UCX DEBUG estimated bcopy bandwidth is 5251268608.000000 | |
[1650638548.502549] [ndv4:69919:0] ucp_context.c:1427 UCX DEBUG allocation method[0] is md 'sysv' | |
[1650638548.502552] [ndv4:69919:0] ucp_context.c:1427 UCX DEBUG allocation method[1] is md 'posix' | |
[1650638548.502556] [ndv4:69919:0] ucp_context.c:1439 UCX DEBUG allocation method[2] is 'huge' | |
[1650638548.502559] [ndv4:69919:0] ucp_context.c:1439 UCX DEBUG allocation method[3] is 'thp' | |
[1650638548.502561] [ndv4:69919:0] ucp_context.c:1427 UCX DEBUG allocation method[4] is md '*' | |
[1650638548.502563] [ndv4:69919:0] ucp_context.c:1439 UCX DEBUG allocation method[5] is 'mmap' | |
[1650638548.502565] [ndv4:69919:0] ucp_context.c:1439 UCX DEBUG allocation method[6] is 'heap' | |
[1650638548.502573] [ndv4:69919:0] module.c:253 UCX DEBUG loading modules for uct | |
[1650638548.507732] [ndv4:69919:0] module.c:253 UCX DEBUG loading modules for uct_ib | |
[1650638548.508085] [ndv4:69919:0] ucp_context.c:1117 UCX DEBUG closing md posix because it has no selected transport resources | |
[1650638548.508109] [ndv4:69919:0] ucp_context.c:1117 UCX DEBUG closing md self because it has no selected transport resources | |
[1650638548.508229] [ndv4:69919:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib4) failed: Cannot assign requested address | |
[1650638548.508243] [ndv4:69919:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib3) failed: Cannot assign requested address | |
[1650638548.508253] [ndv4:69919:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib2) failed: Cannot assign requested address | |
[1650638548.508264] [ndv4:69919:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib7) failed: Cannot assign requested address | |
[1650638548.508274] [ndv4:69919:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib6) failed: Cannot assign requested address | |
[1650638548.508285] [ndv4:69919:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib5) failed: Cannot assign requested address | |
[1650638548.508296] [ndv4:69919:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib1) failed: Cannot assign requested address | |
[1650638548.508331] [ndv4:69919:0] ucp_context.c:1117 UCX DEBUG closing md tcp because it has no selected transport resources | |
[1650638548.508692] [ndv4:69919:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: Success, continuing, but fork may be unsafe. | |
[1650638548.514540] [ndv4:69919:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib0 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.514840] [ndv4:69919:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib0: disable ODP because it's not supported for DevX QP | |
[1650638548.522394] [ndv4:69919:0] asy[1650638548.525155] [ndv4:70448:0] debug.c:1198 UCX DEBUG using signal stack 0x2b091fc2a000 size 141824 | |
[1650638548.525269] [ndv4:70448:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638548.525291] [ndv4:70448:0] init.c:113 UCX DEBUG /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 loaded at 0x2b091fa98000 | |
[1650638548.525312] [ndv4:70448:0] init.c:115 UCX DEBUG cmd line: osu_scatter | |
[1650638548.525321] [ndv4:70448:0] module.c:69 UCX DEBUG ucs library path: /cvmfs/ndv4.azure/CentOS-HPC/7.9.2021052401/EasyBuild/software/UCX/1.11.2-GCCcore-11.2.0/lib64/libucs.so.0 | |
[1650638548.525327] [ndv4:70448:0] module.c:253 UCX DEBUG loading modules for ucs | |
[1650638548.527637] [ndv4:70448:0] cpu.c:231 UCX DEBUG CPU does not support invariant TSC, using fallback timer | |
[1650638548.527658] [ndv4:70448:0] time.c:22 UCX DEBUG measured arch clock speed: 1000000.00 Hz | |
[1650638548.527693] [ndv4:70448:0] ucp_context.c:1361 UCX DEBUG estimated number of endpoints is 96 | |
[1650638548.527696] [ndv4:70448:0] ucp_context.c:1368 UCX DEBUG estimated number of endpoints per node is 96 | |
[1650638548.527703] [ndv4:70448:0] ucp_context.c:1375 UCX DEBUG estimated bcopy bandwidth is 5251268608.000000 | |
[1650638548.527710] [ndv4:70448:0] ucp_context.c:1427 UCX DEBUG allocation method[0] is md 'sysv' | |
[1650638548.527712] [ndv4:70448:0] ucp_context.c:1427 UCX DEBUG allocation method[1] is md 'posix' | |
[1650638548.527717] [ndv4:70448:0] ucp_context.c:1439 UCX DEBUG allocation method[2] is 'huge' | |
[1650638548.527719] [ndv4:70448:0] ucp_context.c:1439 UCX DEBUG allocation method[3] is 'thp' | |
[1650638548.527721] [ndv4:70448:0] ucp_context.c:1427 UCX DEBUG allocation method[4] is md '*' | |
[1650638548.527724] [ndv4:70448:0] ucp_context.c:1439 UCX DEBUG allocation method[5] is 'mmap' | |
[1650638548.527726] [ndv4:70448:0] ucp_context.c:1439 UCX DEBUG allocation method[6] is 'heap' | |
[1650638548.527734] [ndv4:70448:0] module.c:253 UCX DEBUG loading modules for uct | |
[1650638548.532720] [ndv4:70448:0] module.c:253 UCX DEBUG loading modules for uct_ib | |
[1650638548.533059] [ndv4:70448:0] ucp_context.c:1117 UCX DEBUG closing md posix because it has no selected transport resources | |
[1650638548.533079] [ndv4:70448:0] ucp_context.c:1117 UCX DEBUG closing md self because it has no selected transport resources | |
[1650638548.533244] [ndv4:70448:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib4) failed: Cannot assign requested address | |
[1650638548.533261] [ndv4:70448:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib3) failed: Cannot assign requested address | |
[1650638548.533274] [ndv4:70448:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib2) failed: Cannot assign requested address | |
[1650638548.533286] [ndv4:70448:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib7) failed: Cannot assign requested address | |
[1650638548.533298] [ndv4:70448:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib6) failed: Cannot assign requested address | |
[1650638548.533310] [ndv4:70448:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib5) failed: Cannot assign requested address | |
[1650638548.533322] [ndv4:70448:0] sock.c:88 UCX DEBUG ioctl(req=35093, ifr_name=ib1) failed: Cannot assign requested address | |
[1650638548.533355] [ndv4:70448:0] ucp_context.c:1117 UCX DEBUG closing md tcp because it has no selected transport resources | |
[1650638548.533709] [ndv4:70448:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: Success, continuing, but fork may be unsafe. | |
[1650638548.572886] [ndv4:70448:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib0 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.573229] [ndv4:70448:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib0: disable ODP because it's not supported for DevX QP | |
[1650638548.580187] [ndv4:70448:0] asy9a90 [id=82 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638548.500567] [ndv4:69236:0] async.c:506 UCX DEBUG listening to async event fd 82 events 0x5 mode thread_spinlock | |
[1650638548.500581] [ndv4:69236:0] ucp_worker.c:1159 UCX DEBUG created interface[5]=0x1c2a8f0 using ud_mlx5/mlx5_ib0:1 on worker 0x24e34d0 | |
[1650638548.500593] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.500599] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.500656] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.500660] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.500832] [ndv4:69236:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib1:1 | |
[1650638548.501496] [ndv4:69236:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 91 data_sz 8256 | |
[1650638548.501505] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638548.501508] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638548.501525] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638548.501861] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638548.501866] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 192 | |
[1650638548.502498] [ndv4:69236:0] ib_iface.c:994 UCX DEBUG iface=0x2db3050: created RC QP 0x305d on mlx5_ib1:1 TX wr:409 sge:4 inl:124 resp:64 RX wr:0 sge:0 resp:64 | |
[1650638548.503626] [ndv4:69236:0] ucp_worker.c:1159 UCX DEBUG created interface[6]=0x2db3050 using rc_verbs/mlx5_ib1:1 on worker 0x24e34d0 | |
[1650638548.503646] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.503651] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.503719] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.503723] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.503901] [ndv4:69236:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib1:1 | |
[1650638548.504988] [ndv4:69236:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638548.504998] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638548.505001] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638548.505017] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638548.505427] [ndv4:69236:0] mpool.c:205 UCX DEBUG mpool devx dbrec: allocated chunk 0x2d0c010 of 8176 bytes with 127 elements | |
[1650638548.505632] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64 | |
[1650638548.505641] [ndv4:69236:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib1:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638548.505678] [ndv4:69236:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib1 length=2048) failed: Invalid argument | |
[1650638548.505682] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638548.505694] [ndv4:69236:0] async.c:228 UCX DEBUG added async handler 0x1b78960 [id=85 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638548.505717] [ndv4:69236:0] async.c:506 UCX DEBUG listening [1650638548.513875] [ndv4:68946:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.513896] [ndv4:68946:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.514072] [ndv4:68946:0] ib_md.c:1604 UCX DEBUG mlx5_ib2: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.514103] [ndv4:68946:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638548.514109] [ndv4:68946:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638548.514200] [ndv4:68946:0] topo.c:99 UCX DEBUG bus id 0x103000000 doesn't exist. sys_dev = 2 | |
[1650638548.514244] [ndv4:68946:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.514307] [ndv4:68946:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.514311] [ndv4:68946:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.514459] [ndv4:68946:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.514462] [ndv4:68946:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.514559] [ndv4:68946:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.514562] [ndv4:68946:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.514621] [ndv4:68946:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.514623] [ndv4:68946:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.514873] [ndv4:68946:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.520880] [ndv4:68946:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib3 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.521260] [ndv4:68946:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib3: disable ODP because it's not supported for DevX QP | |
[1650638548.521871] [ndv4:68946:0] async.c:228 UCX DEBUG added async handler 0x22ba680 [id=63 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.521904] [ndv4:68946:0] async.c:506 UCX DEBUG listening to async event fd 63 events 0x1 mode thread_spinlock | |
[1650638548.521907] [ndv4:68946:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib3' (InfiniBand channel adapter) with 1 ports | |
[1650638548.522041] [ndv4:68946:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638548.522049] [ndv4:68946:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638548.522063] [ndv4:68946:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.522227] [ndv4:68946:0] ib_md.c:1319 UCX DEBUG mlx5_ib3: using registration cache | |
[1650638548.498009] [ndv4:69048:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib1 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.498390] [ndv4:69048:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib1: disable ODP because it's not supported for DevX QP | |
[1650638548.498651] [ndv4:69048:0] async.c:228 UCX DEBUG added async handler 0x254b3e0 [id=59 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.498676] [ndv4:69048:0] async.c:506 UCX DEBUG listening to async event fd 59 events 0x1 mode thread_spinlock | |
[1650638548.498680] [ndv4:69048:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib1' (InfiniBand channel adapter) with 1 ports | |
[1650638548.498836] [ndv4:69048:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.498843] [ndv4:69048:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.498855] [ndv4:69048:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.498915] [ndv4:69048:0] ib_md.c:1319 UCX DEBUG mlx5_ib1: using registration cache | |
[1650638548.499703] [ndv4:69048:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.499710] [ndv4:69048:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.499916] [ndv4:69048:0] ib_md.c:1604 UCX DEBUG mlx5_ib1: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.499937] [ndv4:69048:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.499942] [ndv4:69048:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.500028] [ndv4:69048:0] topo.c:99 UCX DEBUG bus id 0x102000000 doesn't exist. sys_dev = 1 | |
[1650638548.500032] [ndv4:69048:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.500092] [ndv4:69048:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.500095] [ndv4:69048:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.500154] [ndv4:69048:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.500156] [ndv4:69048:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.500232] [ndv4:69048:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.500236] [ndv4:69048:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.500344] [ndv4:69048:0] topo.c:91 UCX DEBUG bus id 0x102000000 exists. sys_dev = 1 | |
[1650638548.500347] [ndv4:69048:0] ib_device.c:1124 UCX DEBUG mlx5_ib1 bus id 258:0:0.0 sys_dev 1 | |
[1650638548.500594] [ndv4:69048:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.506978] [ndv4:69048:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib2 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.507382] [ndv4:69048:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib2: disable ODP because it's not supported for DevX QP | |
[1650638548.507622] [ndv4:69048:0] async.c:228 UCX DEBUG added async handler 0x251d0c0 [id=61 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.507645] [ndv4:69048:0] async.c:506 UCX DEBUG listening to async event fd 61 events 0x1 mode thread_spinlock | |
[1650638548.507648] [ndv4:69048:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib2' (InfiniBand channel adapter) with 1 ports | |
[1650638548.507875] [ndv4:69048:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638548.507883] [ndv4:69048:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638548.507887] [ndv4:69048:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.507929] [ndv4:69048:0] ib_md.c:1319 UCX DEBUG mlx5_ib2: usinto async event fd 85 events 0x1 mode thread_spinlock | |
[1650638548.505726] [ndv4:69236:0] ucp_worker.c:1159 UCX DEBUG created interface[7]=0x2c39020 using rc_mlx5/mlx5_ib1:1 on worker 0x24e34d0 | |
[1650638548.505739] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.505743] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.505812] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.505816] [ndv4:69236:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.505968] [ndv4:69236:0] ib_iface.c:857 UCX DEBUG using pkey[0] 0x8019 on mlx5_ib1:1 | |
[1650638548.507019] [ndv4:69236:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 90 data_sz 8256 | |
[1650638548.507038] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8356 | |
[1650638548.507042] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8320 | |
[1650638548.507094] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 48 | |
[1650638548.586545] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112 | |
[1650638548.586595] [ndv4:69236:0] ib_mlx5.c:858 UCX DEBUG SL=0 (AR support - unknown) was selected on mlx5_ib1:1, SLs with AR support = { <none> }, SLs without AR support = { <none> } | |
[1650638548.586642] [ndv4:69236:0] rc_mlx5_common.c:727 UCX DEBUG ibv_alloc_dm(dev=mlx5_ib1 length=2048) failed: Invalid argument | |
[1650638548.586651] [ndv4:69236:0] mpool.c:88 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 72 | |
[1650638548.586665] [ndv4:69236:0] async.c:228 UCX DEBUG added async handler 0x2dbbf90 [id=87 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash | |
[1650638548.586693] [ndv4:69236:0] async.c:506 UCX DEBUG listening to async event fd 87 events 0x1 mode thread_spinlock | |
[1650638548.587458] [ndv4:69236:0] dc_mlx5.c:823 UCX DEBUG creating dci pool 0 with 8 QPs | |
[1650638548.573643] [ndv4:69075:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.573679] [ndv4:69075:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.573987] [ndv4:69075:0] ib_md.c:1604 UCX DEBUG mlx5_ib1: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.574043] [ndv4:69075:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.574056] [ndv4:69075:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.570689] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638548.570702] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638548.570912] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x107000000 exists. sys_dev = 6 | |
[1650638548.570916] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib6 bus id 263:0:0.0 sys_dev 6 | |
[1650638548.571196] [ndv4:68755:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.579372] [ndv4:68755:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib7 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.579758] [ndv4:68755:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib7: disable ODP because it's not supported for DevX QP | |
[1650638548.580016] [ndv4:68755:0] async.c:228 UCX DEBUG added async handler 0xcaadb0 [id=71 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.580048] [ndv4:68755:0] async.c:506 UCX DEBUG listening to async event fd 71 events 0x1 mode thread_spinlock | |
[1650638548.580052] [ndv4:68755:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib7' (InfiniBand channel adapter) with 1 ports | |
[1650638548.580246] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638548.580258] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638548.580271] [ndv4:68755:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.580341] [ndv4:68755:0] ib_md.c:1319 UCX DEBUG mlx5_ib7: using registration cache | |
[1650638548.588731] [ndv4:68755:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.588834] [ndv4:68755:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
nc.c:228 UCX DEBUG added async handler 0x11dc340 [id=54 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.522486] [ndv4:69919:0] async.c:506 UCX DEBUG listening to async event fd 54 events 0x1 mode thread_spinlock | |
[1650638548.522575] [ndv4:69919:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib0' (InfiniBand channel adapter) with 1 ports | |
[1650638548.522776] [ndv4:69919:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.522784] [ndv4:69919:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.522813] [ndv4:69919:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.522900] [ndv4:69919:0] module.c:253 UCX DEBUG loading modules for ucm | |
[1650638548.522925] [ndv4:69919:0] ib_md.c:1319 UCX DEBUG mlx5_ib0: using registration cache | |
[1650638548.566126] [ndv4:69919:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.566146] [ndv4:69919:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.566377] [ndv4:69919:0] ib_md.c:1604 UCX DEBUG mlx5_ib0: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.566401] [ndv4:69919:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.566407] [ndv4:69919:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.567049] [ndv4:69919:0] topo.c:99 UCX DEBUG bus id 0x101000000 doesn't exist. sys_dev = 0 | |
[1650638548.567054] [ndv4:69919:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.567978] [ndv4:69919:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.567983] [ndv4:69919:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.569007] [ndv4:69919:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.569013] [ndv4:69919:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.569280] [ndv4:69919:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.569285] [ndv4:69919:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.569489] [ndv4:69919:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.569493] [ndv4:69919:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.569786] [ndv4:69919:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.574068] [ndv4:70462:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638548.574081] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638548.575038] [ndv4:70462:0] topo.c:91 UCX DEBUG bus id 0x106000000 exists. sys_dev = 5 | |
[1650638548.575044] [ndv4:70462:0] ib_device.c:1124 UCX DEBUG mlx5_ib5 bus id 262:0:0.0 sys_dev 5 | |
[1650638548.575350] [ndv4:70462:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.582784] [ndv4:70462:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib6 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.583140] [ndv4:70462:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib6: disable ODP because it's not supported for DevX QP | |
[1650638548.583413] [ndv4:70462:0] async.c:228 UCX DEBUG added async handler 0x1178da0 [id=69 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.583437] [ndv4:70462:0] async.c:506 UCX DEBUG listening to async event fd 69 events 0x1 mode thread_spinlock | |
[1650638548.583441] [ndv4:70462:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib6' (InfiniBand channel adapter) with 1 ports | |
[1650638548.583619] [ndv4:70462:0] ib_md.c:296 UCX DEBUG mlx5_ib6: cuda GPUDirect RDMA is enabled | |
[1650638548.583628] [ndv4:70462:0] ib_md.c:296 UCX DEBUG mlx5_ib6: rocm GPUDirect RDMA is disabled | |
[1650638548.583640] [ndv4:70462:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.583695] [ndv4:70462:0] ib_md.c:1319 UCX DEBUG mlx5_ib6: using registration cache | |
[1650638548.565841] [ndv4:70055:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.565852] [ndv4:70055:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.566358] [ndv4:70055:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.566362] [ndv4:70055:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.566620] [ndv4:70055:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
nc.c:228 UCX DEBUG added async handler 0x136e690 [id=54 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.580292] [ndv4:70448:0] async.c:506 UCX DEBUG listening to async event fd 54 events 0x1 mode thread_spinlock | |
[1650638548.580782] [ndv4:70448:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib0' (InfiniBand channel adapter) with 1 ports | |
[1650638548.580991] [ndv4:70448:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.580999] [ndv4:70448:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.581021] [ndv4:70448:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.581081] [ndv4:70448:0] module.c:253 UCX DEBUG loading modules for ucm | |
[1650638548.581102] [ndv4:70448:0] ib_md.c:1319 UCX DEBUG mlx5_ib0: using registration cache | |
[1650638548.582800] [ndv4:70448:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.582813] [ndv4:70448:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.582976] [ndv4:70448:0] ib_md.c:1604 UCX DEBUG mlx5_ib0: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.582998] [ndv4:70448:0] ib_md.c:296 UCX DEBUG mlx5_ib0: cuda GPUDirect RDMA is enabled | |
[1650638548.583004] [ndv4:70448:0] ib_md.c:296 UCX DEBUG mlx5_ib0: rocm GPUDirect RDMA is disabled | |
[1650638548.583221] [ndv4:70448:0] topo.c:99 UCX DEBUG bus id 0x101000000 doesn't exist. sys_dev = 0 | |
[1650638548.583226] [ndv4:70448:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.583335] [ndv4:70448:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.583339] [ndv4:70448:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.583556] [ndv4:70448:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.583561] [ndv4:70448:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.583690] [ndv4:70448:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.583693] [ndv4:70448:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.583855] [ndv4:70448:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.583858] [ndv4:70448:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.584085] [ndv4:70448:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
g registration cache | |
[1650638548.512587] [ndv4:69048:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.512593] [ndv4:69048:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.512830] [ndv4:69048:0] ib_md.c:1604 UCX DEBUG mlx5_ib2: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.512841] [ndv4:69048:0] ib_md.c:296 UCX DEBUG mlx5_ib2: cuda GPUDirect RDMA is enabled | |
[1650638548.512859] [ndv4:69048:0] ib_md.c:296 UCX DEBUG mlx5_ib2: rocm GPUDirect RDMA is disabled | |
[1650638548.512935] [ndv4:69048:0] topo.c:99 UCX DEBUG bus id 0x103000000 doesn't exist. sys_dev = 2 | |
[1650638548.512938] [ndv4:69048:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.512998] [ndv4:69048:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.513001] [ndv4:69048:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.513059] [ndv4:69048:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.513062] [ndv4:69048:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.513119] [ndv4:69048:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.513122] [ndv4:69048:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.513179] [ndv4:69048:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.513181] [ndv4:69048:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.513425] [ndv4:69048:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.520414] [ndv4:69048:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib3 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.520761] [ndv4:69048:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib3: disable ODP because it's not supported for DevX QP | |
[1650638548.521007] [ndv4:69048:0] async.c:228 UCX DEBUG added async handler 0x2c2c070 [id=63 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.521029] [ndv4:69048:0] async.c:506 UCX DEBUG listening to async event fd 63 events 0x1 mode thread_spinlock | |
[1650638548.521033] [ndv4:69048:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib3' (InfiniBand channel adapter) with 1 ports | |
[1650638548.521194] [ndv4:69048:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638548.521201] [ndv4:69048:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638548.521224] [ndv4:69048:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.521264] [ndv4:69048:0] ib_md.c:1319 UCX DEBUG mlx5_ib3: using registration cache | |
[1650638548.523417] [ndv4:69048:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.523430] [ndv4:69048:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.523624] [ndv4:69048:0] ib_md.c:1604 UCX DEBUG mlx5_ib3: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.523651] [ndv4:69048:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638548.523656] [ndv4:69048:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638548.523750] [ndv4:69048:0] topo.c:99 UCX DEBUG bus id 0x104000000 doesn't exist. sys_dev = 3 | |
[1650638548.523754] [ndv4:69048:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.523837] [ndv4:69048:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.523840] [ndv4:69048:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[16506[1650638548.568825] [ndv4:70439:0] topo.c:91 UCX DEBUG bus id 0x103000000 exists. sys_dev = 2 | |
[1650638548.568838] [ndv4:70439:0] ib_device.c:1124 UCX DEBUG mlx5_ib2 bus id 259:0:0.0 sys_dev 2 | |
[1650638548.569194] [ndv4:70439:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.576781] [ndv4:70439:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib3 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.577132] [ndv4:70439:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib3: disable ODP because it's not supported for DevX QP | |
[1650638548.577428] [ndv4:70439:0] async.c:228 UCX DEBUG added async handler 0x232c070 [id=63 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.577458] [ndv4:70439:0] async.c:506 UCX DEBUG listening to async event fd 63 events 0x1 mode thread_spinlock | |
[1650638548.577462] [ndv4:70439:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib3' (InfiniBand channel adapter) with 1 ports | |
[1650638548.577607] [ndv4:70439:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638548.577615] [ndv4:70439:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638548.577625] [ndv4:70439:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.577689] [ndv4:70439:0] ib_md.c:1319 UCX DEBUG mlx5_ib3: using registration cache | |
[1650638548.587010] [ndv4:68766:0] ib_iface.c:1469 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 92 hdr_ofs 44 data_sz 4096 | |
[1650638548.587879] [ndv4:68766:0] ib_iface.c:994 UCX DEBUG iface=0x238c6e0: created UD QP 0x305e on mlx5_ib1:1 TX wr:341 sge:5 inl:124 resp:0 RX wr:4096 sge:1 resp:0 | |
[1650638548.589172] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4196 | |
[1650638548.589291] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.589298] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.589521] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.589527] [ndv4:68766:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.590173] [ndv4:68766:0] ib_md.c:812 UCX DEBUG registered memory 0x2b8e86065000..0x2b8e860ea000 on mlx5_ib1 lkey 0x81500 rkey 0x81500 access 0xf flags 0x3e4 | |
[1650638548.590180] [ndv4:68766:0] mpool.c:205 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x2b8e86065018 of 544744 bytes with 128 elements | |
[1650638548.590184] [ndv4:68766:0] mpool.c:88 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168 | |
[1650638548.590379] [ndv4:68766:0] ud_iface.c:393 UCX DEBUG iface 0x238c6e0: adding gid fe80::15:5dff:fd34:1c to hash on device mlx5_ib1 port 1 index 0) | |
[1650638548.590520] [ndv4:68766:0] ud_iface.c:393 UCX DEBUG iface 0x238c6e0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 1) | |
[1650638548.590609] [ndv4:68766:0] ud_iface.c:393 UCX DEBUG iface 0x238c6e0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 2) | |
[1650638548.590696] [ndv4:68766:0] ud_iface.c:393 UCX DEBUG iface 0x238c6e0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 3) | |
[1650638548.590809] [ndv4:68766:0] ud_iface.c:393 UCX DEBUG iface 0x238c6e0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 4) | |
[1650638548.590889] [ndv4:68766:0] ud_iface.c:393 UCX DEBUG iface 0x238c6e0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 5) | |
[1650638548.590935] [ndv4:68766:0] ud_iface.c:393 UCX DEBUG iface 0x238c6e0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 6) | |
[1650638548.590991] [ndv4:68766:0] ud_iface.c:393 UCX DEBUG iface 0x238c6e0: adding gid fe80:: to hash on device mlx5_ib1 port 1 index 7) | |
[1650638548.586647] [ndv4:68946:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.586664] [ndv4:68946:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.586905] [ndv4:68946:0] ib_md.c:1604 UCX DEBUG mlx5_ib3: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.587124] [ndv4:68946:0] ib_md.c:296 UCX DEBUG mlx5_ib3: cuda GPUDirect RDMA is enabled | |
[1650638548.587134] [ndv4:68946:0] ib_md.c:296 UCX DEBUG mlx5_ib3: rocm GPUDirect RDMA is disabled | |
[1650638548.587490] [ndv4:68946:0] topo.c:99 UCX DEBUG bus id 0x104000000 doesn't exist. sys_dev = 3 | |
[1650638548.587499] [ndv4:68946:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.587679] [ndv4:68946:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.587683] [ndv4:68946:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.587988] [ndv4:68946:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.587993] [ndv4:68946:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.588126] [ndv4:68946:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.588129] [ndv4:68946:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.588275] [ndv4:68946:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.588278] [ndv4:68946:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.589062] [ndv4:68946:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
38548.524022] [ndv4:69048:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.524027] [ndv4:69048:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.524106] [ndv4:69048:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.524109] [ndv4:69048:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.524176] [ndv4:69048:0] topo.c:91 UCX DEBUG bus id 0x104000000 exists. sys_dev = 3 | |
[1650638548.524179] [ndv4:69048:0] ib_device.c:1124 UCX DEBUG mlx5_ib3 bus id 260:0:0.0 sys_dev 3 | |
[1650638548.524612] [ndv4:69048:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.531061] [ndv4:69048:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib4 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.531406] [ndv4:69048:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib4: disable ODP because it's not supported for DevX QP | |
[1650638548.531659] [ndv4:69048:0] async.c:228 UCX DEBUG added async handler 0x25494c0 [id=65 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.531683] [ndv4:69048:0] async.c:506 UCX DEBUG listening to async event fd 65 events 0x1 mode thread_spinlock | |
[1650638548.531686] [ndv4:69048:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib4' (InfiniBand channel adapter) with 1 ports | |
[1650638548.531830] [ndv4:69048:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638548.531838] [ndv4:69048:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638548.531844] [ndv4:69048:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.589547] [ndv4:69048:0] ib_md.c:1319 UCX DEBUG mlx5_ib4: using registration cache | |
[1650638548.589070] [ndv4:68755:0] ib_md.c:1604 UCX DEBUG mlx5_ib7: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.589110] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib7: cuda GPUDirect RDMA is enabled | |
[1650638548.589117] [ndv4:68755:0] ib_md.c:296 UCX DEBUG mlx5_ib7: rocm GPUDirect RDMA is disabled | |
[1650638548.589362] [ndv4:68755:0] topo.c:99 UCX DEBUG bus id 0x108000000 doesn't exist. sys_dev = 7 | |
[1650638548.589367] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638548.589552] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x108000000 exists. sys_dev = 7 | |
[1650638548.589556] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638548.589729] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x108000000 exists. sys_dev = 7 | |
[1650638548.589733] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638548.590355] [ndv4:68755:0] topo.c:91 UCX DEBUG bus id 0x108000000 exists. sys_dev = 7 | |
[1650638548.590366] [ndv4:68755:0] ib_device.c:1124 UCX DEBUG mlx5_ib7 bus id 264:0:0.0 sys_dev 7 | |
[1650638548.591038] [ndv4:68762:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.591059] [ndv4:68762:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.591152] [ndv4:68762:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.591155] [ndv4:68762:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.591276] [ndv4:68762:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.591285] [ndv4:68762:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.591364] [ndv4:68762:0] topo.c:91 UCX DEBUG bus id 0x101000000 exists. sys_dev = 0 | |
[1650638548.591368] [ndv4:68762:0] ib_device.c:1124 UCX DEBUG mlx5_ib0 bus id 257:0:0.0 sys_dev 0 | |
[1650638548.591706] [ndv4:68762:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.591068] [ndv4:70448:0] ib_device.c:542 UCX DEBUG PF: mlx5_ib1 vendor_id: 0x15b3 device_id: 4124 | |
[1650638548.591489] [ndv4:70448:0] ib_mlx5dv_md.c:489 UCX DEBUG mlx5_ib1: disable ODP because it's not supported for DevX QP | |
[1650638548.591735] [ndv4:70448:0] async.c:228 UCX DEBUG added async handler 0x1366fa0 [id=59 ref 1] uct_ib_async_event_handler() to hash | |
[1650638548.591769] [ndv4:70448:0] async.c:506 UCX DEBUG listening to async event fd 59 events 0x1 mode thread_spinlock | |
[1650638548.591772] [ndv4:70448:0] ib_device.c:655 UCX DEBUG initialized device 'mlx5_ib1' (InfiniBand channel adapter) with 1 ports | |
[1650638548.591579] [ndv4:69048:0] ib_md.c:1499 UCX DEBUG incorrect format of current_link_speed file: expected: <double> GT/s, actual: Unknown speed | |
[1650638548.591596] [ndv4:69048:0] mpool.c:88 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40 | |
[1650638548.591803] [ndv4:69048:0] ib_md.c:1604 UCX DEBUG mlx5_ib4: md open by 'uct_ib_mlx5_devx_md_ops' is successful | |
[1650638548.591829] [ndv4:69048:0] ib_md.c:296 UCX DEBUG mlx5_ib4: cuda GPUDirect RDMA is enabled | |
[1650638548.591835] [ndv4:69048:0] ib_md.c:296 UCX DEBUG mlx5_ib4: rocm GPUDirect RDMA is disabled | |
[1650638548.591928] [ndv4:69048:0] topo.c:99 UCX DEBUG bus id 0x105000000 doesn't exist. sys_dev = 4 | |
[1650638548.591932] [ndv4:69048:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638548.592046] [ndv4:69048:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638548.592050] [ndv4:69048:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638548.592108] [ndv4:69048:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638548.592111] [ndv4:69048:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638548.592200] [ndv4:69048:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638548.592235] [ndv4:69048:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638548.592299] [ndv4:69048:0] topo.c:91 UCX DEBUG bus id 0x105000000 exists. sys_dev = 4 | |
[1650638548.592302] [ndv4:69048:0] ib_device.c:1124 UCX DEBUG mlx5_ib4 bus id 261:0:0.0 sys_dev 4 | |
[1650638548.592539] [ndv4:69048:0] ib_md.c:1592 UCX DEBUG ibv_fork_init() failed: No such file or directory, continuing, but fork may be unsafe. | |
[1650638548.591997] [ndv4:70448:0] ib_md.c:296 UCX DEBUG mlx5_ib1: cuda GPUDirect RDMA is enabled | |
[1650638548.592007] [ndv4:70448:0] ib_md.c:296 UCX DEBUG mlx5_ib1: rocm GPUDirect RDMA is disabled | |
[1650638548.592022] [ndv4:70448:0] mpool.c:88 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144 | |
[1650638548.592113] [ndv4:70448:0] ib_md.c:1319 UCX DEBUG mlx5_ib1: using registration cache | |
[1650638548.591320] [ndv4:68766:0] timer_wheel.c:40 UCX DEBUG high res timer created log=12 resolution=4096.000000 usec wanted: 2500.000000 usec | |
[1650638548.591335] [ndv4:68766:0] async.c:228 UCX DEBUG added async handler 0x237c350 [id=88 ref 1] uct_ud_iface_async_handler() to hash | |
[1650638548.591367] [ndv4:68766:0] async.c:506 UCX DEBUG listening to async event fd 88 events 0x5 mode thread_spinlock | |
[1650638548.591400] [ndv4:68766:0] ucp_worker.c:1159 UCX DEBUG created interface[9]=0x238c6e0 using ud_verbs/mlx5_ib1:1 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment