-
-
Save jpkenny/64b7623a40b2c2881b87371c36bea80e to your computer and use it in GitHub Desktop.
Trouble verifying btl for tcp and RoCE
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Hi, | |
I’m trying to do some RoCE benchmarking on a cluster with Mellanox HCA’s: | |
02:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] | |
MLNX_OFED_LINUX-4.4-2.0.7.0 | |
I’m finding it quite challenging to understand what btl is actually being used based on openmpi’s debug output. I’m using openmpi 4.0.0 (along with a handful of older releases). For example, here’s a command line that I use to run a 16 node HPL test, trying to ensure that internode communication goes over a RoCE-capable btl rather than tcp: | |
/home/jpkenny/install/openmpi-4.0.0-carnac/bin/mpirun --mca btl_base_verbose 100 --mca btl ^tcp -n 64 -N 4 -hostfile hosts.txt ./xhpl | |
Among the interesting debug messages I see are messages of the form: | |
[en257.eth:118902] openib BTL: rdmacm CPC unavailable for use on mlx5_0:1; skipped | |
-------------------------------------------------------------------------- | |
No OpenFabrics connection schemes reported that they were able to be | |
used on a specific port. As such, the openib BTL (OpenFabrics | |
support) will be disabled for this port. | |
Local host: en254 | |
Local device: mlx5_0 | |
Local port: 1 | |
CPCs attempted: rdmacm, udcm | |
-------------------------------------------------------------------------- | |
[en262.eth:103810] select: init of component openib returned failure | |
[en264.eth:171198] select: init of component openib returned failure | |
[en264.eth:171198] mca: base: close: component openib closed | |
[en264.eth:171198] mca: base: close: unloading component openib | |
[en264.eth:171198] select: initializing btl component uct | |
[en264.eth:171198] select: init of component uct returned failure | |
[en264.eth:171198] mca: base: close: component uct closed | |
[en264.eth:171198] mca: base: close: unloading component uct | |
So, it looks to me like openib and uct transports are both failing, yet when I read out rdma counters with ethtool I see that the bulk of the traffic is going over rdma somehow (eth2 is the MT27800): | |
ib counters before: | |
rx_vport_rdma_unicast_packets: 115943830 | |
rx_vport_rdma_unicast_bytes: 195602189248 | |
tx_vport_rdma_unicast_packets: 273170117 | |
tx_vport_rdma_unicast_bytes: 374057100818 | |
eth0 counters before: | |
RX packets 87474728 bytes 43335706060 (40.3 GiB) | |
TX packets 61137838 bytes 71187999781 (66.2 GiB) | |
eth2 counters before: | |
RX packets 49490077 bytes 81084834515 (75.5 GiB) | |
TX packets 532970764 bytes 1742134134428 (1.5 TiB) | |
ib counters after: | |
rx_vport_rdma_unicast_packets: 117188033 | |
rx_vport_rdma_unicast_bytes: 200088022302 | |
tx_vport_rdma_unicast_packets: 274456328 | |
tx_vport_rdma_unicast_bytes: 378587627052 | |
eth0 counters after: | |
RX packets 87481208 bytes 43336915153 (40.3 GiB) | |
TX packets 61143485 bytes 71189606766 (66.3 GiB) | |
eth2 counters after: | |
RX packets 49490077 bytes 81084834515 (75.5 GiB) | |
TX packets 532970764 bytes 1742134134428 (1.5 TiB) | |
Yet, looking at the debug output after xhpl runs, I only see vader and self getting unloaded. The evidence suggests that there is no working intranode btl, yet the job runs properly and it looks like rdma transfers are occurring. Equally perplexing behavior is observed when I exclude openib/uct and expect to run over tcp. What’s actually going on here? | |
I’ll attach output from ompi_info along with the debug output that I’m referring to. I tried to include a compressed config.log, but the message was too big. | |
Thanks, | |
Joe |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment