Skip to content

Instantly share code, notes, and snippets.

@hiroyuki-sato
Last active August 29, 2015 14:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hiroyuki-sato/7d34bbdca78b7a810eca to your computer and use it in GitHub Desktop.
Save hiroyuki-sato/7d34bbdca78b7a810eca to your computer and use it in GitHub Desktop.
iSER and SRP over RoCE questions.

Goal

  • Build robust iSER or SRP over RoCE environment.

Environment

  • OS: CentOS7.1 (7.1.1503)
  • IB/RoCE: Inbox Driver
  • SCST: 6427(trunk)

Questions.

  • What Configuration should I check?
    • Change MLNX OFED
    • Update HCA Firmware to latest version. (It's supermicro server and bulition HCA card(MLNX))
    • And Others.

Basic info

Both machine config

cat /etc/modprobe.d/mlx4.conf 
options mlx4_en pfctx=3 pfcrx=3

MTU -> 4200

ibv_devinfo 
hca_id:	mlx4_0
	transport:			InfiniBand (0)
	fw_ver:				2.30.8000
	node_guid:			0025:90ff:ffdf:82b8
	sys_image_guid:			0025:90ff:ffdf:82bb
	vendor_id:			0x02c9
	vendor_part_id:			4099
	hw_ver:				0x0
	board_id:			SM_2241000001000
	phys_port_cnt:			1
		port:	1
			state:			PORT_ACTIVE (4)
			max_mtu:		4096 (5)
			active_mtu:		4096 (5)
			sm_lid:			0
			port_lid:		0
			port_lmc:		0x00
			link_layer:		Ethernet

Pictures

image

Problem.

  • iSER and SRP connection cause following error.
  • iSCSI(no rdma) seems work well. so maybe Cable is no problem.

iSER

Jul 28 20:44:59 srp_target kernel: isert_cm_evt:TIMEWAIT_EXIT(15) status:0 portal:ffff881050d95380 cm_id:ffff8808386f7c00
Jul 28 20:44:59 srp_target kernel: isert_conn_free conn:ffff880851b186c0
HANDLER vdisk_nullio {
  DEVICE disk_null
}

HANDLER vdisk_blockio {
  DEVICE SAS_DISK1 {
    filename /dev/disk/by-id/scsi-3600605b009e4f4e01d4241c13e907c3f
    t10_dev_id sasdisk1
  }
}
 
TARGET_DRIVER iscsi {
  enabled 1
  TARGET iqn.2006-10.tgt {
    allowed_portal 192.168.5.10
    QueuedCommands 128
    LUN 0 SAS_DISK1
    LUN 1 disk_null
    enabled 1
  }
}
iscsiadm -m discovery --op=new --op=delete --type sendtargets --portal 192.168.5.10:3260  -I iser
iscsiadm -m node -l

Long log

SRP

Jul 28 15:30:52 srp_target kernel: ib_srpt: receiving failed for idx 59 with status 5
Jul 28 15:30:52 srp_target kernel: ib_srpt: receiving failed for idx 60 with status 5
Jul 28 15:30:52 srp_target kernel: ib_srpt: receiving failed for idx 61 with status 5
...
Jul 28 15:30:54 srp_target kernel: ib_srpt: Received CM TimeWait exit for ch 192.168.5.10-615.
Jul 28 15:30:54 srp_target kernel: ib_srpt: Received CM TimeWait exit for ch 192.168.5.10-616.
cat /etc/modprobe.d/ib_srpt.conf 
options ib_srpt rdma_cm_port=5000
HANDLER vdisk_blockio {
  DEVICE SAS_DISK1 {
    filename /dev/disk/by-id/scsi-3600605b009e4f4e01d4241c13e907c3f
    t10_dev_id sasdisk1
  }
}
 
TARGET_DRIVER ib_srpt {
  TARGET fe80:0000:0000:0000:0225:90ff:fedf:82b9 {
    enabled 1
    LUN 0 SAS_DISK1
    nv_cache 1
  }
}
echo dest=192.168.5.10:5000,id_ext=002590ffffdf82b8,ioc_guid=002590ffffdf82b8 > /sys/class/infiniband_srp/srp-mlx4_0-1/add_target

target log

Some Test resluts

udaddy fail

Server

udaddy

Client

udaddy -s server_ip
udaddy: starting client
udaddy: connecting
udaddy: failure creating address handle
test complete
return status -1

rdma_server/client

rdma_server

rdma_server
rdma_server: start
rdma_server: end 0

rdma_client

rdma_client -s 192.168.5.10
rdma_client: start
rdma_client: end 0

ib_send_bw

ib_send_bw -d mlx4_0 -i 1 -F --report_gbits

************************************
* Waiting for client to connect... *
************************************
---------------------------------------------------------------------------------------
                    Send BW Test
 Dual-port       : OFF		Device         : mlx4_0
 Number of qps   : 1		Transport type : IB
 Connection type : RC		Using SRQ      : OFF
 RX depth        : 512
 CQ Moderation   : 100
 Mtu             : 2048[B]
 Link type       : Ethernet
 Gid index       : 0
 Max inline data : 0[B]
 rdma_cm QPs	 : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x00eb PSN 0xd32c4e
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:172:30:210:10
 remote address: LID 0000 QPN 0x00a6 PSN 0x57cfea
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:172:30:210:20
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
Conflicting CPU frequency values detected: 2100.082000 != 1942.417000
Test integrity may be harmed !
 65536      1000             0.00               31.59  		   0.060259
---------------------------------------------------------------------------------------
ib_send_bw -d mlx4_0 -i 1 -F --report_gbits 192.168.5.10
---------------------------------------------------------------------------------------
                    Send BW Test
 Dual-port       : OFF		Device         : mlx4_0
 Number of qps   : 1		Transport type : IB
 Connection type : RC		Using SRQ      : OFF
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 2048[B]
 Link type       : Ethernet
 Gid index       : 0
 Max inline data : 0[B]
 rdma_cm QPs	 : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x00a6 PSN 0x57cfea
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:172:30:210:20
 remote address: LID 0000 QPN 0x00eb PSN 0xd32c4e
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:172:30:210:10
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
Conflicting CPU frequency values detected: 2098.687000 != 1583.039000
Test integrity may be harmed !
 65536      1000             37.41              31.45  		   0.059977
---------------------------------------------------------------------------------------

rping

rping -s  -C 10 -v
server ping data: rdma-ping-0: ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqr
server ping data: rdma-ping-1: BCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrs
server ping data: rdma-ping-2: CDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrst
server ping data: rdma-ping-3: DEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstu
server ping data: rdma-ping-4: EFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuv
server ping data: rdma-ping-5: FGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvw
server ping data: rdma-ping-6: GHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwx
server ping data: rdma-ping-7: HIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxy
server ping data: rdma-ping-8: IJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz
server ping data: rdma-ping-9: JKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyzA
server DISCONNECT EVENT...
wait for RDMA_READ_ADV state 10
rping  -c -a 192.168.5.10  -C 10 -v
ping data: rdma-ping-0: ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqr
ping data: rdma-ping-1: BCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrs
ping data: rdma-ping-2: CDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrst
ping data: rdma-ping-3: DEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstu
ping data: rdma-ping-4: EFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuv
ping data: rdma-ping-5: FGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvw
ping data: rdma-ping-6: GHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwx
ping data: rdma-ping-7: HIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxy
ping data: rdma-ping-8: IJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz
ping data: rdma-ping-9: JKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyzA
client DISCONNECT EVENT...

ucmatose

ucmatose
cmatose: starting server
initiating data transfers
completing sends
receiving data transfers
data transfers complete
cmatose: disconnecting
disconnected
test complete
return status 0
ucmatose -s 192.168.5.10
cmatose: starting client
cmatose: connecting
receiving data transfers
sending replies
data transfers complete
test complete
return status 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment