Assueme NVIDIA Driver and CUDA already successfully installed.
- Install Dependencies
sudo yum install -y tcl tk
- Install MLNX_OFED v2.1-x.x.x or later
Download: www.mellanox.com -> Products -> Software - > InfiniBand/VPI Drivers -> Linux SW/Drivers
tar xvzf MLNX_OFED_LINUX-4.1-1.0.2.0-rhel7.3-x86_64.tgz
sudo ./mlnxofedinstall
Reboot the system for the changes to take effect.
- Install Plugin module to enable GPUDirect RDMA
Download: www.mellanox.com -> Products -> Software - > InfiniBand/VPI Drivers -> GPUDirect RDMA
tar xvzf nvidia-peer-memory_1.0.5.tar.gz
cd nvidia-peer-memory-1.0
./build_module.sh
sudo rpmbuild --rebuild /tmp/nvidia_peer_memory-1.0-5.src.rpm
sudo rpm -ivh /root/rpmbuild/RPMS/x86_64/nvidia_peer_memory-1.0-5.x86_64.rpm
- Validate installtion
sudo service nv_peer_mem start
sudo service nv_peer_mem status
References: