Skip to content

Instantly share code, notes, and snippets.

@sean-smith
Last active April 22, 2024 23:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sean-smith/18e4832ba7b3d38c29cab95d09dd1355 to your computer and use it in GitHub Desktop.
Save sean-smith/18e4832ba7b3d38c29cab95d09dd1355 to your computer and use it in GitHub Desktop.

Install AWS OFI NCCL

  1. Change into the shared directory
cd /fsx
  1. Create a script install-nccl-aws-ofi.sh to install AWS OFI NCCL:
#!/bin/bash

wget https://github.com/aws/aws-ofi-nccl/releases/download/v1.8.1-aws/aws-ofi-nccl-1.8.1-aws.tar.gz
tar -xzf aws-ofi-nccl-1.8.1-aws.tar.gz
cd aws-ofi-nccl-1.8.1-aws
./autogen.sh
./configure --enable-platform-aws --with-libfabric=/opt/amazon/efa --with-mpi=/opt/amazon/openmpi --with-cuda=/usr/local/cuda --prefix=/opt/aws-ofi-nccl
make
sudo make install
  1. Execute:
bash install-nccl-aws-ofi.sh
  1. Check the installed version
$ strings /opt/aws-ofi-nccl/lib/libnccl-net.so | grep Initializing
NET/OFI Initializing aws-ofi-nccl GitHub-dev
  1. Install on all compute nodes:
cd /fsx/aws-ofi-nccl-1.8.1-aws
srun -N 4 sudo make install
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment