In this tutorial we assume to launch a distributed training on 2 nodes using DeepSpeed with the OpenMPI Launcher.
- First of all DeepSpeed needs a passwordless ssh connection with all the nodes, MASTER included:
# generate a public/private ssh key and make sure to NOT insert a passphrase
ssh-keygen -t rsa
# copy public key 'id_rsa' on the MASTER and SLAVE