Sean Smith sean-smith

## resize-ebs.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                sean-smith
                / resize-ebs.md
            
            
              Created
              April 25, 2024 19:42
            
              
                Resize EBS Volume
              
          
    Run out of EBS space on an ec2 instance?


Make sure the instance has arn:aws:iam::aws:policy/AmazonEC2FullAccess permissions.


Create a script called resize.sh with the following contents:


#!/bin/bash

# Specify the desired volume size in GiB as a command line argument. If not specified, default to 20 GiB.
SIZE=${1:-20}

  
## install-aws-ofi-nccl.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                sean-smith
                / install-aws-ofi-nccl.md
            
            
              Last active
              April 22, 2024 23:22
            
          
    Install AWS OFI NCCL


Change into the shared directory

cd /fsx

Create a script install-nccl-aws-ofi.sh to install AWS OFI NCCL:


## install_nccl.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                sean-smith
                / install_nccl.md
            
            
              Last active
              April 22, 2024 23:20
            
              
                Install NCCL
              
          
    Install NCCL on a Cluster

To install on the cluster we'll need to install on all nodes in the /opt/nccl directory. In order to do this we'll create a script and then run it on all nodes using the srun command.

Create a script ./install-nccl.sh : and chmod +x install

#!/bin/bash

# install nccl

  
## instance-id-slurm.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                sean-smith
                / instance-id-slurm.md
            
            
              Last active
              March 13, 2024 18:29
            
              
                Get instance ID to hostname mapping from a Slurm job. 
              
          
    Slurm Get Instance ID to Hostname

Update: you only need the following:
mpirun -N 1 -n 2 bash -c 'echo $(hostname): $(cat /sys/devices/virtual/dmi/id/board_asset_tag | tr -d " ")'


Create a file get-instance-id.sh:


## torch_distributed.py
#!/usr/bin/env python
# Copyright (c) Facebook, Inc. and its affiliates.
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
#


import os
import sys

## set-amazon-ntp.sh
#!/bin/bash
# run as root, then validate with:
# chronyc sources -v
# chronyc tracking
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html#configure-time-sync

apt install -y chrony
sed -i '/\# See http:\/\/www.pool.ntp.org\/join.html for more information./a server 169.254.169.123 prefer iburst minpoll 4 maxpoll 4\npool time.aws.com iburst' /etc/chrony/chrony.conf
systemctl enable --now chrony
/etc/init.d/chrony restart

## virtualenv.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                sean-smith
                / virtualenv.md
            
            
              Last active
              February 8, 2024 02:00
            
          
    Activate virtualenvs with python


Install Virtualenvwrapper - this is my favorite way of creating virtualenvs

sudo apt-get install virtualenvwrapper

Install on the compute as well, where 4 is the number of compute nodes:


## python-3-10.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                sean-smith
                / python-3-10.md
            
            
              Last active
              January 30, 2024 17:50
            
              
                Python 3.10 on Hyperpods
              
          
    Ubuntu 20.04


Create a script install-python.sh with the following content:

#!/bin/bash

sudo apt update 
sudo apt upgrade -y
sudo apt install software-properties-common -y 

  
## enroot_nvme.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                sean-smith
                / enroot_nvme.md
            
            
              Created
              January 17, 2024 21:01
            
          
    Switch Enroot to NVME


Create a file enroot_nvme.sh:

#!/bin/bash

# Change the /etc/enroot/enroot.conf file to use local nvme storage:
#ENROOT_RUNTIME_PATH        /tmp/enroot/user-$(id -u) -> /opt/dlami/nvme/enroot/user-$(id -u)
#ENROOT_CONFIG_PATH         ${HOME}/enroot
#ENROOT_CACHE_PATH          /opt/enroot

  
## enroot_pyxis.sh
#!/bin/bash
# Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance
# with the License. A copy of the License is located at
#
# http://aws.amazon.com/apache2.0/
#
# or in the "LICENSE.txt" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions and
	#!/usr/bin/env python
	# Copyright (c) Facebook, Inc. and its affiliates.
	#
	# This source code is licensed under the MIT license found in the
	# LICENSE file in the root directory of this source tree.
	#


	import os
	import sys
	#!/bin/bash
	# run as root, then validate with:
	# chronyc sources -v
	# chronyc tracking
	# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html#configure-time-sync

	apt install -y chrony
	sed -i '/\# See http:\/\/www.pool.ntp.org\/join.html for more information./a server 169.254.169.123 prefer iburst minpoll 4 maxpoll 4\npool time.aws.com iburst' /etc/chrony/chrony.conf
	systemctl enable --now chrony
	/etc/init.d/chrony restart
	#!/bin/bash
	# Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved.
	#
	# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance
	# with the License. A copy of the License is located at
	#
	# http://aws.amazon.com/apache2.0/
	#
	# or in the "LICENSE.txt" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
	# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions and