Skip to content

Instantly share code, notes, and snippets.

@JohnGiorgi
Last active November 10, 2023 22:44
Show Gist options
  • Star 16 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save JohnGiorgi/7bd643eec9e6f51a659c00aa86efa7f1 to your computer and use it in GitHub Desktop.
Save JohnGiorgi/7bd643eec9e6f51a659c00aa86efa7f1 to your computer and use it in GitHub Desktop.
Helpful tips/tricks for using Compute Canada resources

Using Compute Canada Resources

Basics

You ssh into the systems using:

ssh <username>@<login node>

E.g., to login to Niagra for user example:

ssh example@niagara.scinet.utoronto.ca

It is convenient to set up keys for password-less entry. On the local machine:

$ ssh-keygen
# for <username>@<host>
$ ssh-copy-id example@niagara.scinet.utoronto.ca

You only need to do this once.

Useful Links

Clusters

Cedar

Graham

Niagra

  • Login node: niagara.scinet.utoronto.ca
  • Quickstart

Béluga

Workflows

Simple Python Program

The following commands will set up a python virtualenv, and install python modules in it. Note that you should create virtual environment in your $HOME directory

# First, login to the host machine (ssh <username>@<login node>)
$ ssh example@niagara.scinet.utoronto.ca

# Load the version of python you want. You can check which version are available with module avail python
~ $ module load python/3.7.0

# Create the virtual environment at path ENV
~ $ ENV=path/to/env/to/create
~ $ virtualenv --no-download --python=python3.7 $ENV
# Activate it
~ $ source $ENV/bin/activate

# Check that pip is up to date
(ENV) ~ $ pip install --upgrade pip

# Next, install any requirements with pip install, e.g.,
(ENV) ~ $ pip install numpy --no-index

Whenever possible, you should install the python wheel for your package provided by Compute Candada with pip install package_name --no-index. See here for a list of available wheels. If you don't see an available wheel for your package, you can send a request for it to be added to support@computecanada.ca.

In your job scripts (e.g., train.sh) make sure you call model load python/<VERSION> and source <ENV>/bin/activate. Example:

#!/bin/bash
# The following three commands allow us to take advantage of whole-node
# scheduling
#SBATCH --nodes=1
#SBATCH --cpus-per-task=80
#SBATCH --mem=0
# Wall time
#SBATCH --time=12:00:00
#SBATCH --job-name=example
#SBATCH --output=$SCRATCH/output/example_jobid_%j.txt
# Emails me when job starts, ends or fails
#SBATCH --mail-user=example@gmail.com
#SBATCH --mail-type=ALL

ENV=path/to/my/env

# load any required modules
module load python/3.7.0
# activate the virtual environment
source $ENV/bin/activate

# run a training session
srun python example.py

Jobs are submittied with sbatch <jobscript.sh>, e.g., sbatch train.sh

Interactive jobs

Before submitting a job to the queue, its useful to test that your submission script works. This can be done by initializing an interactive job.

For example, to initialize an interactive job with 48 cpus for 30 minutes:

salloc --time=00:30:00 --nodes=1 --mem=0 --cpus-per-task=48

In general, the arguments to salloc are the same as those you supply in your job scripts.

Alternatively, on Niagra, you can use the command

debugjob 1

To request an interactive job to be run on 1 node.

Monitoring jobs

To check the status of all your submitted jobs:

squeue -u $USER

To kill a job:

scancel -i JOBID

Useful command for monitoring CPU and GPU usage of a job:

srun --jobid JOBID --pty tmux new-session -d 'htop -u $USER' \; split-window -h 'watch nvidia-smi' \; attach

@ccoulombe
Copy link

The pip argument has a typo. --no_index -> --no-index
https://pip.pypa.io/en/stable/cli/pip_install/#cmdoption-no-index

@JohnGiorgi
Copy link
Author

Whoops, thanks! Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment