Skip to content

Instantly share code, notes, and snippets.

@martinsotir
Last active October 18, 2023 14:11
Show Gist options
  • Save martinsotir/cc032aafebfa4886963c43dd0e2b9cea to your computer and use it in GitHub Desktop.
Save martinsotir/cc032aafebfa4886963c43dd0e2b9cea to your computer and use it in GitHub Desktop.

Using SLURM cluster GPU's on Google Colab

This document describe how to start a jupyter session on a SLURM cluster GPU node and, optionnaly, using it from Google Collab.

Using google Collab is optional and can pose serious security risks, please carrefully read the Google local runtime documentation and ask your system administrator for permission before connecting Google Colab to a local server.

1. First time setup

Start a tmux session on the login node:

ssh <username>@<slurm-login-node>
tmux new -s notebook

Request a GPU node in a interactive session:

srun -c 4 --gres=gpu:1 --pty bash
nvidia-smi

Install miniconda + fastai env (~10min on slow disks):

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod u+x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda
eval "$($HOME/miniconda3/bin/conda shell.bash hook)"
conda init bash
conda config --set auto_activate_base false
conda create -n fastai -c fastai python=3.7 fastai jupyter -y

Install jupyter over ws (optional, only for Google Colab comptability):

conda activate fastai
pip install jupyter_http_over_ws
jupyter serverextension enable --py jupyter_http_over_ws

Clone fastai AI:

cd ~
mkdir -p fastai
cd fastai
git clone https://github.com/fastai/course-v3.git

Exit interactive session:

exit
squeue -u <username>
# Check if all interactive sessions are terminated

2. Start a jupyter notebook

Request a GPU node in a interactive session:

srun -c 4 --gres=gpu:1 --pty bash
nvidia-smi

Take note of machine hostname

hostname

Activate conda env. and start jupyter notebook:

conda activate fastai

cd course-v3

# When using collab (!warning! could be unsage. Google and notebooks must be trusted.)
jupyter notebook \
  --NotebookApp.allow_origin='https://colab.research.google.com' \
  --ip 0.0.0.0 \
  --port=8888 \
  --NotebookApp.port_retries=0

# Without collab (safe if local machine is trusted):
jupyter notebook --port=8888 --ip 0.0.0.0

Copy the displayed connection token

Disconnect tmux session (without stopping the jupyter server) using Ctrl+b, d and exit ssh:

Open a new terminal, redirect notebook port to localhost

ssh -L 8888:localhost:8888 <username>@<job-hostname>

Open notebook and past connection token (important: this step is required even if collab is used!) http://localhost:8888

3. (optionnal) Connect from colab

Using google Collab is optional and can pose serious security risks, please carrefully read the Google local runtime documentation and ask your system administrator for permission before connecting Google Colab to a local server.

See step 4 in : https://research.google.com/colaboratory/local-runtimes.html

Stop job and release the GPU

Important: The jupyter server job should be termined when the GPU is not in used for prolonged periods.

Connect to login node, and re-attached tmux session:

ssh <username>@<slurm-login-node>
tmux a -t notebook 

Exit jupyter with Ctrl+C two times, and type exit to terminate job.

From the loggin node, check if all interactive session are terminated:

squeue -u <username>

To manually kill a job, use:

scancel <job-id>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment