Skip to content

Instantly share code, notes, and snippets.

@mengyuest
Last active October 5, 2022 21:03
Show Gist options
  • Save mengyuest/2eeed682de28dce377661c5b10bfbc6e to your computer and use it in GitHub Desktop.
Save mengyuest/2eeed682de28dce377661c5b10bfbc6e to your computer and use it in GitHub Desktop.
REALM Manual for MIT SuperCloud

REALM Manual for MIT SuperCloud

Reference: https://supercloud.mit.edu/

Directories

Reference: https://supercloud.mit.edu/best-practices-and-performance-tips

  • User directory: /home/gridsan/USERNAME
  • Aim for less than ~1000 files per directory

Environment Setup

Reference: https://supercloud.mit.edu/software-and-package-management

Submit an Interactive Job in MIT SuperCloud

Reference: https://supercloud.mit.edu/submitting-jobs#interactive

  1. Log into the supercloud login node: ssh USERNAME@txe1-login.mit.edu
  2. Submit an interactive job (with one gpu) using: LLsub -i -s 20 -g volta:1
  3. Once available, you will automatically login to that node. If you exit then, the node will be terminated.

Submit a Serial Job in MIT SuperCloud

Reference: https://supercloud.mit.edu/submitting-jobs#serial

  1. Log into the supercloud login node: ssh USERNAME@txe1-login.mit.edu
  2. Write the following to a file called myScript.sh
#!/bin/bash
#SBATCH -J USERNAME
#SBATCH -o %j.stdout
#SBATCH -e %j.stderr
#SBATCH -c 20
#SBATCH --gres=gpu:volta:1
#SBATCH --time=24:00:00

# Write your commands here
# Write this line to use conda
source /state/partition1/llgrid/pkg/anaconda/anaconda3-2022a/etc/profile.d/conda.sh
conda activate nn_pde_new
cd /home/gridsan/ymeng/mit/hybrid_clf/DeepRL_Algorithms
python global_patch.py --algor sac --exp_name bp --gpus 0 --env_id BpEnv-v0 --random_seed 20221007
  1. Then submit the job: LLsub myScript.sh (or: sbatch myScript.sh)
  2. Check the job status using: LLstat
  3. If the job is assigned with "NODELIST=d-14-7-2", you can login the compute node using: ssh d-14-7-2
  4. (In the login node) You can stop a job manually by: LLkill JOBID
  5. You can view the stdout (or stderr) files: cat JOBID.stdout (or: cat JOBID.stderr)
  6. You can also append the following to your .bashrc file and source ~/.bashrc, and then you can run like: run_job python global_patch.py --algor sac --exp_name bp --gpus 0 --env_id BpEnv-v0 --random_seed 20221007 and check stdout/stderr by cato JOBID (cate JOBID)
function run_job {
echo "#!/bin/bash
#SBATCH -J USERNAME
#SBATCH -o %j.stdout
#SBATCH -e %j.stderr
#SBATCH -c 20
#SBATCH --gres=gpu:volta:1
#SBATCH --time=24:00:00

# write your commands here
source /state/partition1/llgrid/pkg/anaconda/anaconda3-2022a/etc/profile.d/conda.sh
conda activate nn_pde_new
cd /home/gridsan/ymeng/mit/hybrid_clf/DeepRL_Algorithms
$@
" > ~/tmp.slurm
mkdir -p ~/.lsf/
cd ~/.lsf/
sbatch ~/tmp.slurm
cd -
}

function cato {
cat ~/.lsf/$1.stdout
}

function cate {
cat ~/.lsf/$1.stderr
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment