Skip to content

Instantly share code, notes, and snippets.

@jasonge27
Forked from smsharma/Princeton.md
Last active February 17, 2016 03:10
Show Gist options
  • Save jasonge27/45c32d627aee167b9d79 to your computer and use it in GitHub Desktop.
Save jasonge27/45c32d627aee167b9d79 to your computer and use it in GitHub Desktop.
Running stuff on the Princeton clusters.

##Feynman: feynman.princeton.edu

Feynman-hepheno has somewhere between 12 and 16 nodes (unclear).

  • Three areas -- home (~), group (/group/hepheno/) and group storage (/mnt/hepheno/)
  • Best to work from your user directory in /group/hepheno since ~ has very little storage space
  • The contents of /group/hepheno/group-setup.sh are automatically sourced on cd into the group directory /group/hepheno/:
    • Activates a python virtual environment (ML: chmod others into the venv so they can pip install etc?)
    • Initiates MPI and sets a few environment variables
  • Different compiler versions (intel/gnu) as well as OpenMPI/Intel MPI versions can be activated using module load <module> and available modules seen using module avail.
  • The Fermi data is stored in /mnt/hepheno/CTBCORE/ and /mnt/hepheno/FermiData
  • /group/hepheno/heptools/ contains relevent programs (MultiNest, pythia etc) as well as the NPTF code
  • Typical env:
# env.sh
# Feynman installation

export work_dir=$'/group/hepheno/heptools/test_code'
export psf_dir=$'/mnt/hepheno/CTBCORE/psf_data'
export maps_dir=$'/mnt/hepheno/CTBCORE/'
export fermi_data_dir=$'/mnt/hepheno/FermiData/'
export PYTHONPATH=$PYTHONPATH:/group/hepheno/heptools/Fermi-NPTF-edep-nrodd

###Submitting jobs

  • PBS scheduler. Typical batch file:
#!/bin/sh
#PBS -j oe
#PBS -o /home/smsharma/scratch/job.out
#PBS -l nodes=1:ppn=1 -l cput=200:59:00
#PBS -q hepheno
#PBS -m abe
#PBS -k oe
#PBS -M smsharma@princeton.edu

export PYTHONPATH=/group/hepheno/heptools/Fermi-NPTF-edep-nrodd/
export PATH=/opt/intel/impi/5.0.3.048/lib64/:$PATH
source /group/hepheno/heptools/test_code/env.sh
source /opt/intel/impi/5.0.3.048/bin64/mpivars.sh
cd /group/hepheno/venv
source bin/activate
cd /group/hepheno/heptools/test_code
./shell_scripts/run_nptf_edep_minuit.sh
  • To submit job, qsub -l mem=4gb submit.batch
  • Standard output and error stored in your home directory as the job is run
  • Useful commands:
    • qstat [-u <username>] to view jobs
    • qdel <jobid> to cancel jobs or this script to cancel all

##Tiger/della tiger.princeton.edu and della.princeton.edu

Tiger has 16 cores/node and della has 20 cores/node. There are four partitions on tiger: all, nongpu, gpu and serial.

allgpunongpuserial
priority4000500015000
totalnodes544505944
It seems that the gpu nodes all have very high priority (probably becaue the the resources are much more inadequate compared with cpu nodes) .
  • Home dir ~ (negligible storage space), /scratch/gpfs/ (temp work, short-lived branches etc), /tigress/ (storage area, everything else)
  • Both tiger and della use common storage space /tigress/ so probably best to work from there
  • Fermi data in /tigress/smsharma/public
  • Software in /tigress/smsharma/public/heptools (including NPTF code)
  • No group setup so need to run the following before using:
cd /tigress/smsharma/public/heptools/MultiNest/
export LD_LIBRARY_PATH=$(pwd)/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}
cd /tigress/smsharma/public/heptools
source venv/bin/activate
export PYTHONPATH=$PYTHONPATH:/tigress/smsharma/public/heptools/Fermi-NPTF-edep
module load openmpi
  • Typical env:
# env.sh
# Tigress installation

export work_dir=$(pwd)
export psf_dir=$'/tigress/smsharma/public/CTBCORE/psf_data'
export maps_dir=$'/tigress/smsharma/public/CTBCORE/'
export fermi_data_dir=$'/tigress/smsharma/public/FermiData/'
export PYTHONPATH=$PYTHONPATH:/tigress/smsharma/public/heptools/Fermi-NPTF-edep

###Submitting jobs

    [jiange@tiger1 ~]$ srun -N3 -l -t 10:0:0 --gres=gpu:2 /bin/hostname
    0: tiger-r12n12
    1: tiger-r12n13
    2: tiger-r12n14

The —gres arguments is a list of name[[:type]:count], denoting consumable resources. The specified resources will be allocated to the job on each node.

Typical batch file:

#!/bin/bash
#SBATCH -N 1   # node count
#SBATCH --ntasks-per-node=12
#SBATCH -t 4:00:00
#SBATCH --mail-type=begin
#SBATCH --mail-type=end
#SBATCH --mail-user=smsharma@princeton.edu

cd /tigress/smsharma/public/heptools/MultiNest/
export LD_LIBRARY_PATH=$(pwd)/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}
cd /tigress/smsharma/public/heptools
source venv/bin/activate
export PYTHONPATH=$PYTHONPATH:/tigress/smsharma/public/heptools/Fermi-NPTF-edep
module load openmpi

cd /tigress/smsharma/public/heptools/test_code
source env_example.sh
./shell_scripts/run_nptf_edep_high_lat.sh
  • To submit job, sbatch submit.batch
  • Standard output and error in same directory where job is run from.
  • Annoyingly, mpirun/mpiexec does not work properly on Della so need to use srun in the run.sh files instead.
  • Useful commands:
    • squeue [-u <username>] to view jobs
    • scancel -u <username> and scancel <jobid> to cancel jobs

Miscellaneous

Most of this is just random stuff that I find useful while running on the cluster

Editing locally

If you want to use Sublime Text, either use rsub or Sublime FTP

tmux and screen

  • Install local version of tmux using this script
  • tmux new-session -s <name> to start new session
  • tmux attach -t <name> to attach named session
  • Ctrl-B D to detach session
  • tmux attach to attach session
  • Ctrl-B $ to rename session

Kill all processes started by you

kill -9 -1
kill [-u <user>]

Running an ipython notebook remotely

  1. On Feynman, type: ipython notebook --no-browser --port=7000
  2. On your computer, type: ssh -N -f -L localhost:7000:localhost:7000 <username>@feynman.princeton.edu
  3. On your web browser, go to: http://localhost:7000

Python packages needed on fresh install

pip install numpy matplotlib scipy healpy pandas Cython mpmath pymultinest mpi4py numexpr

To follow a job as it's running

tail -f ~/HL_Sim1_NPTF_0-15_np3.batch.o288395
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment