Skip to content

Instantly share code, notes, and snippets.

@smsharma
Last active August 12, 2019 18:40
Show Gist options
  • Save smsharma/c579c6563eed954f2283 to your computer and use it in GitHub Desktop.
Save smsharma/c579c6563eed954f2283 to your computer and use it in GitHub Desktop.
Random notes on cluster-related stuff for Princeton clusters.

Useful tips and tricks for running on Princeton's computing clusters. (Adapted from https://gist.github.com/smsharma/c579c6563eed954f2283)

Feynman

feynman.princeton.edu

Feynman-hepheno has 3 nodes -- node022 with 16 cores and 128GB memory, and node040-node041 with 28 cores and 256GB memory.

  • Three areas -- home (~), group (/group/hepheno/) and group storage (/mnt/hepheno/)
  • Best to work from your user directory in /group/hepheno since ~ has very little storage space -- set a cd /group/hepheno/ in ~/.bashrc!
  • The contents of /group/hepheno/group-setup.sh are automatically sourced on cd into the group directory /group/hepheno/:
    • Activates a python virtual environment (see below)
    • Initiates MPI and sets a few environment variables
  • Different compiler versions (intel/gnu) as well as OpenMPI/Intel MPI versions can be activated using module load <module> and available modules seen using module avail.
  • Typical env:

Submitting jobs

  • SLURM scheduler. Typical batch file:
#!/bin/bash
#SBATCH -N 1   # node count
#SBATCH --ntasks-per-node=12
#SBATCH -t 4:00:00
#SBATCH --mail-type=begin
#SBATCH --mail-type=end
#SBATCH --mail-user=smsharma@princeton.edu
#SBATCH -p hepheno

cd /hepheno/heptools/test_code
mpiexec.hydra -n 12 run.py
  • To submit job, sbatch submit.batch
  • Standard output and error in same directory where job is run from.
  • Useful commands:
    • squeue [-u <username>] to view jobs
    • scancel -u <username> and scancel <jobid> to cancel jobs

Data/code:

  • The Fermi data is stored in /mnt/hepheno/CTBCORE/ and /mnt/hepheno/FermiData
  • /group/hepheno/heptools/ contains relevent programs (MultiNest, pythia etc) as well as the NPTF code (likeliy outdated)

Tiger/Della/Perseus

[tiger/della/perseus].princeton.edu

Tiger has 16 cores/node and della has 20 cores/node.

  • Home dir ~ (negligible storage space), /scratch/gpfs/ (temp work, short-lived branches etc), /tigress/ (storage area, everything else)
  • Both tiger and della use common storage space /tigress/ so probably best to work from there
  • Fermi data in /tigress/smsharma/public
  • Software in /tigress/smsharma/public/heptools (including NPTF code)
  • No group setup so need to run the following before using:
cd /tigress/smsharma/public/heptools/MultiNest/
export LD_LIBRARY_PATH=$(pwd)/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}
cd /tigress/smsharma/public/heptools
source venv/bin/activate
export PYTHONPATH=$PYTHONPATH:/tigress/smsharma/public/heptools/Fermi-NPTF-edep
module load openmpi

Miscellaneous

Most of this is just random stuff that I find useful while running on the cluster

Editing locally:

If you want to use Sublime Text, either use rsub or Sublime FTP

Instructions for rsub:

  • Put this script somewhere on the cluster and name the file rsub. chmod +x the file. Change the port setting to the desired on (here, port=52697)
  • Create an alias rsub pointing to that file
  • Install rsub locally in Sublime via the ST package manager. Open the package settings and set the port appropriately ("port": 52697).
  • Add the following lines in ~/.ssh/config, for each cluster being used:
Host feynman.princeton.edu
    RemoteForward 52697 127.0.0.1:52697

Staying logged in to a session

tmux allows multiple persistent terminal sessions

  • Install local version of tmux using this script
  • tmux new-session -s <name> to start new session
  • tmux attach -t <name> to attach named session
  • Ctrl-B D to detach session
  • tmux attach to attach session
  • Ctrl-B $ to rename session
  • tmux kill-session -t <name> to kill session
  • tmux ls to view active sessions
  • set-window-option -g aggressive-resize to properly resize over different sessions

Here's a nice cheatsheet.

To get proper mouse scrolling in iTerm, edit ~/.tmux.conf (might not work):

set-option -g mouse on

Kill all processes started by you

kill -9 -1
kill [-u <user>]

Running an jupyter notebook remotely

For feynman:

  1. Go to the root folder from where you can navigate to your notebooks
  2. On Feynman, type: jupyter notebook --no-browser --port=7000. NB: Replace 7000 with something unique 7xxx.
  3. On your computer, type: ssh -N -f -L localhost:7000:localhost:7000 <username>@feynman.princeton.edu
  4. On your web browser, go to: http://localhost:7000

To follow a job as it's running

tail -f ~/HL_Sim1_NPTF_0-15_np3.batch.o288395

Finding number of available processors (cores)

getconf _NPROCESSORS_ONLN

Set up aliases on local machine

So you don't have to type the ssh command each time when logging in, append these to your ~/.bashrc or ~/.bash_profile

# Aliases for ssh-ing into clusters
alias feynman="ssh -Y smsharma@feynman.princeton.edu"
alias della="ssh -Y smsharma@della.princeton.edu"
alias tiger="ssh -Y smsharma@tiger.princeton.edu"
alias perseus="ssh -Y smsharma@perseus.princeton.edu"

Set up ssh public key authentication

So you don't have to type a password each time when logging in. For Linux and Mac OS X (sorry, Windows users).

  • Create a private/public keypair on your local machine (note: if you already have an existing keypair, you can skip this step): ssh-keygen -t rsa. Enter password and specify output file path as prompted.
  • Copy the public key to the cluster:
    • Linux: ssh-copy-id -i ~/.ssh/id_rsa.pub username@server.dreamhost.com
    • OS X: cat ~/.ssh/id_rsa.pub | ssh username@server.com "mkdir ~/.ssh; cat >> ~/.ssh/authorized_keys"
  • Enter passwords as prompted, and you should be good to go!

Installing stuff on a new machine

Making a virtual environment with Anaconda

  • Download the appropriate version with wget from here.
  • Simply do e.g. bash Anaconda2-4.3.0-Linux-x86_64.sh. Make sure to choose the prepend path to be somewhere nice and spacious rather than your home dir.
  • Conda is now installed and it should also have appended conda to your PATH (otherwise do somethinglike export PATH="/group/hepheno/heptools/anaconda3/bin:$PATH"). This installs the usual stuff like [num/sci/asto]py etc.
  • Create a new venv with conda create -n venv_py27 python=2.7 anaconda, where you can see the available Python versions with conda search "^python$".
  • Activate your beautiful venv with source activate venv_py27. You can now pip install into it and do whatever you please.

TODO: working with multiple Python versions in venvs for development. TODO: pipenv

Making a virtual environment (OLD: recommend using Conda as above)

curl -O
https://pypi.python.org/packages/source/v/virtualenv/virtualenv-1.9.tar.gz # Check for latest version
tar xzf virtualenv-1.9.tar.gz
python virtualenv-1.9/virtualenv.py venv
source venv/bin/activate

Can use a different python executable with virtualenv -p /usr/bin/python2.6 <path/to/new/virtualenv/> (in case the default one is old, for example)

autoenv automatically sources virtual environments on login -- useful!

Installing PolSpice

  • Download and untar PolSpice
wget ftp://ftp.iap.fr/pub/from_users/hivon/PolSpice/PolSpice_v03-01-06.tar.gz
tar -zxvf PolSpice_v03-01-06.tar.gz
cd PolSpice[..]
  • Make sure $HEALPIX is set to your root Healpix directory and edit the Makefile in src. My header:
#
# Makefile to customize for your system
#
# system variable HEALPIX must be defined for compilation
#  and at run time
#
FC = gfortran                   # F90 compiler
FCFLAGS =  -fopenmp -lgomp              # compiler options
FITSLIB = /group/hepheno/heptools/cfitsio/              # cfitsio library location (directory)
SUFF   = F90            # Healpix f90 include and lib suffix
HPXINC = $(HEALPIX)/include$(SUFF)  # Healpix f90 include files location
HPXLIB = $(HEALPIX)/lib$(SUFF)      # Healpix f90 library location
#
INC = -I. -I$(HPXINC)
LIBS = -L$(HPXLIB) -lhealpix -L$(FITSLIB) -lcfitsio
#
  • Add /PolSpice/src to your $PATH so that spice is available
  • See example in CAPS.ipynb in the Rockstar directory. The other file you'll need is spicepy.py from the same directory

Installing MultiNest

cd /group/hepheno/smsharma/ #cd /group/hepheno/heptools/

git clone http://github.com/JohannesBuchner/MultiNest

source /opt/intel/impi/5.0.3.048/bin64/mpivars.sh
cmake -DCMAKE_Fortran_COMPILER=gfortran
make
make install #optional, probably don't need it / might not work

# Need to do this everytime -- replace $(pwd) with install dir and add to group-setup.sh or ~/.bashrc.
export LD_LIBRARY_PATH=$(pwd)/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}

Python packages needed on fresh install

pip install numpy matplotlib scipy healpy pandas Cython mpmath pymultinest mpi4py numexpr astropy ipython jupyter ipyparallel joblib tqdm iminuit 
pip install scikit-learn keras tensorflow

If astropy and numpy happen to not play nice, the following definitely work:

pip install numpy==1.9.1
pip install astropy==1.0.3

If getting errors about .so files not being found (libraries)

Try installing OpenBLAS

git clone https://github.com/xianyi/OpenBLAS
cd OpenBLAS
make FC=gfortran

To get tqdm/tqdm_notebook working in JupyterLab

  • Install nodejs:
conda install -c conda-forge nodejs 
  • Install the JupyterLab extensionwith widgets as in here.

Using git

  • To clone repository: git clone <remote link>
  • To pull updates from remote: git pull
  • To add all local changes (from the folder you're in): git add -A .
  • To remove files from local: git rm <files>
  • To commit changes: git commit -m "<Descriptive name of commit>"
  • To push changes to remote: git push

Git

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment