Create a gist now

Instantly share code, notes, and snippets.

@Atlas7 /qsub-option-2.md Secret
Last active Aug 7, 2017

What would you like to do?
How to submit a job to Colfax HPC Cluster Nodes - Option 1 (via shell script)

Intel Colfax Cluster - Notes - Index Page


Say within the directory structure like this:

|- /home/u4443/deepdive/lec-01
  |- hello-knl-cluster.sh

(And say our current working directory is where the shell script lives).

Where the shell script hello-knl-cluster.sh contains the job that we would like to run on the HPC Cluster nodes. The script looks like this:

#PBS -l nodes=4:knl

echo "Yo. This job is running on compute node "`hostname`
echo "Yo. This job has the following nodes reserved in the cluster:"
cat $PBS_NODEFILE
echo "Yo. This list comes from the file $PBS_NODEFILE"
echo "Yo. Should be a knl Knights Landing node. lscpu..."
lscpu
echo ""
echo "Yo. Job starts here..."
# navigate to directory where the script lives
cd
cd deepdive/lec-01
./hello
echo "Yo. Job ends here..."

# remember to have some space at bottom

A bit of quick explaination of the script:

  • #PBS -l nodes=4:knl: we are requesting job to be potentially run on 4 knights landing (knl) nodes.
  • $PBS_NODEFILE shows the node numbers to use. (see the Colfax doc for more info)
  • We navigate to the directory where the script resides, and run it. As we can see here, the remote HPC cluster node has visibility of our user home file system.

We submit the script to run on the HPC Nodes like this:

[u4443@c001 lec-01]$ qsub hello-knl-cluster.sh
21081.c001

We now have our job number. (To check job status just do a qstat.)

See output by doing a cat hello-knl-cluster.sh.o21082

[u4443@c001 lec-01]$ cat hello-knl-cluster.sh.o21082

########################################################################
# Colfax Cluster - https://colfaxresearch.com/
#      Date:           Sat Aug  5 15:01:57 PDT 2017
#    Job ID:           21082.c001
#      User:           u4443
# Resources:           neednodes=4:knl,nodes=4:knl,walltime=24:00:00
########################################################################

Yo. This job is running on compute node c001-n029
Yo. This job has the following nodes reserved in the cluster:
c001-n029
c001-n030
c001-n031
c001-n032
Yo. This list comes from the file /var/spool/torque/aux//21082.c001
Yo. Should be a knl Knights Landing node. lscpu...
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                256
On-line CPU(s) list:   0-255
Thread(s) per core:    4
Core(s) per socket:    64
Socket(s):             1
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 87
Model name:            Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz
Stepping:              1
CPU MHz:               1213.621
BogoMIPS:              2594.02
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
NUMA node0 CPU(s):     0-255
NUMA node1 CPU(s):

Yo. Job starts here...
Hello World!
Yo. Job ends here...

########################################################################
# Colfax Cluster
# End of output for job 21082.c001
# Date: Sat Aug  5 15:01:58 PDT 2017
########################################################################

[u4443@c001 lec-01]$

To check error:

cat hello-knl-cluster.sh.e21082

Note that should we have this line at the top of our script:

#PBS -N yoyoyo

Then the output file name would be instead yoyoyo.o21082 and yoyoyo.e21082.

Reference


Intel Colfax Cluster - Notes - Index Page

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment