Usually, we are able to SSH to the Login node (c001-001), but not the cluster nodes (the ones that run the jobs after we've submitted a
In Deep Dive Lecture 1, Andrey exposed one pretty cool (and potentially handy) trick for us to "peak" into the cluster node and even SSH into it.
Submit a sleep job to a cluster node. For example:
[userxxx@c001] $ echo sleep 600 | qsub -l nodes=1:knl:flat 21083.c001
While the job is running on remote, we can obtain the remote node name:
[userxxx@c001] $ qstat -f 21083 | grep exec_host exec_host = c001-n029/0
... and SSH into it:
[userxxx@c001] $ ssh c001-n029
Notice that the prompt now changes to the remote node:
Once we are at the remote node we can do some queries, such as
numactl -H, etc.
When we are done, just logout (or hit
[userxx@c001-n029 ~]$ logout
Notice that we are now back to the login node:
- Colfax Deepdive Video Series: Session 01 - Intel Architecture and Modern Code. From around 84:50.