Skip to content

Instantly share code, notes, and snippets.

@vivek-bala
Last active September 5, 2018 15:31
Show Gist options
  • Save vivek-bala/b4f024d17cf62545b94f7d5c9a08cc6a to your computer and use it in GitHub Desktop.
Save vivek-bala/b4f024d17cf62545b94f7d5c9a08cc6a to your computer and use it in GitHub Desktop.
> Ask for 1 node, use alloc_flags to set gpumps and smt levels
Cmd:
```
jd.executable = 'jsrun'
jd.arguments = ['-n','1','-a','6','-c','1','-g','1','-bpacked:4','/ccs/proj/csc190/Hello_jsrun/jsrun_layout','|','sort','&>','/ccs/proj/csc190/Hello_jsrun/op.txt']
jd.total_cpu_count = 42
jd.alloc_flags = ['gpumps','smt4']
```
Output:
```
########################################################################
*** MPI Ranks: 6, OpenMP Threads: 4, GPUs per Resource Set: 1 ***
========================================================================
MPI Rank 000, OMP_thread 00 on HWThread 001 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
MPI Rank 000, OMP_thread 01 on HWThread 003 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
MPI Rank 000, OMP_thread 02 on HWThread 000 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
MPI Rank 000, OMP_thread 03 on HWThread 002 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
MPI Rank 001, OMP_thread 00 on HWThread 000 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
MPI Rank 001, OMP_thread 01 on HWThread 001 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
MPI Rank 001, OMP_thread 02 on HWThread 002 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
MPI Rank 001, OMP_thread 03 on HWThread 002 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
MPI Rank 002, OMP_thread 00 on HWThread 001 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
MPI Rank 002, OMP_thread 01 on HWThread 000 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
MPI Rank 002, OMP_thread 02 on HWThread 002 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
MPI Rank 002, OMP_thread 03 on HWThread 002 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
MPI Rank 003, OMP_thread 00 on HWThread 001 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
MPI Rank 003, OMP_thread 01 on HWThread 000 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
MPI Rank 003, OMP_thread 02 on HWThread 003 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
MPI Rank 003, OMP_thread 03 on HWThread 002 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
MPI Rank 004, OMP_thread 00 on HWThread 000 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
MPI Rank 004, OMP_thread 01 on HWThread 002 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
MPI Rank 004, OMP_thread 02 on HWThread 003 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
MPI Rank 004, OMP_thread 03 on HWThread 002 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
MPI Rank 005, OMP_thread 00 on HWThread 003 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
MPI Rank 005, OMP_thread 01 on HWThread 001 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
MPI Rank 005, OMP_thread 02 on HWThread 002 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
MPI Rank 005, OMP_thread 03 on HWThread 001 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
```
> Ask for 2 nodes, use alloc_flags to set gpumps and smt levels
Cmd:
```
jd.executable = 'jsrun'
jd.arguments = ['-n','2','-r','1','-a','6','-c','1','-g','1','-bpacked:4','/ccs/proj/csc190/Hello_jsrun/jsrun_layout','|','sort','&>','/ccs/proj/csc190/Hello_jsrun/op.txt']
jd.total_cpu_count = 84
jd.alloc_flags = ['gpumps','smt4']
```
Output:
```
########################################################################
*** MPI Ranks: 12, OpenMP Threads: 4, GPUs per Resource Set: 1 ***
========================================================================
MPI Rank 000, OMP_thread 00 on HWThread 003 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 000, OMP_thread 01 on HWThread 003 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 000, OMP_thread 02 on HWThread 000 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 000, OMP_thread 03 on HWThread 000 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 001, OMP_thread 00 on HWThread 001 of Node a13n11 - RT_GPU_id 0 : GPU_id 0
MPI Rank 001, OMP_thread 01 on HWThread 001 of Node a13n11 - RT_GPU_id 0 : GPU_id 0
MPI Rank 001, OMP_thread 02 on HWThread 000 of Node a13n11 - RT_GPU_id 0 : GPU_id 0
MPI Rank 001, OMP_thread 03 on HWThread 003 of Node a13n11 - RT_GPU_id 0 : GPU_id 0
MPI Rank 002, OMP_thread 00 on HWThread 002 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 002, OMP_thread 01 on HWThread 003 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 002, OMP_thread 02 on HWThread 002 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 002, OMP_thread 03 on HWThread 000 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 003, OMP_thread 00 on HWThread 002 of Node a13n11 - RT_GPU_id 0 : GPU_id 0
MPI Rank 003, OMP_thread 01 on HWThread 002 of Node a13n11 - RT_GPU_id 0 : GPU_id 0
MPI Rank 003, OMP_thread 02 on HWThread 003 of Node a13n11 - RT_GPU_id 0 : GPU_id 0
MPI Rank 003, OMP_thread 03 on HWThread 000 of Node a13n11 - RT_GPU_id 0 : GPU_id 0
MPI Rank 004, OMP_thread 00 on HWThread 001 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 004, OMP_thread 01 on HWThread 000 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 004, OMP_thread 02 on HWThread 003 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 004, OMP_thread 03 on HWThread 002 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 005, OMP_thread 00 on HWThread 000 of Node a13n11 - RT_GPU_id 0 : GPU_id 0
MPI Rank 005, OMP_thread 01 on HWThread 000 of Node a13n11 - RT_GPU_id 0 : GPU_id 0
MPI Rank 005, OMP_thread 02 on HWThread 002 of Node a13n11 - RT_GPU_id 0 : GPU_id 0
MPI Rank 005, OMP_thread 03 on HWThread 003 of Node a13n11 - RT_GPU_id 0 : GPU_id 0
MPI Rank 006, OMP_thread 00 on HWThread 003 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 006, OMP_thread 01 on HWThread 001 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 006, OMP_thread 02 on HWThread 000 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 006, OMP_thread 03 on HWThread 002 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 007, OMP_thread 00 on HWThread 002 of Node a13n11 - RT_GPU_id 0 : GPU_id 0
MPI Rank 007, OMP_thread 01 on HWThread 001 of Node a13n11 - RT_GPU_id 0 : GPU_id 0
MPI Rank 007, OMP_thread 02 on HWThread 003 of Node a13n11 - RT_GPU_id 0 : GPU_id 0
MPI Rank 007, OMP_thread 03 on HWThread 000 of Node a13n11 - RT_GPU_id 0 : GPU_id 0
MPI Rank 008, OMP_thread 00 on HWThread 002 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 008, OMP_thread 01 on HWThread 000 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 008, OMP_thread 02 on HWThread 003 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 008, OMP_thread 03 on HWThread 001 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 009, OMP_thread 00 on HWThread 003 of Node a13n11 - RT_GPU_id 0 : GPU_id 0
MPI Rank 009, OMP_thread 01 on HWThread 001 of Node a13n11 - RT_GPU_id 0 : GPU_id 0
MPI Rank 009, OMP_thread 02 on HWThread 000 of Node a13n11 - RT_GPU_id 0 : GPU_id 0
MPI Rank 009, OMP_thread 03 on HWThread 000 of Node a13n11 - RT_GPU_id 0 : GPU_id 0
MPI Rank 010, OMP_thread 00 on HWThread 002 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 010, OMP_thread 01 on HWThread 003 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 010, OMP_thread 02 on HWThread 001 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 010, OMP_thread 03 on HWThread 000 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 011, OMP_thread 00 on HWThread 001 of Node a13n11 - RT_GPU_id 0 : GPU_id 0
MPI Rank 011, OMP_thread 01 on HWThread 001 of Node a13n11 - RT_GPU_id 0 : GPU_id 0
MPI Rank 011, OMP_thread 02 on HWThread 003 of Node a13n11 - RT_GPU_id 0 : GPU_id 0
MPI Rank 011, OMP_thread 03 on HWThread 000 of Node a13n11 - RT_GPU_id 0 : GPU_id 0
```
> Ask for 1 node, do not use alloc_flags to set gpumps and smt levels
Cmd:
```
jd.executable = 'jsrun'
jd.arguments = ['-n','1','-r','1','-a','6','-c','1','-g','1','-bpacked:4','/ccs/proj/csc190/Hello_jsrun/jsrun_layout','|','sort','&>','/ccs/proj/csc190/Hello_jsrun/op.txt']
jd.total_cpu_count = 42
#jd.alloc_flags = ['gpumps','smt4']
```
Output:
```
*** MPI Ranks: 6, OpenMP Threads: 4, GPUs per Resource Set: 1 ***
========================================================================
CUDA Error - cudaDeviceGetPCIBusId: all CUDA-capable devices are busy or unavailable/nCUDA Error - cudaDeviceGetPCIBusId: all CUDA-capable devices are busy or unavailable/nCUDA Error - cudaDeviceGetPCIBusId: all CUDA-capable devices are busy or unavailable/nCUDA Error - cudaDeviceGetPCIBusId: all CUDA-capable devices are busy or unavailable/nCUDA Error - cudaDeviceGetPCIBusId: all CUDA-capable devices are busy or unavailable/n########################################################################
MPI Rank 000, OMP_thread 00 on HWThread 001 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 000, OMP_thread 01 on HWThread 002 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 000, OMP_thread 02 on HWThread 000 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 000, OMP_thread 03 on HWThread 003 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
```
> Ask for 2 nodes, do not use alloc_flags to set gpumps and smt levels
Cmd:
```
jd.executable = 'jsrun'
jd.arguments = ['-n','2','-r','1','-a','6','-c','1','-g','1','-bpacked:4','/ccs/proj/csc190/Hello_jsrun/jsrun_layout','|','sort','&>','/ccs/proj/csc190/Hello_jsrun/op.txt']
jd.total_cpu_count = 84
#jd.alloc_flags = ['gpumps','smt4']
```
Output:
```
*** MPI Ranks: 12, OpenMP Threads: 4, GPUs per Resource Set: 1 ***
========================================================================
CUDA Error - cudaDeviceGetPCIBusId: all CUDA-capable devices are busy or unavailable/n########################################################################
CUDA Error - cudaDeviceGetPCIBusId: all CUDA-capable devices are busy or unavailable/nCUDA Error - cudaDeviceGetPCIBusId: all CUDA-capable devices are busy or unavailable/nCUDA Error - cudaDeviceGetPCIBusId: all CUDA-capable devices are busy or unavailable/nCUDA Error - cudaDeviceGetPCIBusId: all CUDA-capable devices are busy or unavailable/nCUDA Error - cudaDeviceGetPCIBusId: all CUDA-capable devices are busy or unavailable/nCUDA Error - cudaDeviceGetPCIBusId: all CUDA-capable devices are busy or unavailable/nCUDA Error - cudaDeviceGetPCIBusId: all CUDA-capable devices are busy or unavailable/nCUDA Error - cudaDeviceGetPCIBusId: all CUDA-capable devices are busy or unavailable/nCUDA Error - cudaDeviceGetPCIBusId: all CUDA-capable devices are busy or unavailable/nMPI Rank 002, OMP_thread 00 on HWThread 003 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
MPI Rank 002, OMP_thread 01 on HWThread 001 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
MPI Rank 002, OMP_thread 02 on HWThread 000 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
MPI Rank 002, OMP_thread 03 on HWThread 002 of Node a01n01 - RT_GPU_id 0 : GPU_id 0
MPI Rank 007, OMP_thread 00 on HWThread 000 of Node a01n02 - RT_GPU_id 0 : GPU_id 0
MPI Rank 007, OMP_thread 01 on HWThread 001 of Node a01n02 - RT_GPU_id 0 : GPU_id 0
MPI Rank 007, OMP_thread 02 on HWThread 002 of Node a01n02 - RT_GPU_id 0 : GPU_id 0
MPI Rank 007, OMP_thread 03 on HWThread 003 of Node a01n02 - RT_GPU_id 0 : GPU_id 0
```
> Ask for 1 node, use alloc_flags to set gpumps or smt levels
Note: Default smt level is 4, so I believe we are able to use smt level 2 implicitly. Note the bpacked argument.
Cmd:
```
jd.executable = 'jsrun'
jd.arguments = ['-n','1','-r','1','-a','6','-c','1','-g','1','-bpacked:2','/ccs/proj/csc190/Hello_jsrun/jsrun_layout','|','sort','&>','/ccs/proj/csc190/Hello_jsrun/op.txt']
jd.total_cpu_count = 42
jd.alloc_flags = ['gpumps']
```
Output:
```
########################################################################
*** MPI Ranks: 6, OpenMP Threads: 2, GPUs per Resource Set: 1 ***
========================================================================
MPI Rank 000, OMP_thread 00 on HWThread 000 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 000, OMP_thread 01 on HWThread 001 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 001, OMP_thread 00 on HWThread 002 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 001, OMP_thread 01 on HWThread 003 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 002, OMP_thread 00 on HWThread 001 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 002, OMP_thread 01 on HWThread 000 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 003, OMP_thread 00 on HWThread 002 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 003, OMP_thread 01 on HWThread 000 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 004, OMP_thread 00 on HWThread 003 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 004, OMP_thread 01 on HWThread 002 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 005, OMP_thread 00 on HWThread 002 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
MPI Rank 005, OMP_thread 01 on HWThread 003 of Node a13n10 - RT_GPU_id 0 : GPU_id 0
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment