Skip to content

Instantly share code, notes, and snippets.

@Atlas7
Last active August 7, 2017 12:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Atlas7/0fbd35e50477d228bb6fbab0e1b944c9 to your computer and use it in GitHub Desktop.
Save Atlas7/0fbd35e50477d228bb6fbab0e1b944c9 to your computer and use it in GitHub Desktop.
Intel Colfax Cluster - How to visualize Knights Landing (knl) NUMA Nodes and High Bandwidth Memory modes

Intel Colfax Cluster - Notes - Index Page


(Borrowing the slides from Colfax Research How-to Deep Dive Series), the following diagram show the bootable Knights Landing (KNL) Processor Memory Organization:

knl-numa-1.png

And the following diagram shows the High Bandwidth Memory Modes (Flat / Cache / Hybrid):

knl-numa-2.png

At the time of writing, all the KNL nodes that I could "see" are configured as "Flat" modes.

Visualize Flat mode NUMA nodes

To visualize the NUMA node flat mode configuration in one of our cluster node, we can do this:

[u4443@c001 lec-02]$ echo numactl -H | qsub -l nodes=1:knl:flat -N knl-flat
21101.c001

This queries one of our KNL nodes that have flat-mode high bandwidth memory configuration, and output result in knl-flat.o21101 (output file) and knl-flat.e21101 (error file). The name of these files are defined by the -N parm knl-flat and the qsub job number (21101 in our case).

Let's take a look at knl-flat.o21101

[u4443@c001 lec-02]$ cat knl-flat.o21101

########################################################################
# Colfax Cluster - https://colfaxresearch.com/
#      Date:           Mon Aug  7 04:39:35 PDT 2017
#    Job ID:           21101.c001
#      User:           u4443
# Resources:           neednodes=1:knl:flat,nodes=1:knl:flat,walltime=24:00:00
########################################################################

available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255
node 0 size: 98207 MB
node 0 free: 94842 MB
node 1 cpus:
node 1 size: 16384 MB
node 1 free: 15843 MB
node distances:
node   0   1
  0:  10  31
  1:  31  10

########################################################################
# Colfax Cluster
# End of output for job 21101.c001
# Date: Mon Aug  7 04:39:36 PDT 2017
########################################################################

This output tells us a couple of thing:

  • We have NUMA node 0 and NUMA node 1
  • NUMA node 0 can "see" 256 CPUs and ~96 GB of DDR4 RAM (Memory).
  • NUMA node 1 can "see" 0 CPUs and ~16 GB of MCD RAM (Memory).
  • We'd normally would do the processing on NUMA node0. But we may take advantage of NUMA node2, for the additional high bandwidth MCD RAM.

Visualize Cache and Hybrid mode NUMA nodes

Currently all the KNL nodes are configured as flat mode. If we do the following, we will get nothing:

Any KNL nodes in Cache Mode? (answer is no)

[u4443@c001 lec-02]$ echo numactl -H | qsub -l nodes=1:knl:cache -N knl-cache
qsub: submit error (Job exceeds queue resource limits MSG=cannot locate feasible nodes (nodes file is empty, all systems are busy, or no nodes have the requested feature))

Any KNL nodes in Hybrid Mode? (answer is no)

[u4443@c001 lec-02]$ echo numactl -H | qsub -l nodes=1:knl:bybrid -N knl-hybrid
qsub: submit error (Job exceeds queue resource limits MSG=cannot locate feasible nodes (nodes file is empty, all systems are busy, or no nodes have the requested feature))

This is it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment