Skip to content

Instantly share code, notes, and snippets.

@kazuki
Last active March 29, 2018 11:16
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kazuki/c6ea34136c767d3b4bacbda5d39970dd to your computer and use it in GitHub Desktop.
Save kazuki/c6ea34136c767d3b4bacbda5d39970dd to your computer and use it in GitHub Desktop.
intel mlc EPYC 7601 Xeon E5-2699Av4 (2-socket)
$ numactl -H
available: 8 nodes (0-7)
node 0 cpus: 0 1 2 3 4 5 6 7 64 65 66 67 68 69 70 71
node 0 size: 64350 MB
node 0 free: 63414 MB
node 1 cpus: 8 9 10 11 12 13 14 15 72 73 74 75 76 77 78 79
node 1 size: 64508 MB
node 1 free: 62869 MB
node 2 cpus: 16 17 18 19 20 21 22 23 80 81 82 83 84 85 86 87
node 2 size: 64487 MB
node 2 free: 63231 MB
node 3 cpus: 24 25 26 27 28 29 30 31 88 89 90 91 92 93 94 95
node 3 size: 64508 MB
node 3 free: 63633 MB
node 4 cpus: 32 33 34 35 36 37 38 39 96 97 98 99 100 101 102 103
node 4 size: 64508 MB
node 4 free: 64405 MB
node 5 cpus: 40 41 42 43 44 45 46 47 104 105 106 107 108 109 110 111
node 5 size: 64508 MB
node 5 free: 64397 MB
node 6 cpus: 48 49 50 51 52 53 54 55 112 113 114 115 116 117 118 119
node 6 size: 64508 MB
node 6 free: 64412 MB
node 7 cpus: 56 57 58 59 60 61 62 63 120 121 122 123 124 125 126 127
node 7 size: 64506 MB
node 7 free: 64337 MB
node distances:
node 0 1 2 3 4 5 6 7
0: 10 16 16 16 32 32 32 32
1: 16 10 16 16 32 32 32 32
2: 16 16 10 16 32 32 32 32
3: 16 16 16 10 32 32 32 32
4: 32 32 32 32 10 16 16 16
5: 32 32 32 32 16 10 16 16
6: 32 32 32 32 16 16 10 16
7: 32 32 32 32 16 16 16 10
Intel(R) Memory Latency Checker - v3.5
Measuring idle latencies (in ns)...
Numa node
Numa node 0 1 2 3 4 5 6 7
0 96.8 161.0 157.6 158.5 284.6 285.3 234.6 278.4
1 162.8 97.9 157.1 158.2 284.2 284.9 279.3 232.7
2 158.3 158.3 96.8 163.1 232.5 291.2 276.3 276.1
3 159.3 160.0 164.2 100.0 291.8 234.6 277.4 280.1
4 285.5 285.8 232.8 279.1 98.3 161.6 158.7 158.1
5 288.0 284.9 281.7 233.2 163.2 95.7 157.8 158.7
6 231.7 291.5 276.5 276.3 158.8 157.1 98.3 162.4
7 290.4 233.1 276.2 276.3 157.8 158.7 162.5 94.6
Measuring Peak Injection Memory Bandwidths for the system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using traffic with the following read-write ratios
ALL Reads : 237103.0
3:1 Reads-Writes : 230847.0
2:1 Reads-Writes : 228098.1
1:1 Reads-Writes : 220235.5
Stream-triad like: 230403.4
Measuring Memory Bandwidths between nodes within system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Numa node
Numa node 0 1 2 3 4 5 6 7
0 29822.1 15949.8 16609.4 16624.9 6823.9 6766.7 6737.7 6830.2
1 15947.2 29843.4 16612.6 16628.3 6759.4 6824.2 6810.4 6737.4
2 16466.4 16635.8 29836.8 15954.8 6623.2 6644.3 6705.6 6655.8
3 16729.1 16623.8 15932.2 29806.4 6693.9 6687.0 6656.8 6707.8
4 6643.2 6754.0 6682.3 6686.1 29765.4 15919.3 16591.5 16589.8
5 6648.5 6696.5 6686.7 6631.4 15910.3 29749.0 16585.4 16583.1
6 6730.5 6685.6 6687.3 6760.4 16580.1 16580.3 29757.9 15902.1
7 6818.5 6737.4 6756.2 6756.2 16700.6 16591.4 15904.2 29731.8
Measuring Loaded Latencies for the system
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Inject Latency Bandwidth
Delay (ns) MB/sec
==========================
00000 685.04 236717.7
00002 687.39 236726.6
00008 585.08 236831.7
00015 576.12 236806.0
00050 521.12 237088.6
00100 453.45 237450.7
00200 132.94 180840.5
00300 118.52 127634.0
00400 115.79 99413.8
00500 108.75 81134.4
00700 108.76 59480.2
01000 103.24 42566.6
01300 105.60 33252.4
01700 101.79 25729.2
02500 104.89 17837.1
03500 101.05 12963.3
05000 104.36 9294.2
09000 101.25 5466.8
20000 102.96 2802.1
Measuring cache-to-cache transfer latency (in ns)...
Local Socket L2->L2 HIT latency 27.8
Local Socket L2->L2 HITM latency 40.4
Remote Socket L2->L2 HITM latency (data address homed in writer socket)
Reader Numa Node
Writer Numa Node 0 1 2 3 4 5 6 7
0 - 178.6 174.9 184.1 323.9 319.8 260.5 324.5
1 185.2 - 179.3 176.6 307.3 318.5 313.3 261.3
2 197.1 182.0 - 186.6 259.4 296.5 306.9 300.8
3 188.9 187.8 180.7 - 301.1 261.3 311.7 304.3
4 311.6 313.7 257.5 315.3 - 189.9 192.1 176.2
5 328.3 308.5 310.0 250.5 192.3 - 175.6 177.4
6 252.2 303.4 308.3 289.6 189.0 187.8 - 180.9
7 299.1 264.1 297.3 310.0 176.6 189.6 194.6 -
Remote Socket L2->L2 HITM latency (data address homed in reader socket)
Reader Numa Node
Writer Numa Node 0 1 2 3 4 5 6 7
0 - 176.7 173.5 174.2 286.0 291.5 234.0 284.5
1 179.0 - 172.3 171.5 282.2 283.3 286.6 226.7
2 173.5 175.5 - 175.7 227.9 265.3 272.9 268.4
3 176.8 170.7 177.7 - 270.6 228.0 267.9 272.4
4 285.8 278.8 228.3 296.3 - 174.0 172.8 174.5
5 285.1 281.9 288.0 233.4 179.7 - 174.8 175.4
6 233.6 267.7 263.9 272.4 172.2 174.8 - 175.8
7 270.8 232.7 263.9 269.4 173.8 174.7 177.7 -
Intel(R) Memory Latency Checker - v3.5
Measuring idle latencies (in ns)...
Numa node
Numa node 0 1
0 91.4 143.8
1 143.1 89.9
Measuring Peak Injection Memory Bandwidths for the system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using traffic with the following read-write ratios
ALL Reads : 92982.4
3:1 Reads-Writes : 87096.0
2:1 Reads-Writes : 85082.7
1:1 Reads-Writes : 77652.6
Stream-triad like: 79605.5
Measuring Memory Bandwidths between nodes within system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Numa node
Numa node 0 1
0 50831.9 29864.2
1 29881.0 41283.8
Measuring Loaded Latencies for the system
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Inject Latency Bandwidth
Delay (ns) MB/sec
==========================
00000 355.06 92313.2
00002 354.89 92314.0
00008 352.08 92451.0
00015 349.02 92501.0
00050 336.29 92386.4
00100 329.89 91949.3
00200 307.02 90831.7
00300 231.75 88141.0
00400 139.41 76314.0
00500 120.42 63515.8
00700 108.69 46090.9
01000 101.96 32756.3
01300 98.79 25442.2
01700 95.82 19707.5
02500 93.22 13637.3
03500 91.59 9963.2
05000 90.48 7197.2
09000 89.68 4321.1
20000 89.21 2342.3
Measuring cache-to-cache transfer latency (in ns)...
Local Socket L2->L2 HIT latency 38.4
Local Socket L2->L2 HITM latency 42.7
Remote Socket L2->L2 HITM latency (data address homed in writer socket)
Reader Numa Node
Writer Numa Node 0 1
0 - 104.3
1 104.4 -
Remote Socket L2->L2 HITM latency (data address homed in reader socket)
Reader Numa Node
Writer Numa Node 0 1
0 - 105.7
1 105.8 -
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment