Skip to content

Instantly share code, notes, and snippets.

@nepeat
Last active February 11, 2019 14:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nepeat/eb73eae03cbc7528214218a04c7663f0 to your computer and use it in GitHub Desktop.
Save nepeat/eb73eae03cbc7528214218a04c7663f0 to your computer and use it in GitHub Desktop.
########################################################################
This is the DARPA/DOE HPC Challenge Benchmark version 1.5.0 October 2012
Produced by Jack Dongarra and Piotr Luszczek
Innovative Computing Laboratory
University of Tennessee Knoxville and Oak Ridge National Laboratory
See the source files for authors of specific codes.
Compiled on Apr 15 2018 at 23:42:37
Current time (1549892996) is Mon Feb 11 05:49:56 2019
Hostname: 'cluster-node-b8ca3a5d8eb4'
########################################################################
================================================================================
HPLinpack 2.0 -- High-Performance Linpack benchmark -- September 10, 2008
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 1000
NB : 80
PMAP : Row-major process mapping
P : 2
Q : 2
PFACT : Right
NBMIN : 4
NDIV : 2
RFACT : Crout
BCAST : 1ringM
DEPTH : 1
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words
--------------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
Begin of MPIRandomAccess section.
Running on 144 processors
Total Main table size = 2^25 = 33554432 words
PE Main table size = (2^25)/144 = 233017 words/PE MAX
Default number of updates (RECOMMENDED) = 134217728
Number of updates EXECUTED = 63363888 (for a TIME BOUND of 60.00 secs)
CPU time used = 0.876077 seconds
Real time used = 12.666014 seconds
0.005002670 Billion(10^9) Updates per second [GUP/s]
0.000034741 Billion(10^9) Updates/PE per second [GUP/s]
Verification: CPU time used = 0.331181 seconds
Verification: Real time used = 5.330198 seconds
Found 0 errors in 33554432 locations (passed).
Current time (1549893015) is Mon Feb 11 05:50:15 2019
End of MPIRandomAccess section.
Begin of StarRandomAccess section.
Main table size = 2^17 = 131072 words
Number of updates = 524288
CPU time used = 0.003990 seconds
Real time used = 0.004061 seconds
0.129104067 Billion(10^9) Updates per second [GUP/s]
Found 0 errors in 131072 locations (passed).
Node(s) with error 0
Minimum GUP/s 0.094468
Average GUP/s 0.123951
Maximum GUP/s 0.140644
Current time (1549893015) is Mon Feb 11 05:50:15 2019
End of StarRandomAccess section.
Begin of SingleRandomAccess section.
Node(s) with error 0
Node selected 93
Single GUP/s 0.164553
Current time (1549893015) is Mon Feb 11 05:50:15 2019
End of SingleRandomAccess section.
Begin of MPIRandomAccess_LCG section.
Running on 144 processors
Total Main table size = 2^25 = 33554432 words
PE Main table size = (2^25)/144 = 233017 words/PE MAX
Default number of updates (RECOMMENDED) = 134217728
Number of updates EXECUTED = 134217728 (for a TIME BOUND of 60.00 secs)
CPU time used = 1.953834 seconds
Real time used = 26.733578 seconds
0.005020567 Billion(10^9) Updates per second [GUP/s]
0.000034865 Billion(10^9) Updates/PE per second [GUP/s]
Verification: CPU time used = 0.171181 seconds
Verification: Real time used = 1.843976 seconds
Found 0 errors in 33554432 locations (passed).
Current time (1549893044) is Mon Feb 11 05:50:44 2019
End of MPIRandomAccess_LCG section.
Begin of StarRandomAccess_LCG section.
Main table size = 2^17 = 131072 words
Number of updates = 524288
CPU time used = 0.003994 seconds
Real time used = 0.004164 seconds
0.125911002 Billion(10^9) Updates per second [GUP/s]
Found 0 errors in 131072 locations (passed).
Node(s) with error 0
Minimum GUP/s 0.086703
Average GUP/s 0.124973
Maximum GUP/s 0.150594
Current time (1549893044) is Mon Feb 11 05:50:44 2019
End of StarRandomAccess_LCG section.
Begin of SingleRandomAccess_LCG section.
Node(s) with error 0
Node selected 6
Single GUP/s 0.196226
Current time (1549893044) is Mon Feb 11 05:50:44 2019
End of SingleRandomAccess_LCG section.
Begin of PTRANS section.
M: 500
N: 500
MB: 80
NB: 80
P: 2
Q: 2
TIME M N MB NB P Q TIME CHECK GB/s RESID
---- ----- ----- --- --- --- --- -------- ------ -------- -----
WALL 500 500 80 80 2 2 0.01 PASSED 0.187 0.00
CPU 500 500 80 80 2 2 0.00 PASSED 20.202 0.00
WALL 500 500 80 80 2 2 0.01 PASSED 0.187 0.00
CPU 500 500 80 80 2 2 0.00 PASSED 37.037 0.00
WALL 500 500 80 80 2 2 0.01 PASSED 0.187 0.00
CPU 500 500 80 80 2 2 0.00 PASSED 0.496 0.00
WALL 500 500 80 80 2 2 0.01 PASSED 0.187 0.00
CPU 500 500 80 80 2 2 0.00 PASSED 200.000 0.00
WALL 500 500 80 80 2 2 0.01 PASSED 0.187 0.00
CPU 500 500 80 80 2 2 0.00 PASSED 200.000 0.00
Finished 5 tests, with the following results:
5 tests completed and passed residual checks.
0 tests completed and failed residual checks.
0 tests skipped because of illegal input values.
END OF TESTS.
Current time (1549893044) is Mon Feb 11 05:50:44 2019
End of PTRANS section.
Begin of StarDGEMM section.
Scaled residual: 0.0311043
Node(s) with error 0
Minimum Gflop/s 3.081478
Average Gflop/s 3.553794
Maximum Gflop/s 4.168029
Current time (1549893044) is Mon Feb 11 05:50:44 2019
End of StarDGEMM section.
Begin of SingleDGEMM section.
Node(s) with error 0
Node selected 6
Single DGEMM Gflop/s 5.228502
Current time (1549893044) is Mon Feb 11 05:50:44 2019
End of SingleDGEMM section.
Begin of StarSTREAM section.
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 83333, Offset = 0
Total memory required = 0.0019 GiB.
Each test is run 10 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
The SCALAR value used for this run is 0.420000
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 90 microseconds.
(= 90 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
VERBOSE: total setup time for rank 0 = 0.003450 seconds
-------------------------------------------------------------
Function Rate (GB/s) Avg time Min time Max time
Copy: 3.3657 0.0028 0.0004 0.0103
Scale: 2.9268 0.0027 0.0005 0.0136
Add: 1.2950 0.0039 0.0015 0.0106
Triad: 0.9305 0.0031 0.0021 0.0037
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
Node(s) with error 0
Minimum Copy GB/s 3.365664
Average Copy GB/s 3.365664
Maximum Copy GB/s 3.365664
Minimum Scale GB/s 2.926757
Average Scale GB/s 2.926757
Maximum Scale GB/s 2.926757
Minimum Add GB/s 1.295010
Average Add GB/s 1.295010
Maximum Add GB/s 1.295010
Minimum Triad GB/s 0.930509
Average Triad GB/s 0.930509
Maximum Triad GB/s 0.930509
Current time (1549893044) is Mon Feb 11 05:50:44 2019
End of StarSTREAM section.
Begin of SingleSTREAM section.
Node(s) with error 0
Node selected 6
Single STREAM Copy GB/s 13.774334
Single STREAM Scale GB/s 12.127558
Single STREAM Add GB/s 15.341597
Single STREAM Triad GB/s 13.797711
Current time (1549893044) is Mon Feb 11 05:50:44 2019
End of SingleSTREAM section.
Begin of MPIFFT section.
Number of nodes: 128
Vector size: 2097152
Generation time: 0.001
Tuning: 0.001
Computing: 0.176
Inverse FFT: 0.179
max(|x-x0|): 1.616e-15
Gflop/s: 1.253
Current time (1549893045) is Mon Feb 11 05:50:45 2019
End of MPIFFT section.
Begin of StarFFT section.
Vector size: 32768
Generation time: 0.002
Tuning: 0.000
Computing: 0.003
Inverse FFT: 0.002
max(|x-x0|): 1.171e-15
Node(s) with error 0
Minimum Gflop/s 0.855966
Average Gflop/s 1.101746
Maximum Gflop/s 1.344019
Current time (1549893045) is Mon Feb 11 05:50:45 2019
End of StarFFT section.
Begin of SingleFFT section.
Node(s) with error 0
Node selected 57
Single FFT Gflop/s 1.583789
Current time (1549893045) is Mon Feb 11 05:50:45 2019
End of SingleFFT section.
Begin of LatencyBandwidth section.
------------------------------------------------------------------
Latency-Bandwidth-Benchmark R1.5.1 (c) HLRS, University of Stuttgart
Written by Rolf Rabenseifner, Gerrit Schulz, and Michael Speck, Germany
Details - level 2
-----------------
MPI_Wtime granularity.
Max. MPI_Wtick is 0.000000 sec
wtick is set to 0.000001 sec
Message Length: 8
Latency min / avg / max: 0.070444 / 0.070444 / 0.070444 msecs
Bandwidth min / avg / max: 0.114 / 0.114 / 0.114 MByte/s
MPI_Wtime granularity is ok.
message size: 8
max time : 10.000000 secs
latency for msg: 0.070444 msecs
estimation for ping pong: 6.339932 msecs
max number of ping pong pairs = 1577
max client pings = max server pongs = 39
stride for latency = 5
Message Length: 8
Latency min / avg / max: 0.027379 / 0.060138 / 0.081365 msecs
Bandwidth min / avg / max: 0.098 / 0.146 / 0.292 MByte/s
Message Length: 2000000
Latency min / avg / max: 17.293045 / 17.293045 / 17.293045 msecs
Bandwidth min / avg / max: 115.653 / 115.653 / 115.653 MByte/s
MPI_Wtime granularity is ok.
message size: 2000000
max time : 30.000000 secs
latency for msg: 17.293045 msecs
estimation for ping pong: 138.344356 msecs
max number of ping pong pairs = 216
max client pings = max server pongs = 14
stride for latency = 11
Message Length: 2000000
Latency min / avg / max: 0.310803 / 15.433052 / 17.406084 msecs
Bandwidth min / avg / max: 114.902 / 741.344 / 6434.934 MByte/s
Message Size: 8 Byte
Natural Order Latency: 0.050991 msec
Natural Order Bandwidth: 0.156889 MB/s
Avg Random Order Latency: 0.090596 msec
Avg Random Order Bandwidth: 0.088304 MB/s
Message Size: 2000000 Byte
Natural Order Latency: 21.680526 msec
Natural Order Bandwidth: 92.248685 MB/s
Avg Random Order Latency: 467.755682 msec
Avg Random Order Bandwidth: 4.275736 MB/s
Execution time (wall clock) = 197.447 sec on 144 processes
- for cross ping_pong latency = 4.813 sec
- for cross ping_pong bandwidth = 22.733 sec
- for ring latency = 1.177 sec
- for ring bandwidth = 168.723 sec
------------------------------------------------------------------
Latency-Bandwidth-Benchmark R1.5.1 (c) HLRS, University of Stuttgart
Written by Rolf Rabenseifner, Gerrit Schulz, and Michael Speck, Germany
Major Benchmark results:
------------------------
Max Ping Pong Latency: 0.081365 msecs
Randomly Ordered Ring Latency: 0.090596 msecs
Min Ping Pong Bandwidth: 114.902356 MB/s
Naturally Ordered Ring Bandwidth: 92.248685 MB/s
Randomly Ordered Ring Bandwidth: 4.275736 MB/s
------------------------------------------------------------------
Detailed benchmark results:
Ping Pong:
Latency min / avg / max: 0.027379 / 0.060138 / 0.081365 msecs
Bandwidth min / avg / max: 114.902 / 741.344 / 6434.934 MByte/s
Ring:
On naturally ordered ring: latency= 0.050991 msec, bandwidth= 92.248685 MB/s
On randomly ordered ring: latency= 0.090596 msec, bandwidth= 4.275736 MB/s
------------------------------------------------------------------
Benchmark conditions:
The latency measurements were done with 8 bytes
The bandwidth measurements were done with 2000000 bytes
The ring communication was done in both directions on 144 processes
The Ping Pong measurements were done on
- 841 pairs of processes for latency benchmarking, and
- 182 pairs of processes for bandwidth benchmarking,
out of 144*(144-1) = 20592 possible combinations on 144 processes.
(1 MB/s = 10**6 byte/sec)
------------------------------------------------------------------
Current time (1549893242) is Mon Feb 11 05:54:02 2019
End of LatencyBandwidth section.
Begin of HPL section.
================================================================================
HPLinpack 2.0 -- High-Performance Linpack benchmark -- September 10, 2008
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 1000
NB : 80
PMAP : Row-major process mapping
P : 2
Q : 2
PFACT : Right
NBMIN : 4
NDIV : 2
RFACT : Crout
BCAST : 1ringM
DEPTH : 1
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words
--------------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR11C2R4 1000 80 2 2 0.06 1.059e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0058297 ...... PASSED
================================================================================
Finished 1 tests with the following results:
1 tests completed and passed residual checks,
0 tests completed and failed residual checks,
0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------
End of Tests.
================================================================================
Current time (1549893242) is Mon Feb 11 05:54:02 2019
End of HPL section.
Begin of Summary section.
VersionMajor=1
VersionMinor=5
VersionMicro=0
VersionRelease=f
LANG=C
Success=1
sizeof_char=1
sizeof_short=2
sizeof_int=4
sizeof_long=8
sizeof_void_ptr=8
sizeof_size_t=8
sizeof_float=4
sizeof_double=8
sizeof_s64Int=8
sizeof_u64Int=8
sizeof_struct_double_double=16
CommWorldProcs=144
MPI_Wtick=1.000000e-09
HPL_Tflops=0.0105853
HPL_time=0.0631221
HPL_eps=1.11022e-16
HPL_RnormI=1.9309e-12
HPL_Anorm1=263.865
HPL_AnormI=262.773
HPL_Xnorm1=2619.63
HPL_XnormI=11.3513
HPL_BnormI=0.499776
HPL_N=1000
HPL_NB=80
HPL_nprow=2
HPL_npcol=2
HPL_depth=1
HPL_nbdiv=2
HPL_nbmin=4
HPL_cpfact=R
HPL_crfact=C
HPL_ctop=1
HPL_order=R
HPL_dMACH_EPS=1.110223e-16
HPL_dMACH_SFMIN=2.225074e-308
HPL_dMACH_BASE=2.000000e+00
HPL_dMACH_PREC=2.220446e-16
HPL_dMACH_MLEN=5.300000e+01
HPL_dMACH_RND=1.000000e+00
HPL_dMACH_EMIN=-1.021000e+03
HPL_dMACH_RMIN=2.225074e-308
HPL_dMACH_EMAX=1.024000e+03
HPL_dMACH_RMAX=1.797693e+308
HPL_sMACH_EPS=5.960464e-08
HPL_sMACH_SFMIN=1.175494e-38
HPL_sMACH_BASE=2.000000e+00
HPL_sMACH_PREC=1.192093e-07
HPL_sMACH_MLEN=2.400000e+01
HPL_sMACH_RND=1.000000e+00
HPL_sMACH_EMIN=-1.250000e+02
HPL_sMACH_RMIN=1.175494e-38
HPL_sMACH_EMAX=1.280000e+02
HPL_sMACH_RMAX=3.402823e+38
dweps=1.110223e-16
sweps=5.960464e-08
HPLMaxProcs=4
HPLMinProcs=4
DGEMM_N=288
StarDGEMM_Gflops=3.55379
SingleDGEMM_Gflops=5.2285
PTRANS_GBs=0.186679
PTRANS_time=0.00822854
PTRANS_residual=1
PTRANS_n=500
PTRANS_nb=80
PTRANS_nprow=2
PTRANS_npcol=2
MPIRandomAccess_LCG_N=33554432
MPIRandomAccess_LCG_time=26.7336
MPIRandomAccess_LCG_CheckTime=1.84398
MPIRandomAccess_LCG_Errors=0
MPIRandomAccess_LCG_ErrorsFraction=0
MPIRandomAccess_LCG_ExeUpdates=134217728
MPIRandomAccess_LCG_GUPs=0.00502057
MPIRandomAccess_LCG_TimeBound=60
MPIRandomAccess_LCG_Algorithm=0
MPIRandomAccess_N=33554432
MPIRandomAccess_time=12.666
MPIRandomAccess_CheckTime=5.3302
MPIRandomAccess_Errors=0
MPIRandomAccess_ErrorsFraction=0
MPIRandomAccess_ExeUpdates=63363888
MPIRandomAccess_GUPs=0.00500267
MPIRandomAccess_TimeBound=60
MPIRandomAccess_Algorithm=0
RandomAccess_LCG_N=131072
StarRandomAccess_LCG_GUPs=0.124973
SingleRandomAccess_LCG_GUPs=0.196226
RandomAccess_N=131072
StarRandomAccess_GUPs=0.123951
SingleRandomAccess_GUPs=0.164553
STREAM_VectorSize=83333
STREAM_Threads=1
StarSTREAM_Copy=3.36566
StarSTREAM_Scale=2.92676
StarSTREAM_Add=1.29501
StarSTREAM_Triad=0.930509
SingleSTREAM_Copy=13.7743
SingleSTREAM_Scale=12.1276
SingleSTREAM_Add=15.3416
SingleSTREAM_Triad=13.7977
FFT_N=32768
StarFFT_Gflops=1.10175
SingleFFT_Gflops=1.58379
MPIFFT_N=2097152
MPIFFT_Gflops=1.25313
MPIFFT_maxErr=1.61598e-15
MPIFFT_Procs=128
MaxPingPongLatency_usec=81.3647
RandomlyOrderedRingLatency_usec=90.596
MinPingPongBandwidth_GBytes=0.114902
NaturallyOrderedRingBandwidth_GBytes=0.0922487
RandomlyOrderedRingBandwidth_GBytes=0.00427574
MinPingPongLatency_usec=27.3792
AvgPingPongLatency_usec=60.1384
MaxPingPongBandwidth_GBytes=6.43493
AvgPingPongBandwidth_GBytes=0.741344
NaturallyOrderedRingLatency_usec=50.9914
FFTEnblk=16
FFTEnp=8
FFTEl2size=1048576
M_OPENMP=-1
omp_get_num_threads=0
omp_get_max_threads=0
omp_get_num_procs=0
MemProc=-1
MemSpec=-1
MemVal=-1
MPIFFT_time0=5.69e-07
MPIFFT_time1=0.0504214
MPIFFT_time2=0.00119853
MPIFFT_time3=0.0650489
MPIFFT_time4=0.00204202
MPIFFT_time5=0.0569445
MPIFFT_time6=2.26e-07
CPS_HPCC_FFT_235=0
CPS_HPCC_FFTW_ESTIMATE=0
CPS_HPCC_MEMALLCTR=0
CPS_HPL_USE_GETPROCESSTIMES=0
CPS_RA_SANDIA_NOPT=0
CPS_RA_SANDIA_OPT2=0
CPS_USING_FFTW=0
End of Summary section.
########################################################################
End of HPC Challenge tests.
Current time (1549893242) is Mon Feb 11 05:54:02 2019
########################################################################
########################################################################
This is the DARPA/DOE HPC Challenge Benchmark version 1.5.0 October 2012
Produced by Jack Dongarra and Piotr Luszczek
Innovative Computing Laboratory
University of Tennessee Knoxville and Oak Ridge National Laboratory
See the source files for authors of specific codes.
Compiled on Apr 15 2018 at 23:42:37
Current time (1549894268) is Mon Feb 11 06:11:08 2019
Hostname: 'cluster-node-b8ca3a5d8eb4'
########################################################################
================================================================================
HPLinpack 2.0 -- High-Performance Linpack benchmark -- September 10, 2008
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 150816
NB : 90
PMAP : Row-major process mapping
P : 6
Q : 24
PFACT : Right
NBMIN : 4
NDIV : 2
RFACT : Crout
BCAST : 1ringM
DEPTH : 1
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words
--------------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
Begin of MPIRandomAccess section.
Running on 144 processors
Total Main table size = 2^34 = 17179869184 words
PE Main table size = (2^34)/144 = 119304648 words/PE MAX
Default number of updates (RECOMMENDED) = 68719476736
Number of updates EXECUTED = 295034400 (for a TIME BOUND of 60.00 secs)
CPU time used = 4.449711 seconds
Real time used = 60.040803 seconds
0.004913898 Billion(10^9) Updates per second [GUP/s]
0.000034124 Billion(10^9) Updates/PE per second [GUP/s]
Verification: CPU time used = 2.232378 seconds
Verification: Real time used = 16.678935 seconds
Found 0 errors in 17179869184 locations (passed).
Current time (1549894490) is Mon Feb 11 06:14:50 2019
End of MPIRandomAccess section.
Begin of StarRandomAccess section.
Main table size = 2^27 = 134217728 words
Number of updates = 536870912
CPU time used = 91.685604 seconds
Real time used = 92.752443 seconds
0.005788213 Billion(10^9) Updates per second [GUP/s]
Found 0 errors in 134217728 locations (passed).
Node(s) with error 0
Minimum GUP/s 0.002890
Average GUP/s 0.005730
Maximum GUP/s 0.008963
Current time (1549894770) is Mon Feb 11 06:19:30 2019
End of StarRandomAccess section.
Begin of SingleRandomAccess section.
Node(s) with error 0
Node selected 14
Single GUP/s 0.039375
Current time (1549894796) is Mon Feb 11 06:19:56 2019
End of SingleRandomAccess section.
Begin of MPIRandomAccess_LCG section.
Running on 144 processors
Total Main table size = 2^34 = 17179869184 words
PE Main table size = (2^34)/144 = 119304648 words/PE MAX
Default number of updates (RECOMMENDED) = 68719476736
Number of updates EXECUTED = 306349776 (for a TIME BOUND of 60.00 secs)
CPU time used = 4.706499 seconds
Real time used = 62.048918 seconds
0.004937230 Billion(10^9) Updates per second [GUP/s]
0.000034286 Billion(10^9) Updates/PE per second [GUP/s]
Verification: CPU time used = 1.789492 seconds
Verification: Real time used = 5.887406 seconds
Found 9 errors in 17179869184 locations (passed).
Current time (1549895004) is Mon Feb 11 06:23:24 2019
End of MPIRandomAccess_LCG section.
Begin of StarRandomAccess_LCG section.
Main table size = 2^27 = 134217728 words
Number of updates = 536870912
CPU time used = 91.709646 seconds
Real time used = 93.146353 seconds
0.005763735 Billion(10^9) Updates per second [GUP/s]
Found 0 errors in 134217728 locations (passed).
Node(s) with error 0
Minimum GUP/s 0.002855
Average GUP/s 0.005756
Maximum GUP/s 0.008747
Current time (1549895289) is Mon Feb 11 06:28:09 2019
End of StarRandomAccess_LCG section.
Begin of SingleRandomAccess_LCG section.
Node(s) with error 0
Node selected 15
Single GUP/s 0.042624
Current time (1549895313) is Mon Feb 11 06:28:33 2019
End of SingleRandomAccess_LCG section.
Begin of PTRANS section.
M: 75408
N: 75408
MB: 90
NB: 90
P: 6
Q: 24
TIME M N MB NB P Q TIME CHECK GB/s RESID
---- ----- ----- --- --- --- --- -------- ------ -------- -----
WALL 75408 75408 90 90 6 24 74.93 PASSED 0.607 0.00
CPU 75408 75408 90 90 6 24 5.53 PASSED 8.232 0.00
WALL 75408 75408 90 90 6 24 70.27 PASSED 0.607 0.00
CPU 75408 75408 90 90 6 24 5.27 PASSED 8.635 0.00
WALL 75408 75408 90 90 6 24 68.15 PASSED 0.607 0.00
CPU 75408 75408 90 90 6 24 4.99 PASSED 9.108 0.00
WALL 75408 75408 90 90 6 24 68.89 PASSED 0.607 0.00
CPU 75408 75408 90 90 6 24 5.16 PASSED 8.814 0.00
WALL 75408 75408 90 90 6 24 71.02 PASSED 0.607 0.00
CPU 75408 75408 90 90 6 24 5.37 PASSED 8.472 0.00
Finished 5 tests, with the following results:
5 tests completed and passed residual checks.
0 tests completed and failed residual checks.
0 tests skipped because of illegal input values.
END OF TESTS.
Current time (1549895708) is Mon Feb 11 06:35:08 2019
End of PTRANS section.
Begin of StarDGEMM section.
Scaled residual: 0.00704105
Node(s) with error 0
Minimum Gflop/s 4.165420
Average Gflop/s 4.250504
Maximum Gflop/s 4.325118
Current time (1549895902) is Mon Feb 11 06:38:22 2019
End of StarDGEMM section.
Begin of SingleDGEMM section.
Node(s) with error 0
Node selected 113
Single DGEMM Gflop/s 5.126427
Current time (1549896059) is Mon Feb 11 06:40:59 2019
End of SingleDGEMM section.
Begin of StarSTREAM section.
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 52651541, Offset = 0
Total memory required = 1.1769 GiB.
Each test is run 10 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
The SCALAR value used for this run is 0.420000
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 678661 microseconds.
(= 678661 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
VERBOSE: total setup time for rank 0 = 6.446717 seconds
-------------------------------------------------------------
Function Rate (GB/s) Avg time Min time Max time
Copy: 0.4922 1.7273 1.7116 1.7387
Scale: 0.4975 1.7277 1.6933 1.7373
Add: 0.5564 2.2834 2.2713 2.2871
Triad: 0.5575 2.2725 2.2665 2.2785
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
Node(s) with error 0
Minimum Copy GB/s 0.492195
Average Copy GB/s 0.492195
Maximum Copy GB/s 0.492195
Minimum Scale GB/s 0.497493
Average Scale GB/s 0.497493
Maximum Scale GB/s 0.497493
Minimum Add GB/s 0.556361
Average Add GB/s 0.556361
Maximum Add GB/s 0.556361
Minimum Triad GB/s 0.557535
Average Triad GB/s 0.557535
Maximum Triad GB/s 0.557535
Current time (1549896148) is Mon Feb 11 06:42:28 2019
End of StarSTREAM section.
Begin of SingleSTREAM section.
Node(s) with error 0
Node selected 137
Single STREAM Copy GB/s 5.633149
Single STREAM Scale GB/s 6.144910
Single STREAM Add GB/s 6.983099
Single STREAM Triad GB/s 6.968478
Current time (1549896155) is Mon Feb 11 06:42:35 2019
End of SingleSTREAM section.
Begin of MPIFFT section.
Number of nodes: 128
Vector size: 2147483648
Generation time: 1.120
Tuning: 1.105
Computing: 167.914
Inverse FFT: 169.693
max(|x-x0|): 2.784e-15
Gflop/s: 1.982
Current time (1549896498) is Mon Feb 11 06:48:18 2019
End of MPIFFT section.
Begin of StarFFT section.
Vector size: 33554432
Generation time: 2.382
Tuning: 0.002
Computing: 11.768
Inverse FFT: 12.463
max(|x-x0|): 2.156e-15
Node(s) with error 0
Minimum Gflop/s 0.211622
Average Gflop/s 0.369285
Maximum Gflop/s 0.500625
Current time (1549896534) is Mon Feb 11 06:48:54 2019
End of StarFFT section.
Begin of SingleFFT section.
Node(s) with error 0
Node selected 40
Single FFT Gflop/s 1.491395
Current time (1549896543) is Mon Feb 11 06:49:03 2019
End of SingleFFT section.
Begin of LatencyBandwidth section.
------------------------------------------------------------------
Latency-Bandwidth-Benchmark R1.5.1 (c) HLRS, University of Stuttgart
Written by Rolf Rabenseifner, Gerrit Schulz, and Michael Speck, Germany
Details - level 2
-----------------
MPI_Wtime granularity.
Max. MPI_Wtick is 0.000000 sec
wtick is set to 0.000001 sec
Message Length: 8
Latency min / avg / max: 0.069041 / 0.069041 / 0.069041 msecs
Bandwidth min / avg / max: 0.116 / 0.116 / 0.116 MByte/s
MPI_Wtime granularity is ok.
message size: 8
max time : 10.000000 secs
latency for msg: 0.069041 msecs
estimation for ping pong: 6.213718 msecs
max number of ping pong pairs = 1609
max client pings = max server pongs = 40
stride for latency = 5
Message Length: 8
Latency min / avg / max: 0.027834 / 0.059848 / 0.079293 msecs
Bandwidth min / avg / max: 0.101 / 0.147 / 0.287 MByte/s
Message Length: 2000000
Latency min / avg / max: 17.283452 / 17.283452 / 17.283452 msecs
Bandwidth min / avg / max: 115.718 / 115.718 / 115.718 MByte/s
MPI_Wtime granularity is ok.
message size: 2000000
max time : 30.000000 secs
latency for msg: 17.283452 msecs
estimation for ping pong: 138.267616 msecs
max number of ping pong pairs = 216
max client pings = max server pongs = 14
stride for latency = 11
Message Length: 2000000
Latency min / avg / max: 0.247290 / 15.422812 / 17.470225 msecs
Bandwidth min / avg / max: 114.480 / 946.376 / 8087.670 MByte/s
Message Size: 8 Byte
Natural Order Latency: 0.052405 msec
Natural Order Bandwidth: 0.152658 MB/s
Avg Random Order Latency: 0.091965 msec
Avg Random Order Bandwidth: 0.086990 MB/s
Message Size: 2000000 Byte
Natural Order Latency: 24.652331 msec
Natural Order Bandwidth: 81.128232 MB/s
Avg Random Order Latency: 473.559513 msec
Avg Random Order Bandwidth: 4.223334 MB/s
Execution time (wall clock) = 192.404 sec on 144 processes
- for cross ping_pong latency = 4.791 sec
- for cross ping_pong bandwidth = 22.488 sec
- for ring latency = 1.287 sec
- for ring bandwidth = 163.838 sec
------------------------------------------------------------------
Latency-Bandwidth-Benchmark R1.5.1 (c) HLRS, University of Stuttgart
Written by Rolf Rabenseifner, Gerrit Schulz, and Michael Speck, Germany
Major Benchmark results:
------------------------
Max Ping Pong Latency: 0.079293 msecs
Randomly Ordered Ring Latency: 0.091965 msecs
Min Ping Pong Bandwidth: 114.480498 MB/s
Naturally Ordered Ring Bandwidth: 81.128232 MB/s
Randomly Ordered Ring Bandwidth: 4.223334 MB/s
------------------------------------------------------------------
Detailed benchmark results:
Ping Pong:
Latency min / avg / max: 0.027834 / 0.059848 / 0.079293 msecs
Bandwidth min / avg / max: 114.480 / 946.376 / 8087.670 MByte/s
Ring:
On naturally ordered ring: latency= 0.052405 msec, bandwidth= 81.128232 MB/s
On randomly ordered ring: latency= 0.091965 msec, bandwidth= 4.223334 MB/s
------------------------------------------------------------------
Benchmark conditions:
The latency measurements were done with 8 bytes
The bandwidth measurements were done with 2000000 bytes
The ring communication was done in both directions on 144 processes
The Ping Pong measurements were done on
- 841 pairs of processes for latency benchmarking, and
- 182 pairs of processes for bandwidth benchmarking,
out of 144*(144-1) = 20592 possible combinations on 144 processes.
(1 MB/s = 10**6 byte/sec)
------------------------------------------------------------------
Current time (1549896736) is Mon Feb 11 06:52:16 2019
End of LatencyBandwidth section.
Begin of HPL section.
================================================================================
HPLinpack 2.0 -- High-Performance Linpack benchmark -- September 10, 2008
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 150816
NB : 90
PMAP : Row-major process mapping
P : 6
Q : 24
PFACT : Right
NBMIN : 4
NDIV : 2
RFACT : Crout
BCAST : 1ringM
DEPTH : 1
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words
--------------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment