Last active
February 11, 2019 14:53
-
-
Save nepeat/eb73eae03cbc7528214218a04c7663f0 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
######################################################################## | |
This is the DARPA/DOE HPC Challenge Benchmark version 1.5.0 October 2012 | |
Produced by Jack Dongarra and Piotr Luszczek | |
Innovative Computing Laboratory | |
University of Tennessee Knoxville and Oak Ridge National Laboratory | |
See the source files for authors of specific codes. | |
Compiled on Apr 15 2018 at 23:42:37 | |
Current time (1549892996) is Mon Feb 11 05:49:56 2019 | |
Hostname: 'cluster-node-b8ca3a5d8eb4' | |
######################################################################## | |
================================================================================ | |
HPLinpack 2.0 -- High-Performance Linpack benchmark -- September 10, 2008 | |
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK | |
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK | |
Modified by Julien Langou, University of Colorado Denver | |
================================================================================ | |
An explanation of the input/output parameters follows: | |
T/V : Wall time / encoded variant. | |
N : The order of the coefficient matrix A. | |
NB : The partitioning blocking factor. | |
P : The number of process rows. | |
Q : The number of process columns. | |
Time : Time in seconds to solve the linear system. | |
Gflops : Rate of execution for solving the linear system. | |
The following parameter values will be used: | |
N : 1000 | |
NB : 80 | |
PMAP : Row-major process mapping | |
P : 2 | |
Q : 2 | |
PFACT : Right | |
NBMIN : 4 | |
NDIV : 2 | |
RFACT : Crout | |
BCAST : 1ringM | |
DEPTH : 1 | |
SWAP : Mix (threshold = 64) | |
L1 : transposed form | |
U : transposed form | |
EQUIL : yes | |
ALIGN : 8 double precision words | |
-------------------------------------------------------------------------------- | |
- The matrix A is randomly generated for each test. | |
- The following scaled residual check will be computed: | |
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N ) | |
- The relative machine precision (eps) is taken to be 1.110223e-16 | |
- Computational tests pass if scaled residuals are less than 16.0 | |
Begin of MPIRandomAccess section. | |
Running on 144 processors | |
Total Main table size = 2^25 = 33554432 words | |
PE Main table size = (2^25)/144 = 233017 words/PE MAX | |
Default number of updates (RECOMMENDED) = 134217728 | |
Number of updates EXECUTED = 63363888 (for a TIME BOUND of 60.00 secs) | |
CPU time used = 0.876077 seconds | |
Real time used = 12.666014 seconds | |
0.005002670 Billion(10^9) Updates per second [GUP/s] | |
0.000034741 Billion(10^9) Updates/PE per second [GUP/s] | |
Verification: CPU time used = 0.331181 seconds | |
Verification: Real time used = 5.330198 seconds | |
Found 0 errors in 33554432 locations (passed). | |
Current time (1549893015) is Mon Feb 11 05:50:15 2019 | |
End of MPIRandomAccess section. | |
Begin of StarRandomAccess section. | |
Main table size = 2^17 = 131072 words | |
Number of updates = 524288 | |
CPU time used = 0.003990 seconds | |
Real time used = 0.004061 seconds | |
0.129104067 Billion(10^9) Updates per second [GUP/s] | |
Found 0 errors in 131072 locations (passed). | |
Node(s) with error 0 | |
Minimum GUP/s 0.094468 | |
Average GUP/s 0.123951 | |
Maximum GUP/s 0.140644 | |
Current time (1549893015) is Mon Feb 11 05:50:15 2019 | |
End of StarRandomAccess section. | |
Begin of SingleRandomAccess section. | |
Node(s) with error 0 | |
Node selected 93 | |
Single GUP/s 0.164553 | |
Current time (1549893015) is Mon Feb 11 05:50:15 2019 | |
End of SingleRandomAccess section. | |
Begin of MPIRandomAccess_LCG section. | |
Running on 144 processors | |
Total Main table size = 2^25 = 33554432 words | |
PE Main table size = (2^25)/144 = 233017 words/PE MAX | |
Default number of updates (RECOMMENDED) = 134217728 | |
Number of updates EXECUTED = 134217728 (for a TIME BOUND of 60.00 secs) | |
CPU time used = 1.953834 seconds | |
Real time used = 26.733578 seconds | |
0.005020567 Billion(10^9) Updates per second [GUP/s] | |
0.000034865 Billion(10^9) Updates/PE per second [GUP/s] | |
Verification: CPU time used = 0.171181 seconds | |
Verification: Real time used = 1.843976 seconds | |
Found 0 errors in 33554432 locations (passed). | |
Current time (1549893044) is Mon Feb 11 05:50:44 2019 | |
End of MPIRandomAccess_LCG section. | |
Begin of StarRandomAccess_LCG section. | |
Main table size = 2^17 = 131072 words | |
Number of updates = 524288 | |
CPU time used = 0.003994 seconds | |
Real time used = 0.004164 seconds | |
0.125911002 Billion(10^9) Updates per second [GUP/s] | |
Found 0 errors in 131072 locations (passed). | |
Node(s) with error 0 | |
Minimum GUP/s 0.086703 | |
Average GUP/s 0.124973 | |
Maximum GUP/s 0.150594 | |
Current time (1549893044) is Mon Feb 11 05:50:44 2019 | |
End of StarRandomAccess_LCG section. | |
Begin of SingleRandomAccess_LCG section. | |
Node(s) with error 0 | |
Node selected 6 | |
Single GUP/s 0.196226 | |
Current time (1549893044) is Mon Feb 11 05:50:44 2019 | |
End of SingleRandomAccess_LCG section. | |
Begin of PTRANS section. | |
M: 500 | |
N: 500 | |
MB: 80 | |
NB: 80 | |
P: 2 | |
Q: 2 | |
TIME M N MB NB P Q TIME CHECK GB/s RESID | |
---- ----- ----- --- --- --- --- -------- ------ -------- ----- | |
WALL 500 500 80 80 2 2 0.01 PASSED 0.187 0.00 | |
CPU 500 500 80 80 2 2 0.00 PASSED 20.202 0.00 | |
WALL 500 500 80 80 2 2 0.01 PASSED 0.187 0.00 | |
CPU 500 500 80 80 2 2 0.00 PASSED 37.037 0.00 | |
WALL 500 500 80 80 2 2 0.01 PASSED 0.187 0.00 | |
CPU 500 500 80 80 2 2 0.00 PASSED 0.496 0.00 | |
WALL 500 500 80 80 2 2 0.01 PASSED 0.187 0.00 | |
CPU 500 500 80 80 2 2 0.00 PASSED 200.000 0.00 | |
WALL 500 500 80 80 2 2 0.01 PASSED 0.187 0.00 | |
CPU 500 500 80 80 2 2 0.00 PASSED 200.000 0.00 | |
Finished 5 tests, with the following results: | |
5 tests completed and passed residual checks. | |
0 tests completed and failed residual checks. | |
0 tests skipped because of illegal input values. | |
END OF TESTS. | |
Current time (1549893044) is Mon Feb 11 05:50:44 2019 | |
End of PTRANS section. | |
Begin of StarDGEMM section. | |
Scaled residual: 0.0311043 | |
Node(s) with error 0 | |
Minimum Gflop/s 3.081478 | |
Average Gflop/s 3.553794 | |
Maximum Gflop/s 4.168029 | |
Current time (1549893044) is Mon Feb 11 05:50:44 2019 | |
End of StarDGEMM section. | |
Begin of SingleDGEMM section. | |
Node(s) with error 0 | |
Node selected 6 | |
Single DGEMM Gflop/s 5.228502 | |
Current time (1549893044) is Mon Feb 11 05:50:44 2019 | |
End of SingleDGEMM section. | |
Begin of StarSTREAM section. | |
------------------------------------------------------------- | |
This system uses 8 bytes per DOUBLE PRECISION word. | |
------------------------------------------------------------- | |
Array size = 83333, Offset = 0 | |
Total memory required = 0.0019 GiB. | |
Each test is run 10 times. | |
The *best* time for each kernel (excluding the first iteration) | |
will be used to compute the reported bandwidth. | |
The SCALAR value used for this run is 0.420000 | |
------------------------------------------------------------- | |
Your clock granularity/precision appears to be 1 microseconds. | |
Each test below will take on the order of 90 microseconds. | |
(= 90 clock ticks) | |
Increase the size of the arrays if this shows that | |
you are not getting at least 20 clock ticks per test. | |
------------------------------------------------------------- | |
WARNING -- The above is only a rough guideline. | |
For best results, please be sure you know the | |
precision of your system timer. | |
------------------------------------------------------------- | |
VERBOSE: total setup time for rank 0 = 0.003450 seconds | |
------------------------------------------------------------- | |
Function Rate (GB/s) Avg time Min time Max time | |
Copy: 3.3657 0.0028 0.0004 0.0103 | |
Scale: 2.9268 0.0027 0.0005 0.0136 | |
Add: 1.2950 0.0039 0.0015 0.0106 | |
Triad: 0.9305 0.0031 0.0021 0.0037 | |
------------------------------------------------------------- | |
Solution Validates: avg error less than 1.000000e-13 on all three arrays | |
------------------------------------------------------------- | |
Node(s) with error 0 | |
Minimum Copy GB/s 3.365664 | |
Average Copy GB/s 3.365664 | |
Maximum Copy GB/s 3.365664 | |
Minimum Scale GB/s 2.926757 | |
Average Scale GB/s 2.926757 | |
Maximum Scale GB/s 2.926757 | |
Minimum Add GB/s 1.295010 | |
Average Add GB/s 1.295010 | |
Maximum Add GB/s 1.295010 | |
Minimum Triad GB/s 0.930509 | |
Average Triad GB/s 0.930509 | |
Maximum Triad GB/s 0.930509 | |
Current time (1549893044) is Mon Feb 11 05:50:44 2019 | |
End of StarSTREAM section. | |
Begin of SingleSTREAM section. | |
Node(s) with error 0 | |
Node selected 6 | |
Single STREAM Copy GB/s 13.774334 | |
Single STREAM Scale GB/s 12.127558 | |
Single STREAM Add GB/s 15.341597 | |
Single STREAM Triad GB/s 13.797711 | |
Current time (1549893044) is Mon Feb 11 05:50:44 2019 | |
End of SingleSTREAM section. | |
Begin of MPIFFT section. | |
Number of nodes: 128 | |
Vector size: 2097152 | |
Generation time: 0.001 | |
Tuning: 0.001 | |
Computing: 0.176 | |
Inverse FFT: 0.179 | |
max(|x-x0|): 1.616e-15 | |
Gflop/s: 1.253 | |
Current time (1549893045) is Mon Feb 11 05:50:45 2019 | |
End of MPIFFT section. | |
Begin of StarFFT section. | |
Vector size: 32768 | |
Generation time: 0.002 | |
Tuning: 0.000 | |
Computing: 0.003 | |
Inverse FFT: 0.002 | |
max(|x-x0|): 1.171e-15 | |
Node(s) with error 0 | |
Minimum Gflop/s 0.855966 | |
Average Gflop/s 1.101746 | |
Maximum Gflop/s 1.344019 | |
Current time (1549893045) is Mon Feb 11 05:50:45 2019 | |
End of StarFFT section. | |
Begin of SingleFFT section. | |
Node(s) with error 0 | |
Node selected 57 | |
Single FFT Gflop/s 1.583789 | |
Current time (1549893045) is Mon Feb 11 05:50:45 2019 | |
End of SingleFFT section. | |
Begin of LatencyBandwidth section. | |
------------------------------------------------------------------ | |
Latency-Bandwidth-Benchmark R1.5.1 (c) HLRS, University of Stuttgart | |
Written by Rolf Rabenseifner, Gerrit Schulz, and Michael Speck, Germany | |
Details - level 2 | |
----------------- | |
MPI_Wtime granularity. | |
Max. MPI_Wtick is 0.000000 sec | |
wtick is set to 0.000001 sec | |
Message Length: 8 | |
Latency min / avg / max: 0.070444 / 0.070444 / 0.070444 msecs | |
Bandwidth min / avg / max: 0.114 / 0.114 / 0.114 MByte/s | |
MPI_Wtime granularity is ok. | |
message size: 8 | |
max time : 10.000000 secs | |
latency for msg: 0.070444 msecs | |
estimation for ping pong: 6.339932 msecs | |
max number of ping pong pairs = 1577 | |
max client pings = max server pongs = 39 | |
stride for latency = 5 | |
Message Length: 8 | |
Latency min / avg / max: 0.027379 / 0.060138 / 0.081365 msecs | |
Bandwidth min / avg / max: 0.098 / 0.146 / 0.292 MByte/s | |
Message Length: 2000000 | |
Latency min / avg / max: 17.293045 / 17.293045 / 17.293045 msecs | |
Bandwidth min / avg / max: 115.653 / 115.653 / 115.653 MByte/s | |
MPI_Wtime granularity is ok. | |
message size: 2000000 | |
max time : 30.000000 secs | |
latency for msg: 17.293045 msecs | |
estimation for ping pong: 138.344356 msecs | |
max number of ping pong pairs = 216 | |
max client pings = max server pongs = 14 | |
stride for latency = 11 | |
Message Length: 2000000 | |
Latency min / avg / max: 0.310803 / 15.433052 / 17.406084 msecs | |
Bandwidth min / avg / max: 114.902 / 741.344 / 6434.934 MByte/s | |
Message Size: 8 Byte | |
Natural Order Latency: 0.050991 msec | |
Natural Order Bandwidth: 0.156889 MB/s | |
Avg Random Order Latency: 0.090596 msec | |
Avg Random Order Bandwidth: 0.088304 MB/s | |
Message Size: 2000000 Byte | |
Natural Order Latency: 21.680526 msec | |
Natural Order Bandwidth: 92.248685 MB/s | |
Avg Random Order Latency: 467.755682 msec | |
Avg Random Order Bandwidth: 4.275736 MB/s | |
Execution time (wall clock) = 197.447 sec on 144 processes | |
- for cross ping_pong latency = 4.813 sec | |
- for cross ping_pong bandwidth = 22.733 sec | |
- for ring latency = 1.177 sec | |
- for ring bandwidth = 168.723 sec | |
------------------------------------------------------------------ | |
Latency-Bandwidth-Benchmark R1.5.1 (c) HLRS, University of Stuttgart | |
Written by Rolf Rabenseifner, Gerrit Schulz, and Michael Speck, Germany | |
Major Benchmark results: | |
------------------------ | |
Max Ping Pong Latency: 0.081365 msecs | |
Randomly Ordered Ring Latency: 0.090596 msecs | |
Min Ping Pong Bandwidth: 114.902356 MB/s | |
Naturally Ordered Ring Bandwidth: 92.248685 MB/s | |
Randomly Ordered Ring Bandwidth: 4.275736 MB/s | |
------------------------------------------------------------------ | |
Detailed benchmark results: | |
Ping Pong: | |
Latency min / avg / max: 0.027379 / 0.060138 / 0.081365 msecs | |
Bandwidth min / avg / max: 114.902 / 741.344 / 6434.934 MByte/s | |
Ring: | |
On naturally ordered ring: latency= 0.050991 msec, bandwidth= 92.248685 MB/s | |
On randomly ordered ring: latency= 0.090596 msec, bandwidth= 4.275736 MB/s | |
------------------------------------------------------------------ | |
Benchmark conditions: | |
The latency measurements were done with 8 bytes | |
The bandwidth measurements were done with 2000000 bytes | |
The ring communication was done in both directions on 144 processes | |
The Ping Pong measurements were done on | |
- 841 pairs of processes for latency benchmarking, and | |
- 182 pairs of processes for bandwidth benchmarking, | |
out of 144*(144-1) = 20592 possible combinations on 144 processes. | |
(1 MB/s = 10**6 byte/sec) | |
------------------------------------------------------------------ | |
Current time (1549893242) is Mon Feb 11 05:54:02 2019 | |
End of LatencyBandwidth section. | |
Begin of HPL section. | |
================================================================================ | |
HPLinpack 2.0 -- High-Performance Linpack benchmark -- September 10, 2008 | |
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK | |
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK | |
Modified by Julien Langou, University of Colorado Denver | |
================================================================================ | |
An explanation of the input/output parameters follows: | |
T/V : Wall time / encoded variant. | |
N : The order of the coefficient matrix A. | |
NB : The partitioning blocking factor. | |
P : The number of process rows. | |
Q : The number of process columns. | |
Time : Time in seconds to solve the linear system. | |
Gflops : Rate of execution for solving the linear system. | |
The following parameter values will be used: | |
N : 1000 | |
NB : 80 | |
PMAP : Row-major process mapping | |
P : 2 | |
Q : 2 | |
PFACT : Right | |
NBMIN : 4 | |
NDIV : 2 | |
RFACT : Crout | |
BCAST : 1ringM | |
DEPTH : 1 | |
SWAP : Mix (threshold = 64) | |
L1 : transposed form | |
U : transposed form | |
EQUIL : yes | |
ALIGN : 8 double precision words | |
-------------------------------------------------------------------------------- | |
- The matrix A is randomly generated for each test. | |
- The following scaled residual check will be computed: | |
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N ) | |
- The relative machine precision (eps) is taken to be 1.110223e-16 | |
- Computational tests pass if scaled residuals are less than 16.0 | |
================================================================================ | |
T/V N NB P Q Time Gflops | |
-------------------------------------------------------------------------------- | |
WR11C2R4 1000 80 2 2 0.06 1.059e+01 | |
-------------------------------------------------------------------------------- | |
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0058297 ...... PASSED | |
================================================================================ | |
Finished 1 tests with the following results: | |
1 tests completed and passed residual checks, | |
0 tests completed and failed residual checks, | |
0 tests skipped because of illegal input values. | |
-------------------------------------------------------------------------------- | |
End of Tests. | |
================================================================================ | |
Current time (1549893242) is Mon Feb 11 05:54:02 2019 | |
End of HPL section. | |
Begin of Summary section. | |
VersionMajor=1 | |
VersionMinor=5 | |
VersionMicro=0 | |
VersionRelease=f | |
LANG=C | |
Success=1 | |
sizeof_char=1 | |
sizeof_short=2 | |
sizeof_int=4 | |
sizeof_long=8 | |
sizeof_void_ptr=8 | |
sizeof_size_t=8 | |
sizeof_float=4 | |
sizeof_double=8 | |
sizeof_s64Int=8 | |
sizeof_u64Int=8 | |
sizeof_struct_double_double=16 | |
CommWorldProcs=144 | |
MPI_Wtick=1.000000e-09 | |
HPL_Tflops=0.0105853 | |
HPL_time=0.0631221 | |
HPL_eps=1.11022e-16 | |
HPL_RnormI=1.9309e-12 | |
HPL_Anorm1=263.865 | |
HPL_AnormI=262.773 | |
HPL_Xnorm1=2619.63 | |
HPL_XnormI=11.3513 | |
HPL_BnormI=0.499776 | |
HPL_N=1000 | |
HPL_NB=80 | |
HPL_nprow=2 | |
HPL_npcol=2 | |
HPL_depth=1 | |
HPL_nbdiv=2 | |
HPL_nbmin=4 | |
HPL_cpfact=R | |
HPL_crfact=C | |
HPL_ctop=1 | |
HPL_order=R | |
HPL_dMACH_EPS=1.110223e-16 | |
HPL_dMACH_SFMIN=2.225074e-308 | |
HPL_dMACH_BASE=2.000000e+00 | |
HPL_dMACH_PREC=2.220446e-16 | |
HPL_dMACH_MLEN=5.300000e+01 | |
HPL_dMACH_RND=1.000000e+00 | |
HPL_dMACH_EMIN=-1.021000e+03 | |
HPL_dMACH_RMIN=2.225074e-308 | |
HPL_dMACH_EMAX=1.024000e+03 | |
HPL_dMACH_RMAX=1.797693e+308 | |
HPL_sMACH_EPS=5.960464e-08 | |
HPL_sMACH_SFMIN=1.175494e-38 | |
HPL_sMACH_BASE=2.000000e+00 | |
HPL_sMACH_PREC=1.192093e-07 | |
HPL_sMACH_MLEN=2.400000e+01 | |
HPL_sMACH_RND=1.000000e+00 | |
HPL_sMACH_EMIN=-1.250000e+02 | |
HPL_sMACH_RMIN=1.175494e-38 | |
HPL_sMACH_EMAX=1.280000e+02 | |
HPL_sMACH_RMAX=3.402823e+38 | |
dweps=1.110223e-16 | |
sweps=5.960464e-08 | |
HPLMaxProcs=4 | |
HPLMinProcs=4 | |
DGEMM_N=288 | |
StarDGEMM_Gflops=3.55379 | |
SingleDGEMM_Gflops=5.2285 | |
PTRANS_GBs=0.186679 | |
PTRANS_time=0.00822854 | |
PTRANS_residual=1 | |
PTRANS_n=500 | |
PTRANS_nb=80 | |
PTRANS_nprow=2 | |
PTRANS_npcol=2 | |
MPIRandomAccess_LCG_N=33554432 | |
MPIRandomAccess_LCG_time=26.7336 | |
MPIRandomAccess_LCG_CheckTime=1.84398 | |
MPIRandomAccess_LCG_Errors=0 | |
MPIRandomAccess_LCG_ErrorsFraction=0 | |
MPIRandomAccess_LCG_ExeUpdates=134217728 | |
MPIRandomAccess_LCG_GUPs=0.00502057 | |
MPIRandomAccess_LCG_TimeBound=60 | |
MPIRandomAccess_LCG_Algorithm=0 | |
MPIRandomAccess_N=33554432 | |
MPIRandomAccess_time=12.666 | |
MPIRandomAccess_CheckTime=5.3302 | |
MPIRandomAccess_Errors=0 | |
MPIRandomAccess_ErrorsFraction=0 | |
MPIRandomAccess_ExeUpdates=63363888 | |
MPIRandomAccess_GUPs=0.00500267 | |
MPIRandomAccess_TimeBound=60 | |
MPIRandomAccess_Algorithm=0 | |
RandomAccess_LCG_N=131072 | |
StarRandomAccess_LCG_GUPs=0.124973 | |
SingleRandomAccess_LCG_GUPs=0.196226 | |
RandomAccess_N=131072 | |
StarRandomAccess_GUPs=0.123951 | |
SingleRandomAccess_GUPs=0.164553 | |
STREAM_VectorSize=83333 | |
STREAM_Threads=1 | |
StarSTREAM_Copy=3.36566 | |
StarSTREAM_Scale=2.92676 | |
StarSTREAM_Add=1.29501 | |
StarSTREAM_Triad=0.930509 | |
SingleSTREAM_Copy=13.7743 | |
SingleSTREAM_Scale=12.1276 | |
SingleSTREAM_Add=15.3416 | |
SingleSTREAM_Triad=13.7977 | |
FFT_N=32768 | |
StarFFT_Gflops=1.10175 | |
SingleFFT_Gflops=1.58379 | |
MPIFFT_N=2097152 | |
MPIFFT_Gflops=1.25313 | |
MPIFFT_maxErr=1.61598e-15 | |
MPIFFT_Procs=128 | |
MaxPingPongLatency_usec=81.3647 | |
RandomlyOrderedRingLatency_usec=90.596 | |
MinPingPongBandwidth_GBytes=0.114902 | |
NaturallyOrderedRingBandwidth_GBytes=0.0922487 | |
RandomlyOrderedRingBandwidth_GBytes=0.00427574 | |
MinPingPongLatency_usec=27.3792 | |
AvgPingPongLatency_usec=60.1384 | |
MaxPingPongBandwidth_GBytes=6.43493 | |
AvgPingPongBandwidth_GBytes=0.741344 | |
NaturallyOrderedRingLatency_usec=50.9914 | |
FFTEnblk=16 | |
FFTEnp=8 | |
FFTEl2size=1048576 | |
M_OPENMP=-1 | |
omp_get_num_threads=0 | |
omp_get_max_threads=0 | |
omp_get_num_procs=0 | |
MemProc=-1 | |
MemSpec=-1 | |
MemVal=-1 | |
MPIFFT_time0=5.69e-07 | |
MPIFFT_time1=0.0504214 | |
MPIFFT_time2=0.00119853 | |
MPIFFT_time3=0.0650489 | |
MPIFFT_time4=0.00204202 | |
MPIFFT_time5=0.0569445 | |
MPIFFT_time6=2.26e-07 | |
CPS_HPCC_FFT_235=0 | |
CPS_HPCC_FFTW_ESTIMATE=0 | |
CPS_HPCC_MEMALLCTR=0 | |
CPS_HPL_USE_GETPROCESSTIMES=0 | |
CPS_RA_SANDIA_NOPT=0 | |
CPS_RA_SANDIA_OPT2=0 | |
CPS_USING_FFTW=0 | |
End of Summary section. | |
######################################################################## | |
End of HPC Challenge tests. | |
Current time (1549893242) is Mon Feb 11 05:54:02 2019 | |
######################################################################## |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
######################################################################## | |
This is the DARPA/DOE HPC Challenge Benchmark version 1.5.0 October 2012 | |
Produced by Jack Dongarra and Piotr Luszczek | |
Innovative Computing Laboratory | |
University of Tennessee Knoxville and Oak Ridge National Laboratory | |
See the source files for authors of specific codes. | |
Compiled on Apr 15 2018 at 23:42:37 | |
Current time (1549894268) is Mon Feb 11 06:11:08 2019 | |
Hostname: 'cluster-node-b8ca3a5d8eb4' | |
######################################################################## | |
================================================================================ | |
HPLinpack 2.0 -- High-Performance Linpack benchmark -- September 10, 2008 | |
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK | |
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK | |
Modified by Julien Langou, University of Colorado Denver | |
================================================================================ | |
An explanation of the input/output parameters follows: | |
T/V : Wall time / encoded variant. | |
N : The order of the coefficient matrix A. | |
NB : The partitioning blocking factor. | |
P : The number of process rows. | |
Q : The number of process columns. | |
Time : Time in seconds to solve the linear system. | |
Gflops : Rate of execution for solving the linear system. | |
The following parameter values will be used: | |
N : 150816 | |
NB : 90 | |
PMAP : Row-major process mapping | |
P : 6 | |
Q : 24 | |
PFACT : Right | |
NBMIN : 4 | |
NDIV : 2 | |
RFACT : Crout | |
BCAST : 1ringM | |
DEPTH : 1 | |
SWAP : Mix (threshold = 64) | |
L1 : transposed form | |
U : transposed form | |
EQUIL : yes | |
ALIGN : 8 double precision words | |
-------------------------------------------------------------------------------- | |
- The matrix A is randomly generated for each test. | |
- The following scaled residual check will be computed: | |
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N ) | |
- The relative machine precision (eps) is taken to be 1.110223e-16 | |
- Computational tests pass if scaled residuals are less than 16.0 | |
Begin of MPIRandomAccess section. | |
Running on 144 processors | |
Total Main table size = 2^34 = 17179869184 words | |
PE Main table size = (2^34)/144 = 119304648 words/PE MAX | |
Default number of updates (RECOMMENDED) = 68719476736 | |
Number of updates EXECUTED = 295034400 (for a TIME BOUND of 60.00 secs) | |
CPU time used = 4.449711 seconds | |
Real time used = 60.040803 seconds | |
0.004913898 Billion(10^9) Updates per second [GUP/s] | |
0.000034124 Billion(10^9) Updates/PE per second [GUP/s] | |
Verification: CPU time used = 2.232378 seconds | |
Verification: Real time used = 16.678935 seconds | |
Found 0 errors in 17179869184 locations (passed). | |
Current time (1549894490) is Mon Feb 11 06:14:50 2019 | |
End of MPIRandomAccess section. | |
Begin of StarRandomAccess section. | |
Main table size = 2^27 = 134217728 words | |
Number of updates = 536870912 | |
CPU time used = 91.685604 seconds | |
Real time used = 92.752443 seconds | |
0.005788213 Billion(10^9) Updates per second [GUP/s] | |
Found 0 errors in 134217728 locations (passed). | |
Node(s) with error 0 | |
Minimum GUP/s 0.002890 | |
Average GUP/s 0.005730 | |
Maximum GUP/s 0.008963 | |
Current time (1549894770) is Mon Feb 11 06:19:30 2019 | |
End of StarRandomAccess section. | |
Begin of SingleRandomAccess section. | |
Node(s) with error 0 | |
Node selected 14 | |
Single GUP/s 0.039375 | |
Current time (1549894796) is Mon Feb 11 06:19:56 2019 | |
End of SingleRandomAccess section. | |
Begin of MPIRandomAccess_LCG section. | |
Running on 144 processors | |
Total Main table size = 2^34 = 17179869184 words | |
PE Main table size = (2^34)/144 = 119304648 words/PE MAX | |
Default number of updates (RECOMMENDED) = 68719476736 | |
Number of updates EXECUTED = 306349776 (for a TIME BOUND of 60.00 secs) | |
CPU time used = 4.706499 seconds | |
Real time used = 62.048918 seconds | |
0.004937230 Billion(10^9) Updates per second [GUP/s] | |
0.000034286 Billion(10^9) Updates/PE per second [GUP/s] | |
Verification: CPU time used = 1.789492 seconds | |
Verification: Real time used = 5.887406 seconds | |
Found 9 errors in 17179869184 locations (passed). | |
Current time (1549895004) is Mon Feb 11 06:23:24 2019 | |
End of MPIRandomAccess_LCG section. | |
Begin of StarRandomAccess_LCG section. | |
Main table size = 2^27 = 134217728 words | |
Number of updates = 536870912 | |
CPU time used = 91.709646 seconds | |
Real time used = 93.146353 seconds | |
0.005763735 Billion(10^9) Updates per second [GUP/s] | |
Found 0 errors in 134217728 locations (passed). | |
Node(s) with error 0 | |
Minimum GUP/s 0.002855 | |
Average GUP/s 0.005756 | |
Maximum GUP/s 0.008747 | |
Current time (1549895289) is Mon Feb 11 06:28:09 2019 | |
End of StarRandomAccess_LCG section. | |
Begin of SingleRandomAccess_LCG section. | |
Node(s) with error 0 | |
Node selected 15 | |
Single GUP/s 0.042624 | |
Current time (1549895313) is Mon Feb 11 06:28:33 2019 | |
End of SingleRandomAccess_LCG section. | |
Begin of PTRANS section. | |
M: 75408 | |
N: 75408 | |
MB: 90 | |
NB: 90 | |
P: 6 | |
Q: 24 | |
TIME M N MB NB P Q TIME CHECK GB/s RESID | |
---- ----- ----- --- --- --- --- -------- ------ -------- ----- | |
WALL 75408 75408 90 90 6 24 74.93 PASSED 0.607 0.00 | |
CPU 75408 75408 90 90 6 24 5.53 PASSED 8.232 0.00 | |
WALL 75408 75408 90 90 6 24 70.27 PASSED 0.607 0.00 | |
CPU 75408 75408 90 90 6 24 5.27 PASSED 8.635 0.00 | |
WALL 75408 75408 90 90 6 24 68.15 PASSED 0.607 0.00 | |
CPU 75408 75408 90 90 6 24 4.99 PASSED 9.108 0.00 | |
WALL 75408 75408 90 90 6 24 68.89 PASSED 0.607 0.00 | |
CPU 75408 75408 90 90 6 24 5.16 PASSED 8.814 0.00 | |
WALL 75408 75408 90 90 6 24 71.02 PASSED 0.607 0.00 | |
CPU 75408 75408 90 90 6 24 5.37 PASSED 8.472 0.00 | |
Finished 5 tests, with the following results: | |
5 tests completed and passed residual checks. | |
0 tests completed and failed residual checks. | |
0 tests skipped because of illegal input values. | |
END OF TESTS. | |
Current time (1549895708) is Mon Feb 11 06:35:08 2019 | |
End of PTRANS section. | |
Begin of StarDGEMM section. | |
Scaled residual: 0.00704105 | |
Node(s) with error 0 | |
Minimum Gflop/s 4.165420 | |
Average Gflop/s 4.250504 | |
Maximum Gflop/s 4.325118 | |
Current time (1549895902) is Mon Feb 11 06:38:22 2019 | |
End of StarDGEMM section. | |
Begin of SingleDGEMM section. | |
Node(s) with error 0 | |
Node selected 113 | |
Single DGEMM Gflop/s 5.126427 | |
Current time (1549896059) is Mon Feb 11 06:40:59 2019 | |
End of SingleDGEMM section. | |
Begin of StarSTREAM section. | |
------------------------------------------------------------- | |
This system uses 8 bytes per DOUBLE PRECISION word. | |
------------------------------------------------------------- | |
Array size = 52651541, Offset = 0 | |
Total memory required = 1.1769 GiB. | |
Each test is run 10 times. | |
The *best* time for each kernel (excluding the first iteration) | |
will be used to compute the reported bandwidth. | |
The SCALAR value used for this run is 0.420000 | |
------------------------------------------------------------- | |
Your clock granularity/precision appears to be 1 microseconds. | |
Each test below will take on the order of 678661 microseconds. | |
(= 678661 clock ticks) | |
Increase the size of the arrays if this shows that | |
you are not getting at least 20 clock ticks per test. | |
------------------------------------------------------------- | |
WARNING -- The above is only a rough guideline. | |
For best results, please be sure you know the | |
precision of your system timer. | |
------------------------------------------------------------- | |
VERBOSE: total setup time for rank 0 = 6.446717 seconds | |
------------------------------------------------------------- | |
Function Rate (GB/s) Avg time Min time Max time | |
Copy: 0.4922 1.7273 1.7116 1.7387 | |
Scale: 0.4975 1.7277 1.6933 1.7373 | |
Add: 0.5564 2.2834 2.2713 2.2871 | |
Triad: 0.5575 2.2725 2.2665 2.2785 | |
------------------------------------------------------------- | |
Solution Validates: avg error less than 1.000000e-13 on all three arrays | |
------------------------------------------------------------- | |
Node(s) with error 0 | |
Minimum Copy GB/s 0.492195 | |
Average Copy GB/s 0.492195 | |
Maximum Copy GB/s 0.492195 | |
Minimum Scale GB/s 0.497493 | |
Average Scale GB/s 0.497493 | |
Maximum Scale GB/s 0.497493 | |
Minimum Add GB/s 0.556361 | |
Average Add GB/s 0.556361 | |
Maximum Add GB/s 0.556361 | |
Minimum Triad GB/s 0.557535 | |
Average Triad GB/s 0.557535 | |
Maximum Triad GB/s 0.557535 | |
Current time (1549896148) is Mon Feb 11 06:42:28 2019 | |
End of StarSTREAM section. | |
Begin of SingleSTREAM section. | |
Node(s) with error 0 | |
Node selected 137 | |
Single STREAM Copy GB/s 5.633149 | |
Single STREAM Scale GB/s 6.144910 | |
Single STREAM Add GB/s 6.983099 | |
Single STREAM Triad GB/s 6.968478 | |
Current time (1549896155) is Mon Feb 11 06:42:35 2019 | |
End of SingleSTREAM section. | |
Begin of MPIFFT section. | |
Number of nodes: 128 | |
Vector size: 2147483648 | |
Generation time: 1.120 | |
Tuning: 1.105 | |
Computing: 167.914 | |
Inverse FFT: 169.693 | |
max(|x-x0|): 2.784e-15 | |
Gflop/s: 1.982 | |
Current time (1549896498) is Mon Feb 11 06:48:18 2019 | |
End of MPIFFT section. | |
Begin of StarFFT section. | |
Vector size: 33554432 | |
Generation time: 2.382 | |
Tuning: 0.002 | |
Computing: 11.768 | |
Inverse FFT: 12.463 | |
max(|x-x0|): 2.156e-15 | |
Node(s) with error 0 | |
Minimum Gflop/s 0.211622 | |
Average Gflop/s 0.369285 | |
Maximum Gflop/s 0.500625 | |
Current time (1549896534) is Mon Feb 11 06:48:54 2019 | |
End of StarFFT section. | |
Begin of SingleFFT section. | |
Node(s) with error 0 | |
Node selected 40 | |
Single FFT Gflop/s 1.491395 | |
Current time (1549896543) is Mon Feb 11 06:49:03 2019 | |
End of SingleFFT section. | |
Begin of LatencyBandwidth section. | |
------------------------------------------------------------------ | |
Latency-Bandwidth-Benchmark R1.5.1 (c) HLRS, University of Stuttgart | |
Written by Rolf Rabenseifner, Gerrit Schulz, and Michael Speck, Germany | |
Details - level 2 | |
----------------- | |
MPI_Wtime granularity. | |
Max. MPI_Wtick is 0.000000 sec | |
wtick is set to 0.000001 sec | |
Message Length: 8 | |
Latency min / avg / max: 0.069041 / 0.069041 / 0.069041 msecs | |
Bandwidth min / avg / max: 0.116 / 0.116 / 0.116 MByte/s | |
MPI_Wtime granularity is ok. | |
message size: 8 | |
max time : 10.000000 secs | |
latency for msg: 0.069041 msecs | |
estimation for ping pong: 6.213718 msecs | |
max number of ping pong pairs = 1609 | |
max client pings = max server pongs = 40 | |
stride for latency = 5 | |
Message Length: 8 | |
Latency min / avg / max: 0.027834 / 0.059848 / 0.079293 msecs | |
Bandwidth min / avg / max: 0.101 / 0.147 / 0.287 MByte/s | |
Message Length: 2000000 | |
Latency min / avg / max: 17.283452 / 17.283452 / 17.283452 msecs | |
Bandwidth min / avg / max: 115.718 / 115.718 / 115.718 MByte/s | |
MPI_Wtime granularity is ok. | |
message size: 2000000 | |
max time : 30.000000 secs | |
latency for msg: 17.283452 msecs | |
estimation for ping pong: 138.267616 msecs | |
max number of ping pong pairs = 216 | |
max client pings = max server pongs = 14 | |
stride for latency = 11 | |
Message Length: 2000000 | |
Latency min / avg / max: 0.247290 / 15.422812 / 17.470225 msecs | |
Bandwidth min / avg / max: 114.480 / 946.376 / 8087.670 MByte/s | |
Message Size: 8 Byte | |
Natural Order Latency: 0.052405 msec | |
Natural Order Bandwidth: 0.152658 MB/s | |
Avg Random Order Latency: 0.091965 msec | |
Avg Random Order Bandwidth: 0.086990 MB/s | |
Message Size: 2000000 Byte | |
Natural Order Latency: 24.652331 msec | |
Natural Order Bandwidth: 81.128232 MB/s | |
Avg Random Order Latency: 473.559513 msec | |
Avg Random Order Bandwidth: 4.223334 MB/s | |
Execution time (wall clock) = 192.404 sec on 144 processes | |
- for cross ping_pong latency = 4.791 sec | |
- for cross ping_pong bandwidth = 22.488 sec | |
- for ring latency = 1.287 sec | |
- for ring bandwidth = 163.838 sec | |
------------------------------------------------------------------ | |
Latency-Bandwidth-Benchmark R1.5.1 (c) HLRS, University of Stuttgart | |
Written by Rolf Rabenseifner, Gerrit Schulz, and Michael Speck, Germany | |
Major Benchmark results: | |
------------------------ | |
Max Ping Pong Latency: 0.079293 msecs | |
Randomly Ordered Ring Latency: 0.091965 msecs | |
Min Ping Pong Bandwidth: 114.480498 MB/s | |
Naturally Ordered Ring Bandwidth: 81.128232 MB/s | |
Randomly Ordered Ring Bandwidth: 4.223334 MB/s | |
------------------------------------------------------------------ | |
Detailed benchmark results: | |
Ping Pong: | |
Latency min / avg / max: 0.027834 / 0.059848 / 0.079293 msecs | |
Bandwidth min / avg / max: 114.480 / 946.376 / 8087.670 MByte/s | |
Ring: | |
On naturally ordered ring: latency= 0.052405 msec, bandwidth= 81.128232 MB/s | |
On randomly ordered ring: latency= 0.091965 msec, bandwidth= 4.223334 MB/s | |
------------------------------------------------------------------ | |
Benchmark conditions: | |
The latency measurements were done with 8 bytes | |
The bandwidth measurements were done with 2000000 bytes | |
The ring communication was done in both directions on 144 processes | |
The Ping Pong measurements were done on | |
- 841 pairs of processes for latency benchmarking, and | |
- 182 pairs of processes for bandwidth benchmarking, | |
out of 144*(144-1) = 20592 possible combinations on 144 processes. | |
(1 MB/s = 10**6 byte/sec) | |
------------------------------------------------------------------ | |
Current time (1549896736) is Mon Feb 11 06:52:16 2019 | |
End of LatencyBandwidth section. | |
Begin of HPL section. | |
================================================================================ | |
HPLinpack 2.0 -- High-Performance Linpack benchmark -- September 10, 2008 | |
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK | |
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK | |
Modified by Julien Langou, University of Colorado Denver | |
================================================================================ | |
An explanation of the input/output parameters follows: | |
T/V : Wall time / encoded variant. | |
N : The order of the coefficient matrix A. | |
NB : The partitioning blocking factor. | |
P : The number of process rows. | |
Q : The number of process columns. | |
Time : Time in seconds to solve the linear system. | |
Gflops : Rate of execution for solving the linear system. | |
The following parameter values will be used: | |
N : 150816 | |
NB : 90 | |
PMAP : Row-major process mapping | |
P : 6 | |
Q : 24 | |
PFACT : Right | |
NBMIN : 4 | |
NDIV : 2 | |
RFACT : Crout | |
BCAST : 1ringM | |
DEPTH : 1 | |
SWAP : Mix (threshold = 64) | |
L1 : transposed form | |
U : transposed form | |
EQUIL : yes | |
ALIGN : 8 double precision words | |
-------------------------------------------------------------------------------- | |
- The matrix A is randomly generated for each test. | |
- The following scaled residual check will be computed: | |
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N ) | |
- The relative machine precision (eps) is taken to be 1.110223e-16 | |
- Computational tests pass if scaled residuals are less than 16.0 | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment