Skip to content

Instantly share code, notes, and snippets.

@shibacow
Last active February 3, 2020 20:23
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save shibacow/860f89f2b0f3cc5b30e64a97bc1d79e0 to your computer and use it in GitHub Desktop.
Save shibacow/860f89f2b0f3cc5b30e64a97bc1d79e0 to your computer and use it in GitHub Desktop.
root@****:~/prog/hpl-2.0_FERMI_v15/bin/CUDA# more HPL.dat
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
7 device out (6=stdout,7=stderr,file)
1 # of problems sizes (N)
65536 73728 60000 40000 50000 60000 39007 39000 20960 364160 359424 276480 138240 115200 23040 354432 236160 95040 9600 20737
16129 16128 Ns
3 # of NBs
2048 1536 1024 512 384 640 768 896 960 1024 1152 1280 384 640 960 768 640 256 960 512 768 1152 NBs
0 PMAP process mapping (0=Row-,1=Column-major)
1 # of process grids (P x Q)
1 Ps
1 Qs
16.0 threshold
1 # of panel fact
0 1 2 PFACTs (0=left, 1=Crout, 2=Right)
1 # of recursive stopping criterium
2 8 NBMINs (>= 1)
1 # of panels in recursion
2 NDIVs
1 # of recursive panel fact.
0 1 2 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
0 2 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
1 0 DEPTHs (>=0)
1 SWAP (0=bin-exch,1=long,2=mix)
192 swapping threshold
1 L1 in (0=transposed,1=no-transposed) form
1 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)
Linpack nvidia tesla v100
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 25000 768 1 1 112.10 9.293e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0044867 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 25000 1024 1 1 110.46 9.431e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0042883 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 25000 1280 1 1 113.27 9.198e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0041744 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 25000 1536 1 1 112.07 9.295e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0039500 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 30000 768 1 1 189.78 9.485e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0043544 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 30000 1024 1 1 187.45 9.603e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0042670 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 30000 1280 1 1 190.92 9.429e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0047732 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 30000 1536 1 1 278.20 6.471e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0044749 ...... PASSED
================================================================================
Finished 8 tests with the following results:
8 tests completed and passed residual checks,
0 tests completed and failed residual checks,
0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------
10.0.0.0/16
54.244.36.217
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 50000 768 1 1 99.20 8.401e+02
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0047704 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 50000 1024 1 1 80.09 1.041e+03
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0039858 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 50000 1280 1 1 84.64 9.846e+02
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0037889 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 50000 1536 1 1 79.82 1.044e+03
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0041055 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 60000 768 1 1 131.37 1.096e+03
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0046351 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 60000 1024 1 1 126.63 1.137e+03
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0039147 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 60000 1280 1 1 132.89 1.084e+03
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0047641 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 60000 1536 1 1 126.48 1.139e+03
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0048663 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 73728 2048 1 1 321.99 8.298e+02
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= -nan ...... FAILED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 73728 1536 1 1 236.61 1.129e+03
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 73728 1536 1 1 236.61 1.129e+03
^C--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0042714 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 73728 1024 1 1 236.96 1.128e+03
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0044989 ...... PASSED
================================================================================
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 65536 1536 1 1 144.04 1.303e+03
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0044428 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 65536 1024 1 1 143.62 1.307e+03
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0044281 ...... PASSED
================================================================================
ubuntu@*********:/usr/local/cuda-9.0/samples/5_Simulations/nbody$ ./nbody -fp64 -benchmark -numbodies=2048000 -device=0
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
> Windowed mode
> Simulation data stored in video memory
> Double precision floating point simulation
> 1 Devices used for simulation
gpuDeviceInit() CUDA Device [0]: "Tesla V100-SXM2-16GB
> Compute 7.0 CUDA device: [Tesla V100-SXM2-16GB]
number of bodies = 2048000
2048000 bodies, total time for 10 iterations: 224622.891 ms
= 186.726 billion interactions per second
= 5601.794 double-precision GFLOP/s at 30 flops per interaction
ubuntu@*********:/usr/local/cuda-9.0/samples/5_Simulations/nbody$ ./nbody -fp64 -benchmark -numbodies=2048000 -device=0
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
> Windowed mode
> Simulation data stored in video memory
> Double precision floating point simulation
> 1 Devices used for simulation
gpuDeviceInit() CUDA Device [0]: "Tesla V100-SXM2-16GB
> Compute 7.0 CUDA device: [Tesla V100-SXM2-16GB]
number of bodies = 2048000
2048000 bodies, total time for 10 iterations: 224622.891 ms
= 186.726 billion interactions per second
= 5601.794 double-precision GFLOP/s at 30 flops per interaction
ubuntu@***********:/usr/local/cuda-9.0/samples/5_Simulations/nbody$ ./nbody -benchmark -numbodies=4096000 -device=0
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
gpuDeviceInit() CUDA Device [0]: "Tesla V100-SXM2-16GB
> Compute 7.0 CUDA device: [Tesla V100-SXM2-16GB]
number of bodies = 4096000
4096000 bodies, total time for 10 iterations: 296521.750 ms
= 565.801 billion interactions per second
= 11316.010 single-precision GFLOP/s at 20 flops per interaction
ubuntu@***********:/usr/local/cuda-9.0/samples/5_Simulations/nbody$ ./nbody -benchmark -numbodies=8192000 -device=0
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
gpuDeviceInit() CUDA Device [0]: "Tesla V100-SXM2-16GB
> Compute 7.0 CUDA device: [Tesla V100-SXM2-16GB]
number of bodies = 8192000
8192000 bodies, total time for 10 iterations: 1170863.375 ms
= 573.157 billion interactions per second
= 11463.142 single-precision GFLOP/s at 20 flops per interaction
ubuntu@*********:/usr/local/cuda-9.0/samples/5_Simulations/nbody$ ./nbody -fp64 -benchmark -numbodies=8192000 -device=0
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
> Windowed mode
> Simulation data stored in video memory
> Double precision floating point simulation
> 1 Devices used for simulation
gpuDeviceInit() CUDA Device [0]: "Tesla V100-SXM2-16GB
> Compute 7.0 CUDA device: [Tesla V100-SXM2-16GB]
number of bodies = 8192000
8192000 bodies, total time for 10 iterations: 3595757.750 ms
= 186.633 billion interactions per second
= 5599.003 double-precision GFLOP/s at 30 flops per interaction
@victoryang00
Copy link

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment