Created
May 16, 2024 20:38
-
-
Save bishtgautam/f4c0a207a3139c22cfda6a2322348405 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
srun -n 4 ./ex18 -mat_type aijcusparse -info :pc -use_gpu_aware_mpi 1 -log_view -log_view_gpu_time | |
srun: Job 25663298 step creation temporarily disabled, retrying (Requested nodes are busy) | |
srun: Step created for StepId=25663298.1 | |
Norm of error 7.48331 iterations 0 | |
**************************************************************************************************************************************************************** | |
*** WIDEN YOUR WINDOW TO 160 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** | |
**************************************************************************************************************************************************************** | |
------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------ | |
########################################################## | |
# # | |
# WARNING!!! # | |
# # | |
# This code was run with -log_view_gpu_time # | |
# This provides accurate timing within the GPU kernels # | |
# but can slow down the entire computation by a # | |
# measurable amount. For fastest runs we recommend # | |
# not using this option. # | |
# # | |
########################################################## | |
/global/cfs/cdirs/m4267/petsc/petsc_main/src/ksp/ksp/tutorials/./ex18 on a pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5 named nid001276 with 4 processors, by gbisht Thu May 16 13:37:24 2024 | |
Using Petsc Development GIT revision: v3.20.2-300-gfc2888174f5 GIT Date: 2023-12-13 19:44:14 +0000 | |
Max Max/Min Avg Total | |
Time (sec): 1.152e+00 1.000 1.152e+00 | |
Objects: 0.000e+00 0.000 0.000e+00 | |
Flops: 3.880e+02 1.037 3.810e+02 1.524e+03 | |
Flops/sec: 3.369e+02 1.037 3.308e+02 1.323e+03 | |
MPI Msg Count: 6.000e+00 2.000 4.500e+00 1.800e+01 | |
MPI Msg Len (bytes): 2.400e+02 2.000 4.000e+01 7.200e+02 | |
MPI Reductions: 3.100e+01 1.000 | |
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) | |
e.g., VecAXPY() for real vectors of length N --> 2N flops | |
and VecAXPY() for complex vectors of length N --> 8N flops | |
Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- | |
Avg %Total Avg %Total Count %Total Avg %Total Count %Total | |
0: Main Stage: 1.1456e+00 99.5% 1.5240e+03 100.0% 6.000e+00 33.3% 5.600e+01 46.7% 7.000e+00 22.6% | |
1: Assembly: 6.1253e-03 0.5% 0.0000e+00 0.0% 1.200e+01 66.7% 3.200e+01 53.3% 6.000e+00 19.4% | |
------------------------------------------------------------------------------------------------------------------------ | |
See the 'Profiling' chapter of the users' manual for details on interpreting output. | |
Phase summary info: | |
Count: number of times phase was executed | |
Time and Flop: Max - maximum over all processors | |
Ratio - ratio of maximum to minimum over all processors | |
Mess: number of messages sent | |
AvgLen: average message length (bytes) | |
Reduct: number of global reductions | |
Global: entire computation | |
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). | |
%T - percent time in this phase %F - percent flop in this phase | |
%M - percent messages in this phase %L - percent message lengths in this phase | |
%R - percent reductions in this phase | |
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) | |
GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors) | |
CpuToGpu Count: total number of CPU to GPU copies per processor | |
CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor) | |
GpuToCpu Count: total number of GPU to CPU copies per processor | |
GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor) | |
GPU %F: percent flops on GPU in this event | |
------------------------------------------------------------------------------------------------------------------------ | |
Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU | |
Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F | |
--------------------------------------------------------------------------------------------------------------------------------------------------------------- | |
--- Event Stage 0: Main Stage | |
MatMult 1 1.0 4.1501e-02 1.5 1.18e+02 1.1 6.0e+00 5.6e+01 0.0e+00 3 29 33 47 0 3 29 100 100 0 0 6 3 1.03e-03 0 0.00e+00 100 | |
MatSolve 1 1.0 3.1753e-03 4.8 9.00e+01 1.0 0.0e+00 0.0e+00 0.0e+00 0 24 0 0 0 0 24 0 0 0 0 0 1 1.12e-04 0 0.00e+00 100 | |
MatLUFactorNum 1 1.0 4.5170e-03 2.3 9.80e+01 1.0 0.0e+00 0.0e+00 0.0e+00 0 26 0 0 0 0 26 0 0 0 0 0 0 0.00e+00 0 0.00e+00 100 | |
MatILUFactorSym 1 1.0 3.4756e-03 8.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
MatGetRowIJ 1 1.0 3.2460e-06 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
MatGetOrdering 1 1.0 2.0309e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
MatCUSPARSCopyTo 2 1.0 8.3599e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 2 9.18e-04 0 0.00e+00 0 | |
DCtxCreate 1 1.0 3.0450e-06 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
DCtxDestroy 2 1.0 2.6482e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
DCtxSetUp 1 1.0 3.7270e-06 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
DCtxSetDevice 1 1.0 9.8200e-07 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
DCtxSync 4 1.0 3.2682e-03 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
VecNorm 2 1.0 1.5186e-02 236.6 5.40e+01 1.0 0.0e+00 0.0e+00 2.0e+00 1 14 0 0 6 1 14 0 0 29 0 0 0 0.00e+00 0 0.00e+00 0 | |
VecCopy 1 1.0 3.1560e-06 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
VecSet 2 1.0 6.9740e-06 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
VecAXPY 1 1.0 8.4360e-06 1.1 2.80e+01 1.0 0.0e+00 0.0e+00 0.0e+00 0 7 0 0 0 0 7 0 0 0 13 0 0 0.00e+00 0 0.00e+00 0 | |
VecScatterBegin 1 1.0 3.1233e-02 3.8 0.00e+00 0.0 6.0e+00 5.6e+01 0.0e+00 2 0 33 47 0 2 0 100 100 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
VecScatterEnd 1 1.0 2.0152e-02 3152.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
VecNormalize 1 1.0 1.5185e-02 291.4 2.70e+01 1.0 0.0e+00 0.0e+00 1.0e+00 1 7 0 0 3 1 7 0 0 14 0 0 0 0.00e+00 0 0.00e+00 0 | |
VecCUDACopyTo 2 1.0 1.4880e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 2 2.24e-04 0 0.00e+00 0 | |
VecCUDACopyFrom 1 1.0 1.2564e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 1 1.12e-04 0 | |
SFPack 1 1.0 7.3200e-07 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
SFUnpack 1 1.0 5.4100e-07 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
KSPSetUp 2 1.0 1.3477e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
KSPSolve 1 1.0 1.7390e-02 7.4 1.17e+02 1.0 0.0e+00 0.0e+00 1.0e+00 1 31 0 0 3 1 31 0 0 14 0 0 1 1.12e-04 1 1.12e-04 77 | |
PCSetUp 2 1.0 1.0112e-02 2.3 9.80e+01 1.0 0.0e+00 0.0e+00 0.0e+00 1 26 0 0 0 1 26 0 0 0 0 0 0 0.00e+00 0 0.00e+00 100 | |
PCSetUpOnBlocks 1 1.0 8.0946e-03 2.4 9.80e+01 1.0 0.0e+00 0.0e+00 0.0e+00 1 26 0 0 0 1 26 0 0 0 0 0 0 0.00e+00 0 0.00e+00 100 | |
PCApply 1 1.0 3.2507e-03 4.5 9.00e+01 1.0 0.0e+00 0.0e+00 0.0e+00 0 24 0 0 0 0 24 0 0 0 0 0 1 1.12e-04 1 1.12e-04 100 | |
--- Event Stage 1: Assembly | |
BuildTwoSided 2 1.0 1.5671e-02 20.2 0.00e+00 0.0 6.0e+00 8.0e+00 2.0e+00 0 0 33 7 6 90 0 50 12 33 0 0 0 0.00e+00 0 0.00e+00 0 | |
BuildTwoSidedF 1 1.0 1.5617e-02 21.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 3 89 0 0 0 17 0 0 0 0.00e+00 0 0.00e+00 0 | |
MatAssemblyBegin 1 1.0 1.5644e-02 21.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 3 89 0 0 0 17 0 0 0 0.00e+00 0 0.00e+00 0 | |
MatAssemblyEnd 1 1.0 6.1206e-04 1.0 0.00e+00 0.0 1.2e+01 3.2e+01 5.0e+00 0 0 67 53 16 10 0 100 100 83 0 0 0 0.00e+00 0 0.00e+00 0 | |
DCtxCreate 1 1.0 2.2353e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
DCtxSetUp 1 1.0 8.0115e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
DCtxSetDevice 1 1.0 6.4335e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
SFSetGraph 1 1.0 4.2980e-06 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
SFSetUp 1 1.0 1.0140e-04 1.1 0.00e+00 0.0 1.2e+01 3.2e+01 1.0e+00 0 0 67 53 3 2 0 100 100 17 0 0 0 0.00e+00 0 0.00e+00 0 | |
--------------------------------------------------------------------------------------------------------------------------------------------------------------- | |
Object Type Creations Destructions. Reports information only for process 0. | |
--- Event Stage 0: Main Stage | |
Container 1 4 | |
Matrix 5 5 | |
PetscDeviceContext 1 0 | |
Vector 10 11 | |
Index Set 2 2 | |
Star Forest Graph 2 3 | |
Krylov Solver 2 2 | |
Preconditioner 2 2 | |
Distributed Mesh 1 1 | |
Discrete System 1 1 | |
Weak Form 1 1 | |
Viewer 1 0 | |
--- Event Stage 1: Assembly | |
Container 3 0 | |
PetscDeviceContext 1 0 | |
Vector 2 1 | |
Index Set 2 2 | |
Star Forest Graph 1 0 | |
--- Event Stage 2: PCMPI | |
======================================================================================================================== | |
Average time to get PetscTime(): 3.41e-08 | |
Average time for MPI_Barrier(): 3.623e-06 | |
Average time for zero size MPI_Send(): 1.2409e-05 | |
#PETSc Option Table entries: | |
-info :pc # (source: command line) | |
-log_view # (source: command line) | |
-log_view_gpu_time # (source: command line) | |
-mat_type aijcusparse # (source: command line) | |
-use_gpu_aware_mpi 1 # (source: command line) | |
#End of PETSc Option Table entries | |
Compiled without FORTRAN kernels | |
Compiled with 64-bit PetscInt | |
Compiled with full precision matrices (default) | |
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8 | |
Configure options: --with-cc=cc --with-cxx=CC --with-fc=ftn --CFLAGS=" -g " --CXXFLAGS=" -g " --CUDAFLAGS=" -g -Xcompiler -rdynamic " --with-fortran-bindings=1 --COPTFLAGS=" -O" --CXXOPTFLAGS=" -O" --FOPTFLAGS=" -O" --with-mpiexec="srun -G4" --with-batch=0 --download-kokkos --download-kokkos-kernels --download-kokkos-cmake-arguments=-DKokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC=OFF --with-kokkos-kernels-tpl=0 --with-make-np=8 --with-64-bit-indices=1 --with-netcdf-dir=/opt/cray/pe/netcdf-hdf5parallel/4.9.0.3/gnu/9.1 --with-pnetcdf-dir=/opt/cray/pe/parallel-netcdf/1.12.3.3/gnu/9.1 --download-hdf5=1 --with-cuda-dir=/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7 --with-cuda-arch=80 --download-parmetis --download-metis --download-muparser --download-zlib --download-scalapack --download-sowing --download-triangle --download-exodusii --download-libceed --download-cgns-commit=HEAD --with-debugging=0 PETSC_ARCH=pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5 | |
----------------------------------------- | |
Libraries compiled on 2024-05-16 17:06:00 on login11 | |
Machine characteristics: Linux-5.14.21-150400.24.81_12.0.87-cray_shasta_c-x86_64-with-glibc2.31 | |
Using PETSc directory: /global/cfs/cdirs/m4267/petsc/petsc_main | |
Using PETSc arch: pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5 | |
----------------------------------------- | |
Using C compiler: cc -g -fPIC -O | |
Using Fortran compiler: ftn -fPIC -Wall -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -O | |
----------------------------------------- | |
Using include paths: -I/global/cfs/cdirs/m4267/petsc/petsc_main/include -I/global/cfs/cdirs/m4267/petsc/petsc_main/pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/include -I/opt/cray/pe/netcdf-hdf5parallel/4.9.0.3/gnu/9.1/include -I/opt/cray/pe/parallel-netcdf/1.12.3.3/gnu/9.1/include -I/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/include -I/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/math_libs/11.7/include | |
----------------------------------------- | |
Using C linker: cc | |
Using Fortran linker: ftn | |
Using libraries: -Wl,-rpath,/global/cfs/cdirs/m4267/petsc/petsc_main/pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/lib -L/global/cfs/cdirs/m4267/petsc/petsc_main/pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/lib -lpetsc -Wl,-rpath,/global/cfs/cdirs/m4267/petsc/petsc_main/pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/lib -L/global/cfs/cdirs/m4267/petsc/petsc_main/pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/lib -Wl,-rpath,/opt/cray/pe/netcdf-hdf5parallel/4.9.0.3/gnu/9.1/lib -L/opt/cray/pe/netcdf-hdf5parallel/4.9.0.3/gnu/9.1/lib -Wl,-rpath,/opt/cray/pe/parallel-netcdf/1.12.3.3/gnu/9.1/lib -L/opt/cray/pe/parallel-netcdf/1.12.3.3/gnu/9.1/lib -Wl,-rpath,/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/lib64 -L/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/lib64 -L/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/lib64/stubs -lscalapack -lkokkoskernels -lkokkoscontainers -lkokkoscore -lkokkossimd -lparmetis -lmetis -lexoIIv2for32 -lexodus -lmuparser -lnetcdf -lpnetcdf -lhdf5_hl -lhdf5 -ltriangle -lz -lceed -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lX11 -lstdc++ -lquadmath |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment