Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save bishtgautam/f4c0a207a3139c22cfda6a2322348405 to your computer and use it in GitHub Desktop.
Save bishtgautam/f4c0a207a3139c22cfda6a2322348405 to your computer and use it in GitHub Desktop.
srun -n 4 ./ex18 -mat_type aijcusparse -info :pc -use_gpu_aware_mpi 1 -log_view -log_view_gpu_time
srun: Job 25663298 step creation temporarily disabled, retrying (Requested nodes are busy)
srun: Step created for StepId=25663298.1
Norm of error 7.48331 iterations 0
****************************************************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 160 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
****************************************************************************************************************************************************************
------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------
##########################################################
# #
# WARNING!!! #
# #
# This code was run with -log_view_gpu_time #
# This provides accurate timing within the GPU kernels #
# but can slow down the entire computation by a #
# measurable amount. For fastest runs we recommend #
# not using this option. #
# #
##########################################################
/global/cfs/cdirs/m4267/petsc/petsc_main/src/ksp/ksp/tutorials/./ex18 on a pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5 named nid001276 with 4 processors, by gbisht Thu May 16 13:37:24 2024
Using Petsc Development GIT revision: v3.20.2-300-gfc2888174f5 GIT Date: 2023-12-13 19:44:14 +0000
Max Max/Min Avg Total
Time (sec): 1.152e+00 1.000 1.152e+00
Objects: 0.000e+00 0.000 0.000e+00
Flops: 3.880e+02 1.037 3.810e+02 1.524e+03
Flops/sec: 3.369e+02 1.037 3.308e+02 1.323e+03
MPI Msg Count: 6.000e+00 2.000 4.500e+00 1.800e+01
MPI Msg Len (bytes): 2.400e+02 2.000 4.000e+01 7.200e+02
MPI Reductions: 3.100e+01 1.000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total Count %Total Avg %Total Count %Total
0: Main Stage: 1.1456e+00 99.5% 1.5240e+03 100.0% 6.000e+00 33.3% 5.600e+01 46.7% 7.000e+00 22.6%
1: Assembly: 6.1253e-03 0.5% 0.0000e+00 0.0% 1.200e+01 66.7% 3.200e+01 53.3% 6.000e+00 19.4%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
AvgLen: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors)
CpuToGpu Count: total number of CPU to GPU copies per processor
CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor)
GpuToCpu Count: total number of GPU to CPU copies per processor
GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor)
GPU %F: percent flops on GPU in this event
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU
Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F
---------------------------------------------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 1 1.0 4.1501e-02 1.5 1.18e+02 1.1 6.0e+00 5.6e+01 0.0e+00 3 29 33 47 0 3 29 100 100 0 0 6 3 1.03e-03 0 0.00e+00 100
MatSolve 1 1.0 3.1753e-03 4.8 9.00e+01 1.0 0.0e+00 0.0e+00 0.0e+00 0 24 0 0 0 0 24 0 0 0 0 0 1 1.12e-04 0 0.00e+00 100
MatLUFactorNum 1 1.0 4.5170e-03 2.3 9.80e+01 1.0 0.0e+00 0.0e+00 0.0e+00 0 26 0 0 0 0 26 0 0 0 0 0 0 0.00e+00 0 0.00e+00 100
MatILUFactorSym 1 1.0 3.4756e-03 8.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatGetRowIJ 1 1.0 3.2460e-06 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatGetOrdering 1 1.0 2.0309e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatCUSPARSCopyTo 2 1.0 8.3599e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 2 9.18e-04 0 0.00e+00 0
DCtxCreate 1 1.0 3.0450e-06 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DCtxDestroy 2 1.0 2.6482e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DCtxSetUp 1 1.0 3.7270e-06 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DCtxSetDevice 1 1.0 9.8200e-07 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DCtxSync 4 1.0 3.2682e-03 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecNorm 2 1.0 1.5186e-02 236.6 5.40e+01 1.0 0.0e+00 0.0e+00 2.0e+00 1 14 0 0 6 1 14 0 0 29 0 0 0 0.00e+00 0 0.00e+00 0
VecCopy 1 1.0 3.1560e-06 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecSet 2 1.0 6.9740e-06 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecAXPY 1 1.0 8.4360e-06 1.1 2.80e+01 1.0 0.0e+00 0.0e+00 0.0e+00 0 7 0 0 0 0 7 0 0 0 13 0 0 0.00e+00 0 0.00e+00 0
VecScatterBegin 1 1.0 3.1233e-02 3.8 0.00e+00 0.0 6.0e+00 5.6e+01 0.0e+00 2 0 33 47 0 2 0 100 100 0 0 0 0 0.00e+00 0 0.00e+00 0
VecScatterEnd 1 1.0 2.0152e-02 3152.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecNormalize 1 1.0 1.5185e-02 291.4 2.70e+01 1.0 0.0e+00 0.0e+00 1.0e+00 1 7 0 0 3 1 7 0 0 14 0 0 0 0.00e+00 0 0.00e+00 0
VecCUDACopyTo 2 1.0 1.4880e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 2 2.24e-04 0 0.00e+00 0
VecCUDACopyFrom 1 1.0 1.2564e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 1 1.12e-04 0
SFPack 1 1.0 7.3200e-07 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFUnpack 1 1.0 5.4100e-07 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
KSPSetUp 2 1.0 1.3477e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
KSPSolve 1 1.0 1.7390e-02 7.4 1.17e+02 1.0 0.0e+00 0.0e+00 1.0e+00 1 31 0 0 3 1 31 0 0 14 0 0 1 1.12e-04 1 1.12e-04 77
PCSetUp 2 1.0 1.0112e-02 2.3 9.80e+01 1.0 0.0e+00 0.0e+00 0.0e+00 1 26 0 0 0 1 26 0 0 0 0 0 0 0.00e+00 0 0.00e+00 100
PCSetUpOnBlocks 1 1.0 8.0946e-03 2.4 9.80e+01 1.0 0.0e+00 0.0e+00 0.0e+00 1 26 0 0 0 1 26 0 0 0 0 0 0 0.00e+00 0 0.00e+00 100
PCApply 1 1.0 3.2507e-03 4.5 9.00e+01 1.0 0.0e+00 0.0e+00 0.0e+00 0 24 0 0 0 0 24 0 0 0 0 0 1 1.12e-04 1 1.12e-04 100
--- Event Stage 1: Assembly
BuildTwoSided 2 1.0 1.5671e-02 20.2 0.00e+00 0.0 6.0e+00 8.0e+00 2.0e+00 0 0 33 7 6 90 0 50 12 33 0 0 0 0.00e+00 0 0.00e+00 0
BuildTwoSidedF 1 1.0 1.5617e-02 21.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 3 89 0 0 0 17 0 0 0 0.00e+00 0 0.00e+00 0
MatAssemblyBegin 1 1.0 1.5644e-02 21.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 3 89 0 0 0 17 0 0 0 0.00e+00 0 0.00e+00 0
MatAssemblyEnd 1 1.0 6.1206e-04 1.0 0.00e+00 0.0 1.2e+01 3.2e+01 5.0e+00 0 0 67 53 16 10 0 100 100 83 0 0 0 0.00e+00 0 0.00e+00 0
DCtxCreate 1 1.0 2.2353e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DCtxSetUp 1 1.0 8.0115e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DCtxSetDevice 1 1.0 6.4335e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFSetGraph 1 1.0 4.2980e-06 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFSetUp 1 1.0 1.0140e-04 1.1 0.00e+00 0.0 1.2e+01 3.2e+01 1.0e+00 0 0 67 53 3 2 0 100 100 17 0 0 0 0.00e+00 0 0.00e+00 0
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Object Type Creations Destructions. Reports information only for process 0.
--- Event Stage 0: Main Stage
Container 1 4
Matrix 5 5
PetscDeviceContext 1 0
Vector 10 11
Index Set 2 2
Star Forest Graph 2 3
Krylov Solver 2 2
Preconditioner 2 2
Distributed Mesh 1 1
Discrete System 1 1
Weak Form 1 1
Viewer 1 0
--- Event Stage 1: Assembly
Container 3 0
PetscDeviceContext 1 0
Vector 2 1
Index Set 2 2
Star Forest Graph 1 0
--- Event Stage 2: PCMPI
========================================================================================================================
Average time to get PetscTime(): 3.41e-08
Average time for MPI_Barrier(): 3.623e-06
Average time for zero size MPI_Send(): 1.2409e-05
#PETSc Option Table entries:
-info :pc # (source: command line)
-log_view # (source: command line)
-log_view_gpu_time # (source: command line)
-mat_type aijcusparse # (source: command line)
-use_gpu_aware_mpi 1 # (source: command line)
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with 64-bit PetscInt
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8
Configure options: --with-cc=cc --with-cxx=CC --with-fc=ftn --CFLAGS=" -g " --CXXFLAGS=" -g " --CUDAFLAGS=" -g -Xcompiler -rdynamic " --with-fortran-bindings=1 --COPTFLAGS=" -O" --CXXOPTFLAGS=" -O" --FOPTFLAGS=" -O" --with-mpiexec="srun -G4" --with-batch=0 --download-kokkos --download-kokkos-kernels --download-kokkos-cmake-arguments=-DKokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC=OFF --with-kokkos-kernels-tpl=0 --with-make-np=8 --with-64-bit-indices=1 --with-netcdf-dir=/opt/cray/pe/netcdf-hdf5parallel/4.9.0.3/gnu/9.1 --with-pnetcdf-dir=/opt/cray/pe/parallel-netcdf/1.12.3.3/gnu/9.1 --download-hdf5=1 --with-cuda-dir=/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7 --with-cuda-arch=80 --download-parmetis --download-metis --download-muparser --download-zlib --download-scalapack --download-sowing --download-triangle --download-exodusii --download-libceed --download-cgns-commit=HEAD --with-debugging=0 PETSC_ARCH=pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5
-----------------------------------------
Libraries compiled on 2024-05-16 17:06:00 on login11
Machine characteristics: Linux-5.14.21-150400.24.81_12.0.87-cray_shasta_c-x86_64-with-glibc2.31
Using PETSc directory: /global/cfs/cdirs/m4267/petsc/petsc_main
Using PETSc arch: pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5
-----------------------------------------
Using C compiler: cc -g -fPIC -O
Using Fortran compiler: ftn -fPIC -Wall -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -O
-----------------------------------------
Using include paths: -I/global/cfs/cdirs/m4267/petsc/petsc_main/include -I/global/cfs/cdirs/m4267/petsc/petsc_main/pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/include -I/opt/cray/pe/netcdf-hdf5parallel/4.9.0.3/gnu/9.1/include -I/opt/cray/pe/parallel-netcdf/1.12.3.3/gnu/9.1/include -I/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/include -I/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/math_libs/11.7/include
-----------------------------------------
Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/global/cfs/cdirs/m4267/petsc/petsc_main/pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/lib -L/global/cfs/cdirs/m4267/petsc/petsc_main/pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/lib -lpetsc -Wl,-rpath,/global/cfs/cdirs/m4267/petsc/petsc_main/pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/lib -L/global/cfs/cdirs/m4267/petsc/petsc_main/pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/lib -Wl,-rpath,/opt/cray/pe/netcdf-hdf5parallel/4.9.0.3/gnu/9.1/lib -L/opt/cray/pe/netcdf-hdf5parallel/4.9.0.3/gnu/9.1/lib -Wl,-rpath,/opt/cray/pe/parallel-netcdf/1.12.3.3/gnu/9.1/lib -L/opt/cray/pe/parallel-netcdf/1.12.3.3/gnu/9.1/lib -Wl,-rpath,/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/lib64 -L/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/lib64 -L/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/lib64/stubs -lscalapack -lkokkoskernels -lkokkoscontainers -lkokkoscore -lkokkossimd -lparmetis -lmetis -lexoIIv2for32 -lexodus -lmuparser -lnetcdf -lpnetcdf -lhdf5_hl -lhdf5 -ltriangle -lz -lceed -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lX11 -lstdc++ -lquadmath
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment