Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save bishtgautam/49b90e92d1cea61ed5dedd824b259066 to your computer and use it in GitHub Desktop.
Save bishtgautam/49b90e92d1cea61ed5dedd824b259066 to your computer and use it in GitHub Desktop.
srun -G4 -N 1 -n 4 ../../rdycore ex2b_ic_file.yaml -use_gpu_aware_mpi 1 -log_view -log_view_gpu_time -ceed /gpu/cuda -dm_vec_type cuda
DETAIL: ==========================================================
DETAIL: RDycore (input read from ex2b_ic_file.yaml)
DETAIL: ==========================================================
DETAIL: Physics:
DETAIL: Flow:
DETAIL: Bed friction: disabled
DETAIL: Sediment model: disabled
DETAIL: Salinity model: disabled
DETAIL: Numerics:
DETAIL: Spatial discretization: finite volume (FV)
DETAIL: Temporal discretization: forward euler
DETAIL: Riemann solver: roe
DETAIL: Time:
DETAIL: Final time: 0.005 hours
DETAIL: Logging:
DETAIL: Primary log file: <stdout>
DETAIL: Checkpoint:
DETAIL: (disabled)
DETAIL: Restart:
DETAIL: (disabled)
DETAIL: Advancing from t = 0. to 0.005...
DETAIL: Step 0: writing XDMF HDF5 output at t = 0. hr to output/ex2b_ic_file-0.h5
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
DETAIL: Step 0: writing XDMF XMF output at t = 0. hr to output/ex2b_ic_file-0000.xmf
DETAIL: Step 100: writing XDMF HDF5 output at t = 0.0005 hr to output/ex2b_ic_file-100.h5
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
DETAIL: Step 100: writing XDMF XMF output at t = 0.0005 hr to output/ex2b_ic_file-0100.xmf
DETAIL: Step 200: writing XDMF HDF5 output at t = 0.001 hr to output/ex2b_ic_file-200.h5
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
DETAIL: Step 200: writing XDMF XMF output at t = 0.001 hr to output/ex2b_ic_file-0200.xmf
DETAIL: Step 300: writing XDMF HDF5 output at t = 0.0015 hr to output/ex2b_ic_file-300.h5
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
DETAIL: Step 300: writing XDMF XMF output at t = 0.0015 hr to output/ex2b_ic_file-0300.xmf
DETAIL: Step 400: writing XDMF HDF5 output at t = 0.002 hr to output/ex2b_ic_file-400.h5
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
DETAIL: Step 400: writing XDMF XMF output at t = 0.002 hr to output/ex2b_ic_file-0400.xmf
DETAIL: Step 500: writing XDMF HDF5 output at t = 0.0025 hr to output/ex2b_ic_file-500.h5
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
DETAIL: Step 500: writing XDMF XMF output at t = 0.0025 hr to output/ex2b_ic_file-0500.xmf
DETAIL: Step 600: writing XDMF HDF5 output at t = 0.003 hr to output/ex2b_ic_file-600.h5
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
DETAIL: Step 600: writing XDMF XMF output at t = 0.003 hr to output/ex2b_ic_file-0600.xmf
DETAIL: Step 700: writing XDMF HDF5 output at t = 0.0035 hr to output/ex2b_ic_file-700.h5
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
DETAIL: Step 700: writing XDMF XMF output at t = 0.0035 hr to output/ex2b_ic_file-0700.xmf
DETAIL: Step 800: writing XDMF HDF5 output at t = 0.004 hr to output/ex2b_ic_file-800.h5
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
DETAIL: Step 800: writing XDMF XMF output at t = 0.004 hr to output/ex2b_ic_file-0800.xmf
DETAIL: Step 900: writing XDMF HDF5 output at t = 0.0045 hr to output/ex2b_ic_file-900.h5
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
DETAIL: Step 900: writing XDMF XMF output at t = 0.0045 hr to output/ex2b_ic_file-0900.xmf
DETAIL: Step 1000: writing XDMF HDF5 output at t = 0.005 hr to output/ex2b_ic_file-1000.h5
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
DETAIL: Step 1000: writing XDMF XMF output at t = 0.005 hr to output/ex2b_ic_file-1000.xmf
****************************************************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 160 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
****************************************************************************************************************************************************************
------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------
##########################################################
# #
# WARNING!!! #
# #
# This code was run with -log_view_gpu_time #
# This provides accurate timing within the GPU kernels #
# but can slow down the entire computation by a #
# measurable amount. For fastest runs we recommend #
# not using this option. #
# #
##########################################################
/global/cfs/cdirs/m4267/gbisht/rdycore/build-pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/driver/tests/swe_roe/../../rdycore on a pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5 named nid001008 with 4 processors, by gbisht Thu May 16 14:03:07 2024
Using Petsc Development GIT revision: v3.20.2-300-gfc2888174f5 GIT Date: 2023-12-13 19:44:14 +0000
Max Max/Min Avg Total
Time (sec): 8.328e+00 1.000 8.328e+00
Objects: 0.000e+00 0.000 0.000e+00
Flops: 9.248e+04 1.108 8.796e+04 3.518e+05
Flops/sec: 1.110e+04 1.108 1.056e+04 4.225e+04
MPI Msg Count: 4.370e+03 2.009 3.257e+03 1.303e+04
MPI Msg Len (bytes): 4.296e+05 1.452 1.089e+02 1.419e+06
MPI Reductions: 3.920e+02 1.000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total Count %Total Avg %Total Count %Total
0: Main Stage: 4.4508e+00 53.4% 9.8310e+03 2.8% 9.940e+02 7.6% 1.629e+02 11.4% 2.640e+02 67.3%
2: RDyAdvance solve: 3.2634e+00 39.2% 3.4200e+05 97.2% 1.203e+04 92.4% 1.044e+02 88.6% 1.100e+02 28.1%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
AvgLen: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors)
CpuToGpu Count: total number of CPU to GPU copies per processor
CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor)
GpuToCpu Count: total number of GPU to CPU copies per processor
GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor)
GPU %F: percent flops on GPU in this event
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU
Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F
---------------------------------------------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
PetscBarrier 1 1.0 6.2462e-04 72.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
BuildTwoSided 35 1.0 1.2587e+00 457.3 0.00e+00 0.0 1.6e+02 1.1e+01 3.5e+01 4 0 1 0 9 7 0 16 1 13 0 0 0 0.00e+00 0 0.00e+00 0
BuildTwoSidedF 3 1.0 1.0890e-03 17.7 0.00e+00 0.0 8.1e+01 7.1e+01 3.0e+00 0 0 1 0 1 0 0 8 4 1 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexCrFromFile 1 1.0 1.1356e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 2 0 0 0 0 2 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexCrFromOpts 1 1.0 1.1404e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 2 0 0 0 0 2 0 0 0 0.00e+00 0 0.00e+00 0
Mesh Partition 2 1.0 6.5219e-02 1.0 0.00e+00 0.0 9.3e+01 1.7e+02 2.5e+01 1 0 1 1 6 1 0 9 10 9 0 0 0 0.00e+00 0 0.00e+00 0
Mesh Migration 2 1.0 1.3705e+00 1.0 0.00e+00 0.0 3.0e+02 1.8e+02 6.7e+01 16 0 2 4 17 31 0 30 33 25 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexPartSelf 1 1.0 6.5496e-05 11.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexPartLblInv 2 1.0 2.1289e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 2 0 0 0 2 2 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexPartLblSF 2 1.0 1.4219e-04 1.1 0.00e+00 0.0 2.4e+01 1.1e+02 2.0e+00 0 0 0 0 1 0 0 2 2 1 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexPartStrtSF 2 1.0 4.3987e-02 1.0 0.00e+00 0.0 1.2e+01 4.3e+02 0.0e+00 1 0 0 0 0 1 0 1 3 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexPointSF 2 1.0 2.1904e-04 1.1 0.00e+00 0.0 1.5e+01 6.3e+02 0.0e+00 0 0 0 1 0 0 0 2 6 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexInterp 17 1.0 8.7852e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexDistribute 1 1.0 1.4344e+00 1.0 0.00e+00 0.0 9.0e+01 4.8e+02 4.2e+01 17 0 1 3 11 32 0 9 26 16 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexDistCones 2 1.0 1.8811e-04 1.0 0.00e+00 0.0 6.4e+01 2.9e+02 4.0e+00 0 0 0 1 1 0 0 6 12 2 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexDistLabels 2 1.0 3.6834e-04 1.0 0.00e+00 0.0 1.1e+02 1.5e+02 3.4e+01 0 0 1 1 9 0 0 11 10 13 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexDistOvrlp 1 1.0 1.7269e-03 1.0 0.00e+00 0.0 3.1e+02 1.1e+02 5.6e+01 0 0 2 3 14 0 0 31 22 21 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexDistField 3 1.0 1.3695e+00 1.0 0.00e+00 0.0 8.7e+01 1.6e+02 8.0e+00 16 0 1 1 2 31 0 9 9 3 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexGToNBegin 4 1.0 1.0629e-04 1.8 0.00e+00 0.0 2.4e+01 1.3e+02 2.0e+00 0 0 0 0 1 0 0 2 2 1 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexGToNEnd 4 1.0 1.8004e-05 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexNToGBegin 3 1.0 3.2829e-04 3.3 0.00e+00 0.0 2.1e+01 1.4e+02 2.0e+00 0 0 0 0 1 0 0 2 2 1 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexNToGEnd 3 1.0 3.2617e-04 72.0 7.70e+01 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 1 0 0 0.00e+00 0 0.00e+00 0
DMPlexStratify 22 1.0 4.9743e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 1 0 0 0 0 2 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexSymmetrize 22 1.0 2.5620e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFSetGraph 46 1.0 4.1692e-05 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFSetUp 32 1.0 1.2591e+00 638.6 0.00e+00 0.0 2.7e+02 1.1e+02 3.2e+01 4 0 2 2 8 7 0 27 19 12 0 0 0 0.00e+00 0 0.00e+00 0
SFBcastBegin 104 1.0 7.2839e-02 1.1 0.00e+00 0.0 5.5e+02 1.7e+02 0.0e+00 1 0 4 7 0 2 0 55 59 0 0 0 2 2.08e-04 0 0.00e+00 0
SFBcastEnd 104 1.0 6.8898e-03 11.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFReduceBegin 16 1.0 3.0451e-04 3.4 0.00e+00 0.0 6.0e+01 4.5e+02 0.0e+00 0 0 0 2 0 0 0 6 16 0 0 0 0 0.00e+00 0 0.00e+00 0
SFReduceEnd 16 1.0 5.2511e-04 1.9 7.70e+01 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFFetchOpBegin 1 1.0 5.0700e-06 5.1 0.00e+00 0.0 3.0e+00 7.2e+01 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFFetchOpEnd 1 1.0 2.0467e-04 24.3 0.00e+00 0.0 3.0e+00 7.2e+01 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFCreateEmbed 8 1.0 1.1424e-04 1.3 0.00e+00 0.0 2.4e+01 8.0e+01 0.0e+00 0 0 0 0 0 0 0 2 1 0 0 0 0 0.00e+00 0 0.00e+00 0
SFDistSection 17 1.0 7.1412e-04 1.1 0.00e+00 0.0 2.2e+02 1.8e+02 2.1e+01 0 0 2 3 5 0 0 23 25 8 0 0 0 0.00e+00 0 0.00e+00 0
SFSectionSF 13 1.0 1.2583e+00 4649.8 0.00e+00 0.0 1.2e+02 1.1e+02 1.3e+01 4 0 1 1 3 7 0 12 8 5 0 0 0 0.00e+00 0 0.00e+00 0
SFPack 121 1.0 2.9587e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 2 2.08e-04 0 0.00e+00 0
SFUnpack 122 1.0 6.5939e-05 3.6 7.70e+01 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 4 0 0 0.00e+00 0 0.00e+00 0
MatAssemblyBegin 24 1.0 2.5960e-06 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatAssemblyEnd 24 1.0 2.1020e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatSetPreallCOO 16 1.0 5.9223e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatSetValuesCOO 16 1.0 2.9146e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecCopy 4 1.0 6.5236e-05 17.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 1 2.64e-04 0 0.00e+00 0
VecSet 15 1.0 5.1213e-04 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecAssemblyBegin 3 1.0 1.1075e-03 14.1 0.00e+00 0.0 8.1e+01 7.1e+01 3.0e+00 0 0 1 0 1 0 0 8 4 1 0 0 0 0.00e+00 0 0.00e+00 0
VecAssemblyEnd 3 1.0 4.1913e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecLoad 2 1.0 6.1732e-03 1.0 0.00e+00 0.0 1.2e+01 2.0e+00 1.4e+01 0 0 0 0 4 0 0 1 0 5 0 0 0 0.00e+00 0 0.00e+00 0
VecCUDACopyTo 3 1.0 2.2604e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 3 6.16e-04 0 0.00e+00 0
VecCUDACopyFrom 4 1.0 5.1449e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 4 1.33e-03 0
DualSpaceSetUp 4 1.0 3.6423e-03 1.0 7.20e+01 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 3 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
FESetUp 4 1.0 4.6242e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
cuBLAS Init 1 1.0 6.2119e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 7 0 0 0 0 14 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DCtxCreate 2 1.0 1.9657e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DCtxDestroy 2 1.0 3.9486e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DCtxSetUp 2 1.0 8.4092e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DCtxSetDevice 2 1.0 6.8342e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DCtxSync 28 1.6 3.2716e-04 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
--- Event Stage 2: RDyAdvance solve
DMPlexGToNBegin 11 1.0 2.9155e-04 1.3 0.00e+00 0.0 3.3e+01 2.6e+02 0.0e+00 0 0 0 1 0 0 0 0 1 0 0 0 0 0.00e+00 10 2.64e-03 0
DMPlexGToNEnd 11 1.0 9.8997e-05 3.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFBcastBegin 1011 1.0 5.7417e-02 1.7 0.00e+00 0.0 6.0e+03 1.0e+02 0.0e+00 1 0 46 45 0 1 0 50 50 0 0 0 0 0.00e+00 0 0.00e+00 0
SFBcastEnd 1011 1.0 1.3355e-01 26.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFReduceBegin 1000 1.0 4.4428e-02 2.0 0.00e+00 0.0 6.0e+03 1.0e+02 0.0e+00 0 0 46 44 0 1 0 50 50 0 0 0 0 0.00e+00 0 0.00e+00 0
SFReduceEnd 1000 1.0 1.4207e-01 15.8 2.40e+04 1.6 0.0e+00 0.0e+00 0.0e+00 1 22 0 0 0 1 23 0 0 0 1 0 0 0.00e+00 0 0.00e+00 100
SFPack 2011 1.0 1.0423e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFUnpack 2011 1.0 6.6657e-03 1.1 2.40e+04 1.6 0.0e+00 0.0e+00 0.0e+00 0 22 0 0 0 0 23 0 0 0 12 0 0 0.00e+00 0 0.00e+00 100
VecView 88 1.0 2.0139e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 24 0 0 0 0 62 0 0 0 0 0 0 0 0.00e+00 1 2.64e-04 0
VecCopy 3000 1.0 6.4235e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecSet 1011 1.0 8.6357e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecAYPX 1000 1.0 2.6836e-02 1.0 6.60e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 75 0 0 0 1 77 0 0 0 10 31 0 0.00e+00 0 0.00e+00 100
VecCUDACopyTo 1 1.0 8.3160e-06 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 1 2.64e-04 0 0.00e+00 0
VecCUDACopyFrom 13 1.3 1.9998e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 11 2.90e-03 0
DCtxSync 5027 1.0 1.1747e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
CeedOperatorApp 2000 1.0 4.6794e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 6 0 0 0 0 14 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
TSStep 1000 1.0 8.2401e-01 1.2 9.00e+04 1.1 1.2e+04 1.0e+02 0.0e+00 10 97 92 88 0 24 100 100 99 0 0 1 1 2.64e-04 0 0.00e+00 100
TSFunctionEval 1000 1.0 7.7976e-01 1.2 2.40e+04 1.6 1.2e+04 1.0e+02 0.0e+00 9 22 92 88 0 23 23 100 99 0 0 0 1 2.64e-04 0 0.00e+00 100
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Object Type Creations Destructions. Reports information only for process 0.
--- Event Stage 0: Main Stage
Container 49 49
Distributed Mesh 61 61
DM Label 154 154
Quadrature 110 110
Index Set 554 554
IS L to G Mapping 148 148
Section 245 245
Star Forest Graph 162 162
Discrete System 91 91
Weak Form 91 91
GraphPartitioner 23 23
Matrix 94 94
Vector 56 57
Linear Space 8 8
Dual Space 28 28
FE Space 4 4
PetscDeviceContext 2 0
Viewer 5 5
TSAdapt 1 1
TS 1 1
DMTS 1 1
SNES 1 1
DMSNES 3 3
SNESLineSearch 1 1
Krylov Solver 1 1
Preconditioner 1 1
Viewer 1 0
--- Event Stage 1: PCMPI
--- Event Stage 2: RDyAdvance solve
Vector 1045 1044
Viewer 22 22
========================================================================================================================
Average time to get PetscTime(): 3.1e-08
Average time for MPI_Barrier(): 4.2142e-06
Average time for zero size MPI_Send(): 2.525e-06
#PETSc Option Table entries:
-ceed /gpu/cuda # (source: command line)
-dist_dm_distribute_save_sf true # (source: code)
-dm_plex_filename DamBreak_grid5x10.exo # (source: code)
-dm_vec_type cuda # (source: command line)
-log_view # (source: command line)
-log_view_gpu_time # (source: command line)
-use_gpu_aware_mpi 1 # (source: command line)
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with 64-bit PetscInt
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8
Configure options: --with-cc=cc --with-cxx=CC --with-fc=ftn --CFLAGS=" -g " --CXXFLAGS=" -g " --CUDAFLAGS=" -g -Xcompiler -rdynamic " --with-fortran-bindings=1 --COPTFLAGS=" -O" --CXXOPTFLAGS=" -O" --FOPTFLAGS=" -O" --with-mpiexec="srun -G4" --with-batch=0 --download-kokkos --download-kokkos-kernels --download-kokkos-cmake-arguments=-DKokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC=OFF --with-kokkos-kernels-tpl=0 --with-make-np=8 --with-64-bit-indices=1 --with-netcdf-dir=/opt/cray/pe/netcdf-hdf5parallel/4.9.0.3/gnu/9.1 --with-pnetcdf-dir=/opt/cray/pe/parallel-netcdf/1.12.3.3/gnu/9.1 --download-hdf5=1 --with-cuda-dir=/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7 --with-cuda-arch=80 --download-parmetis --download-metis --download-muparser --download-zlib --download-scalapack --download-sowing --download-triangle --download-exodusii --download-libceed --download-cgns-commit=HEAD --with-debugging=0 PETSC_ARCH=pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5
-----------------------------------------
Libraries compiled on 2024-05-16 17:06:00 on login11
Machine characteristics: Linux-5.14.21-150400.24.81_12.0.87-cray_shasta_c-x86_64-with-glibc2.31
Using PETSc directory: /global/cfs/cdirs/m4267/petsc/petsc_main
Using PETSc arch: pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5
-----------------------------------------
Using C compiler: cc -g -fPIC -O
Using Fortran compiler: ftn -fPIC -Wall -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -O
-----------------------------------------
Using include paths: -I/global/cfs/cdirs/m4267/petsc/petsc_main/include -I/global/cfs/cdirs/m4267/petsc/petsc_main/pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/include -I/opt/cray/pe/netcdf-hdf5parallel/4.9.0.3/gnu/9.1/include -I/opt/cray/pe/parallel-netcdf/1.12.3.3/gnu/9.1/include -I/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/include -I/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/math_libs/11.7/include
-----------------------------------------
Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/global/cfs/cdirs/m4267/petsc/petsc_main/pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/lib -L/global/cfs/cdirs/m4267/petsc/petsc_main/pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/lib -lpetsc -Wl,-rpath,/global/cfs/cdirs/m4267/petsc/petsc_main/pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/lib -L/global/cfs/cdirs/m4267/petsc/petsc_main/pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/lib -Wl,-rpath,/opt/cray/pe/netcdf-hdf5parallel/4.9.0.3/gnu/9.1/lib -L/opt/cray/pe/netcdf-hdf5parallel/4.9.0.3/gnu/9.1/lib -Wl,-rpath,/opt/cray/pe/parallel-netcdf/1.12.3.3/gnu/9.1/lib -L/opt/cray/pe/parallel-netcdf/1.12.3.3/gnu/9.1/lib -Wl,-rpath,/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/lib64 -L/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/lib64 -L/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/lib64/stubs -lscalapack -lkokkoskernels -lkokkoscontainers -lkokkoscore -lkokkossimd -lparmetis -lmetis -lexoIIv2for32 -lexodus -lmuparser -lnetcdf -lpnetcdf -lhdf5_hl -lhdf5 -ltriangle -lz -lceed -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lX11 -lstdc++ -lquadmath
-----------------------------------------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment