Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save bishtgautam/aa7e213bf977653778b6ce5dae5cbc51 to your computer and use it in GitHub Desktop.
Save bishtgautam/aa7e213bf977653778b6ce5dae5cbc51 to your computer and use it in GitHub Desktop.
srun -G4 -N 1 -n 4 ../../rdycore ex2b_ic_file.yaml -use_gpu_aware_mpi 0 -log_view -log_view_gpu_time -ceed /gpu/cuda -dm_vec_type cuda
DETAIL: ==========================================================
DETAIL: RDycore (input read from ex2b_ic_file.yaml)
DETAIL: ==========================================================
DETAIL: Physics:
DETAIL: Flow:
DETAIL: Bed friction: disabled
DETAIL: Sediment model: disabled
DETAIL: Salinity model: disabled
DETAIL: Numerics:
DETAIL: Spatial discretization: finite volume (FV)
DETAIL: Temporal discretization: forward euler
DETAIL: Riemann solver: roe
DETAIL: Time:
DETAIL: Final time: 0.005 hours
DETAIL: Logging:
DETAIL: Primary log file: <stdout>
DETAIL: Checkpoint:
DETAIL: (disabled)
DETAIL: Restart:
DETAIL: (disabled)
DETAIL: Advancing from t = 0. to 0.005...
DETAIL: Step 0: writing XDMF HDF5 output at t = 0. hr to output/ex2b_ic_file-0.h5
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
DETAIL: Step 0: writing XDMF XMF output at t = 0. hr to output/ex2b_ic_file-0000.xmf
DETAIL: Step 100: writing XDMF HDF5 output at t = 0.0005 hr to output/ex2b_ic_file-100.h5
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
DETAIL: Step 100: writing XDMF XMF output at t = 0.0005 hr to output/ex2b_ic_file-0100.xmf
DETAIL: Step 200: writing XDMF HDF5 output at t = 0.001 hr to output/ex2b_ic_file-200.h5
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
DETAIL: Step 200: writing XDMF XMF output at t = 0.001 hr to output/ex2b_ic_file-0200.xmf
DETAIL: Step 300: writing XDMF HDF5 output at t = 0.0015 hr to output/ex2b_ic_file-300.h5
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
DETAIL: Step 300: writing XDMF XMF output at t = 0.0015 hr to output/ex2b_ic_file-0300.xmf
DETAIL: Step 400: writing XDMF HDF5 output at t = 0.002 hr to output/ex2b_ic_file-400.h5
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
DETAIL: Step 400: writing XDMF XMF output at t = 0.002 hr to output/ex2b_ic_file-0400.xmf
DETAIL: Step 500: writing XDMF HDF5 output at t = 0.0025 hr to output/ex2b_ic_file-500.h5
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
DETAIL: Step 500: writing XDMF XMF output at t = 0.0025 hr to output/ex2b_ic_file-0500.xmf
DETAIL: Step 600: writing XDMF HDF5 output at t = 0.003 hr to output/ex2b_ic_file-600.h5
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
DETAIL: Step 600: writing XDMF XMF output at t = 0.003 hr to output/ex2b_ic_file-0600.xmf
DETAIL: Step 700: writing XDMF HDF5 output at t = 0.0035 hr to output/ex2b_ic_file-700.h5
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
DETAIL: Step 700: writing XDMF XMF output at t = 0.0035 hr to output/ex2b_ic_file-0700.xmf
DETAIL: Step 800: writing XDMF HDF5 output at t = 0.004 hr to output/ex2b_ic_file-800.h5
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
DETAIL: Step 800: writing XDMF XMF output at t = 0.004 hr to output/ex2b_ic_file-0800.xmf
DETAIL: Step 900: writing XDMF HDF5 output at t = 0.0045 hr to output/ex2b_ic_file-900.h5
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
DETAIL: Step 900: writing XDMF XMF output at t = 0.0045 hr to output/ex2b_ic_file-0900.xmf
DETAIL: Step 1000: writing XDMF HDF5 output at t = 0.005 hr to output/ex2b_ic_file-1000.h5
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page.
DETAIL: Step 1000: writing XDMF XMF output at t = 0.005 hr to output/ex2b_ic_file-1000.xmf
****************************************************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 160 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
****************************************************************************************************************************************************************
------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------
##########################################################
# #
# WARNING!!! #
# #
# This code was run with -log_view_gpu_time #
# This provides accurate timing within the GPU kernels #
# but can slow down the entire computation by a #
# measurable amount. For fastest runs we recommend #
# not using this option. #
# #
##########################################################
/global/cfs/cdirs/m4267/gbisht/rdycore/build-pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/driver/tests/swe_roe/../../rdycore on a pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5 named nid001008 with 4 processors, by gbisht Thu May 16 14:07:17 2024
Using Petsc Development GIT revision: v3.20.2-300-gfc2888174f5 GIT Date: 2023-12-13 19:44:14 +0000
Max Max/Min Avg Total
Time (sec): 7.575e+00 1.000 7.575e+00
Objects: 0.000e+00 0.000 0.000e+00
Flops: 9.248e+04 1.108 8.796e+04 3.518e+05
Flops/sec: 1.221e+04 1.108 1.161e+04 4.645e+04
MPI Msg Count: 4.370e+03 2.009 3.257e+03 1.303e+04
MPI Msg Len (bytes): 4.296e+05 1.452 1.089e+02 1.419e+06
MPI Reductions: 3.920e+02 1.000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total Count %Total Avg %Total Count %Total
0: Main Stage: 3.7435e+00 49.4% 9.8310e+03 2.8% 9.940e+02 7.6% 1.629e+02 11.4% 2.640e+02 67.3%
2: RDyAdvance solve: 3.2185e+00 42.5% 3.4200e+05 97.2% 1.203e+04 92.4% 1.044e+02 88.6% 1.100e+02 28.1%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
AvgLen: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors)
CpuToGpu Count: total number of CPU to GPU copies per processor
CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor)
GpuToCpu Count: total number of GPU to CPU copies per processor
GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor)
GPU %F: percent flops on GPU in this event
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU
Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F
---------------------------------------------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
PetscBarrier 1 1.0 6.3476e-04 112.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
BuildTwoSided 35 1.0 1.3420e+00 1729.8 0.00e+00 0.0 1.6e+02 1.1e+01 3.5e+01 4 0 1 0 9 9 0 16 1 13 0 0 0 0.00e+00 0 0.00e+00 0
BuildTwoSidedF 3 1.0 7.4845e-05 1.3 0.00e+00 0.0 8.1e+01 7.1e+01 3.0e+00 0 0 1 0 1 0 0 8 4 1 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexCrFromFile 1 1.0 1.3262e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 2 0 0 0 0 2 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexCrFromOpts 1 1.0 1.3312e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 2 0 0 0 0 2 0 0 0 0.00e+00 0 0.00e+00 0
Mesh Partition 2 1.0 1.9862e-03 1.2 0.00e+00 0.0 9.3e+01 1.7e+02 2.5e+01 0 0 1 1 6 0 0 9 10 9 0 0 0 0.00e+00 0 0.00e+00 0
Mesh Migration 2 1.0 1.3525e+00 1.0 0.00e+00 0.0 3.0e+02 1.8e+02 6.7e+01 18 0 2 4 17 36 0 30 33 25 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexPartSelf 1 1.0 7.3823e-05 13.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexPartLblInv 2 1.0 2.1545e-04 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 2 0 0 0 2 2 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexPartLblSF 2 1.0 1.4288e-04 1.2 0.00e+00 0.0 2.4e+01 1.1e+02 2.0e+00 0 0 0 0 1 0 0 2 2 1 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexPartStrtSF 2 1.0 1.0393e-04 1.1 0.00e+00 0.0 1.2e+01 4.3e+02 0.0e+00 0 0 0 0 0 0 0 1 3 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexPointSF 2 1.0 2.0609e-04 1.1 0.00e+00 0.0 1.5e+01 6.3e+02 0.0e+00 0 0 0 1 0 0 0 2 6 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexInterp 17 1.0 8.8179e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexDistribute 1 1.0 1.3535e+00 1.0 0.00e+00 0.0 9.0e+01 4.8e+02 4.2e+01 18 0 1 3 11 36 0 9 26 16 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexDistCones 2 1.0 1.7515e-04 1.0 0.00e+00 0.0 6.4e+01 2.9e+02 4.0e+00 0 0 0 1 1 0 0 6 12 2 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexDistLabels 2 1.0 3.4997e-04 1.0 0.00e+00 0.0 1.1e+02 1.5e+02 3.4e+01 0 0 1 1 9 0 0 11 10 13 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexDistOvrlp 1 1.0 1.3654e-03 1.0 0.00e+00 0.0 3.1e+02 1.1e+02 5.6e+01 0 0 2 3 14 0 0 31 22 21 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexDistField 3 1.0 1.3515e+00 1.0 0.00e+00 0.0 8.7e+01 1.6e+02 8.0e+00 18 0 1 1 2 36 0 9 9 3 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexGToNBegin 4 1.0 8.5284e-05 1.5 0.00e+00 0.0 2.4e+01 1.3e+02 2.0e+00 0 0 0 0 1 0 0 2 2 1 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexGToNEnd 4 1.0 1.7975e-05 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexNToGBegin 3 1.0 3.2183e-04 3.7 0.00e+00 0.0 2.1e+01 1.4e+02 2.0e+00 0 0 0 0 1 0 0 2 2 1 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexNToGEnd 3 1.0 3.2636e-04 52.4 7.70e+01 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 1 0 0 0.00e+00 0 0.00e+00 0
DMPlexStratify 22 1.0 4.8805e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 1 0 0 0 0 2 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexSymmetrize 22 1.0 2.9021e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFSetGraph 46 1.0 3.5877e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFSetUp 32 1.0 1.3423e+00 1312.7 0.00e+00 0.0 2.7e+02 1.1e+02 3.2e+01 4 0 2 2 8 9 0 27 19 12 0 0 0 0.00e+00 0 0.00e+00 0
SFBcastBegin 104 1.0 1.5670e-02 1.3 0.00e+00 0.0 5.5e+02 1.7e+02 0.0e+00 0 0 4 7 0 0 0 55 59 0 0 0 2 2.08e-04 6 7.28e-04 0
SFBcastEnd 104 1.0 3.9223e-03 8.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 6 7.28e-04 0 0.00e+00 0
SFReduceBegin 16 1.0 1.3290e-04 1.4 0.00e+00 0.0 6.0e+01 4.5e+02 0.0e+00 0 0 0 2 0 0 0 6 16 0 0 0 0 0.00e+00 0 0.00e+00 0
SFReduceEnd 16 1.0 5.2206e-04 2.0 7.70e+01 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFFetchOpBegin 1 1.0 5.0690e-06 4.7 0.00e+00 0.0 3.0e+00 7.2e+01 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFFetchOpEnd 1 1.0 4.1470e-05 4.9 0.00e+00 0.0 3.0e+00 7.2e+01 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFCreateEmbed 8 1.0 1.1387e-04 1.3 0.00e+00 0.0 2.4e+01 8.0e+01 0.0e+00 0 0 0 0 0 0 0 2 1 0 0 0 0 0.00e+00 0 0.00e+00 0
SFDistSection 17 1.0 6.9383e-04 1.1 0.00e+00 0.0 2.2e+02 1.8e+02 2.1e+01 0 0 2 3 5 0 0 23 25 8 0 0 0 0.00e+00 0 0.00e+00 0
SFSectionSF 13 1.0 1.3416e+00 4809.8 0.00e+00 0.0 1.2e+02 1.1e+02 1.3e+01 4 0 1 1 3 9 0 12 8 5 0 0 0 0.00e+00 0 0.00e+00 0
SFPack 121 1.0 2.7338e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 2 2.08e-04 0 0.00e+00 0
SFUnpack 122 1.0 6.6949e-05 3.8 7.70e+01 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 3 0 0 0.00e+00 0 0.00e+00 0
MatAssemblyBegin 24 1.0 2.4680e-06 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatAssemblyEnd 24 1.0 2.1028e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatSetPreallCOO 16 1.0 6.3122e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatSetValuesCOO 16 1.0 2.8285e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecCopy 4 1.0 6.5717e-05 15.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 1 2.64e-04 0 0.00e+00 0
VecSet 15 1.0 4.7812e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecAssemblyBegin 3 1.0 9.1377e-05 1.2 0.00e+00 0.0 8.1e+01 7.1e+01 3.0e+00 0 0 1 0 1 0 0 8 4 1 0 0 0 0.00e+00 0 0.00e+00 0
VecAssemblyEnd 3 1.0 4.5498e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecLoad 2 1.0 4.8150e-03 1.0 0.00e+00 0.0 1.2e+01 2.0e+00 1.4e+01 0 0 0 0 4 0 0 1 0 5 0 0 0 0.00e+00 0 0.00e+00 0
VecCUDACopyTo 3 1.0 3.3434e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 3 6.16e-04 0 0.00e+00 0
VecCUDACopyFrom 4 1.0 4.9968e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 4 1.33e-03 0
DualSpaceSetUp 4 1.0 3.5681e-03 1.0 7.20e+01 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 3 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
FESetUp 4 1.0 4.3868e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
cuBLAS Init 1 1.0 6.1798e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 8 0 0 0 0 16 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DCtxCreate 2 1.0 1.9779e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DCtxDestroy 2 1.0 3.9696e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DCtxSetUp 2 1.0 8.8883e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DCtxSetDevice 2 1.0 1.2130e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DCtxSync 28 1.6 3.0938e-04 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
--- Event Stage 2: RDyAdvance solve
DMPlexGToNBegin 11 1.0 2.8097e-04 1.3 0.00e+00 0.0 3.3e+01 2.6e+02 0.0e+00 0 0 0 1 0 0 0 0 1 0 0 0 0 0.00e+00 10 2.64e-03 0
DMPlexGToNEnd 11 1.0 1.2229e-04 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFBcastBegin 1011 1.0 4.1643e-02 1.1 0.00e+00 0.0 6.0e+03 1.0e+02 0.0e+00 1 0 46 45 0 1 0 50 50 0 0 0 0 0.00e+00 2000 3.12e-01 0
SFBcastEnd 1011 1.0 8.3698e-02 13.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 2000 3.12e-01 0 0.00e+00 0
SFReduceBegin 1000 1.0 3.1020e-02 1.2 0.00e+00 0.0 6.0e+03 1.0e+02 0.0e+00 0 0 46 44 0 1 0 50 50 0 0 0 0 0.00e+00 2000 3.12e-01 0
SFReduceEnd 1000 1.0 7.5066e-02 6.2 2.40e+04 1.6 0.0e+00 0.0e+00 0.0e+00 0 22 0 0 0 1 23 0 0 0 1 0 2000 3.12e-01 0 0.00e+00 100
SFPack 2011 1.0 1.0348e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFUnpack 2011 1.0 6.3307e-03 1.0 2.40e+04 1.6 0.0e+00 0.0e+00 0.0e+00 0 22 0 0 0 0 23 0 0 0 12 0 0 0.00e+00 0 0.00e+00 100
VecView 88 1.0 1.8562e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 24 0 0 0 0 58 0 0 0 0 0 0 0 0.00e+00 1 2.64e-04 0
VecCopy 3000 1.0 6.4494e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecSet 1011 1.0 8.5404e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecAYPX 1000 1.0 2.6781e-02 1.0 6.60e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 75 0 0 0 1 77 0 0 0 10 31 0 0.00e+00 0 0.00e+00 100
VecCUDACopyTo 1 1.0 7.2540e-06 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 1 2.64e-04 0 0.00e+00 0
VecCUDACopyFrom 13 1.3 1.8317e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 11 2.90e-03 0
DCtxSync 5027 1.0 1.1488e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
CeedOperatorApp 2000 1.0 4.6278e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 6 0 0 0 0 14 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
TSStep 1000 1.0 7.5857e-01 1.1 9.00e+04 1.1 1.2e+04 1.0e+02 0.0e+00 10 97 92 88 0 23 100 100 99 0 0 1 4001 6.24e-01 4000 6.24e-01 100
TSFunctionEval 1000 1.0 7.1254e-01 1.1 2.40e+04 1.6 1.2e+04 1.0e+02 0.0e+00 9 22 92 88 0 21 23 100 99 0 0 0 4001 6.24e-01 4000 6.24e-01 100
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Object Type Creations Destructions. Reports information only for process 0.
--- Event Stage 0: Main Stage
Container 49 49
Distributed Mesh 61 61
DM Label 154 154
Quadrature 110 110
Index Set 554 554
IS L to G Mapping 148 148
Section 245 245
Star Forest Graph 162 162
Discrete System 91 91
Weak Form 91 91
GraphPartitioner 23 23
Matrix 94 94
Vector 56 57
Linear Space 8 8
Dual Space 28 28
FE Space 4 4
PetscDeviceContext 2 0
Viewer 5 5
TSAdapt 1 1
TS 1 1
DMTS 1 1
SNES 1 1
DMSNES 3 3
SNESLineSearch 1 1
Krylov Solver 1 1
Preconditioner 1 1
Viewer 1 0
--- Event Stage 1: PCMPI
--- Event Stage 2: RDyAdvance solve
Vector 1045 1044
Viewer 22 22
========================================================================================================================
Average time to get PetscTime(): 3.71e-08
Average time for MPI_Barrier(): 5.0938e-06
Average time for zero size MPI_Send(): 2.3045e-06
#PETSc Option Table entries:
-ceed /gpu/cuda # (source: command line)
-dist_dm_distribute_save_sf true # (source: code)
-dm_plex_filename DamBreak_grid5x10.exo # (source: code)
-dm_vec_type cuda # (source: command line)
-log_view # (source: command line)
-log_view_gpu_time # (source: command line)
-use_gpu_aware_mpi 0 # (source: command line)
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with 64-bit PetscInt
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8
Configure options: --with-cc=cc --with-cxx=CC --with-fc=ftn --CFLAGS=" -g " --CXXFLAGS=" -g " --CUDAFLAGS=" -g -Xcompiler -rdynamic " --with-fortran-bindings=1 --COPTFLAGS=" -O" --CXXOPTFLAGS=" -O" --FOPTFLAGS=" -O" --with-mpiexec="srun -G4" --with-batch=0 --download-kokkos --download-kokkos-kernels --download-kokkos-cmake-arguments=-DKokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC=OFF --with-kokkos-kernels-tpl=0 --with-make-np=8 --with-64-bit-indices=1 --with-netcdf-dir=/opt/cray/pe/netcdf-hdf5parallel/4.9.0.3/gnu/9.1 --with-pnetcdf-dir=/opt/cray/pe/parallel-netcdf/1.12.3.3/gnu/9.1 --download-hdf5=1 --with-cuda-dir=/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7 --with-cuda-arch=80 --download-parmetis --download-metis --download-muparser --download-zlib --download-scalapack --download-sowing --download-triangle --download-exodusii --download-libceed --download-cgns-commit=HEAD --with-debugging=0 PETSC_ARCH=pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5
-----------------------------------------
Libraries compiled on 2024-05-16 17:06:00 on login11
Machine characteristics: Linux-5.14.21-150400.24.81_12.0.87-cray_shasta_c-x86_64-with-glibc2.31
Using PETSc directory: /global/cfs/cdirs/m4267/petsc/petsc_main
Using PETSc arch: pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5
-----------------------------------------
Using C compiler: cc -g -fPIC -O
Using Fortran compiler: ftn -fPIC -Wall -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -O
-----------------------------------------
Using include paths: -I/global/cfs/cdirs/m4267/petsc/petsc_main/include -I/global/cfs/cdirs/m4267/petsc/petsc_main/pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/include -I/opt/cray/pe/netcdf-hdf5parallel/4.9.0.3/gnu/9.1/include -I/opt/cray/pe/parallel-netcdf/1.12.3.3/gnu/9.1/include -I/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/include -I/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/math_libs/11.7/include
-----------------------------------------
Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/global/cfs/cdirs/m4267/petsc/petsc_main/pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/lib -L/global/cfs/cdirs/m4267/petsc/petsc_main/pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/lib -lpetsc -Wl,-rpath,/global/cfs/cdirs/m4267/petsc/petsc_main/pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/lib -L/global/cfs/cdirs/m4267/petsc/petsc_main/pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/lib -Wl,-rpath,/opt/cray/pe/netcdf-hdf5parallel/4.9.0.3/gnu/9.1/lib -L/opt/cray/pe/netcdf-hdf5parallel/4.9.0.3/gnu/9.1/lib -Wl,-rpath,/opt/cray/pe/parallel-netcdf/1.12.3.3/gnu/9.1/lib -L/opt/cray/pe/parallel-netcdf/1.12.3.3/gnu/9.1/lib -Wl,-rpath,/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/lib64 -L/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/lib64 -L/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/lib64/stubs -lscalapack -lkokkoskernels -lkokkoscontainers -lkokkoscore -lkokkossimd -lparmetis -lmetis -lexoIIv2for32 -lexodus -lmuparser -lnetcdf -lpnetcdf -lhdf5_hl -lhdf5 -ltriangle -lz -lceed -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lX11 -lstdc++ -lquadmath
-----------------------------------------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment