Created
May 16, 2024 21:08
-
-
Save bishtgautam/aa7e213bf977653778b6ce5dae5cbc51 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
srun -G4 -N 1 -n 4 ../../rdycore ex2b_ic_file.yaml -use_gpu_aware_mpi 0 -log_view -log_view_gpu_time -ceed /gpu/cuda -dm_vec_type cuda | |
DETAIL: ========================================================== | |
DETAIL: RDycore (input read from ex2b_ic_file.yaml) | |
DETAIL: ========================================================== | |
DETAIL: Physics: | |
DETAIL: Flow: | |
DETAIL: Bed friction: disabled | |
DETAIL: Sediment model: disabled | |
DETAIL: Salinity model: disabled | |
DETAIL: Numerics: | |
DETAIL: Spatial discretization: finite volume (FV) | |
DETAIL: Temporal discretization: forward euler | |
DETAIL: Riemann solver: roe | |
DETAIL: Time: | |
DETAIL: Final time: 0.005 hours | |
DETAIL: Logging: | |
DETAIL: Primary log file: <stdout> | |
DETAIL: Checkpoint: | |
DETAIL: (disabled) | |
DETAIL: Restart: | |
DETAIL: (disabled) | |
DETAIL: Advancing from t = 0. to 0.005... | |
DETAIL: Step 0: writing XDMF HDF5 output at t = 0. hr to output/ex2b_ic_file-0.h5 | |
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1 | |
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page. | |
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1 | |
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page. | |
DETAIL: Step 0: writing XDMF XMF output at t = 0. hr to output/ex2b_ic_file-0000.xmf | |
DETAIL: Step 100: writing XDMF HDF5 output at t = 0.0005 hr to output/ex2b_ic_file-100.h5 | |
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1 | |
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page. | |
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1 | |
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page. | |
DETAIL: Step 100: writing XDMF XMF output at t = 0.0005 hr to output/ex2b_ic_file-0100.xmf | |
DETAIL: Step 200: writing XDMF HDF5 output at t = 0.001 hr to output/ex2b_ic_file-200.h5 | |
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1 | |
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page. | |
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1 | |
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page. | |
DETAIL: Step 200: writing XDMF XMF output at t = 0.001 hr to output/ex2b_ic_file-0200.xmf | |
DETAIL: Step 300: writing XDMF HDF5 output at t = 0.0015 hr to output/ex2b_ic_file-300.h5 | |
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1 | |
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page. | |
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1 | |
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page. | |
DETAIL: Step 300: writing XDMF XMF output at t = 0.0015 hr to output/ex2b_ic_file-0300.xmf | |
DETAIL: Step 400: writing XDMF HDF5 output at t = 0.002 hr to output/ex2b_ic_file-400.h5 | |
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1 | |
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page. | |
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1 | |
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page. | |
DETAIL: Step 400: writing XDMF XMF output at t = 0.002 hr to output/ex2b_ic_file-0400.xmf | |
DETAIL: Step 500: writing XDMF HDF5 output at t = 0.0025 hr to output/ex2b_ic_file-500.h5 | |
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1 | |
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page. | |
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1 | |
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page. | |
DETAIL: Step 500: writing XDMF XMF output at t = 0.0025 hr to output/ex2b_ic_file-0500.xmf | |
DETAIL: Step 600: writing XDMF HDF5 output at t = 0.003 hr to output/ex2b_ic_file-600.h5 | |
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1 | |
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page. | |
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1 | |
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page. | |
DETAIL: Step 600: writing XDMF XMF output at t = 0.003 hr to output/ex2b_ic_file-0600.xmf | |
DETAIL: Step 700: writing XDMF HDF5 output at t = 0.0035 hr to output/ex2b_ic_file-700.h5 | |
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1 | |
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page. | |
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1 | |
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page. | |
DETAIL: Step 700: writing XDMF XMF output at t = 0.0035 hr to output/ex2b_ic_file-0700.xmf | |
DETAIL: Step 800: writing XDMF HDF5 output at t = 0.004 hr to output/ex2b_ic_file-800.h5 | |
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1 | |
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page. | |
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1 | |
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page. | |
DETAIL: Step 800: writing XDMF XMF output at t = 0.004 hr to output/ex2b_ic_file-0800.xmf | |
DETAIL: Step 900: writing XDMF HDF5 output at t = 0.0045 hr to output/ex2b_ic_file-900.h5 | |
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1 | |
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page. | |
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1 | |
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page. | |
DETAIL: Step 900: writing XDMF XMF output at t = 0.0045 hr to output/ex2b_ic_file-0900.xmf | |
DETAIL: Step 1000: writing XDMF HDF5 output at t = 0.005 hr to output/ex2b_ic_file-1000.h5 | |
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1 | |
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page. | |
MPIIO WARNING: DVS stripe width of 24 was requested but DVS set it to 1 | |
See MPICH_MPIIO_DVS_MAXNODES in the intro_mpi man page. | |
DETAIL: Step 1000: writing XDMF XMF output at t = 0.005 hr to output/ex2b_ic_file-1000.xmf | |
**************************************************************************************************************************************************************** | |
*** WIDEN YOUR WINDOW TO 160 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** | |
**************************************************************************************************************************************************************** | |
------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------ | |
########################################################## | |
# # | |
# WARNING!!! # | |
# # | |
# This code was run with -log_view_gpu_time # | |
# This provides accurate timing within the GPU kernels # | |
# but can slow down the entire computation by a # | |
# measurable amount. For fastest runs we recommend # | |
# not using this option. # | |
# # | |
########################################################## | |
/global/cfs/cdirs/m4267/gbisht/rdycore/build-pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/driver/tests/swe_roe/../../rdycore on a pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5 named nid001008 with 4 processors, by gbisht Thu May 16 14:07:17 2024 | |
Using Petsc Development GIT revision: v3.20.2-300-gfc2888174f5 GIT Date: 2023-12-13 19:44:14 +0000 | |
Max Max/Min Avg Total | |
Time (sec): 7.575e+00 1.000 7.575e+00 | |
Objects: 0.000e+00 0.000 0.000e+00 | |
Flops: 9.248e+04 1.108 8.796e+04 3.518e+05 | |
Flops/sec: 1.221e+04 1.108 1.161e+04 4.645e+04 | |
MPI Msg Count: 4.370e+03 2.009 3.257e+03 1.303e+04 | |
MPI Msg Len (bytes): 4.296e+05 1.452 1.089e+02 1.419e+06 | |
MPI Reductions: 3.920e+02 1.000 | |
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) | |
e.g., VecAXPY() for real vectors of length N --> 2N flops | |
and VecAXPY() for complex vectors of length N --> 8N flops | |
Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- | |
Avg %Total Avg %Total Count %Total Avg %Total Count %Total | |
0: Main Stage: 3.7435e+00 49.4% 9.8310e+03 2.8% 9.940e+02 7.6% 1.629e+02 11.4% 2.640e+02 67.3% | |
2: RDyAdvance solve: 3.2185e+00 42.5% 3.4200e+05 97.2% 1.203e+04 92.4% 1.044e+02 88.6% 1.100e+02 28.1% | |
------------------------------------------------------------------------------------------------------------------------ | |
See the 'Profiling' chapter of the users' manual for details on interpreting output. | |
Phase summary info: | |
Count: number of times phase was executed | |
Time and Flop: Max - maximum over all processors | |
Ratio - ratio of maximum to minimum over all processors | |
Mess: number of messages sent | |
AvgLen: average message length (bytes) | |
Reduct: number of global reductions | |
Global: entire computation | |
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). | |
%T - percent time in this phase %F - percent flop in this phase | |
%M - percent messages in this phase %L - percent message lengths in this phase | |
%R - percent reductions in this phase | |
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) | |
GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors) | |
CpuToGpu Count: total number of CPU to GPU copies per processor | |
CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor) | |
GpuToCpu Count: total number of GPU to CPU copies per processor | |
GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor) | |
GPU %F: percent flops on GPU in this event | |
------------------------------------------------------------------------------------------------------------------------ | |
Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU | |
Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F | |
--------------------------------------------------------------------------------------------------------------------------------------------------------------- | |
--- Event Stage 0: Main Stage | |
PetscBarrier 1 1.0 6.3476e-04 112.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
BuildTwoSided 35 1.0 1.3420e+00 1729.8 0.00e+00 0.0 1.6e+02 1.1e+01 3.5e+01 4 0 1 0 9 9 0 16 1 13 0 0 0 0.00e+00 0 0.00e+00 0 | |
BuildTwoSidedF 3 1.0 7.4845e-05 1.3 0.00e+00 0.0 8.1e+01 7.1e+01 3.0e+00 0 0 1 0 1 0 0 8 4 1 0 0 0 0.00e+00 0 0.00e+00 0 | |
DMPlexCrFromFile 1 1.0 1.3262e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 2 0 0 0 0 2 0 0 0 0.00e+00 0 0.00e+00 0 | |
DMPlexCrFromOpts 1 1.0 1.3312e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 2 0 0 0 0 2 0 0 0 0.00e+00 0 0.00e+00 0 | |
Mesh Partition 2 1.0 1.9862e-03 1.2 0.00e+00 0.0 9.3e+01 1.7e+02 2.5e+01 0 0 1 1 6 0 0 9 10 9 0 0 0 0.00e+00 0 0.00e+00 0 | |
Mesh Migration 2 1.0 1.3525e+00 1.0 0.00e+00 0.0 3.0e+02 1.8e+02 6.7e+01 18 0 2 4 17 36 0 30 33 25 0 0 0 0.00e+00 0 0.00e+00 0 | |
DMPlexPartSelf 1 1.0 7.3823e-05 13.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
DMPlexPartLblInv 2 1.0 2.1545e-04 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 2 0 0 0 2 2 0 0 0 0.00e+00 0 0.00e+00 0 | |
DMPlexPartLblSF 2 1.0 1.4288e-04 1.2 0.00e+00 0.0 2.4e+01 1.1e+02 2.0e+00 0 0 0 0 1 0 0 2 2 1 0 0 0 0.00e+00 0 0.00e+00 0 | |
DMPlexPartStrtSF 2 1.0 1.0393e-04 1.1 0.00e+00 0.0 1.2e+01 4.3e+02 0.0e+00 0 0 0 0 0 0 0 1 3 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
DMPlexPointSF 2 1.0 2.0609e-04 1.1 0.00e+00 0.0 1.5e+01 6.3e+02 0.0e+00 0 0 0 1 0 0 0 2 6 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
DMPlexInterp 17 1.0 8.8179e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
DMPlexDistribute 1 1.0 1.3535e+00 1.0 0.00e+00 0.0 9.0e+01 4.8e+02 4.2e+01 18 0 1 3 11 36 0 9 26 16 0 0 0 0.00e+00 0 0.00e+00 0 | |
DMPlexDistCones 2 1.0 1.7515e-04 1.0 0.00e+00 0.0 6.4e+01 2.9e+02 4.0e+00 0 0 0 1 1 0 0 6 12 2 0 0 0 0.00e+00 0 0.00e+00 0 | |
DMPlexDistLabels 2 1.0 3.4997e-04 1.0 0.00e+00 0.0 1.1e+02 1.5e+02 3.4e+01 0 0 1 1 9 0 0 11 10 13 0 0 0 0.00e+00 0 0.00e+00 0 | |
DMPlexDistOvrlp 1 1.0 1.3654e-03 1.0 0.00e+00 0.0 3.1e+02 1.1e+02 5.6e+01 0 0 2 3 14 0 0 31 22 21 0 0 0 0.00e+00 0 0.00e+00 0 | |
DMPlexDistField 3 1.0 1.3515e+00 1.0 0.00e+00 0.0 8.7e+01 1.6e+02 8.0e+00 18 0 1 1 2 36 0 9 9 3 0 0 0 0.00e+00 0 0.00e+00 0 | |
DMPlexGToNBegin 4 1.0 8.5284e-05 1.5 0.00e+00 0.0 2.4e+01 1.3e+02 2.0e+00 0 0 0 0 1 0 0 2 2 1 0 0 0 0.00e+00 0 0.00e+00 0 | |
DMPlexGToNEnd 4 1.0 1.7975e-05 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
DMPlexNToGBegin 3 1.0 3.2183e-04 3.7 0.00e+00 0.0 2.1e+01 1.4e+02 2.0e+00 0 0 0 0 1 0 0 2 2 1 0 0 0 0.00e+00 0 0.00e+00 0 | |
DMPlexNToGEnd 3 1.0 3.2636e-04 52.4 7.70e+01 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 1 0 0 0.00e+00 0 0.00e+00 0 | |
DMPlexStratify 22 1.0 4.8805e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 1 0 0 0 0 2 0 0 0 0.00e+00 0 0.00e+00 0 | |
DMPlexSymmetrize 22 1.0 2.9021e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
SFSetGraph 46 1.0 3.5877e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
SFSetUp 32 1.0 1.3423e+00 1312.7 0.00e+00 0.0 2.7e+02 1.1e+02 3.2e+01 4 0 2 2 8 9 0 27 19 12 0 0 0 0.00e+00 0 0.00e+00 0 | |
SFBcastBegin 104 1.0 1.5670e-02 1.3 0.00e+00 0.0 5.5e+02 1.7e+02 0.0e+00 0 0 4 7 0 0 0 55 59 0 0 0 2 2.08e-04 6 7.28e-04 0 | |
SFBcastEnd 104 1.0 3.9223e-03 8.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 6 7.28e-04 0 0.00e+00 0 | |
SFReduceBegin 16 1.0 1.3290e-04 1.4 0.00e+00 0.0 6.0e+01 4.5e+02 0.0e+00 0 0 0 2 0 0 0 6 16 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
SFReduceEnd 16 1.0 5.2206e-04 2.0 7.70e+01 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
SFFetchOpBegin 1 1.0 5.0690e-06 4.7 0.00e+00 0.0 3.0e+00 7.2e+01 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
SFFetchOpEnd 1 1.0 4.1470e-05 4.9 0.00e+00 0.0 3.0e+00 7.2e+01 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
SFCreateEmbed 8 1.0 1.1387e-04 1.3 0.00e+00 0.0 2.4e+01 8.0e+01 0.0e+00 0 0 0 0 0 0 0 2 1 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
SFDistSection 17 1.0 6.9383e-04 1.1 0.00e+00 0.0 2.2e+02 1.8e+02 2.1e+01 0 0 2 3 5 0 0 23 25 8 0 0 0 0.00e+00 0 0.00e+00 0 | |
SFSectionSF 13 1.0 1.3416e+00 4809.8 0.00e+00 0.0 1.2e+02 1.1e+02 1.3e+01 4 0 1 1 3 9 0 12 8 5 0 0 0 0.00e+00 0 0.00e+00 0 | |
SFPack 121 1.0 2.7338e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 2 2.08e-04 0 0.00e+00 0 | |
SFUnpack 122 1.0 6.6949e-05 3.8 7.70e+01 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 3 0 0 0.00e+00 0 0.00e+00 0 | |
MatAssemblyBegin 24 1.0 2.4680e-06 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
MatAssemblyEnd 24 1.0 2.1028e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
MatSetPreallCOO 16 1.0 6.3122e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
MatSetValuesCOO 16 1.0 2.8285e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
VecCopy 4 1.0 6.5717e-05 15.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 1 2.64e-04 0 0.00e+00 0 | |
VecSet 15 1.0 4.7812e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
VecAssemblyBegin 3 1.0 9.1377e-05 1.2 0.00e+00 0.0 8.1e+01 7.1e+01 3.0e+00 0 0 1 0 1 0 0 8 4 1 0 0 0 0.00e+00 0 0.00e+00 0 | |
VecAssemblyEnd 3 1.0 4.5498e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
VecLoad 2 1.0 4.8150e-03 1.0 0.00e+00 0.0 1.2e+01 2.0e+00 1.4e+01 0 0 0 0 4 0 0 1 0 5 0 0 0 0.00e+00 0 0.00e+00 0 | |
VecCUDACopyTo 3 1.0 3.3434e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 3 6.16e-04 0 0.00e+00 0 | |
VecCUDACopyFrom 4 1.0 4.9968e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 4 1.33e-03 0 | |
DualSpaceSetUp 4 1.0 3.5681e-03 1.0 7.20e+01 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 3 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
FESetUp 4 1.0 4.3868e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
cuBLAS Init 1 1.0 6.1798e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 8 0 0 0 0 16 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
DCtxCreate 2 1.0 1.9779e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
DCtxDestroy 2 1.0 3.9696e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
DCtxSetUp 2 1.0 8.8883e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
DCtxSetDevice 2 1.0 1.2130e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
DCtxSync 28 1.6 3.0938e-04 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
--- Event Stage 2: RDyAdvance solve | |
DMPlexGToNBegin 11 1.0 2.8097e-04 1.3 0.00e+00 0.0 3.3e+01 2.6e+02 0.0e+00 0 0 0 1 0 0 0 0 1 0 0 0 0 0.00e+00 10 2.64e-03 0 | |
DMPlexGToNEnd 11 1.0 1.2229e-04 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
SFBcastBegin 1011 1.0 4.1643e-02 1.1 0.00e+00 0.0 6.0e+03 1.0e+02 0.0e+00 1 0 46 45 0 1 0 50 50 0 0 0 0 0.00e+00 2000 3.12e-01 0 | |
SFBcastEnd 1011 1.0 8.3698e-02 13.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 2000 3.12e-01 0 0.00e+00 0 | |
SFReduceBegin 1000 1.0 3.1020e-02 1.2 0.00e+00 0.0 6.0e+03 1.0e+02 0.0e+00 0 0 46 44 0 1 0 50 50 0 0 0 0 0.00e+00 2000 3.12e-01 0 | |
SFReduceEnd 1000 1.0 7.5066e-02 6.2 2.40e+04 1.6 0.0e+00 0.0e+00 0.0e+00 0 22 0 0 0 1 23 0 0 0 1 0 2000 3.12e-01 0 0.00e+00 100 | |
SFPack 2011 1.0 1.0348e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
SFUnpack 2011 1.0 6.3307e-03 1.0 2.40e+04 1.6 0.0e+00 0.0e+00 0.0e+00 0 22 0 0 0 0 23 0 0 0 12 0 0 0.00e+00 0 0.00e+00 100 | |
VecView 88 1.0 1.8562e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 24 0 0 0 0 58 0 0 0 0 0 0 0 0.00e+00 1 2.64e-04 0 | |
VecCopy 3000 1.0 6.4494e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
VecSet 1011 1.0 8.5404e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
VecAYPX 1000 1.0 2.6781e-02 1.0 6.60e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 75 0 0 0 1 77 0 0 0 10 31 0 0.00e+00 0 0.00e+00 100 | |
VecCUDACopyTo 1 1.0 7.2540e-06 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 1 2.64e-04 0 0.00e+00 0 | |
VecCUDACopyFrom 13 1.3 1.8317e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 11 2.90e-03 0 | |
DCtxSync 5027 1.0 1.1488e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
CeedOperatorApp 2000 1.0 4.6278e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 6 0 0 0 0 14 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 | |
TSStep 1000 1.0 7.5857e-01 1.1 9.00e+04 1.1 1.2e+04 1.0e+02 0.0e+00 10 97 92 88 0 23 100 100 99 0 0 1 4001 6.24e-01 4000 6.24e-01 100 | |
TSFunctionEval 1000 1.0 7.1254e-01 1.1 2.40e+04 1.6 1.2e+04 1.0e+02 0.0e+00 9 22 92 88 0 21 23 100 99 0 0 0 4001 6.24e-01 4000 6.24e-01 100 | |
--------------------------------------------------------------------------------------------------------------------------------------------------------------- | |
Object Type Creations Destructions. Reports information only for process 0. | |
--- Event Stage 0: Main Stage | |
Container 49 49 | |
Distributed Mesh 61 61 | |
DM Label 154 154 | |
Quadrature 110 110 | |
Index Set 554 554 | |
IS L to G Mapping 148 148 | |
Section 245 245 | |
Star Forest Graph 162 162 | |
Discrete System 91 91 | |
Weak Form 91 91 | |
GraphPartitioner 23 23 | |
Matrix 94 94 | |
Vector 56 57 | |
Linear Space 8 8 | |
Dual Space 28 28 | |
FE Space 4 4 | |
PetscDeviceContext 2 0 | |
Viewer 5 5 | |
TSAdapt 1 1 | |
TS 1 1 | |
DMTS 1 1 | |
SNES 1 1 | |
DMSNES 3 3 | |
SNESLineSearch 1 1 | |
Krylov Solver 1 1 | |
Preconditioner 1 1 | |
Viewer 1 0 | |
--- Event Stage 1: PCMPI | |
--- Event Stage 2: RDyAdvance solve | |
Vector 1045 1044 | |
Viewer 22 22 | |
======================================================================================================================== | |
Average time to get PetscTime(): 3.71e-08 | |
Average time for MPI_Barrier(): 5.0938e-06 | |
Average time for zero size MPI_Send(): 2.3045e-06 | |
#PETSc Option Table entries: | |
-ceed /gpu/cuda # (source: command line) | |
-dist_dm_distribute_save_sf true # (source: code) | |
-dm_plex_filename DamBreak_grid5x10.exo # (source: code) | |
-dm_vec_type cuda # (source: command line) | |
-log_view # (source: command line) | |
-log_view_gpu_time # (source: command line) | |
-use_gpu_aware_mpi 0 # (source: command line) | |
#End of PETSc Option Table entries | |
Compiled without FORTRAN kernels | |
Compiled with 64-bit PetscInt | |
Compiled with full precision matrices (default) | |
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8 | |
Configure options: --with-cc=cc --with-cxx=CC --with-fc=ftn --CFLAGS=" -g " --CXXFLAGS=" -g " --CUDAFLAGS=" -g -Xcompiler -rdynamic " --with-fortran-bindings=1 --COPTFLAGS=" -O" --CXXOPTFLAGS=" -O" --FOPTFLAGS=" -O" --with-mpiexec="srun -G4" --with-batch=0 --download-kokkos --download-kokkos-kernels --download-kokkos-cmake-arguments=-DKokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC=OFF --with-kokkos-kernels-tpl=0 --with-make-np=8 --with-64-bit-indices=1 --with-netcdf-dir=/opt/cray/pe/netcdf-hdf5parallel/4.9.0.3/gnu/9.1 --with-pnetcdf-dir=/opt/cray/pe/parallel-netcdf/1.12.3.3/gnu/9.1 --download-hdf5=1 --with-cuda-dir=/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7 --with-cuda-arch=80 --download-parmetis --download-metis --download-muparser --download-zlib --download-scalapack --download-sowing --download-triangle --download-exodusii --download-libceed --download-cgns-commit=HEAD --with-debugging=0 PETSC_ARCH=pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5 | |
----------------------------------------- | |
Libraries compiled on 2024-05-16 17:06:00 on login11 | |
Machine characteristics: Linux-5.14.21-150400.24.81_12.0.87-cray_shasta_c-x86_64-with-glibc2.31 | |
Using PETSc directory: /global/cfs/cdirs/m4267/petsc/petsc_main | |
Using PETSc arch: pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5 | |
----------------------------------------- | |
Using C compiler: cc -g -fPIC -O | |
Using Fortran compiler: ftn -fPIC -Wall -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -O | |
----------------------------------------- | |
Using include paths: -I/global/cfs/cdirs/m4267/petsc/petsc_main/include -I/global/cfs/cdirs/m4267/petsc/petsc_main/pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/include -I/opt/cray/pe/netcdf-hdf5parallel/4.9.0.3/gnu/9.1/include -I/opt/cray/pe/parallel-netcdf/1.12.3.3/gnu/9.1/include -I/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/include -I/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/math_libs/11.7/include | |
----------------------------------------- | |
Using C linker: cc | |
Using Fortran linker: ftn | |
Using libraries: -Wl,-rpath,/global/cfs/cdirs/m4267/petsc/petsc_main/pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/lib -L/global/cfs/cdirs/m4267/petsc/petsc_main/pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/lib -lpetsc -Wl,-rpath,/global/cfs/cdirs/m4267/petsc/petsc_main/pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/lib -L/global/cfs/cdirs/m4267/petsc/petsc_main/pm-gpu-hdf5_1_14_3-opt-64bit-gcc-11-2-0-fc2888174f5/lib -Wl,-rpath,/opt/cray/pe/netcdf-hdf5parallel/4.9.0.3/gnu/9.1/lib -L/opt/cray/pe/netcdf-hdf5parallel/4.9.0.3/gnu/9.1/lib -Wl,-rpath,/opt/cray/pe/parallel-netcdf/1.12.3.3/gnu/9.1/lib -L/opt/cray/pe/parallel-netcdf/1.12.3.3/gnu/9.1/lib -Wl,-rpath,/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/lib64 -L/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/lib64 -L/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/lib64/stubs -lscalapack -lkokkoskernels -lkokkoscontainers -lkokkoscore -lkokkossimd -lparmetis -lmetis -lexoIIv2for32 -lexodus -lmuparser -lnetcdf -lpnetcdf -lhdf5_hl -lhdf5 -ltriangle -lz -lceed -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lX11 -lstdc++ -lquadmath | |
----------------------------------------- |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment