Skip to content

Instantly share code, notes, and snippets.

@goyalankit
Last active August 29, 2015 14:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save goyalankit/86b67c9ab3f092483964 to your computer and use it in GitHub Desktop.
Save goyalankit/86b67c9ab3f092483964 to your computer and use it in GitHub Desktop.
Loop in function ljForce.omp_fn.1 in ljForce.c:191 (96.79% of the total runtime)
===============================================================================
ratio to total instrns % 0..........25..........50..........75..........100
- floating point 68.0 **********************************
- data accesses 29.9 ***************
* GFLOPS (% max) 12.5 ******
- packed 0.0
- scalar 12.5 ******
-------------------------------------------------------------------------------
performance assessment LCPI good.......okay........fair........poor........bad
* overall 0.68 >>>>>>>>>>>>>>
* data accesses 1.07 >>>>>>>>>>>>>>>>>>>>>
- L1d hits 1.05 >>>>>>>>>>>>>>>>>>>>>
- L2d hits 0.01
- L3d hits 0.00
- LLC misses 0.02
* instruction accesses 0.01
- L1i hits 0.00
- L2i hits 0.00
- L2i misses 0.01
* data TLB 0.00
* instruction TLB 0.00
* branch instructions 0.18 >>>>
- correctly predicted 0.08 >>
- mispredicted 0.10 >>
* floating-point instr 1.32 >>>>>>>>>>>>>>>>>>>>>>>>>>
- slow FP instr 0.14 >>>
- fast FP instr 1.19 >>>>>>>>>>>>>>>>>>>>>>>>
===============================================================================
c557-301.stampede(6)$ perfexpert_analyzer -t 0.05 -i /work/02681/ankitg/downloads/codedesign-benchmarks/lulesh/.perfexpert-temp.MPOYpp/1/database/experiment.xml
[analyzer] WARNING: unknown sorting order (none), hotspots are not sorted
-------------------------------------------------------------------------------
Total running time for lulesh2.0 is 39.15 seconds between all 16 cores
The wall-clock time for lulesh2.0 is approximately 2.45 seconds
Module lulesh2.0 takes 96.46% of the total runtime
Module libm-2.12.so takes 3.37% of the total runtime
Module libgomp.so.1.0.0 takes 0.00% of the total runtime
Module libc-2.12.so takes 0.00% of the total runtime
-------------------------------------------------------------------------------
Function CalcElemFBHourglassForce in line 705 of lulesh.cc (6.38% of the total runtime)
===============================================================================
The performance of this code section is good!
-------------------------------------------------------------------------------
ratio to total instrns % 0..........25..........50..........75..........100
- floating point 46.2 ***********************
- data accesses 49.5 *************************
* GFLOPS (% max) 13.1 *******
- packed 0.0
- scalar 13.1 *******
-------------------------------------------------------------------------------
performance assessment LCPI good.......okay........fair........poor........bad
* overall 0.44 >>>>>>>>>
* data accesses 1.77 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- L1d hits 1.73 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- L2d hits 0.02
- L3d hits 0.01
- LLC misses 0.01
* instruction accesses 0.00
- L1i hits 0.00
- L2i hits 0.00
- L2i misses 0.00
* data TLB 0.00
* instruction TLB 0.00
* branch instructions 0.00
- correctly predicted 0.00
- mispredicted 0.00
* floating-point instr 0.85 >>>>>>>>>>>>>>>>>
- slow FP instr 0.05 >
- fast FP instr 0.80 >>>>>>>>>>>>>>>>
===============================================================================
Loop in function _ZL28CalcFBHourglassForceForElemsR6DomainPdS1_S1_S1_S1_S1_S1_dii.omp_fn.6 in lulesh.cc:822 (9.25% of the total runtime)
===============================================================================
ratio to total instrns % 0..........25..........50..........75..........100
- floating point 100.0 **************************************************
- data accesses 38.6 *******************
* GFLOPS (% max) 25.2 *************
- packed 12.5 ******
- scalar 12.7 ******
-------------------------------------------------------------------------------
performance assessment LCPI good.......okay........fair........poor........bad
* overall 0.51 >>>>>>>>>>
* data accesses 1.50 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- L1d hits 1.35 >>>>>>>>>>>>>>>>>>>>>>>>>>>
- L2d hits 0.12 >>
- L3d hits 0.02
- LLC misses 0.01
* instruction accesses 0.02
- L1i hits 0.00
- L2i hits 0.00
- L2i misses 0.02
* data TLB 0.00
* instruction TLB 0.00
* branch instructions 0.01
- correctly predicted 0.01
- mispredicted 0.00
* floating-point instr 1.39 >>>>>>>>>>>>>>>>>>>>>>>>>>>>
- slow FP instr 0.03 >
- fast FP instr 1.36 >>>>>>>>>>>>>>>>>>>>>>>>>>>
===============================================================================
Loop in function _ZL18CalcEnergyForElemsPdS_S_S_S_S_S_S_S_S_S_S_S_dddddS_S_ddiPi.omp_fn.19 in lulesh.cc:2106 (5.13% of the total runtime)
===============================================================================
ratio to total instrns % 0..........25..........50..........75..........100
- floating point 54.2 ***************************
- data accesses 49.7 *************************
* GFLOPS (% max) 11.6 ******
- packed 0.0
- scalar 11.6 ******
-------------------------------------------------------------------------------
performance assessment LCPI good.......okay........fair........poor........bad
* overall 0.58 >>>>>>>>>>>>
* data accesses 2.08 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- L1d hits 1.74 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- L2d hits 0.29 >>>>>>
- L3d hits 0.05 >
- LLC misses 0.00
* instruction accesses 0.03 >
- L1i hits 0.00
- L2i hits 0.00
- L2i misses 0.03 >
* data TLB 0.00
* instruction TLB 0.00
* branch instructions 0.09 >>
- correctly predicted 0.08 >>
- mispredicted 0.01
* floating-point instr 1.66 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- slow FP instr 0.71 >>>>>>>>>>>>>>
- fast FP instr 0.94 >>>>>>>>>>>>>>>>>>>
===============================================================================
Loop in function _ZL20CalcPressureForElemsPdS_S_S_S_S_dddiPi.omp_fn.24 in lulesh.cc:2060 (5.31% of the total runtime)
===============================================================================
The performance of this code section is good!
-------------------------------------------------------------------------------
ratio to total instrns % 0..........25..........50..........75..........100
- floating point 44.4 **********************
- data accesses 28.9 **************
* GFLOPS (% max) 10.9 *****
- packed 0.0
- scalar 10.9 *****
-------------------------------------------------------------------------------
performance assessment LCPI good.......okay........fair........poor........bad
* overall 0.51 >>>>>>>>>>
* data accesses 1.43 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- L1d hits 1.01 >>>>>>>>>>>>>>>>>>>>
- L2d hits 0.33 >>>>>>>
- L3d hits 0.09 >>
- LLC misses 0.00
* instruction accesses 0.01
- L1i hits 0.00
- L2i hits 0.00
- L2i misses 0.01
* data TLB 0.00
* instruction TLB 0.00
* branch instructions 0.22 >>>>
- correctly predicted 0.20 >>>>
- mispredicted 0.02
* floating-point instr 0.77 >>>>>>>>>>>>>>>
- slow FP instr 0.00
- fast FP instr 0.77 >>>>>>>>>>>>>>>
===============================================================================
Loop in function egb..1 in ipo_out.c:0 (82.68% of the total runtime)
===============================================================================
ratio to total instrns % 0..........25..........50..........75..........100
- floating point 100.0 **************************************************
- data accesses 34.9 *****************
* GFLOPS (% max) 12.9 ******
- packed 0.3
- scalar 12.6 ******
-------------------------------------------------------------------------------
performance assessment LCPI good.......okay........fair........poor........bad
* overall 1.14 >>>>>>>>>>>>>>>>>>>>>>>
* data accesses 1.24 >>>>>>>>>>>>>>>>>>>>>>>>>
- L1d hits 1.22 >>>>>>>>>>>>>>>>>>>>>>>>
- L2d hits 0.01
- L3d hits 0.01
- LLC misses 0.00
* instruction accesses 0.02
- L1i hits 0.00
- L2i hits 0.00
- L2i misses 0.02
* data TLB 0.00
* instruction TLB 0.00
* branch instructions 0.11 >>
- correctly predicted 0.10 >>
- mispredicted 0.01
* floating-point instr 2.82 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+
- slow FP instr 0.78 >>>>>>>>>>>>>>>>
- fast FP instr 2.04 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
===============================================================================
Loop in function nbond in ipo_out.c:0 (9.43% of the total runtime)
===============================================================================
ratio to total instrns % 0..........25..........50..........75..........100
- floating point 64.0 ********************************
- data accesses 29.2 ***************
* GFLOPS (% max) 12.2 ******
- packed 0.0
- scalar 12.2 ******
-------------------------------------------------------------------------------
performance assessment LCPI good.......okay........fair........poor........bad
* overall 0.66 >>>>>>>>>>>>>
* data accesses 1.04 >>>>>>>>>>>>>>>>>>>>>
- L1d hits 1.02 >>>>>>>>>>>>>>>>>>>>
- L2d hits 0.02
- L3d hits 0.01
- LLC misses 0.00
* instruction accesses 0.02
- L1i hits 0.00
- L2i hits 0.00
- L2i misses 0.02
* data TLB 0.00
* instruction TLB 0.00
* branch instructions 0.08 >>
- correctly predicted 0.07 >
- mispredicted 0.01
* floating-point instr 1.65 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- slow FP instr 0.53 >>>>>>>>>>>
- fast FP instr 1.12 >>>>>>>>>>>>>>>>>>>>>>
===============================================================================
[analyzer] WARNING: unknown sorting order (none), hotspots are not sorted
-------------------------------------------------------------------------------
Total running time for nabmd is 241.39 seconds between all 16 cores
The wall-clock time for nabmd is approximately 15.09 seconds
Module nabmd takes 99.97% of the total runtime
Module libimf.so takes 0.02% of the total runtime
Module libiomp5.so takes 0.00% of the total runtime
Module libintlc.so.5 takes 0.00% of the total runtime
Module libpthread-2.12.so takes 0.00% of the total runtime
Module libc-2.12.so takes 0.00% of the total runtime
-------------------------------------------------------------------------------
Loop in function nblist in ipo_out.c:0 (5.76% of the total runtime)
===============================================================================
ratio to total instrns % 0..........25..........50..........75..........100
- floating point 100.0 **************************************************
- data accesses 15.6 ********
* GFLOPS (% max) 12.6 ******
- packed 0.0
- scalar 12.6 ******
-------------------------------------------------------------------------------
performance assessment LCPI good.......okay........fair........poor........bad
* overall 1.04 >>>>>>>>>>>>>>>>>>>>>
* data accesses 0.56 >>>>>>>>>>>
- L1d hits 0.55 >>>>>>>>>>>
- L2d hits 0.01
- L3d hits 0.00
- LLC misses 0.00
* instruction accesses 0.00
- L1i hits 0.00
- L2i hits 0.00
- L2i misses 0.00
* data TLB 0.00
* instruction TLB 0.00
* branch instructions 0.80 >>>>>>>>>>>>>>>>
- correctly predicted 0.22 >>>>
- mispredicted 0.59 >>>>>>>>>>>>
* floating-point instr 1.83 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- slow FP instr 0.00
- fast FP instr 1.83 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
===============================================================================
Loop in function egb..1 in ipo_out.c:0 (82.00% of the total runtime)
===============================================================================
ratio to total instrns % 0..........25..........50..........75..........100
- floating point 100.0 **************************************************
- data accesses 35.3 ******************
* GFLOPS (% max) 12.8 ******
- packed 0.3
- scalar 12.5 ******
-------------------------------------------------------------------------------
performance assessment LCPI good.......okay........fair........poor........bad
* overall 1.14 >>>>>>>>>>>>>>>>>>>>>>>
* data accesses 1.30 >>>>>>>>>>>>>>>>>>>>>>>>>>
- L1d hits 1.23 >>>>>>>>>>>>>>>>>>>>>>>>>
- L2d hits 0.06 >
- L3d hits 0.00
- LLC misses 0.00
* instruction accesses 0.01
- L1i hits 0.00
- L2i hits 0.00
- L2i misses 0.01
* data TLB 0.00
* instruction TLB 0.00
* branch instructions 0.11 >>
- correctly predicted 0.10 >>
- mispredicted 0.01
* floating-point instr 2.79 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+
- slow FP instr 0.78 >>>>>>>>>>>>>>>>
- fast FP instr 2.01 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
===============================================================================
Loop in function nbond in ipo_out.c:0 (9.07% of the total runtime)
===============================================================================
ratio to total instrns % 0..........25..........50..........75..........100
- floating point 64.8 ********************************
- data accesses 28.6 **************
* GFLOPS (% max) 12.5 ******
- packed 0.0
- scalar 12.5 ******
-------------------------------------------------------------------------------
performance assessment LCPI good.......okay........fair........poor........bad
* overall 0.65 >>>>>>>>>>>>>
* data accesses 1.05 >>>>>>>>>>>>>>>>>>>>>
- L1d hits 1.00 >>>>>>>>>>>>>>>>>>>>
- L2d hits 0.04 >
- L3d hits 0.01
- LLC misses 0.00
* instruction accesses 0.01
- L1i hits 0.00
- L2i hits 0.00
- L2i misses 0.01
* data TLB 0.00
* instruction TLB 0.00
* branch instructions 0.07 >
- correctly predicted 0.07 >
- mispredicted 0.00
* floating-point instr 1.66 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- slow FP instr 0.54 >>>>>>>>>>>
- fast FP instr 1.13 >>>>>>>>>>>>>>>>>>>>>>>
===============================================================================
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment