Last active
August 29, 2015 14:21
-
-
Save goyalankit/86b67c9ab3f092483964 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loop in function ljForce.omp_fn.1 in ljForce.c:191 (96.79% of the total runtime) | |
=============================================================================== | |
ratio to total instrns % 0..........25..........50..........75..........100 | |
- floating point 68.0 ********************************** | |
- data accesses 29.9 *************** | |
* GFLOPS (% max) 12.5 ****** | |
- packed 0.0 | |
- scalar 12.5 ****** | |
------------------------------------------------------------------------------- | |
performance assessment LCPI good.......okay........fair........poor........bad | |
* overall 0.68 >>>>>>>>>>>>>> | |
* data accesses 1.07 >>>>>>>>>>>>>>>>>>>>> | |
- L1d hits 1.05 >>>>>>>>>>>>>>>>>>>>> | |
- L2d hits 0.01 | |
- L3d hits 0.00 | |
- LLC misses 0.02 | |
* instruction accesses 0.01 | |
- L1i hits 0.00 | |
- L2i hits 0.00 | |
- L2i misses 0.01 | |
* data TLB 0.00 | |
* instruction TLB 0.00 | |
* branch instructions 0.18 >>>> | |
- correctly predicted 0.08 >> | |
- mispredicted 0.10 >> | |
* floating-point instr 1.32 >>>>>>>>>>>>>>>>>>>>>>>>>> | |
- slow FP instr 0.14 >>> | |
- fast FP instr 1.19 >>>>>>>>>>>>>>>>>>>>>>>> | |
=============================================================================== |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
c557-301.stampede(6)$ perfexpert_analyzer -t 0.05 -i /work/02681/ankitg/downloads/codedesign-benchmarks/lulesh/.perfexpert-temp.MPOYpp/1/database/experiment.xml | |
[analyzer] WARNING: unknown sorting order (none), hotspots are not sorted | |
------------------------------------------------------------------------------- | |
Total running time for lulesh2.0 is 39.15 seconds between all 16 cores | |
The wall-clock time for lulesh2.0 is approximately 2.45 seconds | |
Module lulesh2.0 takes 96.46% of the total runtime | |
Module libm-2.12.so takes 3.37% of the total runtime | |
Module libgomp.so.1.0.0 takes 0.00% of the total runtime | |
Module libc-2.12.so takes 0.00% of the total runtime | |
------------------------------------------------------------------------------- | |
Function CalcElemFBHourglassForce in line 705 of lulesh.cc (6.38% of the total runtime) | |
=============================================================================== | |
The performance of this code section is good! | |
------------------------------------------------------------------------------- | |
ratio to total instrns % 0..........25..........50..........75..........100 | |
- floating point 46.2 *********************** | |
- data accesses 49.5 ************************* | |
* GFLOPS (% max) 13.1 ******* | |
- packed 0.0 | |
- scalar 13.1 ******* | |
------------------------------------------------------------------------------- | |
performance assessment LCPI good.......okay........fair........poor........bad | |
* overall 0.44 >>>>>>>>> | |
* data accesses 1.77 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | |
- L1d hits 1.73 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | |
- L2d hits 0.02 | |
- L3d hits 0.01 | |
- LLC misses 0.01 | |
* instruction accesses 0.00 | |
- L1i hits 0.00 | |
- L2i hits 0.00 | |
- L2i misses 0.00 | |
* data TLB 0.00 | |
* instruction TLB 0.00 | |
* branch instructions 0.00 | |
- correctly predicted 0.00 | |
- mispredicted 0.00 | |
* floating-point instr 0.85 >>>>>>>>>>>>>>>>> | |
- slow FP instr 0.05 > | |
- fast FP instr 0.80 >>>>>>>>>>>>>>>> | |
=============================================================================== | |
Loop in function _ZL28CalcFBHourglassForceForElemsR6DomainPdS1_S1_S1_S1_S1_S1_dii.omp_fn.6 in lulesh.cc:822 (9.25% of the total runtime) | |
=============================================================================== | |
ratio to total instrns % 0..........25..........50..........75..........100 | |
- floating point 100.0 ************************************************** | |
- data accesses 38.6 ******************* | |
* GFLOPS (% max) 25.2 ************* | |
- packed 12.5 ****** | |
- scalar 12.7 ****** | |
------------------------------------------------------------------------------- | |
performance assessment LCPI good.......okay........fair........poor........bad | |
* overall 0.51 >>>>>>>>>> | |
* data accesses 1.50 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | |
- L1d hits 1.35 >>>>>>>>>>>>>>>>>>>>>>>>>>> | |
- L2d hits 0.12 >> | |
- L3d hits 0.02 | |
- LLC misses 0.01 | |
* instruction accesses 0.02 | |
- L1i hits 0.00 | |
- L2i hits 0.00 | |
- L2i misses 0.02 | |
* data TLB 0.00 | |
* instruction TLB 0.00 | |
* branch instructions 0.01 | |
- correctly predicted 0.01 | |
- mispredicted 0.00 | |
* floating-point instr 1.39 >>>>>>>>>>>>>>>>>>>>>>>>>>>> | |
- slow FP instr 0.03 > | |
- fast FP instr 1.36 >>>>>>>>>>>>>>>>>>>>>>>>>>> | |
=============================================================================== | |
Loop in function _ZL18CalcEnergyForElemsPdS_S_S_S_S_S_S_S_S_S_S_S_dddddS_S_ddiPi.omp_fn.19 in lulesh.cc:2106 (5.13% of the total runtime) | |
=============================================================================== | |
ratio to total instrns % 0..........25..........50..........75..........100 | |
- floating point 54.2 *************************** | |
- data accesses 49.7 ************************* | |
* GFLOPS (% max) 11.6 ****** | |
- packed 0.0 | |
- scalar 11.6 ****** | |
------------------------------------------------------------------------------- | |
performance assessment LCPI good.......okay........fair........poor........bad | |
* overall 0.58 >>>>>>>>>>>> | |
* data accesses 2.08 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | |
- L1d hits 1.74 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | |
- L2d hits 0.29 >>>>>> | |
- L3d hits 0.05 > | |
- LLC misses 0.00 | |
* instruction accesses 0.03 > | |
- L1i hits 0.00 | |
- L2i hits 0.00 | |
- L2i misses 0.03 > | |
* data TLB 0.00 | |
* instruction TLB 0.00 | |
* branch instructions 0.09 >> | |
- correctly predicted 0.08 >> | |
- mispredicted 0.01 | |
* floating-point instr 1.66 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | |
- slow FP instr 0.71 >>>>>>>>>>>>>> | |
- fast FP instr 0.94 >>>>>>>>>>>>>>>>>>> | |
=============================================================================== | |
Loop in function _ZL20CalcPressureForElemsPdS_S_S_S_S_dddiPi.omp_fn.24 in lulesh.cc:2060 (5.31% of the total runtime) | |
=============================================================================== | |
The performance of this code section is good! | |
------------------------------------------------------------------------------- | |
ratio to total instrns % 0..........25..........50..........75..........100 | |
- floating point 44.4 ********************** | |
- data accesses 28.9 ************** | |
* GFLOPS (% max) 10.9 ***** | |
- packed 0.0 | |
- scalar 10.9 ***** | |
------------------------------------------------------------------------------- | |
performance assessment LCPI good.......okay........fair........poor........bad | |
* overall 0.51 >>>>>>>>>> | |
* data accesses 1.43 >>>>>>>>>>>>>>>>>>>>>>>>>>>>> | |
- L1d hits 1.01 >>>>>>>>>>>>>>>>>>>> | |
- L2d hits 0.33 >>>>>>> | |
- L3d hits 0.09 >> | |
- LLC misses 0.00 | |
* instruction accesses 0.01 | |
- L1i hits 0.00 | |
- L2i hits 0.00 | |
- L2i misses 0.01 | |
* data TLB 0.00 | |
* instruction TLB 0.00 | |
* branch instructions 0.22 >>>> | |
- correctly predicted 0.20 >>>> | |
- mispredicted 0.02 | |
* floating-point instr 0.77 >>>>>>>>>>>>>>> | |
- slow FP instr 0.00 | |
- fast FP instr 0.77 >>>>>>>>>>>>>>> | |
=============================================================================== |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loop in function egb..1 in ipo_out.c:0 (82.68% of the total runtime) | |
=============================================================================== | |
ratio to total instrns % 0..........25..........50..........75..........100 | |
- floating point 100.0 ************************************************** | |
- data accesses 34.9 ***************** | |
* GFLOPS (% max) 12.9 ****** | |
- packed 0.3 | |
- scalar 12.6 ****** | |
------------------------------------------------------------------------------- | |
performance assessment LCPI good.......okay........fair........poor........bad | |
* overall 1.14 >>>>>>>>>>>>>>>>>>>>>>> | |
* data accesses 1.24 >>>>>>>>>>>>>>>>>>>>>>>>> | |
- L1d hits 1.22 >>>>>>>>>>>>>>>>>>>>>>>> | |
- L2d hits 0.01 | |
- L3d hits 0.01 | |
- LLC misses 0.00 | |
* instruction accesses 0.02 | |
- L1i hits 0.00 | |
- L2i hits 0.00 | |
- L2i misses 0.02 | |
* data TLB 0.00 | |
* instruction TLB 0.00 | |
* branch instructions 0.11 >> | |
- correctly predicted 0.10 >> | |
- mispredicted 0.01 | |
* floating-point instr 2.82 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+ | |
- slow FP instr 0.78 >>>>>>>>>>>>>>>> | |
- fast FP instr 2.04 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | |
=============================================================================== | |
Loop in function nbond in ipo_out.c:0 (9.43% of the total runtime) | |
=============================================================================== | |
ratio to total instrns % 0..........25..........50..........75..........100 | |
- floating point 64.0 ******************************** | |
- data accesses 29.2 *************** | |
* GFLOPS (% max) 12.2 ****** | |
- packed 0.0 | |
- scalar 12.2 ****** | |
------------------------------------------------------------------------------- | |
performance assessment LCPI good.......okay........fair........poor........bad | |
* overall 0.66 >>>>>>>>>>>>> | |
* data accesses 1.04 >>>>>>>>>>>>>>>>>>>>> | |
- L1d hits 1.02 >>>>>>>>>>>>>>>>>>>> | |
- L2d hits 0.02 | |
- L3d hits 0.01 | |
- LLC misses 0.00 | |
* instruction accesses 0.02 | |
- L1i hits 0.00 | |
- L2i hits 0.00 | |
- L2i misses 0.02 | |
* data TLB 0.00 | |
* instruction TLB 0.00 | |
* branch instructions 0.08 >> | |
- correctly predicted 0.07 > | |
- mispredicted 0.01 | |
* floating-point instr 1.65 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | |
- slow FP instr 0.53 >>>>>>>>>>> | |
- fast FP instr 1.12 >>>>>>>>>>>>>>>>>>>>>> | |
=============================================================================== |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[analyzer] WARNING: unknown sorting order (none), hotspots are not sorted | |
------------------------------------------------------------------------------- | |
Total running time for nabmd is 241.39 seconds between all 16 cores | |
The wall-clock time for nabmd is approximately 15.09 seconds | |
Module nabmd takes 99.97% of the total runtime | |
Module libimf.so takes 0.02% of the total runtime | |
Module libiomp5.so takes 0.00% of the total runtime | |
Module libintlc.so.5 takes 0.00% of the total runtime | |
Module libpthread-2.12.so takes 0.00% of the total runtime | |
Module libc-2.12.so takes 0.00% of the total runtime | |
------------------------------------------------------------------------------- | |
Loop in function nblist in ipo_out.c:0 (5.76% of the total runtime) | |
=============================================================================== | |
ratio to total instrns % 0..........25..........50..........75..........100 | |
- floating point 100.0 ************************************************** | |
- data accesses 15.6 ******** | |
* GFLOPS (% max) 12.6 ****** | |
- packed 0.0 | |
- scalar 12.6 ****** | |
------------------------------------------------------------------------------- | |
performance assessment LCPI good.......okay........fair........poor........bad | |
* overall 1.04 >>>>>>>>>>>>>>>>>>>>> | |
* data accesses 0.56 >>>>>>>>>>> | |
- L1d hits 0.55 >>>>>>>>>>> | |
- L2d hits 0.01 | |
- L3d hits 0.00 | |
- LLC misses 0.00 | |
* instruction accesses 0.00 | |
- L1i hits 0.00 | |
- L2i hits 0.00 | |
- L2i misses 0.00 | |
* data TLB 0.00 | |
* instruction TLB 0.00 | |
* branch instructions 0.80 >>>>>>>>>>>>>>>> | |
- correctly predicted 0.22 >>>> | |
- mispredicted 0.59 >>>>>>>>>>>> | |
* floating-point instr 1.83 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | |
- slow FP instr 0.00 | |
- fast FP instr 1.83 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | |
=============================================================================== | |
Loop in function egb..1 in ipo_out.c:0 (82.00% of the total runtime) | |
=============================================================================== | |
ratio to total instrns % 0..........25..........50..........75..........100 | |
- floating point 100.0 ************************************************** | |
- data accesses 35.3 ****************** | |
* GFLOPS (% max) 12.8 ****** | |
- packed 0.3 | |
- scalar 12.5 ****** | |
------------------------------------------------------------------------------- | |
performance assessment LCPI good.......okay........fair........poor........bad | |
* overall 1.14 >>>>>>>>>>>>>>>>>>>>>>> | |
* data accesses 1.30 >>>>>>>>>>>>>>>>>>>>>>>>>> | |
- L1d hits 1.23 >>>>>>>>>>>>>>>>>>>>>>>>> | |
- L2d hits 0.06 > | |
- L3d hits 0.00 | |
- LLC misses 0.00 | |
* instruction accesses 0.01 | |
- L1i hits 0.00 | |
- L2i hits 0.00 | |
- L2i misses 0.01 | |
* data TLB 0.00 | |
* instruction TLB 0.00 | |
* branch instructions 0.11 >> | |
- correctly predicted 0.10 >> | |
- mispredicted 0.01 | |
* floating-point instr 2.79 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+ | |
- slow FP instr 0.78 >>>>>>>>>>>>>>>> | |
- fast FP instr 2.01 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | |
=============================================================================== | |
Loop in function nbond in ipo_out.c:0 (9.07% of the total runtime) | |
=============================================================================== | |
ratio to total instrns % 0..........25..........50..........75..........100 | |
- floating point 64.8 ******************************** | |
- data accesses 28.6 ************** | |
* GFLOPS (% max) 12.5 ****** | |
- packed 0.0 | |
- scalar 12.5 ****** | |
------------------------------------------------------------------------------- | |
performance assessment LCPI good.......okay........fair........poor........bad | |
* overall 0.65 >>>>>>>>>>>>> | |
* data accesses 1.05 >>>>>>>>>>>>>>>>>>>>> | |
- L1d hits 1.00 >>>>>>>>>>>>>>>>>>>> | |
- L2d hits 0.04 > | |
- L3d hits 0.01 | |
- LLC misses 0.00 | |
* instruction accesses 0.01 | |
- L1i hits 0.00 | |
- L2i hits 0.00 | |
- L2i misses 0.01 | |
* data TLB 0.00 | |
* instruction TLB 0.00 | |
* branch instructions 0.07 > | |
- correctly predicted 0.07 > | |
- mispredicted 0.00 | |
* floating-point instr 1.66 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | |
- slow FP instr 0.54 >>>>>>>>>>> | |
- fast FP instr 1.13 >>>>>>>>>>>>>>>>>>>>>>> | |
=============================================================================== |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment