goyalankit/CoMD_perf_output

## CoMD_perf_output
Loop in function ljForce.omp_fn.1 in ljForce.c:191 (96.79% of the total runtime)
===============================================================================
ratio to total instrns    %  0..........25..........50..........75..........100
 - floating point       68.0 **********************************
 - data accesses        29.9 ***************
* GFLOPS (% max)        12.5 ******
 - packed                0.0
 - scalar               12.5 ******
-------------------------------------------------------------------------------
performance assessment  LCPI good.......okay........fair........poor........bad
* overall               0.68 >>>>>>>>>>>>>>
* data accesses         1.07 >>>>>>>>>>>>>>>>>>>>>
 - L1d hits             1.05 >>>>>>>>>>>>>>>>>>>>>
 - L2d hits             0.01
 - L3d hits             0.00
 - LLC misses           0.02
* instruction accesses  0.01
 - L1i hits             0.00
 - L2i hits             0.00
 - L2i misses           0.01
* data TLB              0.00
* instruction TLB       0.00
* branch instructions   0.18 >>>>
 - correctly predicted  0.08 >>
 - mispredicted         0.10 >>
* floating-point instr  1.32 >>>>>>>>>>>>>>>>>>>>>>>>>>
 - slow FP instr        0.14 >>>
 - fast FP instr        1.19 >>>>>>>>>>>>>>>>>>>>>>>>
===============================================================================

## lulesh_perfexper_output
c557-301.stampede(6)$ perfexpert_analyzer -t 0.05 -i /work/02681/ankitg/downloads/codedesign-benchmarks/lulesh/.perfexpert-temp.MPOYpp/1/database/experiment.xml
[analyzer]    WARNING: unknown sorting order (none), hotspots are not sorted
-------------------------------------------------------------------------------
Total running time for lulesh2.0 is 39.15 seconds between all 16 cores
The wall-clock time for lulesh2.0 is approximately 2.45 seconds

Module lulesh2.0 takes 96.46% of the total runtime
Module libm-2.12.so takes 3.37% of the total runtime
Module libgomp.so.1.0.0 takes 0.00% of the total runtime
Module libc-2.12.so takes 0.00% of the total runtime
-------------------------------------------------------------------------------

Function CalcElemFBHourglassForce in line 705 of lulesh.cc (6.38% of the total runtime)
===============================================================================
The performance of this code section is good!
-------------------------------------------------------------------------------
ratio to total instrns    %  0..........25..........50..........75..........100
 - floating point       46.2 ***********************
 - data accesses        49.5 *************************
* GFLOPS (% max)        13.1 *******
 - packed                0.0
 - scalar               13.1 *******
-------------------------------------------------------------------------------
performance assessment  LCPI good.......okay........fair........poor........bad
* overall               0.44 >>>>>>>>>
* data accesses         1.77 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
 - L1d hits             1.73 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
 - L2d hits             0.02
 - L3d hits             0.01
 - LLC misses           0.01
* instruction accesses  0.00
 - L1i hits             0.00
 - L2i hits             0.00
 - L2i misses           0.00
* data TLB              0.00
* instruction TLB       0.00
* branch instructions   0.00
 - correctly predicted  0.00
 - mispredicted         0.00
* floating-point instr  0.85 >>>>>>>>>>>>>>>>>
 - slow FP instr        0.05 >
 - fast FP instr        0.80 >>>>>>>>>>>>>>>>
===============================================================================

Loop in function _ZL28CalcFBHourglassForceForElemsR6DomainPdS1_S1_S1_S1_S1_S1_dii.omp_fn.6 in lulesh.cc:822 (9.25% of the total runtime)
===============================================================================
ratio to total instrns    %  0..........25..........50..........75..........100
 - floating point      100.0 **************************************************
 - data accesses        38.6 *******************
* GFLOPS (% max)        25.2 *************
 - packed               12.5 ******
 - scalar               12.7 ******
-------------------------------------------------------------------------------
performance assessment  LCPI good.......okay........fair........poor........bad
* overall               0.51 >>>>>>>>>>
* data accesses         1.50 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
 - L1d hits             1.35 >>>>>>>>>>>>>>>>>>>>>>>>>>>
 - L2d hits             0.12 >>
 - L3d hits             0.02
 - LLC misses           0.01
* instruction accesses  0.02
 - L1i hits             0.00
 - L2i hits             0.00
 - L2i misses           0.02
* data TLB              0.00
* instruction TLB       0.00
* branch instructions   0.01
 - correctly predicted  0.01
 - mispredicted         0.00
* floating-point instr  1.39 >>>>>>>>>>>>>>>>>>>>>>>>>>>>
 - slow FP instr        0.03 >
 - fast FP instr        1.36 >>>>>>>>>>>>>>>>>>>>>>>>>>>
===============================================================================

Loop in function _ZL18CalcEnergyForElemsPdS_S_S_S_S_S_S_S_S_S_S_S_dddddS_S_ddiPi.omp_fn.19 in lulesh.cc:2106 (5.13% of the total runtime)
===============================================================================
ratio to total instrns    %  0..........25..........50..........75..........100
 - floating point       54.2 ***************************
 - data accesses        49.7 *************************
* GFLOPS (% max)        11.6 ******
 - packed                0.0
 - scalar               11.6 ******
-------------------------------------------------------------------------------
performance assessment  LCPI good.......okay........fair........poor........bad
* overall               0.58 >>>>>>>>>>>>
* data accesses         2.08 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
 - L1d hits             1.74 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
 - L2d hits             0.29 >>>>>>
 - L3d hits             0.05 >
 - LLC misses           0.00
* instruction accesses  0.03 >
 - L1i hits             0.00
 - L2i hits             0.00
 - L2i misses           0.03 >
* data TLB              0.00
* instruction TLB       0.00
* branch instructions   0.09 >>
 - correctly predicted  0.08 >>
 - mispredicted         0.01
* floating-point instr  1.66 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
 - slow FP instr        0.71 >>>>>>>>>>>>>>
 - fast FP instr        0.94 >>>>>>>>>>>>>>>>>>>
===============================================================================

Loop in function _ZL20CalcPressureForElemsPdS_S_S_S_S_dddiPi.omp_fn.24 in lulesh.cc:2060 (5.31% of the total runtime)
===============================================================================
The performance of this code section is good!
-------------------------------------------------------------------------------
ratio to total instrns    %  0..........25..........50..........75..........100
 - floating point       44.4 **********************
 - data accesses        28.9 **************
* GFLOPS (% max)        10.9 *****
 - packed                0.0
 - scalar               10.9 *****
-------------------------------------------------------------------------------
performance assessment  LCPI good.......okay........fair........poor........bad
* overall               0.51 >>>>>>>>>>
* data accesses         1.43 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
 - L1d hits             1.01 >>>>>>>>>>>>>>>>>>>>
 - L2d hits             0.33 >>>>>>>
 - L3d hits             0.09 >>
 - LLC misses           0.00
* instruction accesses  0.01
 - L1i hits             0.00
 - L2i hits             0.00
 - L2i misses           0.01
* data TLB              0.00
* instruction TLB       0.00
* branch instructions   0.22 >>>>
 - correctly predicted  0.20 >>>>
 - mispredicted         0.02
* floating-point instr  0.77 >>>>>>>>>>>>>>>
 - slow FP instr        0.00
 - fast FP instr        0.77 >>>>>>>>>>>>>>>
===============================================================================

## nab_aminos
Loop in function egb..1 in ipo_out.c:0 (82.68% of the total runtime)
===============================================================================
ratio to total instrns    %  0..........25..........50..........75..........100
 - floating point      100.0 **************************************************
 - data accesses        34.9 *****************
* GFLOPS (% max)        12.9 ******
 - packed                0.3
 - scalar               12.6 ******
-------------------------------------------------------------------------------
performance assessment  LCPI good.......okay........fair........poor........bad
* overall               1.14 >>>>>>>>>>>>>>>>>>>>>>>
* data accesses         1.24 >>>>>>>>>>>>>>>>>>>>>>>>>
 - L1d hits             1.22 >>>>>>>>>>>>>>>>>>>>>>>>
 - L2d hits             0.01
 - L3d hits             0.01
 - LLC misses           0.00
* instruction accesses  0.02
 - L1i hits             0.00
 - L2i hits             0.00
 - L2i misses           0.02
* data TLB              0.00
* instruction TLB       0.00
* branch instructions   0.11 >>
 - correctly predicted  0.10 >>
 - mispredicted         0.01
* floating-point instr  2.82 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+
 - slow FP instr        0.78 >>>>>>>>>>>>>>>>
 - fast FP instr        2.04 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
===============================================================================

Loop in function nbond in ipo_out.c:0 (9.43% of the total runtime)
===============================================================================
ratio to total instrns    %  0..........25..........50..........75..........100
 - floating point       64.0 ********************************
 - data accesses        29.2 ***************
* GFLOPS (% max)        12.2 ******
 - packed                0.0
 - scalar               12.2 ******
-------------------------------------------------------------------------------
performance assessment  LCPI good.......okay........fair........poor........bad
* overall               0.66 >>>>>>>>>>>>>
* data accesses         1.04 >>>>>>>>>>>>>>>>>>>>>
 - L1d hits             1.02 >>>>>>>>>>>>>>>>>>>>
 - L2d hits             0.02
 - L3d hits             0.01
 - LLC misses           0.00
* instruction accesses  0.02
 - L1i hits             0.00
 - L2i hits             0.00
 - L2i misses           0.02
* data TLB              0.00
* instruction TLB       0.00
* branch instructions   0.08 >>
 - correctly predicted  0.07 >
 - mispredicted         0.01
* floating-point instr  1.65 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
 - slow FP instr        0.53 >>>>>>>>>>>
 - fast FP instr        1.12 >>>>>>>>>>>>>>>>>>>>>>
===============================================================================

## nabmd_gcn4
[analyzer]    WARNING: unknown sorting order (none), hotspots are not sorted
-------------------------------------------------------------------------------
Total running time for nabmd is 241.39 seconds between all 16 cores
The wall-clock time for nabmd is approximately 15.09 seconds

Module nabmd takes 99.97% of the total runtime
Module libimf.so takes 0.02% of the total runtime
Module libiomp5.so takes 0.00% of the total runtime
Module libintlc.so.5 takes 0.00% of the total runtime
Module libpthread-2.12.so takes 0.00% of the total runtime
Module libc-2.12.so takes 0.00% of the total runtime
-------------------------------------------------------------------------------

Loop in function nblist in ipo_out.c:0 (5.76% of the total runtime)
===============================================================================
ratio to total instrns    %  0..........25..........50..........75..........100
 - floating point      100.0 **************************************************
 - data accesses        15.6 ********
* GFLOPS (% max)        12.6 ******
 - packed                0.0
 - scalar               12.6 ******
-------------------------------------------------------------------------------
performance assessment  LCPI good.......okay........fair........poor........bad
* overall               1.04 >>>>>>>>>>>>>>>>>>>>>
* data accesses         0.56 >>>>>>>>>>>
 - L1d hits             0.55 >>>>>>>>>>>
 - L2d hits             0.01
 - L3d hits             0.00
 - LLC misses           0.00
* instruction accesses  0.00
 - L1i hits             0.00
 - L2i hits             0.00
 - L2i misses           0.00
* data TLB              0.00
* instruction TLB       0.00
* branch instructions   0.80 >>>>>>>>>>>>>>>>
 - correctly predicted  0.22 >>>>
 - mispredicted         0.59 >>>>>>>>>>>>
* floating-point instr  1.83 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
 - slow FP instr        0.00
 - fast FP instr        1.83 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
===============================================================================

Loop in function egb..1 in ipo_out.c:0 (82.00% of the total runtime)
===============================================================================
ratio to total instrns    %  0..........25..........50..........75..........100
 - floating point      100.0 **************************************************
 - data accesses        35.3 ******************
* GFLOPS (% max)        12.8 ******
 - packed                0.3
 - scalar               12.5 ******
-------------------------------------------------------------------------------
performance assessment  LCPI good.......okay........fair........poor........bad
* overall               1.14 >>>>>>>>>>>>>>>>>>>>>>>
* data accesses         1.30 >>>>>>>>>>>>>>>>>>>>>>>>>>
 - L1d hits             1.23 >>>>>>>>>>>>>>>>>>>>>>>>>
 - L2d hits             0.06 >
 - L3d hits             0.00
 - LLC misses           0.00
* instruction accesses  0.01
 - L1i hits             0.00
 - L2i hits             0.00
 - L2i misses           0.01
* data TLB              0.00
* instruction TLB       0.00
* branch instructions   0.11 >>
 - correctly predicted  0.10 >>
 - mispredicted         0.01
* floating-point instr  2.79 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+
 - slow FP instr        0.78 >>>>>>>>>>>>>>>>
 - fast FP instr        2.01 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
===============================================================================

Loop in function nbond in ipo_out.c:0 (9.07% of the total runtime)
===============================================================================
ratio to total instrns    %  0..........25..........50..........75..........100
 - floating point       64.8 ********************************
 - data accesses        28.6 **************
* GFLOPS (% max)        12.5 ******
 - packed                0.0
 - scalar               12.5 ******
-------------------------------------------------------------------------------
performance assessment  LCPI good.......okay........fair........poor........bad
* overall               0.65 >>>>>>>>>>>>>
* data accesses         1.05 >>>>>>>>>>>>>>>>>>>>>
 - L1d hits             1.00 >>>>>>>>>>>>>>>>>>>>
 - L2d hits             0.04 >
 - L3d hits             0.01
 - LLC misses           0.00
* instruction accesses  0.01
 - L1i hits             0.00
 - L2i hits             0.00
 - L2i misses           0.01
* data TLB              0.00
* instruction TLB       0.00
* branch instructions   0.07 >
 - correctly predicted  0.07 >
 - mispredicted         0.00
* floating-point instr  1.66 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
 - slow FP instr        0.54 >>>>>>>>>>>
 - fast FP instr        1.13 >>>>>>>>>>>>>>>>>>>>>>>
===============================================================================
	Loop in function ljForce.omp_fn.1 in ljForce.c:191 (96.79% of the total runtime)
	===============================================================================
	ratio to total instrns % 0..........25..........50..........75..........100
	- floating point 68.0 **********************************
	- data accesses 29.9 ***************
	* GFLOPS (% max) 12.5 ******
	- packed 0.0
	- scalar 12.5 ******
	-------------------------------------------------------------------------------
	performance assessment LCPI good.......okay........fair........poor........bad
	* overall 0.68 >>>>>>>>>>>>>>
	* data accesses 1.07 >>>>>>>>>>>>>>>>>>>>>
	- L1d hits 1.05 >>>>>>>>>>>>>>>>>>>>>
	- L2d hits 0.01
	- L3d hits 0.00
	- LLC misses 0.02
	* instruction accesses 0.01
	- L1i hits 0.00
	- L2i hits 0.00
	- L2i misses 0.01
	* data TLB 0.00
	* instruction TLB 0.00
	* branch instructions 0.18 >>>>
	- correctly predicted 0.08 >>
	- mispredicted 0.10 >>
	* floating-point instr 1.32 >>>>>>>>>>>>>>>>>>>>>>>>>>
	- slow FP instr 0.14 >>>
	- fast FP instr 1.19 >>>>>>>>>>>>>>>>>>>>>>>>
	===============================================================================
	c557-301.stampede(6)$ perfexpert_analyzer -t 0.05 -i /work/02681/ankitg/downloads/codedesign-benchmarks/lulesh/.perfexpert-temp.MPOYpp/1/database/experiment.xml
	[analyzer] WARNING: unknown sorting order (none), hotspots are not sorted
	-------------------------------------------------------------------------------
	Total running time for lulesh2.0 is 39.15 seconds between all 16 cores
	The wall-clock time for lulesh2.0 is approximately 2.45 seconds

	Module lulesh2.0 takes 96.46% of the total runtime
	Module libm-2.12.so takes 3.37% of the total runtime
	Module libgomp.so.1.0.0 takes 0.00% of the total runtime
	Module libc-2.12.so takes 0.00% of the total runtime
	-------------------------------------------------------------------------------

	Function CalcElemFBHourglassForce in line 705 of lulesh.cc (6.38% of the total runtime)
	===============================================================================
	The performance of this code section is good!
	-------------------------------------------------------------------------------
	ratio to total instrns % 0..........25..........50..........75..........100
	- floating point 46.2 ***********************
	- data accesses 49.5 *************************
	* GFLOPS (% max) 13.1 *******
	- packed 0.0
	- scalar 13.1 *******
	-------------------------------------------------------------------------------
	performance assessment LCPI good.......okay........fair........poor........bad
	* overall 0.44 >>>>>>>>>
	* data accesses 1.77 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
	- L1d hits 1.73 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
	- L2d hits 0.02
	- L3d hits 0.01
	- LLC misses 0.01
	* instruction accesses 0.00
	- L1i hits 0.00
	- L2i hits 0.00
	- L2i misses 0.00
	* data TLB 0.00
	* instruction TLB 0.00
	* branch instructions 0.00
	- correctly predicted 0.00
	- mispredicted 0.00
	* floating-point instr 0.85 >>>>>>>>>>>>>>>>>
	- slow FP instr 0.05 >
	- fast FP instr 0.80 >>>>>>>>>>>>>>>>
	===============================================================================

	Loop in function _ZL28CalcFBHourglassForceForElemsR6DomainPdS1_S1_S1_S1_S1_S1_dii.omp_fn.6 in lulesh.cc:822 (9.25% of the total runtime)
	===============================================================================
	ratio to total instrns % 0..........25..........50..........75..........100
	- floating point 100.0 **************************************************
	- data accesses 38.6 *******************
	* GFLOPS (% max) 25.2 *************
	- packed 12.5 ******
	- scalar 12.7 ******
	-------------------------------------------------------------------------------
	performance assessment LCPI good.......okay........fair........poor........bad
	* overall 0.51 >>>>>>>>>>
	* data accesses 1.50 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
	- L1d hits 1.35 >>>>>>>>>>>>>>>>>>>>>>>>>>>
	- L2d hits 0.12 >>
	- L3d hits 0.02
	- LLC misses 0.01
	* instruction accesses 0.02
	- L1i hits 0.00
	- L2i hits 0.00
	- L2i misses 0.02
	* data TLB 0.00
	* instruction TLB 0.00
	* branch instructions 0.01
	- correctly predicted 0.01
	- mispredicted 0.00
	* floating-point instr 1.39 >>>>>>>>>>>>>>>>>>>>>>>>>>>>
	- slow FP instr 0.03 >
	- fast FP instr 1.36 >>>>>>>>>>>>>>>>>>>>>>>>>>>
	===============================================================================

	Loop in function _ZL18CalcEnergyForElemsPdS_S_S_S_S_S_S_S_S_S_S_S_dddddS_S_ddiPi.omp_fn.19 in lulesh.cc:2106 (5.13% of the total runtime)
	===============================================================================
	ratio to total instrns % 0..........25..........50..........75..........100
	- floating point 54.2 ***************************
	- data accesses 49.7 *************************
	* GFLOPS (% max) 11.6 ******
	- packed 0.0
	- scalar 11.6 ******
	-------------------------------------------------------------------------------
	performance assessment LCPI good.......okay........fair........poor........bad
	* overall 0.58 >>>>>>>>>>>>
	* data accesses 2.08 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
	- L1d hits 1.74 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
	- L2d hits 0.29 >>>>>>
	- L3d hits 0.05 >
	- LLC misses 0.00
	* instruction accesses 0.03 >
	- L1i hits 0.00
	- L2i hits 0.00
	- L2i misses 0.03 >
	* data TLB 0.00
	* instruction TLB 0.00
	* branch instructions 0.09 >>
	- correctly predicted 0.08 >>
	- mispredicted 0.01
	* floating-point instr 1.66 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
	- slow FP instr 0.71 >>>>>>>>>>>>>>
	- fast FP instr 0.94 >>>>>>>>>>>>>>>>>>>
	===============================================================================

	Loop in function _ZL20CalcPressureForElemsPdS_S_S_S_S_dddiPi.omp_fn.24 in lulesh.cc:2060 (5.31% of the total runtime)
	===============================================================================
	The performance of this code section is good!
	-------------------------------------------------------------------------------
	ratio to total instrns % 0..........25..........50..........75..........100
	- floating point 44.4 **********************
	- data accesses 28.9 **************
	* GFLOPS (% max) 10.9 *****
	- packed 0.0
	- scalar 10.9 *****
	-------------------------------------------------------------------------------
	performance assessment LCPI good.......okay........fair........poor........bad
	* overall 0.51 >>>>>>>>>>
	* data accesses 1.43 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
	- L1d hits 1.01 >>>>>>>>>>>>>>>>>>>>
	- L2d hits 0.33 >>>>>>>
	- L3d hits 0.09 >>
	- LLC misses 0.00
	* instruction accesses 0.01
	- L1i hits 0.00
	- L2i hits 0.00
	- L2i misses 0.01
	* data TLB 0.00
	* instruction TLB 0.00
	* branch instructions 0.22 >>>>
	- correctly predicted 0.20 >>>>
	- mispredicted 0.02
	* floating-point instr 0.77 >>>>>>>>>>>>>>>
	- slow FP instr 0.00
	- fast FP instr 0.77 >>>>>>>>>>>>>>>
	===============================================================================
	Loop in function egb..1 in ipo_out.c:0 (82.68% of the total runtime)
	===============================================================================
	ratio to total instrns % 0..........25..........50..........75..........100
	- floating point 100.0 **************************************************
	- data accesses 34.9 *****************
	* GFLOPS (% max) 12.9 ******
	- packed 0.3
	- scalar 12.6 ******
	-------------------------------------------------------------------------------
	performance assessment LCPI good.......okay........fair........poor........bad
	* overall 1.14 >>>>>>>>>>>>>>>>>>>>>>>
	* data accesses 1.24 >>>>>>>>>>>>>>>>>>>>>>>>>
	- L1d hits 1.22 >>>>>>>>>>>>>>>>>>>>>>>>
	- L2d hits 0.01
	- L3d hits 0.01
	- LLC misses 0.00
	* instruction accesses 0.02
	- L1i hits 0.00
	- L2i hits 0.00
	- L2i misses 0.02
	* data TLB 0.00
	* instruction TLB 0.00
	* branch instructions 0.11 >>
	- correctly predicted 0.10 >>
	- mispredicted 0.01
	* floating-point instr 2.82 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+
	- slow FP instr 0.78 >>>>>>>>>>>>>>>>
	- fast FP instr 2.04 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
	===============================================================================

	Loop in function nbond in ipo_out.c:0 (9.43% of the total runtime)
	===============================================================================
	ratio to total instrns % 0..........25..........50..........75..........100
	- floating point 64.0 ********************************
	- data accesses 29.2 ***************
	* GFLOPS (% max) 12.2 ******
	- packed 0.0
	- scalar 12.2 ******
	-------------------------------------------------------------------------------
	performance assessment LCPI good.......okay........fair........poor........bad
	* overall 0.66 >>>>>>>>>>>>>
	* data accesses 1.04 >>>>>>>>>>>>>>>>>>>>>
	- L1d hits 1.02 >>>>>>>>>>>>>>>>>>>>
	- L2d hits 0.02
	- L3d hits 0.01
	- LLC misses 0.00
	* instruction accesses 0.02
	- L1i hits 0.00
	- L2i hits 0.00
	- L2i misses 0.02
	* data TLB 0.00
	* instruction TLB 0.00
	* branch instructions 0.08 >>
	- correctly predicted 0.07 >
	- mispredicted 0.01
	* floating-point instr 1.65 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
	- slow FP instr 0.53 >>>>>>>>>>>
	- fast FP instr 1.12 >>>>>>>>>>>>>>>>>>>>>>
	===============================================================================
	[analyzer] WARNING: unknown sorting order (none), hotspots are not sorted
	-------------------------------------------------------------------------------
	Total running time for nabmd is 241.39 seconds between all 16 cores
	The wall-clock time for nabmd is approximately 15.09 seconds

	Module nabmd takes 99.97% of the total runtime
	Module libimf.so takes 0.02% of the total runtime
	Module libiomp5.so takes 0.00% of the total runtime
	Module libintlc.so.5 takes 0.00% of the total runtime
	Module libpthread-2.12.so takes 0.00% of the total runtime
	Module libc-2.12.so takes 0.00% of the total runtime
	-------------------------------------------------------------------------------

	Loop in function nblist in ipo_out.c:0 (5.76% of the total runtime)
	===============================================================================
	ratio to total instrns % 0..........25..........50..........75..........100
	- floating point 100.0 **************************************************
	- data accesses 15.6 ********
	* GFLOPS (% max) 12.6 ******
	- packed 0.0
	- scalar 12.6 ******
	-------------------------------------------------------------------------------
	performance assessment LCPI good.......okay........fair........poor........bad
	* overall 1.04 >>>>>>>>>>>>>>>>>>>>>
	* data accesses 0.56 >>>>>>>>>>>
	- L1d hits 0.55 >>>>>>>>>>>
	- L2d hits 0.01
	- L3d hits 0.00
	- LLC misses 0.00
	* instruction accesses 0.00
	- L1i hits 0.00
	- L2i hits 0.00
	- L2i misses 0.00
	* data TLB 0.00
	* instruction TLB 0.00
	* branch instructions 0.80 >>>>>>>>>>>>>>>>
	- correctly predicted 0.22 >>>>
	- mispredicted 0.59 >>>>>>>>>>>>
	* floating-point instr 1.83 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
	- slow FP instr 0.00
	- fast FP instr 1.83 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
	===============================================================================

	Loop in function egb..1 in ipo_out.c:0 (82.00% of the total runtime)
	===============================================================================
	ratio to total instrns % 0..........25..........50..........75..........100
	- floating point 100.0 **************************************************
	- data accesses 35.3 ******************
	* GFLOPS (% max) 12.8 ******
	- packed 0.3
	- scalar 12.5 ******
	-------------------------------------------------------------------------------
	performance assessment LCPI good.......okay........fair........poor........bad
	* overall 1.14 >>>>>>>>>>>>>>>>>>>>>>>
	* data accesses 1.30 >>>>>>>>>>>>>>>>>>>>>>>>>>
	- L1d hits 1.23 >>>>>>>>>>>>>>>>>>>>>>>>>
	- L2d hits 0.06 >
	- L3d hits 0.00
	- LLC misses 0.00
	* instruction accesses 0.01
	- L1i hits 0.00
	- L2i hits 0.00
	- L2i misses 0.01
	* data TLB 0.00
	* instruction TLB 0.00
	* branch instructions 0.11 >>
	- correctly predicted 0.10 >>
	- mispredicted 0.01
	* floating-point instr 2.79 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+
	- slow FP instr 0.78 >>>>>>>>>>>>>>>>
	- fast FP instr 2.01 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
	===============================================================================

	Loop in function nbond in ipo_out.c:0 (9.07% of the total runtime)
	===============================================================================
	ratio to total instrns % 0..........25..........50..........75..........100
	- floating point 64.8 ********************************
	- data accesses 28.6 **************
	* GFLOPS (% max) 12.5 ******
	- packed 0.0
	- scalar 12.5 ******
	-------------------------------------------------------------------------------
	performance assessment LCPI good.......okay........fair........poor........bad
	* overall 0.65 >>>>>>>>>>>>>
	* data accesses 1.05 >>>>>>>>>>>>>>>>>>>>>
	- L1d hits 1.00 >>>>>>>>>>>>>>>>>>>>
	- L2d hits 0.04 >
	- L3d hits 0.01
	- LLC misses 0.00
	* instruction accesses 0.01
	- L1i hits 0.00
	- L2i hits 0.00
	- L2i misses 0.01
	* data TLB 0.00
	* instruction TLB 0.00
	* branch instructions 0.07 >
	- correctly predicted 0.07 >
	- mispredicted 0.00
	* floating-point instr 1.66 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
	- slow FP instr 0.54 >>>>>>>>>>>
	- fast FP instr 1.13 >>>>>>>>>>>>>>>>>>>>>>>
	===============================================================================