goyalankit/results.md

## results.md

      
    Raw
  

              results.md
            
          
    simple loop with single variable being written + vectorization.

for(i=0; i<n; i++){
  c[i] = a[i] + b[i]
}


Same compiler(Rose Compiler)(No vectorization): Both valgrind and macpo traces were exactly the same.


Icc Compiler(vectorization): Code was vectorized. Traces were same


simple loop with multiple variables being written + vectorization.

int n= 10
for(i=0; i<n; i++){
  c[i] = a[i] + b[i]
  a[i] = c[i] + b[i]
}


Same compiler(Rose Compiler): traces were same. Code was not vectorized.

levenshtein distance: a=0, b=0, c=0


Icc Compiler(vectorization): Traces don't match.

levenshtein distance: a=18, b=10, c=10


Icc Compiler(vectorization off explicitly) : Traces were different.

levenshtein distance: a=0, b=10, c=10 (b read twice and c is also read by macpo.)


No Vectorization.
Variable b is read twice in case of macpo.
Variable c is never read in icc.

ICC Compiler(valgrind trace) with vectorization.

icc   macpo
40    60
Ra0   Ra0
Ra1   Rb0
Ra2   Wc0
Rb0   Rc0
Rb1   Rb0
Rb2   Wa0
Wc0   Ra1
Wc1   Rb1
Wc2   Wc1
Wa0   Rc1
Wa1   Rb1
Wa2   Wa1

n = 10;
All "a" are loaded in batches of 2 values.
All "b" are loaded in batches of 2 values.
All "c" are written in batches of 2 values.
All "a" are written in batches of 2 values.

Compiler is doing vectorization here. Each load is of 16 bytes of double values.
I think the compiler is keeping the results in registers and then writing them all at once.

n=20;
4 "a" values are loaded in batches of 2 values.
4 "b" values are loaded in batches of 2 values.
4 "c" are written in batches of 2 values.
4 "a" are written in batches of 2 values.

---------------------------------------------------------------

ICC Vectorization OFF

 R:b:0
 R:b:1
 R:b:2
 R:b:3
 
 R:a:0
 W:c:0
 W:a:0
 
 R:a:1
 W:c:1
 W:a:1
 
 R:a:2
 W:c:2
 W:a:2


simple loop with single variable being written + Loop invariant.

for (i = 0; i < n; i++) {
  a[i] = b[0];
}


Rose compiler: traces were same.


icc(with vectorization): traces were different. Code was vectorized. Loop invariant opt. not performed.

levenshtein distance: b=9, a=0 (b was read 10 times.) No loop invariant optimization.
Total of 5 instruction in case of icc to write due to vectorization.


icc(without vectorization): traces were different. Loop invariant opt. not performed.

levenshtein distance: b=9, a=0 (b was read 10 times.) No loop invariant optimization.
Total of 5 instruction in case of icc to write due to vectorization.


Matrix-matrix multiplication

int n = 100
for (i = 0; i < n; i++) {
  for (k = 0; k < n; k++) {
    for (j = 0; j < n; j++) {
      c[i][j] += a[i][k] * b[k][j];
    }
  }
}


Rose compiler: Both valgrind and macpo traces were exactly the same.
Note: macpo RW -> R, W


icc(w/o vectorization): Traces were not same.

levenshtein distance:b = 200, a = 100 ,c = 1800


ICC: 2100 R/W operations, Macpo: 4000 RW operations. Note: macpo RW -> R, W


## zmacpo_metrics.md

      
    Raw
  

              zmacpo_metrics.md
            
          
    for(i=0; i<n; i++){
  c[i] = a[i] + b[i]
}
Sampling: Disabled; Vectorization: OFF
Compiler: ICC; Trace: Valgrind; 

[macpo] Reuse distances:
var: a: 2 (8 times) 0 (7 times) inf. (2 times).
var: b: 2 (8 times) inf. (3 times) 3 (3 times).
var: c: 2 (10 times) 1 (5 times) inf. (3 times).

[macpo] Cache conflicts:
var: a, conflict ratio: 0%.
var: b, conflict ratio: 0%.
var: c, conflict ratio: 0%.

[macpo] Analyzing records for stride values.
var: a: 1 (19 times).
var: b: 1 (19 times).
var: c: 1 (19 times).

Sampling: Disabled; Vectorization: OFF
Compiler: Rose; Trace: Valgrind;

[macpo] Reuse distances:
var: a: 2 (16 times) inf. (3 times) 8 (1 times).
var: b: 2 (16 times) inf. (2 times) 37 (1 times).
var: c: 2 (16 times) inf. (3 times) 8 (1 times).

[macpo] Cache conflicts:
var: a, conflict ratio: 0%.
var: b, conflict ratio: 0%.
var: c, conflict ratio: 0%.

[macpo] Analyzing records for stride values.
var: a: 1 (19 times).
var: b: 1 (19 times).
var: c: 1 (19 times).


int n= 10
for(i=0; i<n; i++){
  c[i] = a[i] + b[i]
  a[i] = c[i] + b[i]
}
Sampling: Disabled
Compiler: ICC
Trace: Valgrind
Vectorization: OFF

[macpo] Analyzing logs created from the binary /work/0268 at Tue Mar 18 19:57:00 2014

[macpo] Analyzing records for latency.

[macpo] Reuse distances:
var: b: 0 (13 times) inf. (2 times) 16 (2 times).
var: a: 0 (14 times) 1 (8 times) 2 (8 times).
var: c: 2 (13 times) inf. (3 times) 10 (2 times).

[macpo] Cache conflicts:
var: b, conflict ratio: 0%.
var: a, conflict ratio: 0%.
var: c, conflict ratio: 0%.

[macpo] Analyzing records for stride values.
var: b: 1 (19 times).
var: a: 0 (20 times) 1 (19 times).
var: c: 1 (19 times).

[macpo] Analyzing records for vector stride values.

Sampling: Disabled
Compiler: Rose
Trace: Valgrind

[macpo] Analyzing logs created from the binary /work/0268 at Tue Mar 18 20:03:35 2014

[macpo] Analyzing records for latency.

[macpo] Reuse distances:
var: b: 0 (10 times) 6 (6 times) inf. (2 times).
var: a: 0 (27 times) 6 (4 times) inf. (3 times).
var: c: 0 (10 times) 6 (7 times) inf. (3 times).

[macpo] Cache conflicts:
var: b, conflict ratio: 0%.
var: a, conflict ratio: 0%.
var: c, conflict ratio: 0%.

[macpo] Analyzing records for stride values.
var: b: 0 (10 times) 2 (9 times).
var: a: 0 (30 times) 2 (9 times).
var: c: 0 (10 times) 2 (9 times).

[macpo] Analyzing records for vector stride values.


 [macpo] Analyzing records for latency.

 [macpo] Reuse distances:
 var: c: 3 (20 times) 1 (17 times) inf. (3 times).
 var: a: 1 (20 times) 5 (10 times) 9 (5 times).
 var: b: 2 (32 times) 8 (5 times) inf. (2 times).

 [macpo] Cache conflicts:
 var: c, conflict ratio: 0%.
 var: a, conflict ratio: 0%.
 var: b, conflict ratio: 0%.

 [macpo] Analyzing records for stride values.
 var: c: 0 (20 times) 1 (19 times) 127 (1 times).
 var: a: 0 (20 times) 1 (19 times).
 var: b: 0 (20 times) 1 (19 times).

 [macpo] Analyzing records for vector stride values.


for (i = 0; i < n; i++) {
  a[i] = b[0];
}


Sampling: Disabled
Compiler: ICC
Trace: Valgrind

[macpo] Analyzing logs created from the binary /work/0268 at Tue Mar 18 20:14:39 2014

[macpo] Analyzing records for latency.

[macpo] Reuse distances:
var: b: inf. (1 times).
var: a: 0 (17 times) inf. (3 times).

[macpo] Cache conflicts:
var: b, conflict ratio: 0%.
var: a, conflict ratio: 0%.

[macpo] Analyzing records for stride values.
var: a: 1 (19 times).

[macpo] Analyzing records for vector stride values.

Sampling: Disabled
Compiler: Rose
Trace: Valgrind

[macpo] Analyzing logs created from the binary /work/0268 at Tue Mar 18 20:15:39 2014

[macpo] Analyzing records for latency.

[macpo] Reuse distances:
var: b: 1 (10 times) 3 (9 times) inf. (1 times).
var: a: 1 (17 times) inf. (3 times).

[macpo] Cache conflicts:
var: b, conflict ratio: 0%.
var: a, conflict ratio: 0%.

[macpo] Analyzing records for stride values.
var: b: 0 (19 times).
var: a: 1 (19 times).

[macpo] Analyzing records for vector stride values.