Skip to content

Instantly share code, notes, and snippets.

@abadams
Created February 16, 2018 22:30
Show Gist options
  • Save abadams/58aef3941e678b90bc9937d9005d0051 to your computer and use it in GitHub Desktop.
Save abadams/58aef3941e678b90bc9937d9005d0051 to your computer and use it in GitHub Desktop.
Halide:
1.42 │ e0: vmovup -0x60(%rdi),%ymm0
4.50 │ vmovup -0x40(%rdi),%ymm6
4.97 │ vmovup -0x20(%rdi),%ymm7
2.54 │ vmovup (%rdi),%ymm8
30.71 │ vfmadd (%rsi),%ymm5,%ymm8
7.82 │ vfmadd -0x20(%rsi),%ymm5,%ymm7
22.69 │ vfmadd -0x40(%rsi),%ymm5,%ymm6
5.42 │ vfmadd -0x60(%rsi),%ymm5,%ymm0
2.74 │ vmovup %ymm0,-0x60(%rsi)
2.40 │ vmovup %ymm6,-0x40(%rsi)
2.15 │ vmovup %ymm7,-0x20(%rsi)
1.03 │ vmovup %ymm8,(%rsi)
2.96 │ sub $0xffffffffffffff80,%rsi
5.06 │ sub $0xffffffffffffff80,%rdi
0.08 │ add $0xffffffffffffffff,%rbp
2.07 │ ↑ jne e0
Openblas:
1.25 │10: vmovup (%rdx,%rax,4),%ymm12
21.01 │ vmovup 0x20(%rdx,%rax,4),%ymm13
3.29 │ vmovup 0x40(%rdx,%rax,4),%ymm14
20.35 │ vmovup 0x60(%rdx,%rax,4),%ymm15
8.39 │ vfmadd (%rsi,%rax,4),%ymm0,%ymm12
8.46 │ vfmadd 0x20(%rsi,%rax,4),%ymm0,%ymm13
9.95 │ vfmadd 0x40(%rsi,%rax,4),%ymm0,%ymm14
11.35 │ vfmadd 0x60(%rsi,%rax,4),%ymm0,%ymm15
0.09 │ vmovup %ymm12,(%rdx,%rax,4)
2.95 │ vmovup %ymm13,0x20(%rdx,%rax,4)
0.13 │ vmovup %ymm14,0x40(%rdx,%rax,4)
7.65 │ vmovup %ymm15,0x60(%rdx,%rax,4)
0.01 │ add $0x20,%rax
0.06 │ sub $0x20,%rdi
4.65 │ ↑ jne 10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment