Skip to content

Instantly share code, notes, and snippets.

@zingaburga
zingaburga / bench.sh
Created Aug 18, 2022
Benchmarking masked AVX-512 memory ops
View bench.sh
#!/bin/sh
echo "== Aligned load"
./nanoBench.sh -asm_init "MOV RAX, 0x3f; ANDN RBX, RAX, R14" -asm "VMOVDQU8 zmm0, [rbx]" -config configs/cfg_AlderLakeP_common.txt
echo "== Unaligned load"
./nanoBench.sh -asm_init "MOV RAX, 0x3f; ANDN RBX, RAX, R14; SUB RBX, 8" -asm "VMOVDQU8 zmm0, [rbx]" -config configs/cfg_AlderLakeP_common.txt
echo "== Unaligned load (8b, 0 mask)"
./nanoBench.sh -asm_init "MOV RAX, 0x3f; ANDN RBX, RAX, R14; SUB RBX, 8; KXORQ k1, k1, k1" -asm "VMOVDQU8 zmm0{k1}{z}, [rbx]" -config configs/cfg_AlderLakeP_common.txt
echo "== Unaligned load (8b, -1 mask)"
./nanoBench.sh -asm_init "MOV RAX, 0x3f; ANDN RBX, RAX, R14; SUB RBX, 8; KXNORQ k1, k1, k1" -asm "VMOVDQU8 zmm0{k1}{z}, [rbx]" -config configs/cfg_AlderLakeP_common.txt
@zingaburga
zingaburga / sve2.md
Last active Jan 26, 2023
ARM’s Scalable Vector Extensions: A Critical Look at SVE2 For Integer Workloads
View sve2.md

ARM’s Scalable Vector Extensions: A Critical Look at SVE2 For Integer Workloads

Scalable Vector Extensions (SVE) is ARM’s latest SIMD extension to their instruction set, which was announced back in 2016. A follow-up SVE2 extension was announced in 2019, designed to incorporate all functionality from ARM’s current primary SIMD extension, NEON (aka ASIMD).

Despite being announced 5 years ago, there is currently no generally available CPU which supports any form of SVE (which excludes the [Fugaku supercomputer](https://www.fujitsu.com/global/about/innovation/