Skip to content

Instantly share code, notes, and snippets.

@damageboy
Created October 25, 2022 06:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save damageboy/c2600bc6b7650cd80d9d0ab352465c58 to your computer and use it in GitHub Desktop.
Save damageboy/c2600bc6b7650cd80d9d0ab352465c58 to your computer and use it in GitHub Desktop.
CPUID highest leaf : [1bh]
Running as root : [YES]
MSR reads supported : [YES]
CPU pinning enabled : [YES]
CPU supports zeroupper: [YES]
CPU supports AVX2 : [YES]
CPU supports AVX-512F : [YES]
CPU supports AVX-512VL: [YES]
CPU supports AVX-512BW: [YES]
CPU supports AVX-512CD: [YES]
cpuid = eax = 2, ebx = 146, ecx = 38400000, edx = 0
tsc_freq = 2803.2 MHz (from cpuid leaf 0x15)
CPU brand string: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
4 available CPUs: [0, 1, 2, 3]
4 physical cores: [0, 1, 2, 3]
Will test up to 4 CPUs
Cores | ID | Description | OVRLP3 | Mops | A/M-ratio | A/M-MHz | M/tsc-ratio
1 | pause_only | pause instruction | 1.000 | 1649 | 0.86 | 2404 | 1.00
1 | ucomis_clean | scalar ucomis (w/ vzeroupper) | 1.000 | 1164 | 1.67 | 4686 | 1.00
1 | ucomis_dirty | scalar ucomis (no vzeroupper) | 1.000 | 1163 | 1.67 | 4668 | 1.00
1 | scalar_iadd | Scalar integer adds | 1.000 | 4689 | 1.67 | 4684 | 1.00
1 | avx128_iadd | 128-bit integer serial adds | 1.000 | 4682 | 1.57 | 4400 | 1.00
1 | avx256_iadd | 256-bit integer serial adds | 1.000 | 4095 | 1.55 | 4353 | 1.00
1 | avx512_iadd | 512-bit integer serial adds | 1.000 | 4485 | 1.55 | 4355 | 1.00
1 | avx128_iadd16 | 128-bit integer serial adds zmm16 | 1.000 | 4689 | 1.68 | 4705 | 1.00
1 | avx256_iadd16 | 256-bit integer serial adds zmm16 | 1.000 | 4689 | 1.68 | 4705 | 1.00
1 | avx512_iadd16 | 512-bit integer serial adds zmm16 | 1.000 | 4490 | 1.61 | 4505 | 1.00
1 | avx128_iadd_t | 128-bit integer parallel adds | 1.000 | 14071 | 1.68 | 4705 | 1.00
1 | avx256_iadd_t | 256-bit integer parallel adds | 1.000 | 14073 | 1.68 | 4705 | 1.00
1 | avx128_xor_zero | 128-bit zeroing xor | 1.000 | 23218 | 1.68 | 4705 | 1.00
1 | avx256_xor_zero | 256-bit zeroing xor | 1.000 | 23213 | 1.68 | 4705 | 1.00
1 | avx512_xor_zero | 512-bit zeroing xord | 1.000 | 22227 | 1.61 | 4505 | 1.00
1 | avx128_mov_sparse | 128-bit reg-reg mov | 1.000 | 4689 | 1.68 | 4705 | 1.00
1 | avx256_mov_sparse | 256-bit reg-reg mov | 1.000 | 4689 | 1.68 | 4705 | 1.00
1 | avx512_mov_sparse | 512-bit reg-reg mov | 1.000 | 4489 | 1.61 | 4505 | 1.00
1 | avx128_merge_sparse | 128-bit reg-reg merge mov | 1.000 | 4689 | 1.68 | 4705 | 1.00
1 | avx256_merge_sparse | 256-bit reg-reg merge mov | 1.000 | 4690 | 1.68 | 4705 | 1.00
1 | avx512_merge_sparse | 512-bit reg-reg merge mov | 1.000 | 4490 | 1.61 | 4505 | 1.00
1 | avx128_vshift | 128-bit variable shift (vpsrlvd) | 1.000 | 4689 | 1.68 | 4705 | 1.00
1 | avx256_vshift | 256-bit variable shift (vpsrlvd) | 1.000 | 4689 | 1.68 | 4705 | 1.00
1 | avx512_vshift | 512-bit variable shift (vpsrlvd) | 1.000 | 4491 | 1.61 | 4501 | 1.00
1 | avx128_vshift_t | 128-bit variable shift (vpsrlvd) | 1.000 | 9383 | 1.68 | 4705 | 1.00
1 | avx256_vshift_t | 256-bit variable shift (vpsrlvd) | 1.000 | 9375 | 1.66 | 4652 | 1.00
1 | avx512_vshift_t | 512-bit variable shift (vpsrlvd) | 1.000 | 4489 | 1.60 | 4490 | 1.00
1 | avx128_vlzcnt | 128-bit lzcnt (vplzcntd) | 1.000 | 1172 | 1.68 | 4705 | 1.00
1 | avx256_vlzcnt | 256-bit lzcnt (vplzcntd) | 1.000 | 1172 | 1.68 | 4699 | 1.00
1 | avx512_vlzcnt | 512-bit lzcnt (vplzcntd) | 1.000 | 1122 | 1.60 | 4497 | 1.00
1 | avx128_vlzcnt_t | 128-bit lzcnt (vplzcntd) | 1.000 | 9353 | 1.58 | 4415 | 1.00
1 | avx256_vlzcnt_t | 256-bit lzcnt (vplzcntd) | 1.000 | 8181 | 1.46 | 4105 | 1.00
1 | avx512_vlzcnt_t | 512-bit lzcnt (vplzcntd) | 1.000 | 4490 | 1.61 | 4505 | 1.00
1 | avx128_imul | 128-bit integer muls (vpmuldq) | 1.000 | 938 | 1.67 | 4695 | 1.00
1 | avx256_imul | 256-bit integer muls (vpmuldq) | 1.000 | 938 | 1.68 | 4700 | 1.00
1 | avx512_imul | 512-bit integer muls (vpmuldq) | 1.000 | 898 | 1.61 | 4503 | 1.00
1 | avx128_fma_sparse | 128-bit 64-bit sparse FMAs | 1.000 | 4690 | 1.68 | 4705 | 1.00
1 | avx256_fma_sparse | 256-bit 64-bit sparse FMAs | 1.000 | 4688 | 1.67 | 4692 | 1.00
1 | avx512_fma_sparse | 512-bit 64-bit sparse FMAs | 1.000 | 4490 | 1.61 | 4505 | 1.00
1 | avx128_fma | 128-bit serial DP FMAs | 1.000 | 1172 | 1.68 | 4705 | 1.00
1 | avx256_fma | 256-bit serial DP FMAs | 1.000 | 1172 | 1.68 | 4705 | 1.00
1 | avx512_fma | 512-bit serial DP FMAs | 1.000 | 1122 | 1.61 | 4505 | 1.00
1 | avx128_fma_t | 128-bit parallel DP FMAs | 1.000 | 9370 | 1.68 | 4705 | 1.00
1 | avx256_fma_t | 256-bit parallel DP FMAs | 1.000 | 9375 | 1.68 | 4705 | 1.00
1 | avx512_fma_t | 512-bit parallel DP FMAs | 1.000 | 4489 | 1.61 | 4505 | 1.00
1 | avx512_vpermw | 512-bit serial WORD permute | 1.000 | 1122 | 1.61 | 4505 | 1.00
1 | avx512_vpermw_t | 512-bit parallel WORD permute | 1.000 | 4090 | 1.46 | 4105 | 1.00
1 | avx512_vpermd | 512-bit serial DWORD permute | 1.000 | 1363 | 1.46 | 4105 | 1.00
1 | avx512_vpermd_t | 512-bit parallel DWORD permute | 1.000 | 4490 | 1.61 | 4505 | 1.00
Cores | ID | Description | OVRLP3 | Mops | A/M-ratio | A/M-MHz | M/tsc-ratio
2 | pause_only | pause instruction | 1.000 | 1649, 1649 | 0.86, 0.86 | 2403, 2403 | 1.00, 1.00
2 | ucomis_clean | scalar ucomis (w/ vzeroupper) | 1.000 | 1163, 1163 | 1.60, 1.60 | 4487, 4488 | 1.00, 1.00
2 | ucomis_dirty | scalar ucomis (no vzeroupper) | 1.000 | 1163, 1163 | 1.67, 1.67 | 4689, 4689 | 1.00, 1.00
2 | scalar_iadd | Scalar integer adds | 1.000 | 4688, 4688 | 1.67, 1.67 | 4681, 4681 | 1.00, 1.00
2 | avx128_iadd | 128-bit integer serial adds | 1.000 | 4689, 4689 | 1.67, 1.67 | 4683, 4683 | 1.00, 1.00
2 | avx256_iadd | 256-bit integer serial adds | 1.000 | 4689, 4689 | 1.67, 1.67 | 4682, 4682 | 1.00, 1.00
2 | avx512_iadd | 512-bit integer serial adds | 1.000 | 4489, 4489 | 1.60, 1.60 | 4490, 4490 | 1.00, 1.00
2 | avx128_iadd16 | 128-bit integer serial adds zmm16 | 1.000 | 4688, 4689 | 1.66, 1.67 | 4643, 4684 | 1.00, 1.00
2 | avx256_iadd16 | 256-bit integer serial adds zmm16 | 1.000 | 4689, 4688 | 1.67, 1.67 | 4681, 4681 | 1.00, 1.00
2 | avx512_iadd16 | 512-bit integer serial adds zmm16 | 1.000 | 4490, 4489 | 1.60, 1.60 | 4496, 4496 | 1.00, 1.00
2 | avx128_iadd_t | 128-bit integer parallel adds | 1.000 | 14027, 14065 | 1.63, 1.68 | 4560, 4705 | 1.00, 1.00
2 | avx256_iadd_t | 256-bit integer parallel adds | 1.000 | 13826, 14015 | 1.63, 1.63 | 4566, 4567 | 1.00, 1.00
2 | avx128_xor_zero | 128-bit zeroing xor | 1.000 | 23213, 23213 | 1.68, 1.68 | 4705, 4705 | 1.00, 1.00
2 | avx256_xor_zero | 256-bit zeroing xor | 1.000 | 23143, 23143 | 1.57, 1.57 | 4396, 4395 | 1.00, 1.00
2 | avx512_xor_zero | 512-bit zeroing xord | 1.000 | 22227, 22227 | 1.61, 1.61 | 4505, 4505 | 1.00, 1.00
2 | avx128_mov_sparse | 128-bit reg-reg mov | 1.000 | 4689, 4689 | 1.67, 1.67 | 4682, 4682 | 1.00, 1.00
2 | avx256_mov_sparse | 256-bit reg-reg mov | 1.000 | 4690, 4689 | 1.67, 1.67 | 4693, 4693 | 1.00, 1.00
2 | avx512_mov_sparse | 512-bit reg-reg mov | 1.000 | 4490, 4489 | 1.60, 1.60 | 4496, 4496 | 1.00, 1.00
2 | avx128_merge_sparse | 128-bit reg-reg merge mov | 1.000 | 4689, 4689 | 1.67, 1.67 | 4684, 4684 | 1.00, 1.00
2 | avx256_merge_sparse | 256-bit reg-reg merge mov | 1.000 | 4690, 4690 | 1.67, 1.67 | 4677, 4678 | 1.00, 1.00
2 | avx512_merge_sparse | 512-bit reg-reg merge mov | 1.000 | 4490, 4490 | 1.60, 1.60 | 4490, 4490 | 1.00, 1.00
2 | avx128_vshift | 128-bit variable shift (vpsrlvd) | 1.000 | 4689, 4689 | 1.67, 1.67 | 4693, 4693 | 1.00, 1.00
2 | avx256_vshift | 256-bit variable shift (vpsrlvd) | 1.000 | 4689, 4689 | 1.67, 1.67 | 4684, 4684 | 1.00, 1.00
2 | avx512_vshift | 512-bit variable shift (vpsrlvd) | 1.000 | 4091, 4091 | 1.46, 1.46 | 4105, 4105 | 1.00, 1.00
2 | avx128_vshift_t | 128-bit variable shift (vpsrlvd) | 1.000 | 9384, 9386 | 1.67, 1.67 | 4684, 4684 | 1.00, 1.00
2 | avx256_vshift_t | 256-bit variable shift (vpsrlvd) | 1.000 | 8179, 8181 | 1.46, 1.46 | 4105, 4105 | 1.00, 1.00
2 | avx512_vshift_t | 512-bit variable shift (vpsrlvd) | 1.000 | 4490, 4490 | 1.60, 1.60 | 4497, 4497 | 1.00, 1.00
2 | avx128_vlzcnt | 128-bit lzcnt (vplzcntd) | 1.000 | 1172, 1172 | 1.67, 1.67 | 4671, 4673 | 1.00, 1.00
2 | avx256_vlzcnt | 256-bit lzcnt (vplzcntd) | 1.000 | 1172, 1172 | 1.61, 1.61 | 4514, 4514 | 1.00, 1.00
2 | avx512_vlzcnt | 512-bit lzcnt (vplzcntd) | 1.000 | 1122, 1122 | 1.60, 1.60 | 4490, 4490 | 1.00, 1.00
2 | avx128_vlzcnt_t | 128-bit lzcnt (vplzcntd) | 1.000 | 9360, 9352 | 1.66, 1.63 | 4659, 4560 | 1.00, 1.00
2 | avx256_vlzcnt_t | 256-bit lzcnt (vplzcntd) | 1.000 | 9176, 9183 | 1.63, 1.63 | 4562, 4563 | 1.00, 1.00
2 | avx512_vlzcnt_t | 512-bit lzcnt (vplzcntd) | 1.000 | 4487, 4486 | 1.59, 1.59 | 4464, 4465 | 1.00, 1.00
2 | avx128_imul | 128-bit integer muls (vpmuldq) | 1.000 | 938, 938 | 1.67, 1.67 | 4685, 4685 | 1.00, 1.00
2 | avx256_imul | 256-bit integer muls (vpmuldq) | 1.000 | 938, 938 | 1.65, 1.65 | 4611, 4611 | 1.00, 1.00
2 | avx512_imul | 512-bit integer muls (vpmuldq) | 1.000 | 898, 898 | 1.58, 1.58 | 4432, 4432 | 1.00, 1.00
2 | avx128_fma_sparse | 128-bit 64-bit sparse FMAs | 1.000 | 4690, 4689 | 1.67, 1.67 | 4685, 4686 | 1.00, 1.00
2 | avx256_fma_sparse | 256-bit 64-bit sparse FMAs | 1.000 | 4688, 4689 | 1.67, 1.67 | 4683, 4683 | 1.00, 1.00
2 | avx512_fma_sparse | 512-bit 64-bit sparse FMAs | 1.000 | 4487, 4488 | 1.60, 1.60 | 4473, 4473 | 1.00, 1.00
2 | avx128_fma | 128-bit serial DP FMAs | 1.000 | 1172, 1172 | 1.66, 1.67 | 4663, 4683 | 1.00, 1.00
2 | avx256_fma | 256-bit serial DP FMAs | 1.000 | 1172, 1172 | 1.67, 1.67 | 4686, 4686 | 1.00, 1.00
2 | avx512_fma | 512-bit serial DP FMAs | 1.000 | 1122, 1122 | 1.59, 1.59 | 4454, 4454 | 1.00, 1.00
2 | avx128_fma_t | 128-bit parallel DP FMAs | 1.000 | 9376, 9375 | 1.67, 1.67 | 4683, 4683 | 1.00, 1.00
2 | avx256_fma_t | 256-bit parallel DP FMAs | 1.000 | 9198, 9200 | 1.65, 1.65 | 4615, 4613 | 1.00, 1.00
2 | avx512_fma_t | 512-bit parallel DP FMAs | 1.000 | 4390, 4390 | 1.57, 1.57 | 4393, 4393 | 1.00, 1.00
2 | avx512_vpermw | 512-bit serial WORD permute | 1.000 | 1122, 1122 | 1.60, 1.60 | 4491, 4491 | 1.00, 1.00
2 | avx512_vpermw_t | 512-bit parallel WORD permute | 1.000 | 4094, 4093 | 1.50, 1.50 | 4214, 4211 | 1.00, 1.00
2 | avx512_vpermd | 512-bit serial DWORD permute | 1.000 | 1497, 1497 | 1.60, 1.60 | 4488, 4488 | 1.00, 1.00
2 | avx512_vpermd_t | 512-bit parallel DWORD permute | 1.000 | 4090, 4090 | 1.46, 1.46 | 4105, 4105 | 1.00, 1.00
Cores | ID | Description | OVRLP3 | Mops | A/M-ratio | A/M-MHz | M/tsc-ratio
3 | pause_only | pause instruction | 1.000 | 1649, 1649, 1649 | 0.86, 0.86, 0.86 | 2403, 2403, 2403 | 1.00, 1.00, 1.00
3 | ucomis_clean | scalar ucomis (w/ vzeroupper) | 1.000 | 1015, 1015, 1015 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | ucomis_dirty | scalar ucomis (no vzeroupper) | 1.000 | 1015, 1015, 1015 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | scalar_iadd | Scalar integer adds | 1.000 | 4091, 4090, 4090 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx128_iadd | 128-bit integer serial adds | 1.000 | 4090, 4091, 4090 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx256_iadd | 256-bit integer serial adds | 1.000 | 4090, 4090, 4091 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx512_iadd | 512-bit integer serial adds | 1.000 | 4091, 4090, 4090 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx128_iadd16 | 128-bit integer serial adds zmm16 | 1.000 | 4091, 4091, 4090 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx256_iadd16 | 256-bit integer serial adds zmm16 | 1.000 | 4090, 4091, 4091 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx512_iadd16 | 512-bit integer serial adds zmm16 | 1.000 | 4091, 4090, 4091 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx128_iadd_t | 128-bit integer parallel adds | 1.000 | 12273, 12268, 12274 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx256_iadd_t | 256-bit integer parallel adds | 1.000 | 12273, 12273, 12274 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx128_xor_zero | 128-bit zeroing xor | 1.000 | 20251, 20255, 20255 | 1.46, 1.46, 1.46 | 4104, 4105, 4104 | 1.00, 1.00, 1.00
3 | avx256_xor_zero | 256-bit zeroing xor | 1.000 | 20251, 20251, 20247 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx512_xor_zero | 512-bit zeroing xord | 1.000 | 20251, 20251, 20251 | 1.46, 1.46, 1.46 | 4105, 4104, 4105 | 1.00, 1.00, 1.00
3 | avx128_mov_sparse | 128-bit reg-reg mov | 1.000 | 4091, 4091, 4091 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx256_mov_sparse | 256-bit reg-reg mov | 1.000 | 4090, 4091, 4091 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx512_mov_sparse | 512-bit reg-reg mov | 1.000 | 4090, 4091, 4090 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx128_merge_sparse | 128-bit reg-reg merge mov | 1.000 | 4091, 4091, 4091 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx256_merge_sparse | 256-bit reg-reg merge mov | 1.000 | 4091, 4090, 4091 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx512_merge_sparse | 512-bit reg-reg merge mov | 1.000 | 4091, 4091, 4091 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx128_vshift | 128-bit variable shift (vpsrlvd) | 1.000 | 4090, 4091, 4091 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx256_vshift | 256-bit variable shift (vpsrlvd) | 1.000 | 4090, 4091, 4090 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx512_vshift | 512-bit variable shift (vpsrlvd) | 1.000 | 4091, 4091, 4091 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx128_vshift_t | 128-bit variable shift (vpsrlvd) | 1.000 | 8181, 8184, 8183 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx256_vshift_t | 256-bit variable shift (vpsrlvd) | 1.000 | 8181, 8183, 8182 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx512_vshift_t | 512-bit variable shift (vpsrlvd) | 1.000 | 4091, 4091, 4090 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx128_vlzcnt | 128-bit lzcnt (vplzcntd) | 1.000 | 1023, 1023, 1023 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx256_vlzcnt | 256-bit lzcnt (vplzcntd) | 1.000 | 1023, 1023, 1023 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx512_vlzcnt | 512-bit lzcnt (vplzcntd) | 1.000 | 1023, 1023, 1023 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx128_vlzcnt_t | 128-bit lzcnt (vplzcntd) | 1.000 | 8182, 8182, 8183 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx256_vlzcnt_t | 256-bit lzcnt (vplzcntd) | 1.000 | 8181, 8182, 8181 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx512_vlzcnt_t | 512-bit lzcnt (vplzcntd) | 1.000 | 4091, 4090, 4091 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx128_imul | 128-bit integer muls (vpmuldq) | 1.000 | 818, 818, 818 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx256_imul | 256-bit integer muls (vpmuldq) | 1.000 | 818, 818, 818 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx512_imul | 512-bit integer muls (vpmuldq) | 1.000 | 818, 818, 818 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx128_fma_sparse | 128-bit 64-bit sparse FMAs | 1.000 | 4090, 4090, 4090 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx256_fma_sparse | 256-bit 64-bit sparse FMAs | 1.000 | 4090, 4090, 4090 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx512_fma_sparse | 512-bit 64-bit sparse FMAs | 1.000 | 4091, 4091, 4091 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx128_fma | 128-bit serial DP FMAs | 1.000 | 1023, 1023, 1023 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx256_fma | 256-bit serial DP FMAs | 1.000 | 1023, 1023, 1023 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx512_fma | 512-bit serial DP FMAs | 1.000 | 1023, 1023, 1023 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx128_fma_t | 128-bit parallel DP FMAs | 1.000 | 8181, 8181, 8180 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx256_fma_t | 256-bit parallel DP FMAs | 1.000 | 8179, 8181, 8180 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx512_fma_t | 512-bit parallel DP FMAs | 1.000 | 4091, 4091, 4090 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx512_vpermw | 512-bit serial WORD permute | 1.000 | 1023, 1023, 1023 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx512_vpermw_t | 512-bit parallel WORD permute | 1.000 | 4091, 4091, 4091 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx512_vpermd | 512-bit serial DWORD permute | 1.000 | 1363, 1363, 1363 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
3 | avx512_vpermd_t | 512-bit parallel DWORD permute | 1.000 | 4090, 4091, 4091 | 1.46, 1.46, 1.46 | 4105, 4105, 4105 | 1.00, 1.00, 1.00
Cores | ID | Description | OVRLP3 | Mops | A/M-ratio | A/M-MHz | M/tsc-ratio
4 | pause_only | pause instruction | 1.000 | 1649, 1649, 1649, 1649 | 0.86, 0.86, 0.86, 0.86 | 2403, 2403, 2403, 2403 | 1.00, 1.00, 1.00, 1.00
4 | ucomis_clean | scalar ucomis (w/ vzeroupper) | 1.000 | 1015, 1015, 1015, 1015 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | ucomis_dirty | scalar ucomis (no vzeroupper) | 1.000 | 1015, 1015, 1015, 1015 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | scalar_iadd | Scalar integer adds | 1.000 | 4091, 4091, 4091, 4091 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx128_iadd | 128-bit integer serial adds | 1.000 | 4091, 4090, 4090, 4091 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx256_iadd | 256-bit integer serial adds | 1.000 | 4091, 4091, 4091, 4091 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx512_iadd | 512-bit integer serial adds | 1.000 | 4090, 4090, 4091, 4091 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx128_iadd16 | 128-bit integer serial adds zmm16 | 1.000 | 4091, 4091, 4091, 4090 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx256_iadd16 | 256-bit integer serial adds zmm16 | 1.000 | 4091, 4090, 4090, 4090 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx512_iadd16 | 512-bit integer serial adds zmm16 | 1.000 | 4091, 4091, 4091, 4091 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx128_iadd_t | 128-bit integer parallel adds | 1.000 | 12274, 12274, 12271, 12274 | 1.46, 1.46, 1.46, 1.46 | 4105, 4104, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx256_iadd_t | 256-bit integer parallel adds | 1.000 | 12276, 12273, 12274, 12273 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx128_xor_zero | 128-bit zeroing xor | 1.000 | 20255, 20251, 20251, 20251 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx256_xor_zero | 256-bit zeroing xor | 1.000 | 20251, 20251, 20247, 20251 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4104 | 1.00, 1.00, 1.00, 1.00
4 | avx512_xor_zero | 512-bit zeroing xord | 1.000 | 20255, 20255, 20251, 20255 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx128_mov_sparse | 128-bit reg-reg mov | 1.000 | 4091, 4091, 4090, 4090 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx256_mov_sparse | 256-bit reg-reg mov | 1.000 | 4090, 4090, 4091, 4090 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx512_mov_sparse | 512-bit reg-reg mov | 1.000 | 4091, 4091, 4091, 4091 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx128_merge_sparse | 128-bit reg-reg merge mov | 1.000 | 4091, 4090, 4091, 4091 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx256_merge_sparse | 256-bit reg-reg merge mov | 1.000 | 4090, 4091, 4091, 4090 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx512_merge_sparse | 512-bit reg-reg merge mov | 1.000 | 4090, 4090, 4090, 4090 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx128_vshift | 128-bit variable shift (vpsrlvd) | 1.000 | 4090, 4091, 4091, 4091 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx256_vshift | 256-bit variable shift (vpsrlvd) | 1.000 | 4090, 4091, 4091, 4091 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx512_vshift | 512-bit variable shift (vpsrlvd) | 1.000 | 4091, 4091, 4091, 4090 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx128_vshift_t | 128-bit variable shift (vpsrlvd) | 1.000 | 8184, 8181, 8181, 8179 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx256_vshift_t | 256-bit variable shift (vpsrlvd) | 1.000 | 8182, 8181, 8182, 8181 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx512_vshift_t | 512-bit variable shift (vpsrlvd) | 1.000 | 4091, 4090, 4091, 4090 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx128_vlzcnt | 128-bit lzcnt (vplzcntd) | 1.000 | 1023, 1023, 1023, 1023 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx256_vlzcnt | 256-bit lzcnt (vplzcntd) | 1.000 | 1023, 1023, 1023, 1023 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx512_vlzcnt | 512-bit lzcnt (vplzcntd) | 1.000 | 998, 998, 998, 998 | 1.43, 1.43, 1.43, 1.43 | 4006, 4006, 4006, 4006 | 1.00, 1.00, 1.00, 1.00
4 | avx128_vlzcnt_t | 128-bit lzcnt (vplzcntd) | 1.000 | 8182, 8181, 8181, 8183 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx256_vlzcnt_t | 256-bit lzcnt (vplzcntd) | 1.000 | 8182, 8184, 8182, 8183 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx512_vlzcnt_t | 512-bit lzcnt (vplzcntd) | 1.000 | 3889, 3889, 3889, 3889 | 1.38, 1.38, 1.38, 1.38 | 3881, 3881, 3881, 3881 | 1.00, 1.00, 1.00, 1.00
4 | avx128_imul | 128-bit integer muls (vpmuldq) | 1.000 | 818, 818, 818, 818 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx256_imul | 256-bit integer muls (vpmuldq) | 1.000 | 818, 818, 818, 818 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx512_imul | 512-bit integer muls (vpmuldq) | 1.000 | 798, 798, 798, 798 | 1.43, 1.43, 1.43, 1.43 | 4014, 4014, 4014, 4014 | 1.00, 1.00, 1.00, 1.00
4 | avx128_fma_sparse | 128-bit 64-bit sparse FMAs | 1.000 | 4090, 4091, 4091, 4091 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx256_fma_sparse | 256-bit 64-bit sparse FMAs | 1.000 | 4091, 4091, 4091, 4091 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx512_fma_sparse | 512-bit 64-bit sparse FMAs | 1.000 | 3989, 3989, 3989, 3989 | 1.42, 1.42, 1.42, 1.42 | 3978, 3978, 3978, 3978 | 1.00, 1.00, 1.00, 1.00
4 | avx128_fma | 128-bit serial DP FMAs | 1.000 | 1023, 1023, 1023, 1023 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx256_fma | 256-bit serial DP FMAs | 1.000 | 1023, 1023, 1023, 1023 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx512_fma | 512-bit serial DP FMAs | 1.000 | 997, 997, 997, 997 | 1.42, 1.42, 1.42, 1.42 | 3967, 3967, 3967, 3968 | 1.00, 1.00, 1.00, 1.00
4 | avx128_fma_t | 128-bit parallel DP FMAs | 1.000 | 8181, 8179, 8180, 8181 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx256_fma_t | 256-bit parallel DP FMAs | 1.000 | 8182, 8181, 8182, 8182 | 1.46, 1.46, 1.46, 1.46 | 4105, 4105, 4105, 4105 | 1.00, 1.00, 1.00, 1.00
4 | avx512_fma_t | 512-bit parallel DP FMAs | 1.000 | 3791, 3791, 3792, 3791 | 1.36, 1.36, 1.36, 1.36 | 3804, 3804, 3804, 3804 | 1.00, 1.00, 1.00, 1.00
4 | avx512_vpermw | 512-bit serial WORD permute | 1.000 | 973, 973, 973, 973 | 1.39, 1.39, 1.39, 1.39 | 3892, 3892, 3892, 3892 | 1.00, 1.00, 1.00, 1.00
4 | avx512_vpermw_t | 512-bit parallel WORD permute | 1.000 | 3691, 3692, 3692, 3691 | 1.32, 1.32, 1.32, 1.32 | 3704, 3704, 3704, 3704 | 1.00, 1.00, 1.00, 1.00
4 | avx512_vpermd | 512-bit serial DWORD permute | 1.000 | 1297, 1297, 1297, 1297 | 1.40, 1.40, 1.40, 1.40 | 3928, 3928, 3928, 3928 | 1.00, 1.00, 1.00, 1.00
4 | avx512_vpermd_t | 512-bit parallel DWORD permute | 1.000 | 3792, 3792, 3792, 3792 | 1.36, 1.36, 1.36, 1.36 | 3816, 3816, 3816, 3816 | 1.00, 1.00, 1.00, 1.00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment