Skip to content

Instantly share code, notes, and snippets.

@AngryLoki
Created January 18, 2024 21:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save AngryLoki/b94f6a1c3ee0ce757790dde47a5e2de6 to your computer and use it in GitHub Desktop.
Save AngryLoki/b94f6a1c3ee0ce757790dde47a5e2de6 to your computer and use it in GitHub Desktop.
openimageio-2.5.5.0-r1.ebuild unit_simd test failure
156/168 Testing: unit_simd
156/168 Test: unit_simd
Command: "/var/tmp/portage/media-libs/openimageio-2.5.5.0-r1/work/OpenImageIO-2.5.5.0_build/bin/simd_test"
Directory: /var/tmp/portage/media-libs/openimageio-2.5.5.0-r1/work/OpenImageIO-2.5.5.0_build/src/libutil
"unit_simd" start time: Jan 18 21:38 UTC
Output:
----------------------------------------------------------
OIIO SIMD support is: sse2,sse3,ssse3,sse41,sse42,avx,avx2,avx512f,avx512dq,avx512ifma,avx512cd,avx512bw,avx512vl,fma,f16c
Hardware SIMD support is: sse2,sse3,ssse3,sse41,sse42,avx,avx2,avx512f,avx512dq,avx512ifma,avx512cd,avx512bw,avx512vl,fma,f16c,popcnt,rdrand
null benchmark 4: 19120.5 Mvals/sec, (19120.5 Mcalls/sec)
null benchmark 8: 18975.3 Mvals/sec, (18975.3 Mcalls/sec)
vfloat4
load/store vfloat4
partial load 1 : 101 0 0 0
partial store 1 : 1 0 0 0
partial load 2 : 101 102 0 0
partial store 2 : 1 2 0 0
partial load 3 : 101 102 103 0
partial store 3 : 1 2 3 0
partial load 4 : 101 102 103 104
partial store 4 : 1 2 3 4
load scalar: 18458.7 Mvals/sec, (4614.7 Mcalls/sec)
load vec: 18475.8 Mvals/sec, (4618.9 Mcalls/sec)
store vec: 18484.3 Mvals/sec, (4621.1 Mcalls/sec)
load 4 comps: 18484.3 Mvals/sec, (4621.1 Mcalls/sec)
load 3 comps: 12330.5 Mvals/sec, (4110.2 Mcalls/sec)
load 2 comps: 7270.1 Mvals/sec, (3635.0 Mcalls/sec)
load 1 comps: 2815.3 Mvals/sec, (2815.3 Mcalls/sec)
store 4 comps: 12507.8 Mvals/sec, (3127.0 Mcalls/sec)
store 3 comps: 8136.7 Mvals/sec, (2712.2 Mcalls/sec)
store 2 comps: 10443.9 Mvals/sec, (5221.9 Mcalls/sec)
store 1 comps: 5524.9 Mvals/sec, (5524.9 Mcalls/sec)
load/store with conversion vfloat4
load from unsigned short[]: 18082.4 Mvals/sec, (4520.6 Mcalls/sec)
load from short[]: 16870.5 Mvals/sec, (4217.6 Mcalls/sec)
load from unsigned char[]: 16757.4 Mvals/sec, (4189.4 Mcalls/sec)
load from char[]: 16743.4 Mvals/sec, (4185.9 Mcalls/sec)
load from half[]: 16827.9 Mvals/sec, (4207.0 Mcalls/sec)
store to half[]: 86393.1 Mvals/sec, (21598.3 Mcalls/sec)
masked loadstore vfloat4
masked load with int mask: 16820.9 Mvals/sec, (4205.2 Mcalls/sec)
masked load with bool mask: 16806.7 Mvals/sec, (4201.7 Mcalls/sec)
masked store with int mask: 21322.0 Mvals/sec, (21322.0 Mcalls/sec)
masked store with bool mask: 21598.3 Mvals/sec, (21598.3 Mcalls/sec)
scatter & gather vfloat4
gather: 1902.3 Mvals/sec, (475.6 Mcalls/sec)
gather_mask: 1909.3 Mvals/sec, (477.3 Mcalls/sec)
scatter: 3370.1 Mvals/sec, (842.5 Mcalls/sec)
scatter_mask: 4857.9 Mvals/sec, (1214.5 Mcalls/sec)
component_access vfloat4
operator[i]: 22075.1 Mvals/sec, (22075.1 Mcalls/sec)
operator[2]: 21505.4 Mvals/sec, (21505.4 Mcalls/sec)
operator[0]: 21598.3 Mvals/sec, (21598.3 Mcalls/sec)
extract<2> : 21505.4 Mvals/sec, (21505.4 Mcalls/sec)
extract<0> : 21692.0 Mvals/sec, (21692.0 Mcalls/sec)
insert<2> : 4212.3 Mvals/sec, (4212.3 Mcalls/sec)
arithmetic vfloat4
operator+: 16863.4 Mvals/sec, (4215.9 Mcalls/sec)
operator-: 16884.8 Mvals/sec, (4221.2 Mcalls/sec)
operator- (neg): 16638.9 Mvals/sec, (4159.7 Mcalls/sec)
operator*: 16856.3 Mvals/sec, (4214.1 Mcalls/sec)
operator* (scalar): 16884.8 Mvals/sec, (4221.2 Mcalls/sec)
operator/: 16913.3 Mvals/sec, (4228.3 Mcalls/sec)
abs: 16870.5 Mvals/sec, (4217.6 Mcalls/sec)
reduce_add: 16715.4 Mvals/sec, (4178.9 Mcalls/sec)
reference: add scalar: 21505.4 Mvals/sec, (21505.4 Mcalls/sec)
reference: mul scalar: 22026.4 Mvals/sec, (22026.4 Mcalls/sec)
reference: div scalar: 21881.8 Mvals/sec, (21881.8 Mcalls/sec)
comparisons vfloat4
operator< : 17035.8 Mvals/sec, (4258.9 Mcalls/sec)
operator> : 17035.8 Mvals/sec, (4258.9 Mcalls/sec)
operator<=: 16842.1 Mvals/sec, (4210.5 Mcalls/sec)
operator>=: 16813.8 Mvals/sec, (4203.4 Mcalls/sec)
operator==: 16799.7 Mvals/sec, (4199.9 Mcalls/sec)
operator!=: 16813.8 Mvals/sec, (4203.4 Mcalls/sec)
shuffle vfloat4
shuffle<...> : 16820.9 Mvals/sec, (4205.2 Mcalls/sec)
shuffle<0> : 16877.6 Mvals/sec, (4219.4 Mcalls/sec)
shuffle<1> : 16877.6 Mvals/sec, (4219.4 Mcalls/sec)
shuffle<2> : 16813.8 Mvals/sec, (4203.4 Mcalls/sec)
shuffle<3> : 16877.6 Mvals/sec, (4219.4 Mcalls/sec)
swizzle vfloat4
blend vfloat4
blend: 16870.5 Mvals/sec, (4217.6 Mcalls/sec)
blend0: 16792.6 Mvals/sec, (4198.2 Mcalls/sec)
blend0not: 16849.2 Mvals/sec, (4212.3 Mcalls/sec)
transpose vfloat4
before transpose:
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
after transpose:
0 4 8 12
1 5 9 13
2 6 10 14
3 7 11 15
vectorops vfloat4
vdot: 16856.3 Mvals/sec, (4214.1 Mcalls/sec)
dot: 4208.8 Mvals/sec, (4208.8 Mcalls/sec)
vdot3: 16949.2 Mvals/sec, (4237.3 Mcalls/sec)
dot3: 4224.8 Mvals/sec, (4224.8 Mcalls/sec)
fused vfloat4
madd old *+: 16835.0 Mvals/sec, (4208.8 Mcalls/sec)
madd fused: 16827.9 Mvals/sec, (4207.0 Mcalls/sec)
msub old *-: 16842.1 Mvals/sec, (4210.5 Mcalls/sec)
msub fused: 16842.1 Mvals/sec, (4210.5 Mcalls/sec)
nmadd old (-*)+: 16899.0 Mvals/sec, (4224.8 Mcalls/sec)
nmadd fused: 16820.9 Mvals/sec, (4205.2 Mcalls/sec)
nmsub old -(*+): 16632.0 Mvals/sec, (4158.0 Mcalls/sec)
nmsub fused: 16842.1 Mvals/sec, (4210.5 Mcalls/sec)
mathfuncs vfloat4
simd abs: 16877.6 Mvals/sec, (4219.4 Mcalls/sec)
simd sign: 16849.2 Mvals/sec, (4212.3 Mcalls/sec)
simd ceil: 16785.6 Mvals/sec, (4196.4 Mcalls/sec)
simd floor: 16778.5 Mvals/sec, (4194.6 Mcalls/sec)
simd round: 16806.7 Mvals/sec, (4201.7 Mcalls/sec)
simd operator/: 16877.6 Mvals/sec, (4219.4 Mcalls/sec)
simd safe_div: 16977.9 Mvals/sec, (4244.5 Mcalls/sec)
simd rcp_fast: 17064.8 Mvals/sec, (4266.2 Mcalls/sec)
float ifloor: 21598.3 Mvals/sec, (21598.3 Mcalls/sec)
simd ifloor: 16625.1 Mvals/sec, (4156.3 Mcalls/sec)
float floorfrac: 21551.7 Mvals/sec, (21551.7 Mcalls/sec)
simd floorfrac: 6711.4 Mvals/sec, (1677.9 Mcalls/sec)
float expf: 21739.1 Mvals/sec, (21739.1 Mcalls/sec)
float fast_exp: 21505.4 Mvals/sec, (21505.4 Mcalls/sec)
simd exp: 6877.6 Mvals/sec, (1719.4 Mcalls/sec)
simd fast_exp: 9799.1 Mvals/sec, (2449.8 Mcalls/sec)
float logf: 21551.7 Mvals/sec, (21551.7 Mcalls/sec)
fast_log: 21459.2 Mvals/sec, (21459.2 Mcalls/sec)
simd log: 6944.4 Mvals/sec, (1736.1 Mcalls/sec)
simd fast_log: 9622.3 Mvals/sec, (2405.6 Mcalls/sec)
float powf: 21786.5 Mvals/sec, (21786.5 Mcalls/sec)
simd fast_pow_pos: 5914.5 Mvals/sec, (1478.6 Mcalls/sec)
float sqrt: 455.7 Mvals/sec, (455.7 Mcalls/sec)
simd sqrt: 16792.6 Mvals/sec, (4198.2 Mcalls/sec)
float rsqrt: 21186.4 Mvals/sec, (21186.4 Mcalls/sec)
simd rsqrt: 16820.9 Mvals/sec, (4205.2 Mcalls/sec)
simd rsqrt_fast: 16764.5 Mvals/sec, (4191.1 Mcalls/sec)
vfloat3
load/store vfloat3
partial load 1 : 101 0 0
partial store 1 : 1 0 0
partial load 2 : 101 102 0
partial store 2 : 1 2 0
partial load 3 : 101 102 103
partial store 3 : 1 2 3
load scalar: 11843.2 Mvals/sec, (3947.7 Mcalls/sec)
load vec: 11957.0 Mvals/sec, (3985.7 Mcalls/sec)
store vec: 8246.3 Mvals/sec, (2748.8 Mcalls/sec)
load 3 comps: 11900.0 Mvals/sec, (3966.7 Mcalls/sec)
load 2 comps: 8438.8 Mvals/sec, (4219.4 Mcalls/sec)
load 1 comps: 4163.2 Mvals/sec, (4163.2 Mcalls/sec)
store 3 comps: 8255.4 Mvals/sec, (2751.8 Mcalls/sec)
store 2 comps: 11129.7 Mvals/sec, (5564.8 Mcalls/sec)
store 1 comps: 5540.2 Mvals/sec, (5540.2 Mcalls/sec)
load/store with conversion vfloat3
load from unsigned short[]: 12605.0 Mvals/sec, (4201.7 Mcalls/sec)
load from short[]: 12589.2 Mvals/sec, (4196.4 Mcalls/sec)
load from unsigned char[]: 12578.6 Mvals/sec, (4192.9 Mcalls/sec)
load from char[]: 12589.2 Mvals/sec, (4196.4 Mcalls/sec)
load from half[]: 12610.3 Mvals/sec, (4203.4 Mcalls/sec)
store to half[]: 62761.5 Mvals/sec, (20920.5 Mcalls/sec)
component_access vfloat3
operator[i]: 21881.8 Mvals/sec, (21881.8 Mcalls/sec)
operator[2]: 21598.3 Mvals/sec, (21598.3 Mcalls/sec)
operator[0]: 21739.1 Mvals/sec, (21739.1 Mcalls/sec)
extract<2> : 21276.6 Mvals/sec, (21276.6 Mcalls/sec)
extract<0> : 21505.4 Mvals/sec, (21505.4 Mcalls/sec)
insert<2> : 4192.9 Mvals/sec, (4192.9 Mcalls/sec)
arithmetic vfloat3
operator+: 12547.1 Mvals/sec, (4182.4 Mcalls/sec)
operator-: 12594.5 Mvals/sec, (4198.2 Mcalls/sec)
operator- (neg): 12474.0 Mvals/sec, (4158.0 Mcalls/sec)
operator*: 12573.3 Mvals/sec, (4191.1 Mcalls/sec)
operator* (scalar): 12728.0 Mvals/sec, (4242.7 Mcalls/sec)
operator/: 12605.0 Mvals/sec, (4201.7 Mcalls/sec)
abs: 12631.6 Mvals/sec, (4210.5 Mcalls/sec)
reduce_add: 12599.7 Mvals/sec, (4199.9 Mcalls/sec)
add Imath::V3f: 8241.8 Mvals/sec, (2747.3 Mcalls/sec)
add Imath::V3f with simd: 8187.8 Mvals/sec, (2729.3 Mcalls/sec)
sub Imath::V3f: 8178.8 Mvals/sec, (2726.3 Mcalls/sec)
mul Imath::V3f: 7317.1 Mvals/sec, (2439.0 Mcalls/sec)
div Imath::V3f: 8230.5 Mvals/sec, (2743.5 Mcalls/sec)
reference: add scalar: 22471.9 Mvals/sec, (22471.9 Mcalls/sec)
reference: mul scalar: 21413.3 Mvals/sec, (21413.3 Mcalls/sec)
reference: div scalar: 20876.8 Mvals/sec, (20876.8 Mcalls/sec)
vectorops vfloat3
vdot: 12610.3 Mvals/sec, (4203.4 Mcalls/sec)
dot: 4180.6 Mvals/sec, (4180.6 Mcalls/sec)
dot vfloat3: 4205.2 Mvals/sec, (4205.2 Mcalls/sec)
dot Imath::V3f: 21598.3 Mvals/sec, (21598.3 Mcalls/sec)
dot Imath::V3f with simd: 4196.4 Mvals/sec, (4196.4 Mcalls/sec)
normalize Imath: 2757.9 Mvals/sec, (2757.9 Mcalls/sec)
normalize Imath with simd: 2101.3 Mvals/sec, (2101.3 Mcalls/sec)
normalize Imath with simd fast: 2101.7 Mvals/sec, (2101.7 Mcalls/sec)
normalize simd: 12647.6 Mvals/sec, (4215.9 Mcalls/sec)
normalize simd fast: 12610.3 Mvals/sec, (4203.4 Mcalls/sec)
fused vfloat3
madd old *+: 12663.6 Mvals/sec, (4221.2 Mcalls/sec)
madd fused: 16827.9 Mvals/sec, (4207.0 Mcalls/sec)
msub old *-: 12668.9 Mvals/sec, (4223.0 Mcalls/sec)
msub fused: 16835.0 Mvals/sec, (4208.8 Mcalls/sec)
nmadd old (-*)+: 12663.6 Mvals/sec, (4221.2 Mcalls/sec)
nmadd fused: 16884.8 Mvals/sec, (4221.2 Mcalls/sec)
nmsub old -(*+): 12594.5 Mvals/sec, (4198.2 Mcalls/sec)
nmsub fused: 16849.2 Mvals/sec, (4212.3 Mcalls/sec)
vfloat8
load/store vfloat8
partial load 1 : 101 0 0 0 0 0 0 0
partial store 1 : 1 0 0 0 0 0 0 0
partial load 2 : 101 102 0 0 0 0 0 0
partial store 2 : 1 2 0 0 0 0 0 0
partial load 3 : 101 102 103 0 0 0 0 0
partial store 3 : 1 2 3 0 0 0 0 0
partial load 4 : 101 102 103 104 0 0 0 0
partial store 4 : 1 2 3 4 0 0 0 0
partial load 5 : 101 102 103 104 105 0 0 0
partial store 5 : 1 2 3 4 5 0 0 0
partial load 6 : 101 102 103 104 105 106 0 0
partial store 6 : 1 2 3 4 5 6 0 0
partial load 7 : 101 102 103 104 105 106 107 0
partial store 7 : 1 2 3 4 5 6 7 0
partial load 8 : 101 102 103 104 105 106 107 108
partial store 8 : 1 2 3 4 5 6 7 8
load scalar: 31274.4 Mvals/sec, (3909.3 Mcalls/sec)
load vec: 30983.7 Mvals/sec, (3873.0 Mcalls/sec)
store vec: 31311.2 Mvals/sec, (3913.9 Mcalls/sec)
load 8 comps: 24442.4 Mvals/sec, (3055.3 Mcalls/sec)
load 7 comps: 17148.5 Mvals/sec, (2449.8 Mcalls/sec)
load 6 comps: 17331.0 Mvals/sec, (2888.5 Mcalls/sec)
load 5 comps: 14560.3 Mvals/sec, (2912.1 Mcalls/sec)
load 4 comps: 14367.8 Mvals/sec, (3592.0 Mcalls/sec)
load 3 comps: 10567.1 Mvals/sec, (3522.4 Mcalls/sec)
load 2 comps: 7165.9 Mvals/sec, (3582.9 Mcalls/sec)
load 1 comps: 3562.5 Mvals/sec, (3562.5 Mcalls/sec)
store 8 comps: 16827.9 Mvals/sec, (2103.5 Mcalls/sec)
store 7 comps: 12708.8 Mvals/sec, (1815.5 Mcalls/sec)
store 6 comps: 16560.9 Mvals/sec, (2760.1 Mcalls/sec)
store 5 comps: 13736.3 Mvals/sec, (2747.3 Mcalls/sec)
store 4 comps: 16877.6 Mvals/sec, (4219.4 Mcalls/sec)
store 3 comps: 8273.6 Mvals/sec, (2757.9 Mcalls/sec)
store 2 comps: 10952.9 Mvals/sec, (5476.5 Mcalls/sec)
store 1 comps: 5482.5 Mvals/sec, (5482.5 Mcalls/sec)
load/store with conversion vfloat8
load from unsigned short[]: 31384.9 Mvals/sec, (3923.1 Mcalls/sec)
load from short[]: 31372.5 Mvals/sec, (3921.6 Mcalls/sec)
load from unsigned char[]: 31446.5 Mvals/sec, (3930.8 Mcalls/sec)
load from char[]: 31397.2 Mvals/sec, (3924.6 Mcalls/sec)
load from half[]: 30995.7 Mvals/sec, (3874.5 Mcalls/sec)
store to half[]: 174291.9 Mvals/sec, (21786.5 Mcalls/sec)
masked loadstore vfloat8
masked load with int mask: 31384.9 Mvals/sec, (3923.1 Mcalls/sec)
masked load with bool mask: 31348.0 Mvals/sec, (3918.5 Mcalls/sec)
masked store with int mask: 21834.1 Mvals/sec, (21834.1 Mcalls/sec)
masked store with bool mask: 21739.1 Mvals/sec, (21739.1 Mcalls/sec)
scatter & gather vfloat8
gather: 2347.4 Mvals/sec, (293.4 Mcalls/sec)
gather_mask: 920.7 Mvals/sec, (115.1 Mcalls/sec)
scatter: 2091.6 Mvals/sec, (261.4 Mcalls/sec)
scatter_mask: 2072.6 Mvals/sec, (259.1 Mcalls/sec)
component_access vfloat8
operator[i]: 21645.0 Mvals/sec, (21645.0 Mcalls/sec)
operator[2]: 21645.0 Mvals/sec, (21645.0 Mcalls/sec)
operator[0]: 21929.8 Mvals/sec, (21929.8 Mcalls/sec)
extract<2> : 21322.0 Mvals/sec, (21322.0 Mcalls/sec)
extract<0> : 21322.0 Mvals/sec, (21322.0 Mcalls/sec)
insert<2> : 3468.6 Mvals/sec, (3468.6 Mcalls/sec)
arithmetic vfloat8
operator+: 31176.9 Mvals/sec, (3897.1 Mcalls/sec)
operator-: 31164.8 Mvals/sec, (3895.6 Mcalls/sec)
operator- (neg): 31620.6 Mvals/sec, (3952.6 Mcalls/sec)
operator*: 31360.3 Mvals/sec, (3920.0 Mcalls/sec)
operator* (scalar): 31311.2 Mvals/sec, (3913.9 Mcalls/sec)
operator/: 31274.4 Mvals/sec, (3909.3 Mcalls/sec)
abs: 31104.2 Mvals/sec, (3888.0 Mcalls/sec)
reduce_add: 31152.6 Mvals/sec, (3894.1 Mcalls/sec)
reference: add scalar: 22522.5 Mvals/sec, (22522.5 Mcalls/sec)
reference: mul scalar: 21881.8 Mvals/sec, (21881.8 Mcalls/sec)
reference: div scalar: 21739.1 Mvals/sec, (21739.1 Mcalls/sec)
comparisons vfloat8
operator< : 31262.2 Mvals/sec, (3907.8 Mcalls/sec)
operator> : 31250.0 Mvals/sec, (3906.2 Mcalls/sec)
operator<=: 31164.8 Mvals/sec, (3895.6 Mcalls/sec)
operator>=: 31225.6 Mvals/sec, (3903.2 Mcalls/sec)
operator==: 31201.2 Mvals/sec, (3900.2 Mcalls/sec)
operator!=: 31250.0 Mvals/sec, (3906.2 Mcalls/sec)
shuffle vfloat8
shuffle<...> : 31189.1 Mvals/sec, (3898.6 Mcalls/sec)
shuffle<0> : 31140.5 Mvals/sec, (3892.6 Mcalls/sec)
shuffle<1> : 30983.7 Mvals/sec, (3873.0 Mcalls/sec)
shuffle<2> : 31250.0 Mvals/sec, (3906.2 Mcalls/sec)
shuffle<3> : 31152.6 Mvals/sec, (3894.1 Mcalls/sec)
shuffle<4> : 31116.3 Mvals/sec, (3889.5 Mcalls/sec)
shuffle<5> : 31116.3 Mvals/sec, (3889.5 Mcalls/sec)
shuffle<6> : 31189.1 Mvals/sec, (3898.6 Mcalls/sec)
shuffle<7> : 31007.8 Mvals/sec, (3876.0 Mcalls/sec)
blend vfloat8
blend: 31152.6 Mvals/sec, (3894.1 Mcalls/sec)
blend0: 31237.8 Mvals/sec, (3904.7 Mcalls/sec)
blend0not: 31116.3 Mvals/sec, (3889.5 Mcalls/sec)
fused vfloat8
madd old *+: 31080.0 Mvals/sec, (3885.0 Mcalls/sec)
madd fused: 31250.0 Mvals/sec, (3906.2 Mcalls/sec)
msub old *-: 31055.9 Mvals/sec, (3882.0 Mcalls/sec)
msub fused: 31237.8 Mvals/sec, (3904.7 Mcalls/sec)
nmadd old (-*)+: 31286.7 Mvals/sec, (3910.8 Mcalls/sec)
nmadd fused: 31323.4 Mvals/sec, (3915.4 Mcalls/sec)
nmsub old -(*+): 31225.6 Mvals/sec, (3903.2 Mcalls/sec)
nmsub fused: 31152.6 Mvals/sec, (3894.1 Mcalls/sec)
mathfuncs vfloat8
simd abs: 31360.3 Mvals/sec, (3920.0 Mcalls/sec)
simd sign: 31434.2 Mvals/sec, (3929.3 Mcalls/sec)
simd ceil: 31225.6 Mvals/sec, (3903.2 Mcalls/sec)
simd floor: 30959.8 Mvals/sec, (3870.0 Mcalls/sec)
simd round: 30971.7 Mvals/sec, (3871.5 Mcalls/sec)
simd operator/: 31274.4 Mvals/sec, (3909.3 Mcalls/sec)
simd safe_div: 31213.4 Mvals/sec, (3901.7 Mcalls/sec)
simd rcp_fast: 30674.8 Mvals/sec, (3834.4 Mcalls/sec)
float ifloor: 21739.1 Mvals/sec, (21739.1 Mcalls/sec)
simd ifloor: 30840.4 Mvals/sec, (3855.1 Mcalls/sec)
float floorfrac: 21459.2 Mvals/sec, (21459.2 Mcalls/sec)
simd floorfrac: 12899.1 Mvals/sec, (1612.4 Mcalls/sec)
float expf: 21186.4 Mvals/sec, (21186.4 Mcalls/sec)
float fast_exp: 21276.6 Mvals/sec, (21276.6 Mcalls/sec)
simd exp: 13331.1 Mvals/sec, (1666.4 Mcalls/sec)
simd fast_exp: 11453.1 Mvals/sec, (1431.6 Mcalls/sec)
float logf: 20920.5 Mvals/sec, (20920.5 Mcalls/sec)
fast_log: 20746.9 Mvals/sec, (20746.9 Mcalls/sec)
simd log: 13402.6 Mvals/sec, (1675.3 Mcalls/sec)
simd fast_log: 18148.8 Mvals/sec, (2268.6 Mcalls/sec)
float powf: 6844.6 Mvals/sec, (6844.6 Mcalls/sec)
simd fast_pow_pos: 7258.1 Mvals/sec, (907.3 Mcalls/sec)
float sqrt: 452.3 Mvals/sec, (452.3 Mcalls/sec)
simd sqrt: 30557.7 Mvals/sec, (3819.7 Mcalls/sec)
float rsqrt: 20491.8 Mvals/sec, (20491.8 Mcalls/sec)
simd rsqrt: 30511.1 Mvals/sec, (3813.9 Mcalls/sec)
simd rsqrt_fast: 30326.0 Mvals/sec, (3790.8 Mcalls/sec)
vfloat16
load/store vfloat16
partial load 1 : 101 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
partial store 1 : 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
partial load 2 : 101 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0
partial store 2 : 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0
partial load 3 : 101 102 103 0 0 0 0 0 0 0 0 0 0 0 0 0
partial store 3 : 1 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0
partial load 4 : 101 102 103 104 0 0 0 0 0 0 0 0 0 0 0 0
partial store 4 : 1 2 3 4 0 0 0 0 0 0 0 0 0 0 0 0
partial load 5 : 101 102 103 104 105 0 0 0 0 0 0 0 0 0 0 0
partial store 5 : 1 2 3 4 5 0 0 0 0 0 0 0 0 0 0 0
partial load 6 : 101 102 103 104 105 106 0 0 0 0 0 0 0 0 0 0
partial store 6 : 1 2 3 4 5 6 0 0 0 0 0 0 0 0 0 0
partial load 7 : 101 102 103 104 105 106 107 0 0 0 0 0 0 0 0 0
partial store 7 : 1 2 3 4 5 6 7 0 0 0 0 0 0 0 0 0
partial load 8 : 101 102 103 104 105 106 107 108 0 0 0 0 0 0 0 0
partial store 8 : 1 2 3 4 5 6 7 8 0 0 0 0 0 0 0 0
partial load 9 : 101 102 103 104 105 106 107 108 109 0 0 0 0 0 0 0
partial store 9 : 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0 0
partial load 10 : 101 102 103 104 105 106 107 108 109 110 0 0 0 0 0 0
partial store 10 : 1 2 3 4 5 6 7 8 9 10 0 0 0 0 0 0
partial load 11 : 101 102 103 104 105 106 107 108 109 110 111 0 0 0 0 0
partial store 11 : 1 2 3 4 5 6 7 8 9 10 11 0 0 0 0 0
partial load 12 : 101 102 103 104 105 106 107 108 109 110 111 112 0 0 0 0
partial store 12 : 1 2 3 4 5 6 7 8 9 10 11 12 0 0 0 0
partial load 13 : 101 102 103 104 105 106 107 108 109 110 111 112 113 0 0 0
partial store 13 : 1 2 3 4 5 6 7 8 9 10 11 12 13 0 0 0
partial load 14 : 101 102 103 104 105 106 107 108 109 110 111 112 113 114 0 0
partial store 14 : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 0
partial load 15 : 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 0
partial store 15 : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0
partial load 16 : 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116
partial store 16 : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
load scalar: 29138.6 Mvals/sec, (1821.2 Mcalls/sec)
load vec: 28622.5 Mvals/sec, (1788.9 Mcalls/sec)
store vec: 28891.3 Mvals/sec, (1805.7 Mcalls/sec)
load 16 comps: 28663.6 Mvals/sec, (1791.5 Mcalls/sec)
load 13 comps: 22640.2 Mvals/sec, (1741.6 Mcalls/sec)
load 9 comps: 15600.6 Mvals/sec, (1733.4 Mcalls/sec)
load 8 comps: 14558.7 Mvals/sec, (1819.8 Mcalls/sec)
load 7 comps: 12108.6 Mvals/sec, (1729.8 Mcalls/sec)
load 6 comps: 4904.8 Mvals/sec, (817.5 Mcalls/sec)
load 5 comps: 8735.2 Mvals/sec, (1747.0 Mcalls/sec)
load 4 comps: 7320.6 Mvals/sec, (1830.2 Mcalls/sec)
load 3 comps: 5217.4 Mvals/sec, (1739.1 Mcalls/sec)
load 2 comps: 3653.0 Mvals/sec, (1826.5 Mcalls/sec)
load 1 comps: 1822.5 Mvals/sec, (1822.5 Mcalls/sec)
store 16 comps: 28725.3 Mvals/sec, (1795.3 Mcalls/sec)
store 13 comps: 21262.7 Mvals/sec, (1635.6 Mcalls/sec)
store 9 comps: 16501.7 Mvals/sec, (1833.5 Mcalls/sec)
store 8 comps: 16750.4 Mvals/sec, (2093.8 Mcalls/sec)
store 7 comps: 12547.1 Mvals/sec, (1792.4 Mcalls/sec)
store 6 comps: 16344.3 Mvals/sec, (2724.1 Mcalls/sec)
store 5 comps: 13605.4 Mvals/sec, (2721.1 Mcalls/sec)
store 4 comps: 16743.4 Mvals/sec, (4185.9 Mcalls/sec)
store 3 comps: 8156.6 Mvals/sec, (2718.9 Mcalls/sec)
store 2 comps: 10875.5 Mvals/sec, (5437.7 Mcalls/sec)
store 1 comps: 5423.0 Mvals/sec, (5423.0 Mcalls/sec)
load/store with conversion vfloat16
load from unsigned short[]: 28760.9 Mvals/sec, (1797.6 Mcalls/sec)
load from short[]: 29133.3 Mvals/sec, (1820.8 Mcalls/sec)
load from unsigned char[]: 28808.1 Mvals/sec, (1800.5 Mcalls/sec)
load from char[]: 28802.9 Mvals/sec, (1800.2 Mcalls/sec)
load from half[]: 28818.4 Mvals/sec, (1801.2 Mcalls/sec)
store to half[]: 347826.1 Mvals/sec, (21739.1 Mcalls/sec)
masked loadstore vfloat16
masked load with int mask: 28648.2 Mvals/sec, (1790.5 Mcalls/sec)
masked load with bool mask: 28551.0 Mvals/sec, (1784.4 Mcalls/sec)
masked store with int mask: 21186.4 Mvals/sec, (21186.4 Mcalls/sec)
masked store with bool mask: 21097.0 Mvals/sec, (21097.0 Mcalls/sec)
scatter & gather vfloat16
gather: 2462.0 Mvals/sec, (153.9 Mcalls/sec)
gather_mask: 2269.8 Mvals/sec, (141.9 Mcalls/sec)
scatter: 2041.6 Mvals/sec, (127.6 Mcalls/sec)
scatter_mask: 2309.1 Mvals/sec, (144.3 Mcalls/sec)
component_access vfloat16
operator[i]: 4450.4 Mvals/sec, (4450.4 Mcalls/sec)
operator[2]: 4428.7 Mvals/sec, (4428.7 Mcalls/sec)
operator[0]: 4446.4 Mvals/sec, (4446.4 Mcalls/sec)
extract<2> : 4420.9 Mvals/sec, (4420.9 Mcalls/sec)
extract<0> : 4071.7 Mvals/sec, (4071.7 Mcalls/sec)
insert<2> : 1623.6 Mvals/sec, (1623.6 Mcalls/sec)
arithmetic vfloat16
operator+: 29038.1 Mvals/sec, (1814.9 Mcalls/sec)
operator-: 29043.4 Mvals/sec, (1815.2 Mcalls/sec)
operator- (neg): 29017.0 Mvals/sec, (1813.6 Mcalls/sec)
operator*: 29261.2 Mvals/sec, (1828.8 Mcalls/sec)
operator* (scalar): 29059.2 Mvals/sec, (1816.2 Mcalls/sec)
operator/: 29064.5 Mvals/sec, (1816.5 Mcalls/sec)
abs: 29027.6 Mvals/sec, (1814.2 Mcalls/sec)
reduce_add: 29133.3 Mvals/sec, (1820.8 Mcalls/sec)
reference: add scalar: 4222.8 Mvals/sec, (4222.8 Mcalls/sec)
reference: mul scalar: 4215.9 Mvals/sec, (4215.9 Mcalls/sec)
reference: div scalar: 4214.1 Mvals/sec, (4214.1 Mcalls/sec)
comparisons vfloat16
operator< : 350109.4 Mvals/sec, (21881.8 Mcalls/sec)
operator> : 345572.3 Mvals/sec, (21598.3 Mcalls/sec)
operator<=: 345572.3 Mvals/sec, (21598.3 Mcalls/sec)
operator>=: 348583.9 Mvals/sec, (21786.5 Mcalls/sec)
operator==: 347826.1 Mvals/sec, (21739.1 Mcalls/sec)
operator!=: 348583.9 Mvals/sec, (21786.5 Mcalls/sec)
shuffle vfloat16
shuffle4<> : 28985.5 Mvals/sec, (1811.6 Mcalls/sec)
shuffle<> : 29449.7 Mvals/sec, (1840.6 Mcalls/sec)
blend vfloat16
blend: 29027.6 Mvals/sec, (1814.2 Mcalls/sec)
blend0: 28959.3 Mvals/sec, (1810.0 Mcalls/sec)
blend0not: 29117.4 Mvals/sec, (1819.8 Mcalls/sec)
fused vfloat16
madd old *+: 28901.7 Mvals/sec, (1806.4 Mcalls/sec)
madd fused: 28865.2 Mvals/sec, (1804.1 Mcalls/sec)
msub old *-: 24342.0 Mvals/sec, (1521.4 Mcalls/sec)
msub fused: 28959.3 Mvals/sec, (1810.0 Mcalls/sec)
nmadd old (-*)+: 28896.5 Mvals/sec, (1806.0 Mcalls/sec)
nmadd fused: 28818.4 Mvals/sec, (1801.2 Mcalls/sec)
nmsub old -(*+): 28891.3 Mvals/sec, (1805.7 Mcalls/sec)
nmsub fused: 28808.1 Mvals/sec, (1800.5 Mcalls/sec)
mathfuncs vfloat16
/var/tmp/portage/media-libs/openimageio-2.5.5.0-r1/work/OpenImageIO-2.5.5.0/src/libutil/simd_test.cpp:1579:
FAILED: round(F) == mkvec<VEC>(std::round(F[0]), std::round(F[1]), std::round(F[2]), std::round(F[3]))
values were '-1.5 0 1.5 4 -1.5 0 1.5 4 -1.5 0 1.5 4 -1.5 0 1.5 4' and '-2 0 2 4 -2 0 2 4 -2 0 2 4 -2 0 2 4'
simd abs: 28828.8 Mvals/sec, (1801.8 Mcalls/sec)
simd sign: 29234.4 Mvals/sec, (1827.2 Mcalls/sec)
simd ceil: 18892.4 Mvals/sec, (1180.8 Mcalls/sec)
simd floor: 29287.9 Mvals/sec, (1830.5 Mcalls/sec)
simd round: 29293.3 Mvals/sec, (1830.8 Mcalls/sec)
simd operator/: 28823.6 Mvals/sec, (1801.5 Mcalls/sec)
simd safe_div: 28828.8 Mvals/sec, (1801.8 Mcalls/sec)
simd rcp_fast: 28308.6 Mvals/sec, (1769.3 Mcalls/sec)
float ifloor: 21459.2 Mvals/sec, (21459.2 Mcalls/sec)
simd ifloor: 29032.8 Mvals/sec, (1814.6 Mcalls/sec)
float floorfrac: 22026.4 Mvals/sec, (22026.4 Mcalls/sec)
simd floorfrac: 12503.9 Mvals/sec, (781.5 Mcalls/sec)
float expf: 21459.2 Mvals/sec, (21459.2 Mcalls/sec)
float fast_exp: 21505.4 Mvals/sec, (21505.4 Mcalls/sec)
simd exp: 14864.4 Mvals/sec, (929.0 Mcalls/sec)
simd fast_exp: 20085.4 Mvals/sec, (1255.3 Mcalls/sec)
float logf: 21505.4 Mvals/sec, (21505.4 Mcalls/sec)
fast_log: 21367.5 Mvals/sec, (21367.5 Mcalls/sec)
simd log: 14301.0 Mvals/sec, (893.8 Mcalls/sec)
simd fast_log: 19524.1 Mvals/sec, (1220.3 Mcalls/sec)
float powf: 21459.2 Mvals/sec, (21459.2 Mcalls/sec)
simd fast_pow_pos: 13965.3 Mvals/sec, (872.8 Mcalls/sec)
float sqrt: 460.5 Mvals/sec, (460.5 Mcalls/sec)
simd sqrt: 28617.4 Mvals/sec, (1788.6 Mcalls/sec)
float rsqrt: 21413.3 Mvals/sec, (21413.3 Mcalls/sec)
simd rsqrt: 28699.6 Mvals/sec, (1793.7 Mcalls/sec)
simd rsqrt_fast: 28648.2 Mvals/sec, (1790.5 Mcalls/sec)
vint4
load/store vint4
partial load 1 : 101 0 0 0
partial store 1 : 1 0 0 0
partial load 2 : 101 102 0 0
partial store 2 : 1 2 0 0
partial load 3 : 101 102 103 0
partial store 3 : 1 2 3 0
partial load 4 : 101 102 103 104
partial store 4 : 1 2 3 4
load scalar: 16266.8 Mvals/sec, (4066.7 Mcalls/sec)
load vec: 16286.0 Mvals/sec, (4071.5 Mcalls/sec)
store vec: 16611.3 Mvals/sec, (4152.8 Mcalls/sec)
load 4 comps: 16200.9 Mvals/sec, (4050.2 Mcalls/sec)
load 3 comps: 11815.7 Mvals/sec, (3938.6 Mcalls/sec)
load 2 comps: 8183.3 Mvals/sec, (4091.7 Mcalls/sec)
load 1 comps: 4090.0 Mvals/sec, (4090.0 Mcalls/sec)
store 4 comps: 16604.4 Mvals/sec, (4151.1 Mcalls/sec)
store 3 comps: 8112.5 Mvals/sec, (2704.2 Mcalls/sec)
store 2 comps: 10834.2 Mvals/sec, (5417.1 Mcalls/sec)
store 1 comps: 5402.5 Mvals/sec, (5402.5 Mcalls/sec)
load/store with conversion vint4
load from int[]: 16515.3 Mvals/sec, (4128.8 Mcalls/sec)
load from unsigned short[]: 16570.0 Mvals/sec, (4142.5 Mcalls/sec)
load from short[]: 16542.6 Mvals/sec, (4135.6 Mcalls/sec)
load from unsigned char[]: 16535.8 Mvals/sec, (4133.9 Mcalls/sec)
load from char[]: 16563.1 Mvals/sec, (4140.8 Mcalls/sec)
store to unsigned short[]: 16380.0 Mvals/sec, (4095.0 Mcalls/sec)
store to unsigned char[]: 16359.9 Mvals/sec, (4090.0 Mcalls/sec)
masked loadstore vint4
masked load with int mask: 16528.9 Mvals/sec, (4132.2 Mcalls/sec)
masked load with bool mask: 16542.6 Mvals/sec, (4135.6 Mcalls/sec)
masked store with int mask: 21186.4 Mvals/sec, (21186.4 Mcalls/sec)
masked store with bool mask: 22624.4 Mvals/sec, (22624.4 Mcalls/sec)
scatter & gather vint4
gather: 1889.7 Mvals/sec, (472.4 Mcalls/sec)
gather_mask: 1942.1 Mvals/sec, (485.5 Mcalls/sec)
scatter: 4376.4 Mvals/sec, (1094.1 Mcalls/sec)
scatter_mask: 5816.5 Mvals/sec, (1454.1 Mcalls/sec)
component_access vint4
operator[i]: 21097.0 Mvals/sec, (21097.0 Mcalls/sec)
operator[2]: 21097.0 Mvals/sec, (21097.0 Mcalls/sec)
operator[0]: 21186.4 Mvals/sec, (21186.4 Mcalls/sec)
extract<2> : 27855.2 Mvals/sec, (27855.2 Mcalls/sec)
extract<0> : 27173.9 Mvals/sec, (27173.9 Mcalls/sec)
insert<2> : 4543.4 Mvals/sec, (4543.4 Mcalls/sec)
arithmetic vint4
operator+: 18223.2 Mvals/sec, (4555.8 Mcalls/sec)
operator-: 18181.8 Mvals/sec, (4545.5 Mcalls/sec)
operator- (neg): 18223.2 Mvals/sec, (4555.8 Mcalls/sec)
operator*: 18206.6 Mvals/sec, (4551.7 Mcalls/sec)
operator* (scalar): 18190.1 Mvals/sec, (4547.5 Mcalls/sec)
operator/: 18223.2 Mvals/sec, (4555.8 Mcalls/sec)
abs: 18198.4 Mvals/sec, (4549.6 Mcalls/sec)
reduce_add: 11277.1 Mvals/sec, (2819.3 Mcalls/sec)
reference: add scalar: 18797.0 Mvals/sec, (18797.0 Mcalls/sec)
reference: mul scalar: 18832.4 Mvals/sec, (18832.4 Mcalls/sec)
reference: div scalar: 19047.6 Mvals/sec, (19047.6 Mcalls/sec)
bitwise vint4
operator&: 18223.2 Mvals/sec, (4555.8 Mcalls/sec)
operator|: 18231.5 Mvals/sec, (4557.9 Mcalls/sec)
operator^: 18223.2 Mvals/sec, (4555.8 Mcalls/sec)
operator!: 18223.2 Mvals/sec, (4555.8 Mcalls/sec)
andnot: 18206.6 Mvals/sec, (4551.7 Mcalls/sec)
reduce_and: 18939.4 Mvals/sec, (18939.4 Mcalls/sec)
reduce_or : 19011.4 Mvals/sec, (19011.4 Mcalls/sec)
comparisons vint4
operator< : 18231.5 Mvals/sec, (4557.9 Mcalls/sec)
operator> : 18231.5 Mvals/sec, (4557.9 Mcalls/sec)
operator<=: 18214.9 Mvals/sec, (4553.7 Mcalls/sec)
operator>=: 18148.8 Mvals/sec, (4537.2 Mcalls/sec)
operator==: 18198.4 Mvals/sec, (4549.6 Mcalls/sec)
operator!=: 18198.4 Mvals/sec, (4549.6 Mcalls/sec)
shuffle vint4
shuffle<...> : 18223.2 Mvals/sec, (4555.8 Mcalls/sec)
shuffle<0> : 18198.4 Mvals/sec, (4549.6 Mcalls/sec)
shuffle<1> : 18181.8 Mvals/sec, (4545.5 Mcalls/sec)
shuffle<2> : 13315.6 Mvals/sec, (3328.9 Mcalls/sec)
shuffle<3> : 14492.8 Mvals/sec, (3623.2 Mcalls/sec)
blend vint4
blend: 15898.3 Mvals/sec, (3974.6 Mcalls/sec)
blend0: 15760.4 Mvals/sec, (3940.1 Mcalls/sec)
blend0not: 17398.9 Mvals/sec, (4349.7 Mcalls/sec)
test converting vint4 to uint16
load from uint16: 95923.3 Mvals/sec, (23980.8 Mcalls/sec)
convert to uint16: 16757.4 Mvals/sec, (4189.4 Mcalls/sec)
test converting vint4 to uint8
load from uint8: 87146.0 Mvals/sec, (21786.5 Mcalls/sec)
convert to uint16: 16611.3 Mvals/sec, (4152.8 Mcalls/sec)
shift vint4
[-80000000 -80000000 -80000000 -80000000] >> 1 == [-40000000 -40000000 -40000000 -40000000]
[-80000000 -80000000 -80000000 -80000000] srl 1 == [40000000 40000000 40000000 40000000]
[-80000000 -80000000 -80000000 -80000000] >> 4 == [-8000000 -8000000 -8000000 -8000000]
[-80000000 -80000000 -80000000 -80000000] srl 4 == [8000000 8000000 8000000 8000000]
[-1 -1 -1 -1] >> 1 == [-1 -1 -1 -1]
[-1 -1 -1 -1] srl 1 == [7fffffff 7fffffff 7fffffff 7fffffff]
[-1 -1 -1 -1] >> 4 == [-1 -1 -1 -1]
[-1 -1 -1 -1] srl 4 == [fffffff fffffff fffffff fffffff]
[ffff ffff ffff ffff] >> 1 == [7fff 7fff 7fff 7fff]
[ffff ffff ffff ffff] srl 1 == [7fff 7fff 7fff 7fff]
[ffff ffff ffff ffff] >> 4 == [fff fff fff fff]
[ffff ffff ffff ffff] srl 4 == [fff fff fff fff]
[3 3 3 3] >> 1 == [1 1 1 1]
[3 3 3 3] srl 1 == [1 1 1 1]
[3 3 3 3] >> 4 == [0 0 0 0]
[3 3 3 3] srl 4 == [0 0 0 0]
operator<<: 16522.1 Mvals/sec, (4130.5 Mcalls/sec)
operator>>: 16590.6 Mvals/sec, (4147.7 Mcalls/sec)
srl : 16597.5 Mvals/sec, (4149.4 Mcalls/sec)
rotl : 14214.6 Mvals/sec, (3553.7 Mcalls/sec)
transpose vint4
before transpose:
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
after transpose:
0 4 8 12
1 5 9 13
2 6 10 14
3 7 11 15
vint8
load/store vint8
partial load 1 : 101 0 0 0 0 0 0 0
partial store 1 : 1 0 0 0 0 0 0 0
partial load 2 : 101 102 0 0 0 0 0 0
partial store 2 : 1 2 0 0 0 0 0 0
partial load 3 : 101 102 103 0 0 0 0 0
partial store 3 : 1 2 3 0 0 0 0 0
partial load 4 : 101 102 103 104 0 0 0 0
partial store 4 : 1 2 3 4 0 0 0 0
partial load 5 : 101 102 103 104 105 0 0 0
partial store 5 : 1 2 3 4 5 0 0 0
partial load 6 : 101 102 103 104 105 106 0 0
partial store 6 : 1 2 3 4 5 6 0 0
partial load 7 : 101 102 103 104 105 106 107 0
partial store 7 : 1 2 3 4 5 6 7 0
partial load 8 : 101 102 103 104 105 106 107 108
partial store 8 : 1 2 3 4 5 6 7 8
load scalar: 30616.2 Mvals/sec, (3827.0 Mcalls/sec)
load vec: 30441.4 Mvals/sec, (3805.2 Mcalls/sec)
store vec: 30983.7 Mvals/sec, (3873.0 Mcalls/sec)
load 8 comps: 30546.0 Mvals/sec, (3818.3 Mcalls/sec)
load 7 comps: 4035.7 Mvals/sec, (576.5 Mcalls/sec)
load 6 comps: 22329.7 Mvals/sec, (3721.6 Mcalls/sec)
load 5 comps: 18726.6 Mvals/sec, (3745.3 Mcalls/sec)
load 4 comps: 15491.9 Mvals/sec, (3873.0 Mcalls/sec)
load 3 comps: 11278.2 Mvals/sec, (3759.4 Mcalls/sec)
load 2 comps: 7657.0 Mvals/sec, (3828.5 Mcalls/sec)
load 1 comps: 3840.2 Mvals/sec, (3840.2 Mcalls/sec)
store 8 comps: 31128.4 Mvals/sec, (3891.1 Mcalls/sec)
store 7 comps: 12565.1 Mvals/sec, (1795.0 Mcalls/sec)
store 6 comps: 16291.1 Mvals/sec, (2715.2 Mcalls/sec)
store 5 comps: 13568.5 Mvals/sec, (2713.7 Mcalls/sec)
store 4 comps: 16673.6 Mvals/sec, (4168.4 Mcalls/sec)
store 3 comps: 2102.3 Mvals/sec, (700.8 Mcalls/sec)
store 2 comps: 10905.1 Mvals/sec, (5452.6 Mcalls/sec)
store 1 comps: 5370.6 Mvals/sec, (5370.6 Mcalls/sec)
load/store with conversion vint8
load from int[]: 30852.3 Mvals/sec, (3856.5 Mcalls/sec)
load from unsigned short[]: 30840.4 Mvals/sec, (3855.1 Mcalls/sec)
load from short[]: 30852.3 Mvals/sec, (3856.5 Mcalls/sec)
load from unsigned char[]: 30733.8 Mvals/sec, (3841.7 Mcalls/sec)
load from char[]: 30804.8 Mvals/sec, (3850.6 Mcalls/sec)
store to unsigned short[]: 32653.1 Mvals/sec, (4081.6 Mcalls/sec)
store to unsigned char[]: 32653.1 Mvals/sec, (4081.6 Mcalls/sec)
masked loadstore vint8
masked load with int mask: 30840.4 Mvals/sec, (3855.1 Mcalls/sec)
masked load with bool mask: 30923.8 Mvals/sec, (3865.5 Mcalls/sec)
masked store with int mask: 22075.1 Mvals/sec, (22075.1 Mcalls/sec)
masked store with bool mask: 21186.4 Mvals/sec, (21186.4 Mcalls/sec)
scatter & gather vint8
gather: 2354.5 Mvals/sec, (294.3 Mcalls/sec)
gather_mask: 2323.2 Mvals/sec, (290.4 Mcalls/sec)
scatter: 1452.5 Mvals/sec, (181.6 Mcalls/sec)
scatter_mask: 763.3 Mvals/sec, (95.4 Mcalls/sec)
component_access vint8
operator[i]: 21413.3 Mvals/sec, (21413.3 Mcalls/sec)
operator[2]: 21413.3 Mvals/sec, (21413.3 Mcalls/sec)
operator[0]: 21413.3 Mvals/sec, (21413.3 Mcalls/sec)
extract<2> : 10080.6 Mvals/sec, (10080.6 Mcalls/sec)
extract<0> : 21367.5 Mvals/sec, (21367.5 Mcalls/sec)
insert<2> : 3871.5 Mvals/sec, (3871.5 Mcalls/sec)
arithmetic vint8
operator+: 30959.8 Mvals/sec, (3870.0 Mcalls/sec)
operator-: 30935.8 Mvals/sec, (3867.0 Mcalls/sec)
operator- (neg): 30911.9 Mvals/sec, (3864.0 Mcalls/sec)
operator*: 30935.8 Mvals/sec, (3867.0 Mcalls/sec)
operator* (scalar): 30923.8 Mvals/sec, (3865.5 Mcalls/sec)
operator/: 31019.8 Mvals/sec, (3877.5 Mcalls/sec)
abs: 30995.7 Mvals/sec, (3874.5 Mcalls/sec)
reduce_add: 31055.9 Mvals/sec, (3882.0 Mcalls/sec)
reference: add scalar: 22371.4 Mvals/sec, (22371.4 Mcalls/sec)
reference: mul scalar: 21505.4 Mvals/sec, (21505.4 Mcalls/sec)
reference: div scalar: 21413.3 Mvals/sec, (21413.3 Mcalls/sec)
bitwise vint8
operator&: 31007.8 Mvals/sec, (3876.0 Mcalls/sec)
operator|: 31007.8 Mvals/sec, (3876.0 Mcalls/sec)
operator^: 30923.8 Mvals/sec, (3865.5 Mcalls/sec)
operator!: 31019.8 Mvals/sec, (3877.5 Mcalls/sec)
andnot: 31620.6 Mvals/sec, (3952.6 Mcalls/sec)
reduce_and: 21186.4 Mvals/sec, (21186.4 Mcalls/sec)
reduce_or : 21276.6 Mvals/sec, (21276.6 Mcalls/sec)
comparisons vint8
operator< : 31104.2 Mvals/sec, (3888.0 Mcalls/sec)
operator> : 31007.8 Mvals/sec, (3876.0 Mcalls/sec)
operator<=: 31055.9 Mvals/sec, (3882.0 Mcalls/sec)
operator>=: 31019.8 Mvals/sec, (3877.5 Mcalls/sec)
operator==: 31019.8 Mvals/sec, (3877.5 Mcalls/sec)
operator!=: 30959.8 Mvals/sec, (3870.0 Mcalls/sec)
shuffle vint8
shuffle<...> : 30947.8 Mvals/sec, (3868.5 Mcalls/sec)
shuffle<0> : 31068.0 Mvals/sec, (3883.5 Mcalls/sec)
shuffle<1> : 31176.9 Mvals/sec, (3897.1 Mcalls/sec)
shuffle<2> : 31092.1 Mvals/sec, (3886.5 Mcalls/sec)
shuffle<3> : 31176.9 Mvals/sec, (3897.1 Mcalls/sec)
shuffle<4> : 31068.0 Mvals/sec, (3883.5 Mcalls/sec)
shuffle<5> : 31335.7 Mvals/sec, (3917.0 Mcalls/sec)
shuffle<6> : 31360.3 Mvals/sec, (3920.0 Mcalls/sec)
shuffle<7> : 31397.2 Mvals/sec, (3924.6 Mcalls/sec)
blend vint8
blend: 31348.0 Mvals/sec, (3918.5 Mcalls/sec)
blend0: 31409.5 Mvals/sec, (3926.2 Mcalls/sec)
blend0not: 31458.9 Mvals/sec, (3932.4 Mcalls/sec)
test converting vint8 to uint16
load from uint16: 175824.2 Mvals/sec, (21978.0 Mcalls/sec)
convert to uint16: 33140.0 Mvals/sec, (4142.5 Mcalls/sec)
test converting vint8 to uint8
load from uint8: 181818.2 Mvals/sec, (22727.3 Mcalls/sec)
convert to uint16: 33195.0 Mvals/sec, (4149.4 Mcalls/sec)
shift vint8
[-80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000] >> 1 == [-40000000 -40000000 -40000000 -40000000 -40000000 -40000000 -40000000 -40000000]
[-80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000] srl 1 == [40000000 40000000 40000000 40000000 40000000 40000000 40000000 40000000]
[-80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000] >> 4 == [-8000000 -8000000 -8000000 -8000000 -8000000 -8000000 -8000000 -8000000]
[-80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000] srl 4 == [8000000 8000000 8000000 8000000 8000000 8000000 8000000 8000000]
[-1 -1 -1 -1 -1 -1 -1 -1] >> 1 == [-1 -1 -1 -1 -1 -1 -1 -1]
[-1 -1 -1 -1 -1 -1 -1 -1] srl 1 == [7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff]
[-1 -1 -1 -1 -1 -1 -1 -1] >> 4 == [-1 -1 -1 -1 -1 -1 -1 -1]
[-1 -1 -1 -1 -1 -1 -1 -1] srl 4 == [fffffff fffffff fffffff fffffff fffffff fffffff fffffff fffffff]
[ffff ffff ffff ffff ffff ffff ffff ffff] >> 1 == [7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff]
[ffff ffff ffff ffff ffff ffff ffff ffff] srl 1 == [7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff]
[ffff ffff ffff ffff ffff ffff ffff ffff] >> 4 == [fff fff fff fff fff fff fff fff]
[ffff ffff ffff ffff ffff ffff ffff ffff] srl 4 == [fff fff fff fff fff fff fff fff]
[3 3 3 3 3 3 3 3] >> 1 == [1 1 1 1 1 1 1 1]
[3 3 3 3 3 3 3 3] srl 1 == [1 1 1 1 1 1 1 1]
[3 3 3 3 3 3 3 3] >> 4 == [0 0 0 0 0 0 0 0]
[3 3 3 3 3 3 3 3] srl 4 == [0 0 0 0 0 0 0 0]
operator<<: 31262.2 Mvals/sec, (3907.8 Mcalls/sec)
operator>>: 31384.9 Mvals/sec, (3923.1 Mcalls/sec)
srl : 31335.7 Mvals/sec, (3917.0 Mcalls/sec)
rotl : 31384.9 Mvals/sec, (3923.1 Mcalls/sec)
vint16
load/store vint16
partial load 1 : 101 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
partial store 1 : 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
partial load 2 : 101 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0
partial store 2 : 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0
partial load 3 : 101 102 103 0 0 0 0 0 0 0 0 0 0 0 0 0
partial store 3 : 1 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0
partial load 4 : 101 102 103 104 0 0 0 0 0 0 0 0 0 0 0 0
partial store 4 : 1 2 3 4 0 0 0 0 0 0 0 0 0 0 0 0
partial load 5 : 101 102 103 104 105 0 0 0 0 0 0 0 0 0 0 0
partial store 5 : 1 2 3 4 5 0 0 0 0 0 0 0 0 0 0 0
partial load 6 : 101 102 103 104 105 106 0 0 0 0 0 0 0 0 0 0
partial store 6 : 1 2 3 4 5 6 0 0 0 0 0 0 0 0 0 0
partial load 7 : 101 102 103 104 105 106 107 0 0 0 0 0 0 0 0 0
partial store 7 : 1 2 3 4 5 6 7 0 0 0 0 0 0 0 0 0
partial load 8 : 101 102 103 104 105 106 107 108 0 0 0 0 0 0 0 0
partial store 8 : 1 2 3 4 5 6 7 8 0 0 0 0 0 0 0 0
partial load 9 : 101 102 103 104 105 106 107 108 109 0 0 0 0 0 0 0
partial store 9 : 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0 0
partial load 10 : 101 102 103 104 105 106 107 108 109 110 0 0 0 0 0 0
partial store 10 : 1 2 3 4 5 6 7 8 9 10 0 0 0 0 0 0
partial load 11 : 101 102 103 104 105 106 107 108 109 110 111 0 0 0 0 0
partial store 11 : 1 2 3 4 5 6 7 8 9 10 11 0 0 0 0 0
partial load 12 : 101 102 103 104 105 106 107 108 109 110 111 112 0 0 0 0
partial store 12 : 1 2 3 4 5 6 7 8 9 10 11 12 0 0 0 0
partial load 13 : 101 102 103 104 105 106 107 108 109 110 111 112 113 0 0 0
partial store 13 : 1 2 3 4 5 6 7 8 9 10 11 12 13 0 0 0
partial load 14 : 101 102 103 104 105 106 107 108 109 110 111 112 113 114 0 0
partial store 14 : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 0
partial load 15 : 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 0
partial store 15 : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0
partial load 16 : 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116
partial store 16 : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
load scalar: 29287.9 Mvals/sec, (1830.5 Mcalls/sec)
load vec: 28880.9 Mvals/sec, (1805.1 Mcalls/sec)
store vec: 28964.0 Mvals/sec, (1810.2 Mcalls/sec)
load 16 comps: 28917.4 Mvals/sec, (1807.3 Mcalls/sec)
load 13 comps: 10723.4 Mvals/sec, (824.9 Mcalls/sec)
load 9 comps: 7193.7 Mvals/sec, (799.3 Mcalls/sec)
load 8 comps: 14795.6 Mvals/sec, (1849.5 Mcalls/sec)
load 7 comps: 5751.8 Mvals/sec, (821.7 Mcalls/sec)
load 6 comps: 4983.0 Mvals/sec, (830.5 Mcalls/sec)
load 5 comps: 4165.3 Mvals/sec, (833.1 Mcalls/sec)
load 4 comps: 7382.8 Mvals/sec, (1845.7 Mcalls/sec)
load 3 comps: 2466.5 Mvals/sec, (822.2 Mcalls/sec)
load 2 comps: 3416.5 Mvals/sec, (1708.2 Mcalls/sec)
load 1 comps: 1847.7 Mvals/sec, (1847.7 Mcalls/sec)
store 16 comps: 29191.8 Mvals/sec, (1824.5 Mcalls/sec)
store 13 comps: 21385.1 Mvals/sec, (1645.0 Mcalls/sec)
store 9 comps: 16381.5 Mvals/sec, (1820.2 Mcalls/sec)
store 8 comps: 31043.9 Mvals/sec, (3880.5 Mcalls/sec)
store 7 comps: 12644.5 Mvals/sec, (1806.4 Mcalls/sec)
store 6 comps: 16451.9 Mvals/sec, (2742.0 Mcalls/sec)
store 5 comps: 13642.6 Mvals/sec, (2728.5 Mcalls/sec)
store 4 comps: 16680.6 Mvals/sec, (4170.1 Mcalls/sec)
store 3 comps: 2433.9 Mvals/sec, (811.3 Mcalls/sec)
store 2 comps: 10952.9 Mvals/sec, (5476.5 Mcalls/sec)
store 1 comps: 5461.5 Mvals/sec, (5461.5 Mcalls/sec)
load/store with conversion vint16
load from int[]: 28653.3 Mvals/sec, (1790.8 Mcalls/sec)
load from unsigned short[]: 28673.8 Mvals/sec, (1792.1 Mcalls/sec)
load from short[]: 28709.9 Mvals/sec, (1794.4 Mcalls/sec)
load from unsigned char[]: 28663.6 Mvals/sec, (1791.5 Mcalls/sec)
load from char[]: 28917.4 Mvals/sec, (1807.3 Mcalls/sec)
store to unsigned short[]: 32881.2 Mvals/sec, (2055.1 Mcalls/sec)
store to unsigned char[]: 32854.2 Mvals/sec, (2053.4 Mcalls/sec)
masked loadstore vint16
masked load with int mask: 28474.8 Mvals/sec, (1779.7 Mcalls/sec)
masked load with bool mask: 28668.7 Mvals/sec, (1791.8 Mcalls/sec)
masked store with int mask: 21052.6 Mvals/sec, (21052.6 Mcalls/sec)
masked store with bool mask: 21413.3 Mvals/sec, (21413.3 Mcalls/sec)
scatter & gather vint16
gather: 2473.8 Mvals/sec, (154.6 Mcalls/sec)
gather_mask: 2459.0 Mvals/sec, (153.7 Mcalls/sec)
scatter: 318.5 Mvals/sec, (19.9 Mcalls/sec)
scatter_mask: 2206.0 Mvals/sec, (137.9 Mcalls/sec)
component_access vint16
operator[i]: 5327.7 Mvals/sec, (5327.7 Mcalls/sec)
operator[2]: 5299.4 Mvals/sec, (5299.4 Mcalls/sec)
operator[0]: 5330.5 Mvals/sec, (5330.5 Mcalls/sec)
extract<2> : 5299.4 Mvals/sec, (5299.4 Mcalls/sec)
extract<0> : 5319.1 Mvals/sec, (5319.1 Mcalls/sec)
insert<2> : 1721.8 Mvals/sec, (1721.8 Mcalls/sec)
arithmetic vint16
operator+: 28318.6 Mvals/sec, (1769.9 Mcalls/sec)
operator-: 28119.5 Mvals/sec, (1757.5 Mcalls/sec)
operator- (neg): 28016.1 Mvals/sec, (1751.0 Mcalls/sec)
operator*: 27976.9 Mvals/sec, (1748.6 Mcalls/sec)
operator* (scalar): 28070.2 Mvals/sec, (1754.4 Mcalls/sec)
operator/: 5839.8 Mvals/sec, (365.0 Mcalls/sec)
abs: 27937.8 Mvals/sec, (1746.1 Mcalls/sec)
reduce_add: 28075.1 Mvals/sec, (1754.7 Mcalls/sec)
reference: add scalar: 5310.7 Mvals/sec, (5310.7 Mcalls/sec)
reference: mul scalar: 5313.5 Mvals/sec, (5313.5 Mcalls/sec)
reference: div scalar: 4219.4 Mvals/sec, (4219.4 Mcalls/sec)
bitwise vint16
operator&: 29085.6 Mvals/sec, (1817.9 Mcalls/sec)
operator|: 29159.8 Mvals/sec, (1822.5 Mcalls/sec)
operator^: 29352.4 Mvals/sec, (1834.5 Mcalls/sec)
operator!: 28896.5 Mvals/sec, (1806.0 Mcalls/sec)
andnot: 28818.4 Mvals/sec, (1801.2 Mcalls/sec)
reduce_and: 21413.3 Mvals/sec, (21413.3 Mcalls/sec)
reduce_or : 9661.8 Mvals/sec, (9661.8 Mcalls/sec)
comparisons vint16
operator< : 359550.6 Mvals/sec, (22471.9 Mcalls/sec)
operator> : 332640.3 Mvals/sec, (20790.0 Mcalls/sec)
operator<=: 338983.0 Mvals/sec, (21186.4 Mcalls/sec)
operator>=: 348583.9 Mvals/sec, (21786.5 Mcalls/sec)
operator==: 341151.4 Mvals/sec, (21322.0 Mcalls/sec)
operator!=: 345572.3 Mvals/sec, (21598.3 Mcalls/sec)
shuffle vint16
shuffle4<> : 28828.8 Mvals/sec, (1801.8 Mcalls/sec)
shuffle<> : 28933.1 Mvals/sec, (1808.3 Mcalls/sec)
blend vint16
blend: 28808.1 Mvals/sec, (1800.5 Mcalls/sec)
blend0: 29117.4 Mvals/sec, (1819.8 Mcalls/sec)
blend0not: 28886.1 Mvals/sec, (1805.4 Mcalls/sec)
test converting vint16 to uint16
load from uint16: 344086.0 Mvals/sec, (21505.4 Mcalls/sec)
convert to uint16: 33092.0 Mvals/sec, (2068.3 Mcalls/sec)
test converting vint16 to uint8
load from uint8: 345572.3 Mvals/sec, (21598.3 Mcalls/sec)
convert to uint16: 33051.0 Mvals/sec, (2065.7 Mcalls/sec)
shift vint16
[-80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000] >> 1 == [-40000000 -40000000 -40000000 -40000000 -40000000 -40000000 -40000000 -40000000 -40000000 -40000000 -40000000 -40000000 -40000000 -40000000 -40000000 -40000000]
[-80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000] srl 1 == [40000000 40000000 40000000 40000000 40000000 40000000 40000000 40000000 40000000 40000000 40000000 40000000 40000000 40000000 40000000 40000000]
[-80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000] >> 4 == [-8000000 -8000000 -8000000 -8000000 -8000000 -8000000 -8000000 -8000000 -8000000 -8000000 -8000000 -8000000 -8000000 -8000000 -8000000 -8000000]
[-80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000 -80000000] srl 4 == [8000000 8000000 8000000 8000000 8000000 8000000 8000000 8000000 8000000 8000000 8000000 8000000 8000000 8000000 8000000 8000000]
[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1] >> 1 == [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1]
[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1] srl 1 == [7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff 7fffffff]
[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1] >> 4 == [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1]
[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1] srl 4 == [fffffff fffffff fffffff fffffff fffffff fffffff fffffff fffffff fffffff fffffff fffffff fffffff fffffff fffffff fffffff fffffff]
[ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff] >> 1 == [7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff]
[ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff] srl 1 == [7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff 7fff]
[ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff] >> 4 == [fff fff fff fff fff fff fff fff fff fff fff fff fff fff fff fff]
[ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff] srl 4 == [fff fff fff fff fff fff fff fff fff fff fff fff fff fff fff fff]
[3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3] >> 1 == [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
[3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3] srl 1 == [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
[3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3] >> 4 == [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3] srl 4 == [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
operator<<: 28933.1 Mvals/sec, (1808.3 Mcalls/sec)
operator>>: 28834.0 Mvals/sec, (1802.1 Mcalls/sec)
srl : 30245.7 Mvals/sec, (1890.4 Mcalls/sec)
rotl : 32296.5 Mvals/sec, (2018.5 Mcalls/sec)
vbool4
shuffle vbool4
shuffle<...> : 18315.0 Mvals/sec, (4578.8 Mcalls/sec)
shuffle<0> : 15552.1 Mvals/sec, (3888.0 Mcalls/sec)
shuffle<1> : 18535.7 Mvals/sec, (4633.9 Mcalls/sec)
shuffle<2> : 18552.9 Mvals/sec, (4638.2 Mcalls/sec)
shuffle<3> : 18561.5 Mvals/sec, (4640.4 Mcalls/sec)
component_access vbool4
bitwise vbool4
operator&: 18561.5 Mvals/sec, (4640.4 Mcalls/sec)
operator|: 18535.7 Mvals/sec, (4633.9 Mcalls/sec)
operator^: 18544.3 Mvals/sec, (4636.1 Mcalls/sec)
operator!: 18458.7 Mvals/sec, (4614.7 Mcalls/sec)
reduce_and: 2422.5 Mvals/sec, (2422.5 Mcalls/sec)
reduce_or : 2420.7 Mvals/sec, (2420.7 Mcalls/sec)
vbool8
shuffle vbool8
shuffle<...> : 32679.7 Mvals/sec, (4085.0 Mcalls/sec)
shuffle<0> : 32800.3 Mvals/sec, (4100.0 Mcalls/sec)
shuffle<1> : 32666.4 Mvals/sec, (4083.3 Mcalls/sec)
shuffle<2> : 32520.3 Mvals/sec, (4065.0 Mcalls/sec)
shuffle<3> : 32693.1 Mvals/sec, (4086.6 Mcalls/sec)
shuffle<4> : 32679.7 Mvals/sec, (4085.0 Mcalls/sec)
shuffle<5> : 32719.8 Mvals/sec, (4090.0 Mcalls/sec)
shuffle<6> : 32733.2 Mvals/sec, (4091.7 Mcalls/sec)
shuffle<7> : 32786.9 Mvals/sec, (4098.4 Mcalls/sec)
component_access vbool8
bitwise vbool8
operator&: 32626.4 Mvals/sec, (4078.3 Mcalls/sec)
operator|: 32786.9 Mvals/sec, (4098.4 Mcalls/sec)
operator^: 32115.6 Mvals/sec, (4014.5 Mcalls/sec)
operator!: 31176.9 Mvals/sec, (3897.1 Mcalls/sec)
reduce_and: 2402.1 Mvals/sec, (2402.1 Mcalls/sec)
reduce_or : 2396.4 Mvals/sec, (2396.4 Mcalls/sec)
vbool16
component_access vbool16
bitwise vbool16
operator&: 345572.3 Mvals/sec, (21598.3 Mcalls/sec)
operator|: 346320.4 Mvals/sec, (21645.0 Mcalls/sec)
operator^: 344827.6 Mvals/sec, (21551.7 Mcalls/sec)
operator!: 345572.3 Mvals/sec, (21598.3 Mcalls/sec)
reduce_and: 21881.8 Mvals/sec, (21881.8 Mcalls/sec)
reduce_or : 21551.7 Mvals/sec, (21551.7 Mcalls/sec)
Odds and ends
constants
vfloat4 = float(const): 16849.2 Mvals/sec, (4212.3 Mcalls/sec)
vfloat4 = Zero(): 16877.6 Mvals/sec, (4219.4 Mcalls/sec)
vfloat4 = One(): 16870.5 Mvals/sec, (4217.6 Mcalls/sec)
vfloat4 = Iota(): 16806.7 Mvals/sec, (4201.7 Mcalls/sec)
vfloat8 = float(const): 31458.9 Mvals/sec, (3932.4 Mcalls/sec)
vfloat8 = Zero(): 32989.7 Mvals/sec, (4123.7 Mcalls/sec)
vfloat8 = One(): 32989.7 Mvals/sec, (4123.7 Mcalls/sec)
vfloat8 = Iota(): 446.1 Mvals/sec, (55.8 Mcalls/sec)
vfloat16 = float(const): 28011.2 Mvals/sec, (1750.7 Mcalls/sec)
vfloat16 = Zero(): 29239.8 Mvals/sec, (1827.5 Mcalls/sec)
vfloat16 = One(): 29017.0 Mvals/sec, (1813.6 Mcalls/sec)
vfloat16 = Iota(): 28880.9 Mvals/sec, (1805.1 Mcalls/sec)
special
metaprogramming
Testing matrix ops:
P = (1 0 0)
Mtrans = ( 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00
0.000000e+00 0.000000e+00 1.000000e+00 0.000000e+00
1.000000e+01 1.100000e+01 1.200000e+01 1.000000e+00)
Mrot = ( -4.371139e-08 -0.000000e+00 -1.000000e+00 -0.000000e+00
0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00
1.000000e+00 0.000000e+00 -4.371139e-08 0.000000e+00
0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00)
P translated = 11 11 12
P rotated = -4.37114e-08 0 -1
P rotated by the transpose = -4.37114e-08 0 -1
Mrot transposed = ( -4.371139e-08 0.000000e+00 1.000000e+00 0.000000e+00
-0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00
-1.000000e+00 0.000000e+00 -4.371139e-08 0.000000e+00
-0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00)
V4 * M44 Imath: 4168.4 Mvals/sec, (4168.4 Mcalls/sec)
M44 * V4 simd: 4198.2 Mvals/sec, (4198.2 Mcalls/sec)
V4 * M44 simd: 4221.2 Mvals/sec, (4221.2 Mcalls/sec)
transformp Imath: 2777.0 Mvals/sec, (2777.0 Mcalls/sec)
transformp Imath with simd: 2745.0 Mvals/sec, (2745.0 Mcalls/sec)
transformp simd: 4210.5 Mvals/sec, (4210.5 Mcalls/sec)
transpose m44: 1827.8 Mvals/sec, (1827.8 Mcalls/sec)
transpose m44 with simd: 1830.8 Mvals/sec, (1830.8 Mcalls/sec)
m44 inverse Imath: 82.8 Mvals/sec, (82.8 Mcalls/sec)
m44 inverse_simd: 99.9 Mvals/sec, (99.9 Mcalls/sec)
m44 inverse_simd native simd: 104.2 Mvals/sec, (104.2 Mcalls/sec)
Total time: 0.0s
ERRORS!
<end of output>
Test time = 0.08 sec
----------------------------------------------------------
Test Failed.
"unit_simd" end time: Jan 18 21:38 UTC
"unit_simd" time elapsed: 00:00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment