I have one example from packed_simd
, where I want to implement vertical (lane-wise) division for packed SIMD vectors (repr(simd)
). The Intel SVML (Short-Vector Math Library) exposes these with the _mm_div_ep{i,u}{lane_bitwidth}
APIs.
I've implemented two functions for integers:
impl Simd<[T; N]> where T: Signed | Unsigned {
pub fn wrapping_div(self, other: Self) -> Self {
assert!(!other.eq(Self::splat(0)).any(),
"division by zero: `{:?} / {:?}`", self, other);
unsafe { intrinsic::wrapping_div(self, other) }
}
}
This first function, wrapping_div
, panics if any of the lanes of the denominator is zero, and is therefore safe. For those users that know that none of the lanes of the denominator is zero, I also implemented:
impl Simd<[T; N]> where T: Signed | Unsigned {
/// ... Safety: if any of the lanes of other is zero, the behavior is undefined.
pub unsafe fn wrapping_div_unchecked(self, other: Self) -> Self {
// debug_assert!(!other.eq(Self::splat(0)).any(),
// "division by zero: `{:?} / {:?}`", self, other);
unsafe { intrinsic::wrapping_div(self, other) }
}
}
When operating on SIMD vectors it is, however, very common to only care about some lanes of the result:
let a: Simd<[i32; 4]>;
let c = unsafe { a.wrapping_div_unchecked(b) };
let d = shuffle!(c, [0, 1]); // extract lanes 0 and 1
// drop(c); // the result of lanes 2 and 3 is never used
So it would be nice if instead of saying that calling wrapping_div_unchecked
with a denominator containing a zero lane is undefined behavior, I could say something else to support this idiom.
Would supporting uninitialized bits in integers help here? That depends on whether dividing by zero returns uninitialized
or something else (see below for a hint on what this should return).
A different way of improving this API would be to change the signature of the method to:
pub fn wrapping_div_unchecked(a: Simd<[T; N]>, b: Simd<[T; N]>)
-> Simd<[MaybeUninit<T>; N]>;
The user could then write:
let a: Simd<[i32; 4]>;
let c = a.wrapping_div_unchecked(b);
let d: Simd<[i32; 2]> = unsafe { shuffle!(c, [0, 1]).into_inner() }; // extract lanes 0 and 1
let e: MaybeUninit<i32> = c.extract(2);
let f: i32 = unsafe { e.into_inner() }; // UB?
As shown in steps e
and f
, the answer to what should dividing by zero returns has to be uninitialized
, because otherwise step f
wouldn't be UB.
With my packed_simd
hat on, adding support for packed SIMD vectors of MaybeUninit<T>
is going to be a true PITA to implement.
Yet the API using MaybeUninit
looks way clearer to me because it puts the unsafe
block in the operation that is actually undefined behavior (reading a value that results from a division by zero) instead of in the actual division.