Authored-by: Eric Lunderberg
Notes summarizing discussion between @Lunderberg and @csullivan on 2022_10_25
From previous conversation, possibility of representing pad/crop separately from the layout transform. This would allow algebraic proofs to be done in the simpler coordinate system, before applying the layout transform.
However, not all types of padding can be represented in this manner. As an example, suppose we want to pad a buffer such that every 8th value is padding. This could be represented in one step with a padded transform, but would require three steps when the padding is introduced in separate steps.
# With padded transforms
transform_layout(index_map = lambda i: [i//7, (i%7)%8], pad_value=0)
# With pad/crop and bijective transforms
insert_pad_crop(new_shape = [7*ceildiv(A.shape, 7)])
transform_layout(index_map = lambda i: [i//7, i%7])
insert_pad_crop(new_shape = [buf.shape[0], 8])
Any cancellation of the second pad/crop would need to be done after the layout transform. Therefore, we can't get away from performing algebraic proofs within the transformed layout.
While this is a somewhat contrived example, it could easily occur in
practice. Support a conv1d with filter size 2 uses vector operations
of size 8. The implementation uses a sliding window of size 8, which
advances by 7 elements at a time. (Assume alignment restrictions are
handled by a cache_read
.) Each application of the vector operations
would produce 8 values, the last of which is junk. If the output of
the conv1d is then matrix multiplied by a constant matrix, the above example
could be applied to the constant matrix. This would result in a pad
value (zero) at every location corresponding to a junk value, which
could be used to vectorize the matrix multiplication.
Is your argument the following: hoisting
y_crop_1
TIR block into a compact representation withcropped_value = 0
would require dataflow analysis?I agree it probably would. But that is not my main argument for splitting layout transformation into pad, crop and pure_layout_transform primitives. The main argument for splitting is -- it is hard to prove that two layout transforms are inverse of each other if there is implicit padding and cropping in the compact representation or TIR block representation.
For example, there is no compact representation of the inverse layout transform of the following:
so we would be left with fusing the underlying PrimFunc blocks (shown below) and proving that it is identity.
It seems it would be non-trivial to prove that the above PrimFunc is identity, but I would be happy to be corrected.
Hence I proposed to split pad/crop from layout_transform. This would allow easy cancelling out in compact representation as dscussed in my previous comment.
I disagree that we have to hoist them one at a time, outside in. It depends on the implementation. If it wants, it can do all three of them together. It might be cleaner to do it one at a time, but there is no fundamental reason to do it that way.
Even if we do it one at a time, as long as
T.assume(i < 25 or y_transform[i] == 0)
are placed before they_crop_1
block using dataflow analysis or otherwise, we should be able to raise it to the compact form.