Authored-by: Eric Lunderberg
Notes summarizing discussion between @Lunderberg and @csullivan on 2022_10_25
From previous conversation, possibility of representing pad/crop separately from the layout transform. This would allow algebraic proofs to be done in the simpler coordinate system, before applying the layout transform.
However, not all types of padding can be represented in this manner. As an example, suppose we want to pad a buffer such that every 8th value is padding. This could be represented in one step with a padded transform, but would require three steps when the padding is introduced in separate steps.
# With padded transforms
transform_layout(index_map = lambda i: [i//7, (i%7)%8], pad_value=0)
# With pad/crop and bijective transforms
insert_pad_crop(new_shape = [7*ceildiv(A.shape, 7)])
transform_layout(index_map = lambda i: [i//7, i%7])
insert_pad_crop(new_shape = [buf.shape[0], 8])
Any cancellation of the second pad/crop would need to be done after the layout transform. Therefore, we can't get away from performing algebraic proofs within the transformed layout.
While this is a somewhat contrived example, it could easily occur in
practice. Support a conv1d with filter size 2 uses vector operations
of size 8. The implementation uses a sliding window of size 8, which
advances by 7 elements at a time. (Assume alignment restrictions are
handled by a cache_read
.) Each application of the vector operations
would produce 8 values, the last of which is junk. If the output of
the conv1d is then matrix multiplied by a constant matrix, the above example
could be applied to the constant matrix. This would result in a pad
value (zero) at every location corresponding to a junk value, which
could be used to vectorize the matrix multiplication.
Essentially, yes. The difficulties involved in identifying a TIR block and hoisting out a compact representation are roughly the same as the difficulties in proving a memcpy in TIR.
Hmm. I suppose I'm not seeing the difficulty in proving two layout transforms to be inverses of each other. I would see three different cases for compact representations that could be canceled out.
A
layout_transform(A)
followed by alayout_transform(B)
. Since thelayout_transform
can introduce implicit padding, if eitherlayout_transform
introduces padding, the sequence of two transforms introduces padding, and is therefore not a no-op. The transformations cancel out ifA(B(indices)) == indices
and both transformations are bijective.An
inv_layout_transform(A, pad_value=x)
followed by alayout_transform(B, pad_value=y)
. Theinv_layout_transform
can crop out padding, which is then added back in by theinv_layout_transform
. The two compact representations cancel out ifA
is equivalent toB
, andx == y
.A
layout_transform(A, pad_value=x)
followed by aninv_layout_transform(B, pad_value=y)
. Thelayout_transform
can introduce implicit padding, which is removed by theinv_layout_transform
. The two compact representations cancel out ifA
is the same asB
.An
inv_layout_transform(A)
followed by aninv_layout_transform(B)
. Since theinv_layout_transform
can crop out implicit padding, if eitherinv_layout_transform
crops out padding, the sequence of two inverse transforms changes the size of the buffer padding, and is therefore not a no-op. The transformations cancel out ifB(A(indices)) == indices
and both transformations are bijective.