fastmachinelearning/hls4ml#117
Input images are 32x32x1, pool size is 2x2x1, giving output images of 16x16x1 (256 pixels).
Configurations which give the desired behaviour of II = reuse
are in bold.
There is almost a reliable trend for the choice of block partition factor.
For reuse <= 16 (only partitioning in the 1st image dimension) using block partition factor of 16 / reuse
for both input and output image gives sensible performance.
For reuse >= 32, the partition factor for the 1st dimension is set to 1, and now I scan over the factors for the second image dimension. Here sensible performance is obtained with image_dimension / (reuse / 16)
, or 16 / ((reuse / 16) / 2)
.
So for reuse=128
, for example, good performance is obtained with block factor 32 / (128 / 16) = 4
for the 1st dimension, and 16 / (128 / 16) = 2
for the 2nd dimension, or 16 / ((128 / 16) / 2) = 4
for both.
Either way the calculation seems to need to change slightly when partitioning also in th