Mentee name | Prathamesh Tagore |
Organization | RISC-V International |
Mentors | Hongbin Zhang, Wei Wu |
Project title | RISC-V MLIR Convolution Vectorization |
Project page | https://mentorship.lfx.linuxfoundation.org/project/f994928b-8998-4cd3-b66e-c576aa99c9d5 |
Project repository | buddy-mlir, buddy-benchmark |
Design draft of developed algorithm | DIP 2D Correlation |
Image processing operations have been optimized in a platform specific way for fixed length registers in many popular open source image processing libraries like OpenCV. Though this optimization is very effective on supported set of processors, the cost of maintaining and improving code is high. Since vector registers of all supported processors are handled separately in such implementations, programmer has to maintain, optimize and fix bugs in all of these implementations separately which leads to slower release cycles and makes it difficult to ensure compatibility on all user platforms. Moreover, intrinsic bloat created due to difference in architectures of processors is making it even more difficult to develop and maintain such implementations.
This project primarily intends to solve this problem by creating a generic vectorised implementation of 2D correlation using MLIR. We intend to generate platform specific high performance IR(intermediate representation) using platform inpependent code. Our objectives are as follows :
- Develop a platform independent implementation of 2D correlation using MLIR.
- Complement it with useful features such as handling variable anchor point positioning and boundary extrapolation.
- Compare obtained results with OpenCV's implementation for verifying accuracy.
- Benchmarking and iterative optimization of novel implementation.
-
A novel MLIR dialect named digital image processing(DIP) dialect was created which encapsulates operations and lowering passes used for generating high performance IR code for image processing.
-
A custom algorithm for handling IP specific attributes was built on top of coefficient broadcasting strip mining(CBSM) approach for vectorised 2D correlation.
-
Above mentioned algortihm was developed into a MLIR lowering pass and was used in 2d correlation operation encapsulated by DIP dialect.
-
Support for variable anchor point positioning was added, all points in provided kernel can now be specified as anchor point and the algorithm will take care of alignment. It will also assign correlation results to appropriate pixel(s) in output image.
-
Support for custom boundary extrapolation was developed so that the user can choose their favourable way of boundary extrapolation as per their application. As of now, supported options are :
- Constant Padding : Uses a constant for padding whole extra region in input image for obtaining the boundary extrapolated output image. (kkk|abcdefg|kkk)
- Replicate Padding : Uses last/first element of respective column/row for padding the extra region used for creating the boundary extrapolated output image. (aaa|abcdefg|ggg)
We are working on adding more boundary extrapolation techniques to this list.
-
Since the developed implementation works for any vector length, it will benefit directly from increasing vector register lengths of upcoming processors.
-
Some architectures such as RISC-V and ARM SVE have native support for variable length vector registers and are expected to increase the base size of their SIMD registers in upcoming implementations, our work is properly tuned with them and would benefit a lot from their future versions as well.
-
DIP dialect's implementation performed better than OpenCV's implementation for small kernels(3x3, 5x5) on AVX 512 but lost speed while processing larger kernels. We are currently investigating this behaviour.
- We are working on adding more IP operations in DIP dialect.
- We are trying to improve performance of 2D correlation implementation in DIP dialect and bring it at par with state of the art platform dependent versions.
- We are also working on publishing our completed work which is mainly focussed on developing a platform independent vectorised version of image processing related operations.