Skip to content

Instantly share code, notes, and snippets.

@meshtag
Last active January 12, 2022 03:59
Show Gist options
  • Save meshtag/c31498b4075e35598df66bfcc0c6b2ab to your computer and use it in GitHub Desktop.
Save meshtag/c31498b4075e35598df66bfcc0c6b2ab to your computer and use it in GitHub Desktop.

LFX fall mentorship report for RISC-V MLIR Convolution Vectorization

Details

Mentee name Prathamesh Tagore
Organization RISC-V International
Mentors Hongbin Zhang, Wei Wu
Project title RISC-V MLIR Convolution Vectorization
Project page https://mentorship.lfx.linuxfoundation.org/project/f994928b-8998-4cd3-b66e-c576aa99c9d5
Project repository buddy-mlir, buddy-benchmark
Design draft of developed algorithm DIP 2D Correlation

Project Description

Image processing operations have been optimized in a platform specific way for fixed length registers in many popular open source image processing libraries like OpenCV. Though this optimization is very effective on supported set of processors, the cost of maintaining and improving code is high. Since vector registers of all supported processors are handled separately in such implementations, programmer has to maintain, optimize and fix bugs in all of these implementations separately which leads to slower release cycles and makes it difficult to ensure compatibility on all user platforms. Moreover, intrinsic bloat created due to difference in architectures of processors is making it even more difficult to develop and maintain such implementations.

This project primarily intends to solve this problem by creating a generic vectorised implementation of 2D correlation using MLIR. We intend to generate platform specific high performance IR(intermediate representation) using platform inpependent code. Our objectives are as follows :

  • Develop a platform independent implementation of 2D correlation using MLIR.
  • Complement it with useful features such as handling variable anchor point positioning and boundary extrapolation.
  • Compare obtained results with OpenCV's implementation for verifying accuracy.
  • Benchmarking and iterative optimization of novel implementation.

Summary of work done

  • A novel MLIR dialect named digital image processing(DIP) dialect was created which encapsulates operations and lowering passes used for generating high performance IR code for image processing.

  • A custom algorithm for handling IP specific attributes was built on top of coefficient broadcasting strip mining(CBSM) approach for vectorised 2D correlation.

  • Above mentioned algortihm was developed into a MLIR lowering pass and was used in 2d correlation operation encapsulated by DIP dialect.

  • Support for variable anchor point positioning was added, all points in provided kernel can now be specified as anchor point and the algorithm will take care of alignment. It will also assign correlation results to appropriate pixel(s) in output image.

  • Support for custom boundary extrapolation was developed so that the user can choose their favourable way of boundary extrapolation as per their application. As of now, supported options are :

    • Constant Padding : Uses a constant for padding whole extra region in input image for obtaining the boundary extrapolated output image. (kkk|abcdefg|kkk)
    • Replicate Padding : Uses last/first element of respective column/row for padding the extra region used for creating the boundary extrapolated output image. (aaa|abcdefg|ggg)

    We are working on adding more boundary extrapolation techniques to this list.

  • Since the developed implementation works for any vector length, it will benefit directly from increasing vector register lengths of upcoming processors.

  • Some architectures such as RISC-V and ARM SVE have native support for variable length vector registers and are expected to increase the base size of their SIMD registers in upcoming implementations, our work is properly tuned with them and would benefit a lot from their future versions as well.

  • DIP dialect's implementation performed better than OpenCV's implementation for small kernels(3x3, 5x5) on AVX 512 but lost speed while processing larger kernels. We are currently investigating this behaviour.

Current work :

  • We are working on adding more IP operations in DIP dialect.
  • We are trying to improve performance of 2D correlation implementation in DIP dialect and bring it at par with state of the art platform dependent versions.
  • We are also working on publishing our completed work which is mainly focussed on developing a platform independent vectorised version of image processing related operations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment