Mentee: Varun Samaga B L
Mentor: Jacob Boerema
Project link: https://summerofcode.withgoogle.com/programs/2024/projects/NmrndTt8
The massively parallel nature of many real-world computations has led to the development of specialized massively parallel hardware such as GPUs, TPUs and FPGAs. Many closely-related programming models exist such as CUDA and OpenCL. OpenCL is a popular framework for writing massively parallel code given its broader ecosystem and supported platforms. GIMP is a popular free open-source image manipulation tool often shipped as the default tool in many Unix-based distributions. GIMP uses GEGL internally for accelerating various image processing tasks. GEGL comes with OpenCL implementations for many operations. However, the OpenCL backend in GEGL was often slower than the baselines and resulted in incorrect outputs (and crashed sometimes).
This project aimed to revive the GEGL's defunct OpenCL infrastructure, prioritizing its reliability. OpenCL operations were previously causing frequent crashes, leading to significant user frustration and loss of work. Additionally, inconsistencies between OpenCL and non-OpenCL implementations resulted in numerous failed tests and the disabling of certain OpenCL operations. A successful completion of the project would allow the OpenCL backend to be enabled by default and allow the end-users to take advantage of hardware acceleration on their devices automatically.
This project addressed numerous issues within GEGL's OpenCL infrastructure. We fixed many broken implementations, updated outdated operations with new kernels, optimized some kernels, and fixed several numerical issues. We also upgraded GEGL's OpenCL from version 1.1 to 3.0 with a fallback to 1.2 for incompatible devices. As a result, OpenCL in GEGL is now stable enough to be enabled by default for developer testing.
The table below documents changes performed to OpenCL in GEGL.
| Change | Impact |
|---|---|
| Upgrade OpenCL version from 1.1 to 3.0 | Improved performance for devices with OpenCL 3.0 support with a fallback to OpenCL 1.2 for unsupported devices. |
| Disabled OpenCL Cache System | Stops GIMP from unexpected crashes caused by the faulty OpenCL cache system. This avoids the loss of unsaved work. |
| Fixed bugs and numerical issues | Fixes all OpenCL kernels to output correct results that match the reference implementation. This allows hash-based testing of results in the future. |
| Fixed the test runner for OpenCL tests in tests/opencl | Fixed a problem where all tests passed regardless of whether they produced correct results. |
The following table documents all the targeted operations and their current status.
| Staus | Icon |
|---|---|
| Enabled | ✔️ |
| Partially Enabled | ☑️ |
| Partially fixed but disabled | ❌ |
| Operation | Status |
|---|---|
| Alien Map | ✔️ |
| Color to alpha | ✔️ |
| Focus Blur[1] | ✔️ |
| Hue Chroma | ✔️ |
| Noise Reduction | ❌ |
| Oilify | ✔️ |
| SNN Mean | ✔️ |
| Sobel Edge Detector | ☑️ |
[1] The error in gegl:focus-blur was caused by the cache system's flushing mechanism interfering with the multi-threaded gegl:piecewise-blend operation. Disabling the OpenCL cache system indirectly resolved this issue. gegl:focus-blur consists of several operations, including gegl:vignette, which is accelerated by OpenCL and serves as an auxiliary input for gegl:piecewise-blend. The cache write-back mechanism was occurring while some threads of gegl:piecewise-blend had already completed execution, resulting in stale data being written to the buffer.
GEGL's OpenCL module has reached a stable state. Many operations have been updated, and all numerical issues have been resolved. The system appears robust but further testing is necessary.
OpenCL significantly enhances performance for compute-bound operations, often achieving speedups of 2-10x or more. However, for operations that are memory-bound or less compute-intensive, no significant impact on performance was observed.
I was unable to pinpoint the root cause of the issues with gegl:edge-sobel and gegl:noise-reduction and requires further investigation.
The existing OpenCL cache system did not appear to have a substantial impact on performance during testing. Therefore, disabling and completely redesigning the caching system might be a viable solution.
Some of the areas that need work are:
- Cache System: Fix the existing cache system or implement a new design to improve performance.
- Color Conversion: Update the OpenCL color conversion system to account for color spaces.
- Fix operations: Investigate issues caused due to math functions in gegl:edge-sobel and the cause of error in gegl:noise-reduction.
- OpenCL Coverage: Expand OpenCL support to highly compute-intensive operations to leverage hardware acceleration.
- OpenCL 3.0 Features: Exploit the advanced features of OpenCL 3.0 to enhance performance on compatible devices.
- Developer Tutorials: Create high-quality tutorials regarding the GEGL OpenCL API to guide developers in utilizing hardware acceleration for their operations.
- Generic changes
- Tests Related changes
- Operation related changes
- Alien Map
- Color to Alpha
- Hue Chroma
- Noise Reduction
- Oilify
- SNN Mean
- Sobel Edge detector