The Chapel compiler can compile both to C or LLVM. The main idea behind this project was to improve code generated by LLVM backend of compiler. Part of the project was to do as many tasks as possible in this github issue. Initially, this list contained only couple of introductory tasks (the first few), but it grew as the project progressed.
The biggest accomplishment of this summer was improving vectorization in LLVM IR. Here is a link to chapel performance graphs where the performance spike in favor of LLVM is visible on May 10th (PR merged on May 9th).
- #5511
- A big part of the work was passing flags to the backend compiler - this PR fixed a bug which prevented passing more complex flags.
- #5788
- I kept results of my experiments trying to see why vectorization wasn't occuring in this PR.
- #6046
- As a result of experiments in #5788, I discovered that the vectorizer was relying on undefined behaviour (UB) when integers were overflowing. It was enough to make signed adds on integers cause UB when overflowing. Incredibly simple fix, but gave huge performance boost.
- #6135, #6280, #6531
- There was some work necessary to make working with LLVM background easier like writing tests and inspecting LLVM IR on different stages of optimization pipeline. Dumping LLVM IR per Chapel module was too verbose and hard to read, so the solution involved dumping LLVM IR per function in chapel code, from specific optimization stage using compiler option.
- #6533, #6548
- Add
parallel_loop_access
metadata to loops that are vectorizable to force vectorization.
- Add
- #6706, #6802
- LLVM had no information about
const
variables in Chapel code. These PRs addsllvm.invariant.start
metadata for variables marked asconst
. This task took a lot of time due difficulties related to debugging.
- LLVM had no information about
- #6833
- This PR addresses one of the performance tests that was running slower for the LLVM version, because of slight differences between implementations (like using integers of different width).
- #6926
- Generating fast flags when fast math is enabled
--no-ieee-float
- Generating fast flags when fast math is enabled
- #6965
- References in chapel cannot be
null
. Tiny PR that adds information about function parameters beingnonnull
in LLVM IR.
- References in chapel cannot be
- #7070
- Chapel delegates operations on complex numbers to libc. When compiler generated C code there was no problem with performance because GCC that was run on C code used special builtins instead of dynamically linking with libc (or in this case with libm), clang doesn't have many of these builtins. Solution to the problem was to create extra C header with wrappers for builtins supported by clang, wrapper with implementation of functions that aren't supported by clang that developers will unfortunately have to add in future. PR includes script that makes writing those wrappers easier and prepares devs for future fight. This was noticed when working on
mandelbrot-complex
test performance problem.
- Chapel delegates operations on complex numbers to libc. When compiler generated C code there was no problem with performance because GCC that was run on C code used special builtins instead of dynamically linking with libc (or in this case with libm), clang doesn't have many of these builtins. Solution to the problem was to create extra C header with wrappers for builtins supported by clang, wrapper with implementation of functions that aren't supported by clang that developers will unfortunately have to add in future. PR includes script that makes writing those wrappers easier and prepares devs for future fight. This was noticed when working on
- #7092
- Finishes what we started, every signed operation should cause UB on overflow.
- #7101
- Make sure that proper instruction is emitted when comparing against constants.
Experiments were big part of working during this project, as some things weren't documented, missing or required investigation to see how some things work.
- Impact of noalias and alias.scope metadata on peformance
- Impact of different alias analysis on performance
- Attempt to make writing tests for LLVM even more convenient
List of PRs closed within GSoC period
There is still quite a lot of work to be done that is being tracked in already mentioned github issue and that list grew during GSoC period and I'm sure it will grow even further in the future. LLVM is constantly being developed and new features are being added which might be useful for the compiler. There is potentially a lot of work that can be done for LLVM backend if one knows LLVM. One example was adding noalias and alias.scope metadata that wasn't mentioned in Performance Tips for Frontend Authors and wasn't on the list prior to GSoC - this is something I'd like to add after GSoC.