wesleywiser/cgu_partitioning_meeting.md Secret

## cgu_partitioning_meeting.md

      
    Raw
  

              cgu_partitioning_meeting.md
            
          
    Title: Code Generation Unit Partitioning Meeting
Estimate: 1 meeting
Type: Technical
Summary

Currently, LLVM is responsible for backend code generation in rustc.
One of the largest factors that affects LLVM performance, besides optimization level, is how code is organized across LLVM modules (code generation units in rustc lingo).
Since partitioning code into code generation units (cgu's) is so critical for both compile-time and run-time performance, the MIR Opt working group would like to explore possible improvements to the current algorithm.
Motivation

What are the goals of this change? What problem are you trying to solve?
The MIR Opt working group has been making good progress over the last year at implementing optimizations to improve the quality of the MIR we use to drive code generation.
The goal of these improvements has been to lower LLVM compile-time and in many cases, this has been successful.
However, sometimes improvements to codegen-ready MIR has led to regressions in LLVM performance.
The most recent instance of this was in rust-lang/rust#71298 where we discovered that constant propagating arguments to function calls led to some serious regressions.
We believe that the regressed incremental tests are occurring because the cgu partitioning algorithm is unluckily putting the patched function in a larger cgu which regresses performance of the test case.
Improving the cgu partitioning algorithm would likely yield improvements to a number of compilation time test cases particularly the incremental ones.
Details

The current algorithm has been largely stable since at least June 2018 and has been extensively tested in the field.
Some attempts have been made since that time to tweak various parts of the algorithm for better performance usually with very mixed results.
(See for example, rust-lang/rust#65281 and its performance.)
Challenges


This algorithm serves a wide variety of use cases and does a decent job at all of them.
This makes improving it tricky because what improves one use case often hurts a different one.


Deciding what "acceptable losses" are has historically been difficult which often leads to changes not being merged because of some regressed tests.


Nobody on the MIR Opt working group is an expert on this and it's not clear that we currently have an expert on the compiler team to consult.


Risk/reward factor: Designing an "optimal" algorithm would probably result in bigger performance gains but how long would that take? Tweaking the current algorithm might produce short-term wins but, since recent attempts have been unsuccessful, it's not certain there is any low-hanging fruit left.


Key design questions


Is there currently an expert on the compiler team we should be talking to?


Does anyone have ideas for improving the current algorithm?


Does anyone have ideas for a different algorithm that would perform better?


Is regressing some types of compilation performance more acceptable than others?
For example, how much do we care about fat LTO + release vs regular release vs debug?


What should the performance criteria be for merging changes?

"No regressions": the default and the "safest" answer but makes it very challenging to land any improvements.
"big enough to offset small losses"
"bias toward improvements on large crates"
others?


Ideally we would identify several ideas to try with the current algorithm and one or two different algorithms that could be implemented and tested in parallel.
Collection of instances where we believe this is happening:


rust-lang/rust#71298
rust-lang/rust#71840
rust-lang/rust#71911
rust-lang/rust#71946
rust-lang/rust#72189 (comment) (this one is positive because we got lucky)