nikomatsakis/parallel-experiment-plan.md Secret

## parallel-experiment-plan.md

      
    Raw
  

              parallel-experiment-plan.md
            
          
    Background and goal

This is a plan to enable support for parallel compilation by default in a rapid but responsible fashion.  In particular, we'd like to begin with a period of experimentation and requesting explicit feedback before changing the defaults.
One complication: in order to allow people to opt-in to using parallel compilation, the compiler needs to use more  locks internally, which implies a small amount of overhead -- in the final state, this overhead is recouped by the benefits of using more threads (and of course over time we can work to reduce this overhead in many ways). So there will be a small period when we are doing experimentation where the compiler will get slightly slower -- though we have set thresholds that should ensure this is largely unnoticeable.
Background and goal

Thresholds

We establish two thresholds (these are preliminary figures, we may adjust them):
A major regression is a regression in compilation time that is both >5% and greater than 1 second.
The average improvement is the average improvement in compilation time across a set of benchmarks when using 2 cores.
Plan

We plan to go through two experimentation phases before finally enabling by default. Each phase lasts for approximately one release cycle. Note that the plan is that support for parallelism will begin to ride the trains as soon as the experimentation period begins -- so, if no major obstacles are found, then parallel support (opt-in or opt-out, as appropriate) will ride the trains as usual into the stable release.

Experimentation (opt-in):

Support for parallelism is enabled, but it is opt-in. This implies some amount of overhead, as described above.
We plan to enter this state as soon as we see no major regressions (as defined above) on the perf test suite.
We make an announcement and request that people kick the tires (as described below).
Presuming that no major regressions are reported, we go to the next phase.

Question: Do we also require an average improvement of some % to be reported? On which tests?
If major regressions are reported that we can't fix yet, we will revert parallel support and try again.


Experimentation (opt-out):

Parallel queries are enabled by default, but there remains a way to opt out (while retaining the other forms of parallelism,
such as multiple processes and LLVM-based parallelism, that we currently use).
We expect the average user to start seeing benefits.
Presuming that no major regressions are reported, we go to the next phase.

If major regressions are reported that we can't fix yet, we will revert parallel support and try again.


Enabled by default:

This is the end-state. Parallelism is enabled by default.
You cannot separately disable parallel queries and other forms of parallel execution anymore, you can only request that we avoid parallelism overall.


Optimistic calendar

If all goes well, the phases would work like this:


Nightly version
Start date
Phase


1.36
ASAP
Experimentation (opt-in)


1.37
May 23 or so
Experimentation (opt-out)


1.38
July 4 or so
Enabled by default


Evaluation period

We plan to widely advertise this transition plan, probably via blog posts on the Rust blog, but possibly just with internals posts.
During the experimentation phases, there will be an internals thread for feedback. We would like people to test:

Enabling parallelism with 100% of available hyperthreads
Enabling parallelism with 50% of available hyperthreads (i.e., just physical cores)
Enabling parallelism with 1 thread

and report the performance results for each of those cases. Of course we are also interested in correctness bugs. The goal here is to help us develop automatic heuristics for selecting how many cores to use.
How to enable parallelism

One place we had some disagreement was how to allow users to opt-in and out-opt from parallelism. The idea is that there should be some simple switch that:

controls whether queries execute in parallel, but leaves other pre-existing forms of parallelism unaffected
permits users to control the number of threads, so that we can have people test the three scenarios above.

One proposal was to add a  RUSTC_PARALLEL_QUERY environment variable. This variable would be ignored once we land parallel queries by default, but during the experimentation period, it would control the number of threads used. It could be set to 1 (use 100% of available cores), 0.5 (use 50% of available cores), and 0 (use a single thread). After the experimentation period, we would probably just ignore this environment variable. The major downside of this option is that it is somewhat unusual and probably not the kind of switch we want long-term.
Another proposal was to use a binary switch to turn on/off query-based parallelism, and then use RUSTFLAGS with -j to allow users to manually specify the number of cores. This avoids the "three-way" setting above, but requires users to know how many cores they have, and using RUSTFLAGS may lead to other compilations (e.g., in cross-compilation scenarios).
Details to resolve


What is the exact environment variable to enable opt-in and opt-out?
Should we advertise the experimental period and general plan on the main blog, or wait until things are enabled by default?
Nightly version	Start date	Phase
1.36	ASAP	Experimentation (opt-in)
1.37	May 23 or so	Experimentation (opt-out)
1.38	July 4 or so	Enabled by default