Skip to content

Instantly share code, notes, and snippets.

@HadrienG2
Last active March 13, 2024 00:31
Show Gist options
  • Star 47 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save HadrienG2/e9a875bdf98b528594f4e20f8176bb68 to your computer and use it in GitHub Desktop.
Save HadrienG2/e9a875bdf98b528594f4e20f8176bb68 to your computer and use it in GitHub Desktop.
Making Rust a perfect fit for high-performance computations

Hello, Rust community!

My name is Hadrien and I am a software performance engineer in a particle physics lab. My daily job is to figure out ways to make scientific software use hardware more efficiently without sacrificing its correctness, primarily by adapting old-ish codebases to the changes that occured in the software and computing landscape since the days where they were designed:

  • CPU clock rates and instruction-level parallelism stopped going up, so optimizing code is now more important.
  • Multi-core CPUs went from an exotic niche to a cheap commodity, so parallelism is not optional anymore.
  • Core counts grow faster than RAM prices go down, so multi-processing is not enough anymore.
  • SIMD vectors become wider and wider, so vectorization is not a gimmick anymore.
  • RAM performance increased much slower than FLOPs, so data layout and cache management are now more important.
  • GPUs are now ubiquitous, and becoming an increasingly important part of the scientific computing landscape.
  • Programming languages grew up, gaining features like the ability to do more work at compile time.

I think Rust could help at the task of writing performant computational software, because its expressiveness, flexibility and degree of low-level control rivals that of C++, which is the de facto dominant programming language of this community where sophisticated abstractions and high performance are both desired. All the while being vastly easier to learn, and to use correctly and efficiently at the same time:

  • The borrow checker greatly eases writing performant parallel programs that perform few copies and memory allocations.
  • Immutability by default and absence of undefined behaviour make debugging of Safe Rust less frequent and much easier.
  • Trait bounds are a great improvement over the tragic template error handling situation in C++. Unlike C++ concepts, their use is mandatory, and they are available today and shipping in all Rust libraries as we speak.
  • Extension traits and simple closure syntax give huge extension power to third-party libraries like faster and rayon. They greatly extend the reach of what can be done in a crate, without needing language and standard library integration.
  • In general, crates.io puts the Rust library ecosystem on everyone's reach, removing the barriers to code reuse that cause endless reimplementations of the same concepts in C and C++.
  • Rust's rigorous handling of numerical data avoid many common C++ bugs originating from implicit conversions, careless manipulation of floating-point values, and implied types of number literals.
  • And finally, Rust can, to some degree, interoperate with existing C++ codebases, enabling piecewise integration.

However, like all software projects, scientific projects have significant inertia and dislike moving to a new technology if they feel that they will lose anything in the process. So before Rust can become a reference language for writing scientific software alongside C++ and Fortran, we probably need to fix the parts where Rust is not yet as powerful as its competition.

Here are the big things that I think Rust needs before it can be considered a serious competitor to C++ in this area:

  • Fixed-size arrays, which avoid the use of expensive dynamic memory allocation, must become first-class citizens
    • It should be possible to pass arrays of variable size to a function, guaranteeing that bound checks will be elided.
    • It should be possible to return arrays of variable size from a function, depending on an input const parameter.
    • There should be no magic array length threshold beyond which usability falls down a cliff.
    • The ergonomics of array initialization and iteration should match those of vector initialization and iteration.
    • Libraries based on arrays like arrayvec or nalgebra should not have such hacky APIs.
  • Portable SIMD should be an ergonomic, built-in feature of the Rust standard library.
  • Rust should provide higher-quality libraries, tutorials and documentation for scientific software development.

And here are some smaller fixes that would improve Rust ergonomics for scientific work:

  • It should be possible to guarantee that some floating-point computations will be done at compile time.
  • The borrow checker should not be so picky about array borrows that are known to be independent at compile time.
  • Mathematical functions of f32 and f64 should be usable in prefix form, as is the norm in other languages used in science.
    • Note that this can probably be provided by a library using the num crate.
  • There should be an easy way to configure float printout by number of significant digits (aka "engineering notation").
  • GPU kernels should be writable in a Rust-like language, instead of going back and forth with C.
  • Release build compile times should be improved, as debug builds are often unusably slow on realistic problems.

From reading this list, it should be clear that the single biggest language-level feature that Rust needs to be taken more seriously by the scientific software community, in my opinion at least, is more const and array capabilities. There are many ongoing developments in this area, including the recent stabilization of some const fn support and the incoming implementation of const generics, so I am hopeful that by Rust 2021, we can get to a nice state. If usability for scientific computation is made a priority of Rust in 2019, we might get there earlier.

Standard primitives for portable SIMD would give Rust a significant edge over C++'s SIMD library chaos, an outcome which I would much prefer to replicating that chaos on crates.io. I am therefore very happy to see that RFC 2366 is a thing, and hope that it will continue making good progress.

Speaking personally, these two language-level roadblocks are the main things that still prevent me from advertising Rust at work and justifying use of work time to investigate the other items. When it comes to the other points, a lot of excellent work has already been done on them, and I think a "high performance Rust" working group could easily get the language through the last mile and make it feel right for this use case if it were made a priority of the next annual iteration or the next edition.

I should also point out that video game development is another area where Rust is close to maturity (and, in fact, already being experimented internally by some major studios). Many of the improvements which I hinted at above would also indirectly help this use case, as video games perform a significant amount of CPU and GPU computations internally, and anything which improves the ergonomics and performance of such computations therefore indirectly helps game development.

To summarize, my wish for Rust 2019 and beyond is for use of Rust in high-performance computations to receive more love.

@scottmcm
Copy link

scottmcm commented Jan 9, 2019

It should be possible to guarantee that some floating-point computations will be done at compile time.

Shameless plug: https://internals.rust-lang.org/t/quick-thought-const-blocks/7803?u=scottmcm

@mxxo
Copy link

mxxo commented Aug 12, 2019

Great writeup! This is a coherent set of issues that would have a huge impact on scientific computing in Rust.

I don't want to take attention away from your main points but I've also been thinking about dependency management.
Even for relatively simple projects, there's a wall of dependencies pulled in by cargo.
Every crate is another attack surface and HPC clusters are a very tempting target.

I think the risk is higher for those smaller, specialized crates with few users since they don't receive same attention as a standard language crate.
It's likely an electromagnetic field crate would have fewer users than a serialization library, for example.

My personal experience with scientific computing in C++ is there's a trusted set of standalone libraries the lab uses (Eigen, Lapack, VTK, etc.) and everything else is reinvented. Because installing libraries is such a pain, there's a big incentive to keep the number of dependencies low.

Maybe the scientific computing ecosystem in Rust will evolve in a similar way, with a relatively small number of "building block" libraries (nalgebra, rayon, serde, etc.) but without the horrible pain of C++ dependency management.

@smr97
Copy link

smr97 commented May 29, 2020

Very interesting writeup, I agree with the issues you raise about fixed sized arrays, and GPU usability as well. Shameless plug for those looking for a fast and configurable task splitting scheduler: https://github.com/wagnerf42/rayon-adaptive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment