Instantly share code, notes, and snippets.

@Zaerei Zaerei/rust_blog.md Secret
Created Jan 12, 2019

Embed
What would you like to do?
# Rust 2019-2021 -- Math and Simulation

Rust 2019-2021 -- Math and Simulation

I. Intro

Some of my first real, non-toy projects in Rust, way back in 2015, were numeric. I was taking a graduate-level Reinforcement Learning and MDP class at the time, and shortly after transitioned to using it in my research in AI. Rust has a lot going for it in terms of package management, speed, and typing.

Unfortunately, it falls apart in several fundamental areas when it comes to mathematical and scientific uses. Some of these are ecosystem problems (i.e. the work just isn't there) and some of it are more core problems (e.g. type-level numerics, which will be solved Soon™, hopefully).

Similarly, another area common in my field, simulations, are also nascent and have many of the same issues. In this context, "simulation" is somewhat broad -- technically many simulations are simply Ordinary or Partial Differential Equations solved over some number of steps, but largely they're closer to game develolpment than anything. I'm somewhat impressed with Rust's hobbyist game development scene, but the current issues in that sphere are well documented.

First let me address what this post is not. It is not an admonishment of the direction Rust has focused on to date, nor is it a request to drop everything and focus on making this a language of purely mathematical or game development interest. The purpose is to point out areas where I think Rust could shine with attention from the "core" part of the community. I also won't focus on a lot of mathematical areas where Rust could potentially focus its attention, but I'm not as familiar with (for instance, somebody wanting to propose Rust for type system theory or theorem proving can explore that ground much better than I can).

II. Landscape Survey

This section is a brief survey of the Machine Learning, Numeric, and Game/Simulation landscape in Rust to date. Please do not take this as complete. I'm somewhat tuned into this scene, but for purposes of the blog post I'm mainly relying on my own memory and various areweXyet websites. If we actually decide to focus our attention in any or all of these directions in any way, more exploration is needed.

Note that there are some libraries that look neat, and even some which are still maintained, but I didn't have room to really talk about or weren't familiar enough with to comment on (statrs as an example).

Machine Learning

Are We Learning Yet is a fairly good analysis of this, but to save time looking through it, I'd note that the vast majority of projects are dormant or abandoned (defined as no notable activity in the past 6 months). Most of the remaining active projects are bindings.

I would attribute this to the foundational elements that are missing. Note that for this section I'm silently discounting the "scientific computing" page, as that will be the next section.

The major notable project that's still living seems to be rustlearn. However, it is still fairly nascent from the perspective of an ML researcher, only containing a handful of elements you'd cover in an undergrad level ML class. This isn't to criticize the project, these elements are foundational to a lot of research and still widely used, but it does pale in comparison to say, Python's scikit-learn. Even one of these elements -- the Support Vector Machine, is a libsvm binding ultimately (not to get too much NIH syndrome).

We notably do not have much in the way of neural networks, which is a major blocker in any ML research happening. The biggest endeavor I've known of -- something truly Rusty -- was leaf, which seems to have at some point morphed into juice. However, both have stagnated development. Similarly for other NN architectures on the page -- even the ones with "recent" commit dates are largely small unit test revisions or README updates.

I was somewhat impressed by rustml, which may be resuming development. It does rely on a whole host of C or C++ bindings, but then so does the numpy landscape.

Numeric And Scientific Computing

The story here is somewhat better than ML (which is to be expected as this foundational to ML), but I also attribute some of the advancements to the game development scene. For instance Sébastien Crozet's nalgebra is foundational to a number of packages largely used in game development such as ncollide or nphysics. Similarly is cgmath, focused on 3D math which is more natively usable with graphics libraries. Finally, we have ndarray (not to be confused with the "n" series earlier) which is a more "traditional" picture of what a type-level N-dimensional matrix package looks like.

This is good and bad, nalgebra and ndarray actually has a few fairly promising ideas for generalized dimension-agnostic math, but is ultimately hampered by the lack of support for things like type-level numerics (ndarray in particular can get really gnarly to work with even for light use). The general focus on game dev and 3D spaces also hampers their use for the "other side", meaning big ol' matrices for ML or other numeric and scientific tasks. I would not want to try multiplying huge matrices in any of these.

There are a number of other packages that I'm not as familiar with, and thus can't really evaluate in depth. However, matrixmultiply does seem to make use of our recent SIMD functionality, and we have a few scattered BLAS, LAPACK, and even cuBLAS bindings (more on BLAS later).

I did not find any notable plotting libraries, but I may have missed them.

Simulations and Games

This is a big topic I don't think I can fully cover. Luckily, I have room to do a bit of a poor job here because Rust's hobbyist game dev space is so big I'm sure most people reading this have heard a lot about it.

I should note that the "framework" I'm most familiar with is specs, having contributed to it a few times myself. It's nice, but pretty bare bones (I haven't had the opportunity to use its more "end user" oriented cousin/importer Amethyst).

The other "big player" is Piston, which I haven't used myself, but have heard there's some friction in setting up and navigating.

Obviously we don't have any sort of "killer app" like Unreal, but I don't think we can at this point. It's simply not reasonable to expect that -- especially not when most of those big engines have backing from game companies with some cash.

The "lower level" portion of the game dev space is pretty nice too, gfx-rs is actually a fairly nice backend-agnostic middleware. It's probably not going to make it easy to push the polygons of something optimized to hell like Unreal, but it's functional and relatively easy to pick up if you have graphics experience.

I'd be remiss if I didn't mention rlua which is under development by an actual game studio -- Chucklefish. It has some serious warts to keep the wrapper "safe" in a Rust sense. It also doesn't play nice with specs' concurrent dispatch design (and I get the impression even Chucklefish had to do some finagling to get it to work internally). I'm not sure about Lua and Rust long term because of this incompatibility, but it use widely used in the industry and the other Rust-focused scripting languages are, for now, even less "there" than rlua.

Unfortunately, where it starts to fall apart is roughly where we start intersecting with the "Numeric and Scientific Computing" section. In particular ncollide and nphysics suffer heavily at the hands of the lack in basic math. Obviously some of this is just a time issue -- Bullet Physics doesn't just fall out overnight. But between lack of foundational elements, lack of maintainers/contributors, and so on, a lot of important libraries that handle things like collision are sparse at the moment.

III. So... why do we care?

Rust actually has a lot going for it in these spheres, more than I think a lot of people realize.

Performance-sensitive numeric computing has for a very long time been dominated by Fortran. I don't want to be overly reductive, but a very large part of this is because Fortran is heavily optimizable for numeric computation due to extremely strong aliasing guarantees. Guess what our borrowing system also provides, by happenstance? C++ and C both, of course, can be compiled with flags such as -noalias which will optimize the library or program as if there wasn't any aliasing, but Rust is built from the ground up with these guarantees.

In the applied scientific computing, and especially Machine Learning sphere, at the moment Python dominates with C++ as a somewhat distant second (except when extreme performance is required). I don't think we can totally steal Python's thunder here. Not for a very long time at least, but we can probably provide utilities that can at least make it a go-to backend and even a full environment for some people. The major difficulty here of course is that a lot of companies important in this sector such as Nvidia heavily favor C++ and C for their tooling.

However, I'd point out that one sub-area of ML that's really open for the taking is Reinforcement Learning. RL actually occupies a really weird space where it's about half really intensive computation and half (often intensive) simulation. Unlike pure number-crunching things like image processing, Python simply can't farm all the "intensive stuff" out to C or Fortran-land.

Maybe we don't want to cater to these areas, but I'd point out that there are big vacuums here that are in need of landscape updates, and Rust's active development and modern design could lead it to establish itself as a new "de facto" in at least some areas.

Julia has attempted to take some of this sphere, and keeps a lot of the ease-of-use Python has (as opposed to Rust's more prickly design to those used to dynamic typing). However, as of now it hasn't caught on and we can still be in a decent spot for composition. Notably, this isn't our sole area of focus as a language, so we won't fizzle out if we don't make it.

Also, while modern C++ has a lot of really nice features to keep yourself from footgunning, academic programmers often don't know how to use them properly and Rust can alleviate that. I've worked on more than a few large scale projects with academic coders and the C++ becomes very unsafe and hard to debug very fast. (We also have package management tools)

IV. What can we do?

I want to give some measurable goals to take or leave. These aren't all necessarily 2019 things, or even 2021 things, but things to be working towards in that meantime. I think for a lot of this, core Rust members need to help nurture because as I hope I illustrated somewhat in section II., we're moving fairly slowly.

BLAS and Rust

Remember when I said BLAS would come back later? Well, at the moment, most BLAS implementations are written in Fortran. I think Rust, especially with the new SIMD capabilities, would be in a good spot to try and write a BLAS, as well as parallel or further development of something that fulfills the same role but with a more Rust-y design (perhaps for eventual inclusion in the standard library).

Both of these would drive two key areas - compiler optimization for numerics, and type system improvements for the same.

I'm not going to sugar coat it. This will probably be frustrating, and we'll probably be blocked by a lot. Especially LLVM.

I also don't want to undersell that most Fortran-BLAS (such as OpenBLAS) isn't pure Fortran, a decent portion uses inline assembly. I suppose we have a choice in whether we want to write a pure Rust version to show off the language, or write it in mostly Rust to sell that a Rust-primary version can be truly competitive. I'd leave that up to the team writing it, with a small nudge towards the latter.

Small-D Numerics

By "Small-D" I mean 2D and 3D numerics. These are of particularly interest to games, and require some special consideration different from the ND case to be competitive in a performance sense. Things like type-level generics will help this, but I do think it may be worth trying to promote and nurture a very focused 2D or 3D math library and try to squeeze the performance out. This could, potentially, be merged with the above endeavors with the optimized 2D and 3D code relying on something like specialization, but I don't have strong feelings on the exact methods here.

Graphics, CUDA, Spir-V

This one is a bit fuzzier. However, the semi-recent release of Spir-V places Rust in an interesting position. We've already had a good deal of push in the Rust-wasm sphere.

Spir-V is an intermediate language for graphics cards that can be used for shader or compute modules. Rust could make an interesting source language for this, making development easier for things that require GPU use such as games or tensor calculations. We do have module processing functionality in rspirv, but I'm not sure how mature it is. In addition, LLVM has a Spir-V target, which can make the transition even easier for us. Our area would largely be the tooling around it.

In particular, the ability to use attributes to designate modules or functions as "to become Spir-V" in some way would be helpful, to keep things in one codebase, but I'm not sure how that would work in practice or how easy that is with the compiler toolchain we have.

I will note that the tensor/compute angle is particularly difficult. CUDA performs extraordinarily well and that's in part because Nvidia owns and develops it. It's hard to get that level of performance without first party support. It may be worth seeing, however unlikely, if Rust could, some day, become a first class language to Nvidia. The nvcc compiler is a derivative of LLVM, but who knows how easy it would be to integrate. This is somewhat of a pipe dream, but it would put us in a very good spot if it could happen. It would really boost our usability in the ML and numerics sphere, particularly with good tooling and language support for these features.

Simulations and Games

I have to admit, I don't have as strong of a "plan of action" here. I think getting numerics sorted out would be a big deal on its own. I'm hesitant for the nursery to "sponser" most projects in this sphere, but I think after some sore spots like numerics are sorted out we may get clear contenders and high quality projects that are worth nudging along.

V. Conclusions

I think Rust is well poised to move into some of these areas given the current state of both it as a language and the new developments happening in those fields at a technology level. This does not mean giving up our current avenues such as the documentation push, or the web and CLI push. I merely propose these as potential areas of interest for a strong, performant Rust that fills some niches that need filling.

Most of these goals likely won't be industry standards by 2021, but I don't think they'll ever be industry standards in year 20XX+3 either, whenever we choose to start working on them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment