Skip to content

Instantly share code, notes, and snippets.

@nrc
Last active August 2, 2023 16:40
Show Gist options
  • Star 112 You must be signed in to star a gist
  • Fork 8 You must be signed in to fork a gist
  • Save nrc/a3bbf6dd1b14ce57f18c to your computer and use it in GitHub Desktop.
Save nrc/a3bbf6dd1b14ce57f18c to your computer and use it in GitHub Desktop.
Rust tooling

Rust developer tools - status and strategy

Availability and quality of developer tools are an important factor in the success of a programming language. C/C++ has remained dominant in the systems space in part because of the huge number of tools tailored to these lanaguages. Succesful modern languages have had excellent tool support (Java in particular, Scala, Javascript, etc.). Finally, LLVM has been successful in part because it is much easier to extend than GCC. So far, Rust has done pretty well with developer tools, we have a compiler which produces good quality code in reasonable time, good support for debug symbols which lets us leverage C++/lanaguge agnostic tools such as debuggers, profilers, etc., there are also syntax highlighting, cross-reference, code completion, and documentation tools.

In this document I want to layout what Rust tools exist and where to find them, highlight opportunities for tool developement in the short and long term, and start a discussion about where to focus our time and energy to have maximum impact. Note that much of this document concerns long term developemnt and most goals will be low priority until post-1.0. Some of the issues here are not pure tooling issues (the compiler (!), syntax extensions, ...) and will hopefully discussion willoutgrow this document pretty quickly.

Existing tools

(Please help expand this list!)

Tools it would be nice to have

  • Refactoring tools Basic things such as rename method/field/variable and more complex things like inline/outline function (note that this is slightly more complicated in Rust than other lanaguages because Rust is not compositional with respect to outlining).

  • RustFix - a tool for translating code from an old version of Rust to a new version of Rust. C.f., GoFix. I think this is just a refactoring tool which can generate invalid code (i.e., code that is valid in a later version of Rust). The refactorings it can do are more language-level though.

  • A REPL - Read, eval, print loop. Useful for introducing new users to Rust, tutorials, experimentation with small programs.

    • related - an embeddable JIT
  • Deadlock detection.

  • Instrumentation of tasks to give a picture of how concurrent tasks are communicating.

  • Style checking (as opposed to style fixing (RustFmt)). Lints and make tidy do some of this. See for example, FXCop

  • Code coverage

  • XCode fix-its - I couldn't find much information about how these work or even if it is possible to make them work for another lanaguage. Could be trivial or impossible or anywhere in bewtween.

  • Implementation specific debugging aids - e.g., what is the vtable layout of this object? What is the layout of this struct?

  • ccache

    This is part of a crazy plan to simplify rustc's build system - if no-op compiles can be extrememly quick, effectively zero-cost, then we might never need make files. We would not care about dependencies and could just have a script which compiles everything. We would require accurate dependency information and stable/deterministic builds. It might need to be built in to the compiler. I expect if we have proper incremental compilation this could use the same infrastructure but be at the scale of crates rather than parts of a crate.

  • distcc

And some more research-ey ideas:

  • Lifetime visualisation/explanation

  • Rust-specific memory profiling (this might be a silly idea, but I wonder if we can use the static lifetime info from Rust with dynamic memory profiling to give useful information to the programmer)

  • Static/dynamic analysis for unsafe blocks

Some general issues

Its worth thinking about how Rust can get an awesome tools eco-system. I think tooling is a great way for people to get involved with Rust - projects tend to be small-ish and immediately useful. Since there are lots of UI issues around tooling, it is often useful to have several different versions of a tool, rather than a canonical implementation.

I believe the best way for the core Rust community to foster tool development (as well as working with/helping people who want to work on tools) is to provide useful and comprehensible APIs to the compiler (that is, APIs which can be used without understanding the details of the compiler) and to provide re-usable abstractions which can be leveraged by many tools (debuginfo is a great example of this, although it is kind of unique due to the level of support from existing tools).

As for why tooling requires special consideration rather than just plain reuse of compiler (or extension/plugin) APIs: compiling is basically a one-way street from source code to machine code. The only time we really need to go backwards is when generating error messages. Tooling often wants to do a round trip, e.g., start at source code, compile to annotated machine code, and then get back from the machine code annotations to the source code (debugging), or go from source code to type checking and then back to source code (cross-referencing). These backwards steps mean that tools often need different API, e.g., debuginfo, or much more spans than required for errors. It also means we need more flexibility - different tools need radically different information.

Other high level questions:

  • where to focus to have maximum impact?
  • what is most important?
  • where is the momentum?

Open questions

What exactly do tools need from the compiler (information and interfaces)?

Brain dump:

  • macro support (we've got a good start with span stacks, I hope we can do better in order to make tools which work well with macros)
  • spans
  • stringified idents/names, etc.
  • type info (and other info, e.g., cfg, metadata, dependencies)
    • APIs for this (both methods provided by the compiler, and dumps of this info as JSON or other data)
    • how to best organise and present this information? Currently tools have to do a lot of data juggling to make this information useful, can we move some of that to the compiler or to some low level tool?
  • identifiers, hashes for various items in the language
  • deterministic/stable builds
    • mostly this is about symbol names. We use hashes in those which (I think) are non-deterministic in some ways. That means we don't get stable builds.
    • furthermore, I would like to able to change the internals of one function without changing the symbol names for other items (does this have a name?). Currently, those hashes depend indirectly on node ids, which change very easily. It would be nice to not use node ids, etc. in these hashes so we have more stability in our builds.
    • there are other issues here which I have forgotten (FIXME)
  • printing, pretty printing (this is provided in several different ways by the compiler, we should think about unifying some of these methods and perhaps moving full pretty printing (of programs, rather than snippets) into a separate tool and instead provide more fundamental information for printing. The trouble is that the compiler often uses some of this functionality for messages, etc. so it is not as simple as just ripping out a bunch of code)
  • identification/search
  • flexibility - e.g., incremental/partial compilation
    • especially for tools like IDEs
    • useful for: realtime error reporting, code completion, navigation/cross-reference, etc.
    • need fine grained incrementality - be able to re-compile a single item without parsing, resolving, etc. the rest of the file.
    • error tolerance - be able to keep compiling despite errors; especially in the parser
    • codemap in libsyntax is not a suitable abstraction for incremental/long running compilation
    • be able to output messages (errors, warnings, etc) in different ways (e.g., for an IDE or warnings for DXR)
  • if the compiler is going to be long running (as part of an IDE), then we must be more careful about memory management, making sure everything is freed properly, etc.

How can the compiler be reused/extended? And how do we make this straightforward? By which I mean, how do we make the compiler most usable as a library and easiest to customise using plugins?

We should make any exposed APIs as separate from the compiler internals as possible. That gives us the most flexibility in refactoring the compiler later. Any exposed APIs should be guarded with the most flexible stability attributes.

The compiler

I also want to think about what we want to do with the compiler. This could be a separate topic, but seeing as the compiler is the most important developer tool, I thought I'd chuck my thoughts in here.

What are our goals for the compiler? Here are my ideas, in very rough order of priority:

  • complete and correct i.e., a faithful implementation of the Rust language
  • emit high quality code
  • flexible both in how it operates and in the different flavours of code it can produce:
    • different levels of optimisation
    • various debugging outputs
    • incremental compilation
    • oneliner execution - i.e., compile and execute code snippets in any context, for example when debugging and paused at a break point. Even, one day, edit and continue (be able to patch code into executable whilst debugging)
  • fast the faster we compile code, the better - waiting for the compiler sucks we need automation here to ensure improvement and prevent regressions possible idea - have a dedicated timing server: pull rust, build with time_passes, 3x, take average, make a graph of the results, if there is a regression, email anyone who merged a patch since the previous pull, repeat. Shouldn't matter about llvm because we only look at the time_passes results.
  • extensible
  • engineering quality easy to improve/extend less susceptible to bugs well documented
  • modularity (could be part of extensibility and engineering quality) librustc is currently monolithic
  • useful for tooling
  • memory efficient the less memory we use during compilation, the better
  • backend agnostic it would be nice to be able to swap out the LLVM backend for something else
  • be an exemplar of well-written Rust (it is pretty much the opposite right now)

Once we get to 1.0, or shortly before, we should think about how well we meet these goals and how we can improve things. Only the first has really been high priority up till now.

Expanding on a couple of those goals:

Engineering quality

By improving the quality of the software, we make the compiler easier to improve and extend and less susceptible to bugs. This should save developer time and attract more contributors.

I have some ideas for large scale change below. On a smaller scale, the compiler could benefit from auditing of older code for refactoring opportunities, adhering to modern style conventions (this will be much easier if we have refactoring tools), using more idiomatic Rust patterns and modern language features, better documentation, removing obsolete or under-utilised code, and use of clearer abstractions.

Useful for tooling

Tools will use the compiler in two ways - as a library and as a framework (current examples: Rustdoc uses the compiler as a library, DXR uses it as a framework). In both cases, the compiler is more useful if it has a high quality and stable API.

For better use as a library, we should aim to stabilise some parts of the compiler as an API, generally the highest level functionality. In order to preserve flexibility in our implementation, we should probably add an extra API layer, rather than exposing the internals of the compiler. However, to some extent, we will have to commit to exposing and stabilising some data structures. For use as a framework, we need to identify parts of the compiler which can be used as hooks, both on a small and large scale (e.g., a callback when visiting a node of the AST during an existing pass, vs adding a pass).

I would like this high level API to include as much information as we can from the compiler, such as debuginfo, intermediate information from type checking, borrow checking, etc., metadata (even where the compiler would not normally generate it), and so forth.

For syntax extensions, I would like to separate the AST generated and used by libsyntax and the AST used by rustc. The former would be very close to the source code and exposed as part of the API to syntax extensions. The latter would not be exposed and would be a more transformed version of the AST. We would commit to only changing the libsyntax AST according to the semver rules, the rustc AST could change however we like. The first stage of rustc would convert the libsyntax AST to its internal AST.

I would also like to make available more high level information about a program being compiled which is currently implicit in the compiler. For example, finding all implementations of a trait or uses of a variable. This could be computed by an external tool, but this kind of information is likely to be widely used and we would be helping tool authors by making it available. I expect a blessed compiler plugin which could be used by other plugins is the best solution.

Again, good documentation is really important here - the compiler is only a useful component for tool authors if the exposed APIs are well documented and there are good guides to using the APIs.

If we do this right, I hope to see the compiler used as a library in sophisticated tools such as IDEs - incremental compilation, type information, warnings (including lints), macro expansion, code search, etc. all available as APIs.

Plans for the compiler

General things

If we're going to get serious about compiler speed (and I think we should, as well as getting proper incremental compilation, etc.), we need better infrastructure to prevent regressions there. Currently, we really only have isrustfastyet. A possible idea is to have a dedicated timing server (not a VM) which will pull rust, build three times with time_passes, find the average for each pass, make a graph of the results, if there is a regression, email anyone who merged a patch since the previous pull, repeat.

@mdornseif
Copy link

rustfix seems like an interesting idea.With the sloooow Python 2 -> Python 3 transition for me https://github.com/asottile/pyupgrade https://python-modernize.readthedocs.io/en/latest/fixers.html and https://docs.python.org/3/library/2to3.html#fixers became indispensible in preparing to move to Python 3 (but beeing still Pytohn 2.7 compatible).

In the Javascript world I find recast / jscodeshift / https://github.com/sejoker/awesome-jscodeshift very interesting but have not done much with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment