Skip to content

Instantly share code, notes, and snippets.

@nagisa
Last active March 8, 2020 03:09
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nagisa/a311a0dab09851397f266076130eefb6 to your computer and use it in GitHub Desktop.
Save nagisa/a311a0dab09851397f266076130eefb6 to your computer and use it in GitHub Desktop.

Notes are denoted as quoteblocks, as such

The top level heading category should be named something like "Backend"

Backend

This is where we generate code. It is made of a couple of stages:

Monomorphisation collector

@mw is a good person to ask about this, I think

Based on calls and stuff figures out what we will be acting on in subsequent steps. Given a function like:

fn banana() { 
    peach::<u64>();
}

monomorphisation collector will give you a list of [banana, peach::<u64>] – these are the functions that will have machine code generated for them. Collector will also add things like statics to that list. Not sure about external function declarations, need to look at the code.

https://github.com/rust-lang/rust/blob/master/src/librustc_mir/monomorphize/collector.rs

Lowering MIR to code generator IR (usually LLVM)

Might make sense to split this into mutliple subchapters by topic, so that it is easy to find things.

e.g.

Codegen-time analysis (if there’s enough text to make it worth splitting from the rest of the discussion about MIR code)

Constants

MIR code

Intrinsics

Debug Info

Glue code generation (i.e. implicit code like what is generated for drops of trait objects)

...

Acts on the list of the symbols obtained from the collector and generates the LLVM-IR from generic MIR. The actual monomorphisation is performed as we go, while we do the translation.

At the very high level the entry point is:

https://github.com/rust-lang/rust/blob/4007d4ef26eab44bdabc2b7574d032152264d3ad/src/librustc_codegen_ssa/base.rs#L496

Which eventually, for functions, reaches the code in:

https://github.com/rust-lang/rust/tree/master/src/librustc_codegen_ssa/mir

with entry point here:

https://github.com/rust-lang/rust/blob/4007d4ef26eab44bdabc2b7574d032152264d3ad/src/librustc_codegen_ssa/mir/mod.rs#L122

The code is split into modules which handle particular MIR primitives. librustc_codegen_ssa::mir::block will deal with translating blocks and their terminators. The most complicated and also the most interesting thing this module does is generating code for function calls, including the necessary unwinding handling IR. Similarly librustc_codegen_ssa::mir::{statement, operand, place, rvalue} deal with translating mir::{Statement, Operand, PlaceRef, Rvalue}. This code is fairly straightforward so it is hard to write something useful about these modules.

Before function is translated a number of simple and primitive analysis passes will run to help us generate simpler and more efficient LLVM-IR. An example of such an analysis pass would be figuring out which variables are SSA-like, so that we can translate them to SSA directly rather than relying on LLVM's mem2reg for those variables. The anayses can be found at https://github.com/rust-lang/rust/blob/master/src/librustc_codegen_ssa/mir/analyze.rs

Usually a single MIR basic block will map to a LLVM basic block, with very few exceptions: intrinsic or function calls and less basic MIR statemenets like assert can result in multiple basic blocks. This is a perfect lede into the non-portable LLVM-specific part of the code generation. Intrinsic generation is fairly easy to understand as it involves very few abstraction levels in between and can be found at

https://github.com/rust-lang/rust/blob/master/src/librustc_codegen_llvm/intrinsic.rs

Everything else will use the builder interface, this is the code that gets called in librustc_codegen_ssa::mir::* modules that was discussed a couple of paragraphs above.

https://github.com/rust-lang/rust/blob/master/src/librustc_codegen_llvm/builder.rs

Another interesting thing to include here is how constants are generated. That works somewhat differently and, sadly, my knowledge here is fairly rusty. Has interactions with miri too.

Running LLVM, linker and metadata generation

Do we want to explore LLVM internals? Might be useful, but also sounds out of scope to me.

Unclear whether it makes sense to separate these into sub-chapters or not. Left them as one so far. Linking is definitely bound to be a long topic though.

Once LLVM IR for all of the functions and statics, etc is built, it is time to start running LLVM and its optimisation passes. LLVM code is grouped into modules, of which there can be multiple to aid in multi-core utilisation. These modules are what we refer to as codegen-units. These units were established way back during monomorphisation collection phase.

Once LLVM produces objects from these modules, these objects are passed to the linker along with, optionally, the metadata object and an archive or an executable is produced.

It is not necessarily the codegen phase described above that runs the optimisations. With certain kinds of LTO, the optimisation might happen at the linking time instead. It is also possible for some optimisations to happen before objects are passed on to the linker and some to happen during the linking.

alex is the best person to write about linking I think, especially about LTO and stuff.

This all happens in the back-back-end. The code for this can be found in librustc_codegen_*/back directory. Sadly, this piece of code is not really well separated into LLVM-dependent code (goes into librustc_codegen_llvm) and backend independent code (goes into librustc_codegen_ssa) – the codegen independent code contains a fair amount of code specific to the LLVM backend.

Once these components are done with their work you end up with a number of files in your filesystem corresponding to the outputs you have requested.

A good idea would be to note down how exactly queries link

@bjorn3
Copy link

bjorn3 commented Dec 9, 2019

def banana() {

nit: *fn

@nagisa
Copy link
Author

nagisa commented Dec 9, 2019

Fixed, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment