MultisampledNight/Responsive Compilers.md

## Responsive Compilers.md

      
    Raw
  

              Responsive Compilers.md
            
          
    Codified https://www.youtube.com/watch?v=N6b44kMS6OM from 2019
Traditionally

A compiler has a pipeline of passes:

==lexing== the ==source== to get ==tokens==
==parsing== the ==tokens== to get the ==AST==
semantically analyzing the ==AST==, type checking the ==AST==
applying optimizations
...

$\implies$ was how rustc was written (and still kinda is)
The past vs. today


Previously: Compiler runs and we see output
Today: Much more dynamically and integrated into the IDE

LSP made that possible "easily"


$\implies$ compiler as an actor required rather than a batch mode processor
$\implies$ need to respond as quickly as possible
Why would one care about IDEs?


Code needs to be written somewhere, it won't magically come out of the void
Dependencies matter, it might help language design

For example, in Rust it's possible to have a struct definition and impls inside a function, which are invisible to the outside
Which also means that impls of outer structs are possible
And those impls propagate to the outside


$\implies$ All function bodies might contain impls for outer types


Strictly separating phases can never work anyway

See const expressions for array lengths, for example

Which involve type checks and eval again


In the case of the JVM, lazy class file loading
Macros
Inferred types across function boundaries
Specialization

Some traits need to be solved, some not


Error handling


Errors should not just be throw new TypeError()
They should also not be if (error) { return semi_fake_but_legal_value }
Instead: Sentinel value that signifies "bad user code is here"

If this sentinel value is seen, its error has already been reported
No further reporting necessary, just propagate the sentinel up the chain


There is no "fallible" compiler operation

It always succeeds
But it might produce an "error" value
E.g. type enums or the like also need to have an Error variant


Actual error reporting is then done through side channels

Cycles

Optimizations: Inlining


MIR

First compiled to unoptimized version
Walk over all call sites
Get the optimized version of every function
Inline it


Works fine as long as not panicking


$\implies$ Instead of naively trying to get the optimized MIR, just ignore it if it's the same


Non-deterministic results


Two functions that call each other, foo and bar

Might error in one case while trying to inline
Might error in the other


How the heck does one solve it?

"Master" query that resolves the whole graph
Which outputs the list of [[Clique|strongly connected components]] to process


Other cycle cases


Name resolution
Trait resolution

Tracking of location information


"Spans"
Editing a comment does affect something, namely where errors need to be displayed
Different approaches

Concatenating all input files into one string, indexed by 32-bit indices

Compact but not nice for incremental compilation


Solve offsets from previous siblings and length in the AST node. Track only the AST node ID when referenced, and only recompute the starting offset when needed


Contents of the AST


Also whitespace and comments
Compiler should just ignore it
Useful for formatting
If not provided, can be worked around if spans and input files are present (which is what rustfmt does)

Incremental re-parsing


ASTs don't need to be parsed fully
If a compiler receives a diff of what's changed, it can reparse only those

Example: Swift ==red== and ==black== tree layers


Not related to [[red-black tree]]
==Black== tree layer

Keeps all the information and details

Doesn't have lengths of AST points, so is only relative to the current point

$\implies$ Can be reused


==Red== tree layer

Works ontop of the ==black== tree layer to actually form the whole program


Conclusions


Start with an on-demand style
Best practices are still unsure and need to be codified

How to represent ASTs?
How to represent location information?
How to handle cycles?