Codified https://www.youtube.com/watch?v=N6b44kMS6OM from 2019
A compiler has a pipeline of passes:
- ==lexing== the ==source== to get ==tokens==
- ==parsing== the ==tokens== to get the ==AST==
- semantically analyzing the ==AST==, type checking the ==AST==
- applying optimizations
- ...
- Previously: Compiler runs and we see output
- Today: Much more dynamically and integrated into the IDE
- LSP made that possible "easily"
- Code needs to be written somewhere, it won't magically come out of the void
- Dependencies matter, it might help language design
- For example, in Rust it's possible to have a struct definition and impls inside a function, which are invisible to the outside
- Which also means that impls of outer structs are possible
- And those impls propagate to the outside
-
$\implies$ All function bodies might contain impls for outer types
- Strictly separating phases can never work anyway
- See const expressions for array lengths, for example
- Which involve type checks and eval again
- In the case of the JVM, lazy class file loading
- Macros
- Inferred types across function boundaries
- Specialization
- Some traits need to be solved, some not
- See const expressions for array lengths, for example
- Errors should not just be
throw new TypeError()
- They should also not be
if (error) { return semi_fake_but_legal_value }
- Instead: Sentinel value that signifies "bad user code is here"
- If this sentinel value is seen, its error has already been reported
- No further reporting necessary, just propagate the sentinel up the chain
- There is no "fallible" compiler operation
- It always succeeds
- But it might produce an "error" value
- E.g. type enums or the like also need to have an
Error
variant
- Actual error reporting is then done through side channels
- MIR
- First compiled to unoptimized version
- Walk over all call sites
- Get the optimized version of every function
- Inline it
- Works fine as long as not panicking
-
$\implies$ Instead of naively trying to get the optimized MIR, just ignore it if it's the same
-
- Two functions that call each other,
foo
andbar
- Might error in one case while trying to inline
- Might error in the other
- How the heck does one solve it?
- "Master" query that resolves the whole graph
- Which outputs the list of [[Clique|strongly connected components]] to process
- Name resolution
- Trait resolution
- "Spans"
- Editing a comment does affect something, namely where errors need to be displayed
- Different approaches
- Concatenating all input files into one string, indexed by 32-bit indices
- Compact but not nice for incremental compilation
- Solve offsets from previous siblings and length in the AST node. Track only the AST node ID when referenced, and only recompute the starting offset when needed
- Concatenating all input files into one string, indexed by 32-bit indices
- Also whitespace and comments
- Compiler should just ignore it
- Useful for formatting
- If not provided, can be worked around if spans and input files are present (which is what rustfmt does)
- ASTs don't need to be parsed fully
- If a compiler receives a diff of what's changed, it can reparse only those
- Not related to [[red-black tree]]
- ==Black== tree layer
- Keeps all the information and details
- Doesn't have lengths of AST points, so is only relative to the current point
-
$\implies$ Can be reused
- ==Red== tree layer
- Works ontop of the ==black== tree layer to actually form the whole program
- Start with an on-demand style
- Best practices are still unsure and need to be codified
- How to represent ASTs?
- How to represent location information?
- How to handle cycles?