paulp/notes.md

## notes.md

      
    Raw
  

              notes.md
            
          
Thesis: Problem is the language as much as the implementation of the compiler

Big problem is cyclic dependency in the typer where type inference and type-checking and implicit search mutually depend on each other. In other words: Can you write an interpreter without also writing a typer that at least implements type inference and implicit search. Can you do implicit search without implementing all of the type checker?
Type inference and implicit search must be disentangled. They know that in scala too, except that it will be close to impossible. Note that disentanglement doesn't preclude interaction, but it has to flow through much narrower, explicit, disciplined channels.
Consequence if this is true: If you want to simplify the compiler you would have to identify how to simplify the language to break up those cycles
There are innumerable simplifications available in the compiler which require nothing but sane software engineering. If you want to minimize the ESSENTIAL complexity of the compiler, these cross-dependencies must be addressed - but the major issue with the compiler is not its essential complexity but its accidental.


On REPL vs compiled

Personally, I don’t use the repl regularly for “real programming”
one reason: there seems to be a strange conflation of when static checking is done and when runtime checks:

why (or when) do you need type-checking when the code is instantly executed anyways?


How do you lift instantly executed code into abstractions?
How do you lift those into files?
In other words: when using static type-checking you let the compiler prove as much things as possible about the domain. When using the REPL you usually start with concrete examples. These two seem like the opposite way of doing things. I don't know how you would unify both of them.
Think of them as endpoints building toward one another: the repl's job is to help you visualize each endpoint and to connect them smoothly, which is quite a different task than the "classic" compiler job where either it works 100% or fails.


On incremental programming

The basic question is: Is there a constructive way to write a program interactively? I.e. is there a series of operations (AST diffs) starting from an empty program to the final program for which each step typechecks indepedently?
How would you introduce cyclic dependencies? Maybe the C-way of first introducing properly typed prototypes and fill those out only later?
There are many possibilities here - I want to investigate the literature before I say too much, but in general cycles can be handled by stubbing out enough to capture the cycle and then retraversing incomplete nodes. I managed this in the scala typer a long time ago, but as usual was eventually defeated by accidental complexity. Also, current repl semantics are antagonistic to incrementally introducing cyclical structures due to the way it shadows: this could be much improved.
If there is such a way, the editor for writing a program interactively would be a zipper over the syntax tree which you operate on.


On incrementally compiling programs

Miguel once stated the idea (AFAIR) that you could turn compilation on its head: you start with the compilation products and figure out a tree of compilation operations that you have to do to get the products. In the leaves you had source code or AST (snippets).
If this operation tree is materialized (i.e. all the dependencies are explicit and values you/the compiler can inspect) you can monitor dependencies and have changes in the sources trickle up the dependency tree to only recompile what is necessary
Advantage: If the compilation process is explicit it may be easier to find opportunities for making things parallel.
Disadvantages / Problems:

What about those cyclic dependencies in the compiler? (Obviously, they cannot be real cycles, but how do you figure out?)
In one way this approach to build a compiler would mean absolute laziness, every step in the compiler is only done if requested by a dependee. The opposite extreme of compiler design would be a batch compiler that per one phase does the same kind of work on all sources first before going to the next phase.
The current compiler design seems to be hybrid: it has phases that sources run through in batch but large parts of type-checking and loading dependencies are done lazily / on-demand. In some way, it is the worst of all possible worlds: batch compiling prevents better incremental compilation while (implicit) laziness introduces unpredictable execution flow and performance, and also proper parallelization is impossible because it's easy to run into deadlocks if laziness/dependencies are implicit. I have the impression that a proper batch compiler could be faster because a CPU will work faster if it gets to work on the same kind of operations at one time keeping code and related data in caches. So, one should have this trade-off in mind when designing a compiler that totally depends on laziness: you will have to reclaim possible performance losses of doing stuff on-demand by doing less stuff in the first place.
I expect to do exactly this: no laziness, and no laziness necessary - I will be doing literally orders of magnitude less work by designing for incrementalism. Everything is eager, everything is immutable (to the extent possible - which is very close to everything), and everything is explicitly parallelizable (again to the extent possible). I have plans for dealing with typing cycles, but they might go out the window when I see what works in practice.


Both ideas combined, incremental programming and incremental compilation, could be very powerful because if you found a way to construct a program where each step can be verified independently, the compiler itself wouldn't have to do type-checking any more but would have to just run the backend again for things that were invalidated by a change.
Exactly!