Pondering the next version of Pikelet
I've been thinking about a rewrite of the initial implementation of Pikelet. Since experimenting with 'semantic type checking' in rust-nbe-for-mltt, I'm thinking that I'll want to scrap the current implementation. The reasoning being that I've learned enough now that I want to take an entirely different approach to building the language, starting in an end-to-end fashion, building up a set of integration tests from the start.
What follows is some things I've been thinking about. Nothing is set in stone, but I'd love to hear your feedback in order to refine and improve my thoughts!
Goal and vision
Dependently typed functional programming is the future! And I don't think it needs to be at odds with low-level procedural programming. Provided the right tools, we should be able to reason about how our high level code gets translated into low level code without too many conceptual leaps and magical reasoning. Recent and ongoing advances in the state of the art of substructural type systems are getting us closer to this goal, and I don't think we should have to wait years for this to filter down to industry - we should be in a position to accept this soon after it is off the academic presses.
It is time to beat turnstiles into technology. The elegance and efficiency of linear types should allow functional languages to finally go "main-stream".
Problems to solve
Systems languages tend towards dependent types and staged computation over time
- statically sized types
- generating lookup tables at compile time
- constexpr in C++, D, Rust, Nim, Zig, Jai (Zig's
comptimeis pretty interesting)
- constant template params in C++, D, Nim, Zig, Jai
- trait induction, and now constant generic parameters in Rust
- macro languages have no access to the type system
- templates have access to types, but are integrated into the type system an ad-hoc way
- the iterator and future APIs in Rust - allowing for explicit fusion of chained computations, with few intermediate allocations using parametrised adaptor types
Formally verified systems programming is challenging
- hard to integrate current languages with theorem provers
- dependently typed languages compile/extract to code that is slower than code written in languages with:
- explicit phase separation between runtime values and compile time types
- better support for inlining and erasure
- dependently typed languages tend to assume a GC and a runtime, unless you manually lower code to a simpler representation, for example to CompCert or Low*.
- manual proofs or tactics based proofs are annoying in the common case
- solvers can be hard to wrangle with
- cost-benefit between proofs, solvers, and property testing/fuzzing
Side effects are not tracked in the type system
- testing effectful code in languages like Rust often requires the use of dependency injection
- hard to know what is panic/exception-safe - need to fuzz test
- separate systems for async/await are created in Rust and C++
- languages like Nim only track effects statically (without handlers), preventing their use as an alternative to dependency injection for async and testing
- languages with effects and handlers don't optimise well for the 'statically known handlers' case - lack of focus on 'handler fusion'
High level, declarative descriptions of data-types are conflated with low-level data layout
- bit fields
- struct of arrays vs arrays of structs
- built-in tagged union optimisations
- Hard to have efficient extensible records and variants
- Forced into thinking about low-level operational concerns (eg. how should this field be allocated?) when it would be nice to defer those to a later place, potentially in a polymorphic way.
Tooling is often an afterthought, for example:
- editor integrations
- error messages
- package management
- code formatting
- refactoring support
Hard to integrate current programming languages with academic work
Lots of exciting work in programming language design and type theory at the moment - dependent types, effects and handlers, coeffects and grading - but it seems hard to integrate into the the current breed of systems languages cleanly (Rust, D, Zig, Nim, C, C++, etc.).
Formally verified compilers are challenging for most of us to write, but we need to prepare for it. Building large, monolithic languages can compound this issue (eg. C++, Rust, etc).
What follows is an ambitious (perhaps overly ambitious) set of things I'd like to explore in the next version of Pikelet.
A unified language for programming, metaprogramming, and scripting
Systems languages inevitably end up tending towards encroaching on the domain of dependent types and staged metaprogramming, but often it's done in a way that is 'bolted on' after the fact, or does not adequately take advantage of existing research. We want to try to do better by taking advantage of:
- dependent type theory
- dependent records for modules (see 1ML)
- elaborator reflection for macros and tactics
- explicit staging in the type system using modal types
- allow the interpreter to be included at runtime, for scripting
- custom literals, parser extensions
A marriage of high and low level data type programming
- Unboxed data types (records, inductive types, closures, etc)
- Non-uniform memory layout
- Custom data packing
- Abstract over/defer data layout choices - eg. Struct of Arrays (SoA) / Array of Structs (AoS)
Graded modal types
This subsumes the various contextual capabilities that we want to enforce in systems programming. Granule is a project that is currently at the forefront of pushing this research, and we should pay close attention to it! I'm thinking it might be handy to make use of a very similar core language in Pikelet. Also interesting to watch is Atkey and Wood's Context Constrained Computation.
Modalities can be used to describe many things we care about, for example:
- usage count (0, 1, ...)
- sharing count (0, 1, ...)
- staging/phase (static, dynamic, ...)
- memory (initialised, uninitialised)
- unsafe code (safe, trusted, untrusted)
- allocators (GC, RC, arenas, etc..)
- panic safety
It remains to be seen where Rust-style borrowing fits in with the above, however.
Effects and handlers seem quite handy for:
- aspect-ortiented programming without dependency injection
- supporting aysnchronous scheduling without bifurcating the language ecosystem (see Multicore OCaml)
In order to ensure handlers are implemented as efficiently as possible we will most likeley want to ensure that they are fused with a high level of realiablility. We will want to ensure we have similar performance guarentees to what Rust's Future and Iterator APIs enjoy when handlers are statically known, and this should be easy for programmers to reason about.
- fused-effects: A fast, flexible, fused effect system for Haskell [Hackage] [Github]
- polysemy: Higher-order, low-boilerplate, zero-cost free monads [Hackage] [Github]
- effective-rust: Algebraic Effects Library for Rust [Github]
Type errors are warnings, hole-driven development
We should continue compiling even in the presence of type errors. Invalid code will result in fragments of code that abort the program with the type error - kind of like GHC's 'defer type errors' thing.
Prepare for a verified core
While full formal verification is still out of reach of many of us programming language implementors, and verified compilers for dependently typed programming languages are out of reach even to researchers, at least to prepare for it somewhat.
- Specify the type system during implementation with a markdown spec to begin with, later perhaps using some kind of structured specification format
- Type preserving IRs, down to link time.
- Compiling with Dependent Types [Thesis]
These are things that may be trickier, but might be worth exploring in the future:
Transition from purely textural editing to projectional, structured editing
Many features of existing programming languages are limited by the standard plain-text representation of code. As we want to leverage more power out of modalities for describing capabilities at a finer grained level, we might discover that an ASCII-based syntax is too cumbersome compared to what we could have in a structural representation. That said, we can't really expect everyone to transition overnight, so perhaps having the option to use ASCII from the start might be helpful.
While building a proof assistant is not yet a core goal (dependent types are more of interest for metaprogramming and polymorphism), it might be nice in the future to allow for parts of programs to be verified, for example using property testing, solvers, tactics, or manual proofs.
Universe levels with a lifting operator
It'd like to use a McBride style of universe lifting, but it seems to be challenging in the presence of normalisation-by-evaluation. We might have to drop this for the initial version of Pikelet and fall back to type-in-type.
- create a new branch an move the current implementation (including the docs) into an 'attic' subdirectory
- begin an end-to-end implementation, going from the concrete syntax, to a compiled binary/JIT (via cranelift), with end-to-end integration tests from the start
- begin adding higher level features building up to a small, dependently typed language
The Pikelet codebase should be well-documented, with integration tests from the start, and a specification and language guide that we grow during development.
- How do we break up these goals and achieve them incrementally?
- How do we make it easy for people to contribute in parallel?
- How can we make Pikelet useable and dog-foodable from an early stage? Perhaps we could use it as a game scripting language?
- What implementation language should we use? Rust? Haskell? OCaml? Coq?
- Should we use Salsa for incremental compilation, or would that tie us to Rust too heavily from the start?
- How do we prepare for bootstrapping, to allow us to avoid the issue of a "trusting trust" attack?
- It would be nice to support a great editor experience from the start, but an LSP-based solution would eventually become obsolete given the goal of structured editing. Should we try to support the LSP from the start anyway?
Looking through the New Pikelet document, I think what you want is very desirable and also definitely a humongous research endeavor.
The following is rather ambitious (perhaps too ambitious), and might not all be acheivable. The goal is more to figure out how different language features work together and inspire future research in the area of dependent types for systems programming. Soundness problems will probably be inevitable in the mid term - we'd have to wear that, and mitigate these as they come up.