brendanzab/pikelet-next.md Secret

## pikelet-next.md

      
    Raw
  

              pikelet-next.md
            
          
    Pondering the next version of Pikelet


I've been thinking about a rewrite of the initial implementation of Pikelet. Since experimenting with 'semantic type checking' in rust-nbe-for-mltt, I'm thinking that I'll want to scrap the current implementation. The reasoning being that I've learned enough now that I want to take an entirely different approach to building the language, starting in an end-to-end fashion, building up a set of integration tests from the start.
What follows is some things I've been thinking about. Nothing is set in stone, but I'd love to hear your feedback in order to refine and improve my thoughts!
Goal and vision

Dependently typed functional programming is the future! And I don't think it needs to be at odds with low-level procedural programming. Provided the right tools, we should be able to reason about how our high level code gets translated into low level code without too many conceptual leaps and magical reasoning. Recent and ongoing advances in the state of the art of substructural type systems are getting us closer to this goal, and I don't think we should have to wait years for this to filter down to industry - we should be in a position to accept this soon after it is off the academic presses.

It is time to beat turnstiles into technology. The elegance and efficiency of linear types should allow functional languages to finally go "main-stream".
— Henry G. Baker

Problems to solve

Systems languages tend towards dependent types and staged computation over time

For example:

statically sized types
generating lookup tables at compile time
constexpr in C++, D, Rust, Nim, Zig, Jai (Zig's comptime is pretty interesting)
constant template params in C++, D, Nim,  Zig, Jai
trait induction, and now constant generic parameters in Rust
macro languages have no access to the type system
templates have access to types, but are integrated into the type system an ad-hoc way
the iterator and future APIs in Rust - allowing for explicit fusion of chained computations, with few intermediate allocations using parametrised adaptor types

Library examples:

Boost.Units (C++)
Eigen (C++)
Diesel (Rust)
nalgebra (Rust)
ndarray (Rust)
Servant (Haskell)

Formally verified systems programming is challenging


hard to integrate current languages with theorem provers
dependently typed languages compile/extract to code that is slower than code written in languages with:

explicit phase separation between runtime values and compile time types
better support for inlining and erasure


dependently typed languages tend to assume a GC and a runtime, unless you manually lower code to a simpler representation, for example to CompCert or Low*.
manual proofs or tactics based proofs are annoying in the common case
solvers can be hard to wrangle with
cost-benefit between proofs, solvers, and property testing/fuzzing

Side effects are not tracked in the type system


testing effectful code in languages like Rust often requires the use of dependency injection
hard to know what is panic/exception-safe - need to fuzz test
separate systems for async/await are created in Rust and C++
languages like Nim only track effects statically (without handlers), preventing their use as an alternative to dependency injection for async and testing
languages with effects and handlers don't optimise well for the 'statically known handlers' case - lack of focus on 'handler fusion'

High level, declarative descriptions of data-types are conflated with low-level data layout


bit fields
struct of arrays vs arrays of structs
built-in tagged union optimisations
Hard to have efficient extensible records and variants
Forced into thinking about low-level operational concerns (eg. how should this field be allocated?) when it would be nice to defer those to a later place, potentially in a polymorphic way.

Tooling is often an afterthought, for example:


holes
editor integrations
error messages
package management
code formatting
refactoring support

Hard to integrate current programming languages with academic work

Lots of exciting work in programming language design and type theory at the moment - dependent types, effects and handlers, coeffects and grading - but it seems hard to integrate into the the current breed of systems languages cleanly (Rust, D, Zig, Nim, C, C++, etc.).
Formally verified compilers are challenging for most of us to write, but we need to prepare for it. Building large, monolithic languages can compound this issue (eg. C++, Rust, etc).
Wish list

What follows is an ambitious (perhaps overly ambitious) set of things I'd like to explore in the next version of Pikelet.
A unified language for programming, metaprogramming, and scripting

Systems languages inevitably end up tending towards encroaching on the domain of dependent types and staged metaprogramming, but often it's done in a way that is 'bolted on' after the fact, or does not adequately take advantage of existing research. We want to try to do better by taking advantage of:

dependent type theory
dependent records for modules (see 1ML)
elaborator reflection for macros and tactics
explicit staging in the type system using modal types
allow the interpreter to be included at runtime, for scripting
custom literals, parser extensions

A marriage of high and low level data type programming


Unboxed data types (records, inductive types, closures, etc)
Non-uniform memory layout
Custom data packing

High-Level Abstractions for Low-LevelProgramming [Thesis]
Bringing Effortless Refinement of Data Layouts to Cogent [Paper]
Erlang's bit syntax [Docs]


Abstract over/defer data layout choices - eg. Struct of Arrays (SoA) / Array of Structs (AoS)

You Can Have It All: Abstraction and Good Cache Performance [Paper]
Slides, and a few words on representation polymorphism [Blog Post]


Graded modal types

This subsumes the various contextual capabilities that we want to enforce in systems programming. Granule is a project that is currently at the forefront of pushing this research, and we should pay close attention to it! I'm thinking it might be handy to make use of a very similar core language in Pikelet. Also interesting to watch is Atkey and Wood's Context Constrained Computation.
Modalities can be used to describe many things we care about, for example:

coeffects

usage count (0, 1, ...)

The Syntax and Semantics of Quantitative Type Theory [Paper]
Resourceful Dependent Types [Paper] [Slides]
Quantitative Program Reasoning with Graded Modal Types [Paper]


sharing count (0, 1, ...)
staging/phase (static, dynamic, ...)

Phase Distinctions in Type Theory [Paper]
A Modal Analysis of Staged Computation [Paper]


memory (initialised, uninitialised)
unsafe code (safe, trusted, untrusted)


effects

allocators (GC, RC, arenas, etc..)
panic safety
IO
logging
HTTP


It remains to be seen where Rust-style borrowing fits in with the above, however.
Effects and handlers seem quite handy for:

aspect-ortiented programming without dependency injection
supporting aysnchronous scheduling without bifurcating the language ecosystem (see Multicore OCaml)

In order to ensure handlers are implemented as efficiently as possible we will most likeley want to ensure that they are fused with a high level of realiablility. We will want to ensure we have similar performance guarentees to what Rust's Future and Iterator APIs enjoy when handlers are statically known, and this should be easy for programmers to reason about.

fused-effects: A fast, flexible, fused effect system for Haskell [Hackage] [Github]
polysemy: Higher-order, low-boilerplate, zero-cost free monads [Hackage] [Github]
effective-rust: Algebraic Effects Library for Rust [Github]

Type errors are warnings, hole-driven development

We should continue compiling even in the presence of type errors. Invalid code will result in fragments of code that abort the program with the type error - kind of like GHC's 'defer type errors' thing.
Prepare for a verified core

While full formal verification is still out of reach of many of us programming language implementors, and verified compilers for dependently typed programming languages are out of reach even to researchers, at least to prepare for it somewhat.

Specify the type system during implementation with a markdown spec to begin with,
later perhaps using some kind of structured specification format
Type preserving IRs, down to link time.

Compiling with Dependent Types [Thesis]


Stretch goals

These are things that may be trickier, but might be worth exploring in the future:
Transition from purely textural editing to projectional, structured editing

Many features of existing programming languages are limited by the standard plain-text representation of code. As we want to leverage more power out of modalities for describing capabilities at a finer grained level, we might discover that an ASCII-based syntax is too cumbersome compared to what we could have in a structural representation. That said, we can't really expect everyone to transition overnight, so perhaps having the option to use ASCII from the start might be helpful.
Lightweight verification

While building a proof assistant is not yet a core goal (dependent types are more of interest for metaprogramming and polymorphism), it might be nice in the future to allow for parts of programs to be verified, for example using property testing, solvers, tactics, or manual proofs.
Universe levels with a lifting operator

It'd like to use a McBride style of universe lifting, but it seems to be challenging in the presence of normalisation-by-evaluation. We might have to drop this for the initial version of Pikelet and fall back to type-in-type.
Implementation plan


 create a new branch an move the current implementation (including the docs) into an 'attic' subdirectory
 begin an end-to-end implementation, going from the concrete syntax, to a compiled binary/JIT (via cranelift), with end-to-end integration tests from the start
 begin adding higher level features building up to a small, dependently typed language
 ...

The Pikelet codebase should be well-documented, with integration tests from the start, and a specification and language guide that we grow during development.
Implementation questions


How do we break up these goals and achieve them incrementally?
How do we make it easy for people to contribute in parallel?
How can we make Pikelet useable and dog-foodable from an early stage? Perhaps we could use it as a game scripting language?
What implementation language should we use? Rust? Haskell? OCaml? Coq?
Should we use Salsa for incremental compilation, or would that tie us to Rust too heavily from the start?
How do we prepare for bootstrapping, to allow us to avoid the issue of a "trusting trust" attack?
It would be nice to support a great editor experience from the start, but an LSP-based solution would eventually become obsolete given the goal of structured editing. Should we try to support the LSP from the start anyway?

Caveats


Looking through the New Pikelet document, I think what you want is very desirable and also definitely a humongous research endeavor.
– Aaron Weiss

The following is rather ambitious (perhaps too ambitious), and might not all be acheivable. The goal is more to figure out how different language features work together and inspire future research in the area of dependent types for systems programming. Soundness problems will probably be inevitable in the mid term - we'd have to wear that, and mitigate these as they come up.