Skip to content

Instantly share code, notes, and snippets.

@brendanzab
Last active April 22, 2022 06:30
Embed
What would you like to do?
Pondering the next version of Pikelet

Pondering the next version of Pikelet

Pikelet mascot

I've been thinking about a rewrite of the initial implementation of Pikelet. Since experimenting with 'semantic type checking' in rust-nbe-for-mltt, I'm thinking that I'll want to scrap the current implementation. The reasoning being that I've learned enough now that I want to take an entirely different approach to building the language, starting in an end-to-end fashion, building up a set of integration tests from the start.

What follows is some things I've been thinking about. Nothing is set in stone, but I'd love to hear your feedback in order to refine and improve my thoughts!

Goal and vision

Dependently typed functional programming is the future! And I don't think it needs to be at odds with low-level procedural programming. Provided the right tools, we should be able to reason about how our high level code gets translated into low level code without too many conceptual leaps and magical reasoning. Recent and ongoing advances in the state of the art of substructural type systems are getting us closer to this goal, and I don't think we should have to wait years for this to filter down to industry - we should be in a position to accept this soon after it is off the academic presses.

It is time to beat turnstiles into technology. The elegance and efficiency of linear types should allow functional languages to finally go "main-stream".

Henry G. Baker

Problems to solve

Systems languages tend towards dependent types and staged computation over time

For example:

  • statically sized types
  • generating lookup tables at compile time
  • constexpr in C++, D, Rust, Nim, Zig, Jai (Zig's comptime is pretty interesting)
  • constant template params in C++, D, Nim, Zig, Jai
  • trait induction, and now constant generic parameters in Rust
  • macro languages have no access to the type system
  • templates have access to types, but are integrated into the type system an ad-hoc way
  • the iterator and future APIs in Rust - allowing for explicit fusion of chained computations, with few intermediate allocations using parametrised adaptor types

Library examples:

Formally verified systems programming is challenging

  • hard to integrate current languages with theorem provers
  • dependently typed languages compile/extract to code that is slower than code written in languages with:
    • explicit phase separation between runtime values and compile time types
    • better support for inlining and erasure
  • dependently typed languages tend to assume a GC and a runtime, unless you manually lower code to a simpler representation, for example to CompCert or Low*.
  • manual proofs or tactics based proofs are annoying in the common case
  • solvers can be hard to wrangle with
  • cost-benefit between proofs, solvers, and property testing/fuzzing

Side effects are not tracked in the type system

  • testing effectful code in languages like Rust often requires the use of dependency injection
  • hard to know what is panic/exception-safe - need to fuzz test
  • separate systems for async/await are created in Rust and C++
  • languages like Nim only track effects statically (without handlers), preventing their use as an alternative to dependency injection for async and testing
  • languages with effects and handlers don't optimise well for the 'statically known handlers' case - lack of focus on 'handler fusion'

High level, declarative descriptions of data-types are conflated with low-level data layout

  • bit fields
  • struct of arrays vs arrays of structs
  • built-in tagged union optimisations
  • Hard to have efficient extensible records and variants
  • Forced into thinking about low-level operational concerns (eg. how should this field be allocated?) when it would be nice to defer those to a later place, potentially in a polymorphic way.

Tooling is often an afterthought, for example:

  • holes
  • editor integrations
  • error messages
  • package management
  • code formatting
  • refactoring support

Hard to integrate current programming languages with academic work

Lots of exciting work in programming language design and type theory at the moment - dependent types, effects and handlers, coeffects and grading - but it seems hard to integrate into the the current breed of systems languages cleanly (Rust, D, Zig, Nim, C, C++, etc.).

Formally verified compilers are challenging for most of us to write, but we need to prepare for it. Building large, monolithic languages can compound this issue (eg. C++, Rust, etc).

Wish list

What follows is an ambitious (perhaps overly ambitious) set of things I'd like to explore in the next version of Pikelet.

A unified language for programming, metaprogramming, and scripting

Systems languages inevitably end up tending towards encroaching on the domain of dependent types and staged metaprogramming, but often it's done in a way that is 'bolted on' after the fact, or does not adequately take advantage of existing research. We want to try to do better by taking advantage of:

  • dependent type theory
  • dependent records for modules (see 1ML)
  • elaborator reflection for macros and tactics
  • explicit staging in the type system using modal types
  • allow the interpreter to be included at runtime, for scripting
  • custom literals, parser extensions

A marriage of high and low level data type programming

  • Unboxed data types (records, inductive types, closures, etc)
  • Non-uniform memory layout
  • Custom data packing
    • High-Level Abstractions for Low-LevelProgramming [Thesis]
    • Bringing Effortless Refinement of Data Layouts to Cogent [Paper]
    • Erlang's bit syntax [Docs]
  • Abstract over/defer data layout choices - eg. Struct of Arrays (SoA) / Array of Structs (AoS)
    • You Can Have It All: Abstraction and Good Cache Performance [Paper]
    • Slides, and a few words on representation polymorphism [Blog Post]

Graded modal types

This subsumes the various contextual capabilities that we want to enforce in systems programming. Granule is a project that is currently at the forefront of pushing this research, and we should pay close attention to it! I'm thinking it might be handy to make use of a very similar core language in Pikelet. Also interesting to watch is Atkey and Wood's Context Constrained Computation.

Modalities can be used to describe many things we care about, for example:

  • coeffects
    • usage count (0, 1, ...)
      • The Syntax and Semantics of Quantitative Type Theory [Paper]
      • Resourceful Dependent Types [Paper] [Slides]
      • Quantitative Program Reasoning with Graded Modal Types [Paper]
    • sharing count (0, 1, ...)
    • staging/phase (static, dynamic, ...)
      • Phase Distinctions in Type Theory [Paper]
      • A Modal Analysis of Staged Computation [Paper]
    • memory (initialised, uninitialised)
    • unsafe code (safe, trusted, untrusted)
  • effects
    • allocators (GC, RC, arenas, etc..)
    • panic safety
    • IO
    • logging
    • HTTP

It remains to be seen where Rust-style borrowing fits in with the above, however.

Effects and handlers seem quite handy for:

  • aspect-ortiented programming without dependency injection
  • supporting aysnchronous scheduling without bifurcating the language ecosystem (see Multicore OCaml)

In order to ensure handlers are implemented as efficiently as possible we will most likeley want to ensure that they are fused with a high level of realiablility. We will want to ensure we have similar performance guarentees to what Rust's Future and Iterator APIs enjoy when handlers are statically known, and this should be easy for programmers to reason about.

  • fused-effects: A fast, flexible, fused effect system for Haskell [Hackage] [Github]
  • polysemy: Higher-order, low-boilerplate, zero-cost free monads [Hackage] [Github]
  • effective-rust: Algebraic Effects Library for Rust [Github]

Type errors are warnings, hole-driven development

We should continue compiling even in the presence of type errors. Invalid code will result in fragments of code that abort the program with the type error - kind of like GHC's 'defer type errors' thing.

Prepare for a verified core

While full formal verification is still out of reach of many of us programming language implementors, and verified compilers for dependently typed programming languages are out of reach even to researchers, at least to prepare for it somewhat.

  • Specify the type system during implementation with a markdown spec to begin with, later perhaps using some kind of structured specification format
  • Type preserving IRs, down to link time.
    • Compiling with Dependent Types [Thesis]

Stretch goals

These are things that may be trickier, but might be worth exploring in the future:

Transition from purely textural editing to projectional, structured editing

Many features of existing programming languages are limited by the standard plain-text representation of code. As we want to leverage more power out of modalities for describing capabilities at a finer grained level, we might discover that an ASCII-based syntax is too cumbersome compared to what we could have in a structural representation. That said, we can't really expect everyone to transition overnight, so perhaps having the option to use ASCII from the start might be helpful.

Lightweight verification

While building a proof assistant is not yet a core goal (dependent types are more of interest for metaprogramming and polymorphism), it might be nice in the future to allow for parts of programs to be verified, for example using property testing, solvers, tactics, or manual proofs.

Universe levels with a lifting operator

It'd like to use a McBride style of universe lifting, but it seems to be challenging in the presence of normalisation-by-evaluation. We might have to drop this for the initial version of Pikelet and fall back to type-in-type.

Implementation plan

  1. create a new branch an move the current implementation (including the docs) into an 'attic' subdirectory
  2. begin an end-to-end implementation, going from the concrete syntax, to a compiled binary/JIT (via cranelift), with end-to-end integration tests from the start
  3. begin adding higher level features building up to a small, dependently typed language
  4. ...

The Pikelet codebase should be well-documented, with integration tests from the start, and a specification and language guide that we grow during development.

Implementation questions

  • How do we break up these goals and achieve them incrementally?
  • How do we make it easy for people to contribute in parallel?
  • How can we make Pikelet useable and dog-foodable from an early stage? Perhaps we could use it as a game scripting language?
  • What implementation language should we use? Rust? Haskell? OCaml? Coq?
  • Should we use Salsa for incremental compilation, or would that tie us to Rust too heavily from the start?
  • How do we prepare for bootstrapping, to allow us to avoid the issue of a "trusting trust" attack?
  • It would be nice to support a great editor experience from the start, but an LSP-based solution would eventually become obsolete given the goal of structured editing. Should we try to support the LSP from the start anyway?

Caveats

Looking through the New Pikelet document, I think what you want is very desirable and also definitely a humongous research endeavor.

Aaron Weiss

The following is rather ambitious (perhaps too ambitious), and might not all be acheivable. The goal is more to figure out how different language features work together and inspire future research in the area of dependent types for systems programming. Soundness problems will probably be inevitable in the mid term - we'd have to wear that, and mitigate these as they come up.

@jorge-jbs
Copy link

I recently discovered Pikelet and your ideas are great! I hope the future of systems programming is a functional language. Here are some thoughts on the article. 🙂

Goal and vision

Dependently typed functional programming is the future! And I don't think it needs to be at odds with low-level procedural programming.

In fact, I think (safe) systems programming in a functional language cannot be done without dependent types. If we want to guarantee that some very low level code is safe, proofs are a good way to do it (in the absence of other mechanisms).

  • generating lookup tables at compile time
  • constexpr in C++, D, Rust, Nim, Zig, Jai (Zig's comptime is pretty interesting)
  • languages with effects and handlers don't optimise well for the 'statically known handlers' case - lack of focus on 'handler fusion'

Edwin Brady has worked on this for Idris in Scrapping your Inefficient Engine: Using Partial Evaluation to Improve Domain-Specific Language Implementation. Partial evaluation could solve:

  • Generating constants at compile time
  • Inlining
  • Specialization of type classes. We could have type classes be records by default, and only specialize them if the programmer wants to. Have specialization be optional would reduce compile times (specially for -O0) and code size (in the embedded world, size is very important, and some level of indirection is not tragic).
  • Monophormization (similar to specialization of type clases).
  • As you say: "statically known handlers".

Tooling is often an afterthought, for example:

[...]

  • package management

Package management shouldn't be an afterthought, but I don't like how other languages like Zig and Rust are doing it. If Pikelet is going to be a systems programming language it should integrate seamlessly with C. Rust has its own build system and package manager, and Zig its own build system and not yet a package manager. But you can't integrate them with C code in a straightforward way. I think using the meson build system would give tremendous advantages interoperating with C with respect to other languages. It is a good build system, it is already developed, has many features, and it even has a somewhat basic package manager built-in. Even if Pikelet wants to have its own build system and package manager, I think emitting meson files and letting meson do the hard would be the right way to go.

Implementation questions

[...]

  • Should we use Salsa for incremental compilation, or would that tie us to Rust too heavily from the start?

Related to the previous comment. Meson uses ninja under the hood, a very fast build system supporting incremental compilation.

Hard to integrate current programming languages with academic work

Haskell was made to unify research on lazy functional languages into one language. If Pikelet becomes the language to research on systems functional programming, that would be great :)

@brendanzab
Copy link
Author

Hello, and thanks for you thoughts!

Edwin Brady has worked on this for Idris in Scrapping your Inefficient Engine: Using Partial Evaluation to Improve Domain-Specific Language Implementation. Partial evaluation could solve:

Yup, aware of this paper, it's pretty cool! In practice I've found it to be a bit flaky and hard to use - you get lots of warnings about it failing to partially evaluate. Having stage annotations to guide the partial evaluation and a richer language around quoting and unquoting things might help this.

Package management shouldn't be an afterthought, but I don't like how other languages like Zig and Rust are doing it. If Pikelet is going to be a systems programming language it should integrate seamlessly with C. Rust has its own build system and package manager, and Zig its own build system and not yet a package manager. But you can't integrate them with C code in a straightforward way. I think using the meson build system would give tremendous advantages interoperating with C with respect to other languages. It is a good build system, it is already developed, has many features, and it even has a somewhat basic package manager built-in. Even if Pikelet wants to have its own build system and package manager, I think emitting meson files and letting meson do the hard would be the right way to go.

I'm a big fan of the UX of cargo, but I agree that it makes certain things like integrating with external package managers harder, so perhaps that is something we should be bearing in mind from the start. If I'm honest, I'm not really sure about tying myself to a specific, complicated build system like Meson, especially one written in Python 😅! Note that we might also want to think about other build systems too, like nix, etc.

@jorge-jbs
Copy link

Just wanted to comment on this (although a little bit late 😅)

I'm a big fan of the UX of cargo, but I agree that it makes certain things like integrating with external package managers harder, so perhaps that is something we should be bearing in mind from the start. If I'm honest, I'm not really sure about tying myself to a specific, complicated build system like Meson, especially one written in Python sweat_smile! Note that we might also want to think about other build systems too, like nix, etc.

It's true that Meson is somewhat complicated, but compiling C and C++ code is that complicated. A systems programming language needs to talk to C very easily, so having Meson already doing that is quite an advantage. Zig goes so far as to include a complete C compiler inside it. But I don't know what's your vision with interoperating with C.

And regarding Nix, it is actually quite useful, and using it alongside Meson would be a perfect match. Download the compiler and prebuilt packages with Nix, and compile your project and dependecies (that aren't prebuilt) with Meson.

I hope I can get to work with Pikelet some time, but for now I'm keeping a close eye to it. 🙂

@dumblob
Copy link

dumblob commented Jul 16, 2021

Hi @brendanzab! Thanks for this informative write up! From the text it seems to me you might not know about Kind, IMHO the state-of-the-art dependently typed practical language (and proof assistant at the same time). Hint hint - they have some nice web games written in it 😉.

Btw. Kind is a bit different than all other "proof" languages in that it doesn't need explicit support for provably correct parallelism because it's maximally parallel (mathematically guaranteed) by default. Though there is this issue of mapping all these millions of parallel operations to our low-parallelism computation devices (tens/hundreds/thousands of CPU cores) efficiently (and no, round-robin really is not efficient except for the edge case of mapping everything to a single CPU core). But that's not a fundamental limitation - it's "just" about finding the right abstractions 😉.

One more thing I'd like to point out is the elegance of how V lang approaches many of the shortcommings the "traditionally imperative" languages you mentioned (C++, D, Rust, Nim, Zig, Jai) have. V lang and its ecosystem actually provide quite unmatched potential for correctness proving, staged metaprogramming, etc. due to its vibrant (i.e. usable in practice) but highly restricted way of doing things (many restrictions are not immediately visible, but I can assure you it's significantly more than in Zig/Rust/Nim/Go as examples of languages making it practically "impossible" to prove correctness due to their low restrictiveness - yes both Zig and Rust are not restrictive enough IMHO 😉). But nobody started the effort yet if I'm not mistaken.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment