Skip to content

Instantly share code, notes, and snippets.

@eamsden
Last active March 27, 2023 22:08
Show Gist options
  • Save eamsden/2e0450ba51980c37012cad097234d588 to your computer and use it in GitHub Desktop.
Save eamsden/2e0450ba51980c37012cad097234d588 to your computer and use it in GitHub Desktop.

New Mars storyboard

Elevator pitch

An Urbit runtime with near-machine execution speed and NAS-equivalent addressable storage enables most classes of personal applications to be built directly on Urbit.

Wins

Near-machine execution speed

Currently standard library functions which implement highly-iterated loops, such as turn, are jetted to remove arm invocation overhead arising from indirect jumps, dynamic jet matching, and allocation for parameters. The majority of overhead in hot codepaths (such as Ames) is in arm invocations themselves, not in the code for arms.

This is replaced by static jet matching, which removes runtime overhead for jet matching whether a jet is found or not; mostly-direct arm invocations (always direct for idiomatic loops in Hoon) which removes prediction overhead, and bump-allocated and/or register-mapped subjects, which either eliminate memory allocation overhead or make it equivalent to parameter-passing on the stack. Taken together, these approaches should eliminate nearly all overhead in Nock execution and make Urbit performance comparable to native implementations of equivalent programs. (modulo algorithmic mismatches for which jets are still necessary.)

Components: Codegen and 2stackz allocator

NAS-equivalent addressable storage

The current vere is a 32-bit program which addresses a 2, 4, or possibly 8GB "loom" which represents a unified single-level store for both intermediate computational results and persistent state between events. This is implemented by anonymously remapping pages upon writing to them, and then re-writing dirty pages back to the snapshot at regular intervals for durability. The smaller pointer size and the anonymous mapping of "dirty" pages (though primarily the former) limit the size of the loom, necessitating off-Urbit ("off-loom") storage for large objects, especially images and multimedia.

This is replaced by a 64-bit (practically, 47-bit due to limitations of underlying CPU and operating system architectures) arena for persistent data, using a copy-on-write strategy to ensure durability while ensuring all in-use pages can remain mapped to backing disk storage at all times. This retains the single-level store while permitting the underlying OS's virtual memory system to evict pages at will, uncoupling the size of Urbit's data storage from pointer size limitations and available system RAM and swap space. In practice, Urbit can now address many tens of terabytes of locally-stored data

Components: persistent memory arena

Naming

"New Mars" is an excellent skunkworks name but a terrible product name. Perhaps "Ares" is better for a product. Consultation welcome. BIKESHEDDING ENDS HERE

Current status: R&D

Note: Technical risk denotes a low-resolution estimate of the probability that a show-stopping technical problem which returns the project to R&D or makes it demonstrably infeasible emerges from a particular component. Implementation effort is a low-resolution estimation of the developers X effort required to complete and integrate the component into a releasable New Mars/Ares product.

Codegen

Most active work is on code generation, driven by ~ritpub-sipsyl. Subject knowledge analysis, destination-driven linearization, and registerization combine to turn Nock into a nearly zero-cost abstraction. Active implementation effort is focused on the yak-shaving always involved in generating low-level code, and on the problem of propagating subject-knowledge analysis across code-as-state boundaries (arvo-to-vanes, gall-to-agents). Codegen code is currently in Hoon, which requires some bootstrapping story for how codegen code is run and the results supplied to the runtime for execution.

A potential technical risk mitigation is to initially target a bytecode rather than machine code for the lowering of linearized Nock. This eliminates concerns about detecting and managing platform portability and decreases implementation effort and long-tail bugfixes and optimizations significantly.

Technical risk: medium to low

Implementation effort: high

2stackz allocator

The allocator is fully implemented, and requires hardening and benchmarking.

Technical risk: low

Implementation effort: low

Permanent memory arena

This is actively under development, primarily driven by ~finmep-lanteb. The primary innovation is the application to the Urbit runtime of well-understood techniques in database persistence, with pointers instead of semantically higher-level keys as indexes. Open questions no longer implicate feasibility but only the complexity of a successful implementation. The primary open question is one of virtual memory system pressure: can the underlying OS handle the number of virtual memory mappings contained in the table? Or is an on-demand mapping strategy necessary? The latter option is feasible and supported by existing implementation decisions (in particular the use of a B+ tree for the page directory and an always-file-mapped copy-on-write strategy for memory mapping) but represents extra implementation effort.

Technical risk: low

Implementation effort: medium

Non-R&D technical requirements

Jets

One of the hairier yaks to shave to turn New Mars into a product is to rewrite a sufficient set of jets. New Mars' code generation system for Nock should eliminate the need for jets to ameliorate function call and/or looping overheads. Jets will still be needed for algorithmic or numeric hardness, e.g. arithmetic, bit-twiddling, encryption. There are over 350 jets in the current vere, ranging from the absolutely necessary (+add) to the absolutely obsoleted (+turn).

Event logging

The only well-tested approach is logging events to LMDB as the current vere does, so we will initially re-use this approach.

Frontend

We will need to re-implement bootstrapping from a boot sequence (contained in a pill), IPC to communicate with Urth, an event loop which receives events via IPC and dispatches them for logging and processing, and some form of pier import from vere.

Hypothetical storyboard

Task Time P
Jets
Tabulate the jets in current vere 4w Y
Implement jets known absolutely to be necessary ??? Y
Test-and-implement to discover remaining necessary jets 8w N
Codegen
Bytecode generator and interpreter for linearized Nock 4w N
Memory representation of code table keyed by subject knowledge 3w N
SKA propagation through dynamic dispatch (vanes/agents/threads) 12w N
Bootstrapping: compile codegen code to bytecode 8w N
Virtualization and error handling 4w N
2stackz
Harden 3w Y
Integrate with PMA 2w N
PMA
Implement B+ tree page index 5w N
Garbage collection from stack root 4w N
Harden 3w Y
Frontend
Urth IPC protocol 3w N
Bootstrapping from a pill 5w N
Implement an event loop (receive IPC events, log, evaluate) 3w N
Event Log
Re-implement Vere LMDB Event log 4w N
Compatibility
Migration tooling (cue snapshot + metadata) vere->New Mars 2w N
@eamsden
Copy link
Author

eamsden commented Jan 26, 2023

Notes from meeting with ~wicdev-wisryt ~1.26.2023

~rovnys-ricfer's concerns

  • pre-boot validation -- rewrite in rust? probably need to, since it allocates nouns and calls nock; potential alternative: separate into its own process, run old C code, then replace process with new mars
  • initial boot, replay -- rewrite in rust? probably need to since it allocates nouns and calls nock
  • main.c event loop -- replace libuv, rewrite in rust?
  • IPC handling -- rewrite in rust?
  • format and stream slogs to urth -- rewrite in rust?
  • rewrite enough jets to boot
  • need to rewrite lots of noun manipulation functions, which is a grind -- u3x_cell, u3nc, etc.
  • repeated copies of large nouns in the 2stackz allocator might be too slow
  • large nouns that get moved out of 2stackz into the heap might not be structurally shared, which might cause memory duplication if they're edited a lot (I could be wrong about this)
  • platform dependence of delicate mmap-related syscalls could pose difficulties for the PMA (I could be wrong here too)
  • +cue performance needs to be good for initial boot and file mounting, might require nontrivial rewrite in rust or FFI into old C code
  • build system complications from being a polyglot system
  • |meld and |mass need to be written
  • memoization hashtable needs to work, possibly need cache reclamation too

Need to add to estimation:

  • implementation of jet dashboard

@lukestiles
Copy link

This is looking really good. I'd like to see a point on testing. One suggestion is to treat the event log like a database - have both receive input and write event logs, compare them, and fail New Mars when it's different. This could even be set up at the beginning and be part of CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment