VictorTaelin/gist:dfbd1f1d0a6747f7579cbd29787161fe Secret

## gistfile1.txt
// HVM2

// README.md

# HVM-Core: a parallel Interaction Combinator evaluator

HVM-Core is a parallel evaluator for extended [Symmetric Interaction Combinators](https://www-lipn.univ-paris13.fr/~mazza/papers/CombSem-MSCS.pdf).

We provide a raw syntax for specifying nets and a Rust implementation that
achieves up to **10 billion rewrites per second** on Apple M3 Max CPU. HVM's
optimal evaluation semantics and concurrent model of computation make it a great
compile target for high-level languages seeking massive parallelism.

HVM-Core will be used as the compile target for
[HVM](https://github.com/higherorderco/hvm) on its upcoming update.

## Usage

Install HVM-Core as:

```
cargo install hvm-core
```

Then, run the interpeter as:

```
hvmc run file.hvmc -s
```

Or compile it to a faster executable:

```
hvmc compile file.hvmc
./file
```

Both versions will compute the program's normal form using all available cores.

## Example

HVMC is a low-level compile target for high-level languages. It provides a raw
syntax for wiring interaction nets. For example:

```javascript
@add = (<+ a b> (a b))

@sum = (?<(#1 @sumS) a> a)

@sumS = ({2 a b} c)
  & @add ~ (e (d c))
  & @sum ~ (a d)
  & @sum ~ (b e)

@main = a
  & @sum ~ (#24 a)
```

The file above implements a recursive sum. As you can see, its syntax isn't
meant to be very human readable. Fortunatelly, we have
[HVM-Lang](https://github.com/HigherOrderCO/hvm-lang), a tool that generates
`.hvmc` files from a familiar functional syntax. On HVM-Lang, you can write
instead:

```javascript
add = λa λb (+ a b)

sum = λn match n {
  0   : 1
  1+p : (add (sum p) (sum p))
}

main = (sum 24)
```

Which compiles to the first program via `hvml compile main.hvm`. For more
examples, see the [`/examples`](/examples) directory. If you do want to
understand the hardcore syntax, keep reading.

## Language

HVM-Core's textual syntax represents interaction combinators via an AST:

```
<TERM> ::=
  <ERA> ::= "*"
  <CON> ::= "(" <TERM> " " <TERM> ")"
  <TUP> ::= "[" <TERM> " " <TERM> "]"
  <DUP> ::= "{" <label> " " <TERM> " " <TERM> "}"
  <REF> ::= "@" <name>
  <U60> ::= "#" <value>
  <OP2> ::= "<" <op> " " <TERM> " " <TERM> ">"
  <MAT> ::= "?" "<" <TERM> " " <TERM> ">"
  <VAR> ::= <name>

<NET> ::=
  <ROOT> ::= <TERM>
  <RDEX> ::= "&" <TERM> "~" <TERM> <NET>

<BOOK> ::=
  <DEF> ::= "@" <name> "=" <NET> <BOOK>
  <END> ::= <EOF>
```

As you can see, HVMC extends the original system with some performance-relevant
features, including top-level definitions (closed nets), unboxed 60-bit machine
integers, numeric operations and numeric pattern-matching.

- `ERA`: an eraser node, as defined on the original paper.

- `CON`: a constructor node, as defined on the original paper.

- `TUP`: a tuple node. Has the same behavior of `CON`.

- `DUP`: a duplicator, or fan node, as defined on the original paper.
  Additionally, it can include a label. Dups with different labels will commute.
  This allows for increased expressivity (nested loops).

- `VAR`: a named variable, used to create a wire. Each name must occur twice,
  denoting both endpoints of a wire.

- `REF`: a reference to a top-level definition, which is itself a closed net.
  That reference is unrolled lazily, allowing for recursive functions to be
  implemented without the need for Church numerals and the like.

- `U60`: an unboxed 60-bit unsigned integer.

- `OP2`: a binary operation on u60 operands.

- `MAT`: a pattern-matching operator on u60 values.

Note that terms form a tree-like structure. Yet, interaction combinators are not
trees, but graphs; terms aren't enough to express all possible nets. To fix
that, we provide the `& <TERM> ~ <TERM>` syntax, which connects the top-most
main port of each tree. This allows us to build any closed nets with a single
free wire. For example, to build the closed net:

```
R       .............
:       :           :
:      /_\         /_\
:    ..: :..     ..: :..
:    :     :     :     :
:   /_\   /_\   /_\   /_\
:...: :   : :...: :   :.:
      *   :.......:
```

We could use the following syntax:

```
@main
  = R
  & ((R *) (x y))
  ~ ((y x) (z z))
```

Here, `@main` is the name of the closed net, `R` is used to denote its single
free wire, each CON node is denoted as `(x y)`, and the ERA node is represented
as a `*`. The wires from an aux port to a main port are denoted by the tree
hierarchy, the wires between aux ports are denoted by named variables, and the
single wire between main ports is denoted by the `& A ~ B` syntax. Note this
always represents an active pair (or redex)!

## CPU Evaluator

HVMC's main evaluator is a Rust package that runs on the CPU, although GPU
versions are in development (see below). It is completely eager, which means it
will reduce *every* generated active pair (redex) in an ultra-greedy,
massively-parallel fashion.

The evaluator works by keeping a vector of current active pairs (redexes) and,
for each redex in parallel, performing local "interaction", or "graph rewrite",
as described below. To distribute work, a simple task stealing queue is used.

Note that, due to HVM's ultra-strict evaluator, languages targeting it should
convert top-level definitions to
[supercombinators](https://en.wikipedia.org/wiki/Supercombinator), which enables
recursive definitions halt. HVM-Lang performs this transformation before
converting to HVM-Core.

## GPU Evaluator

The GPU evaluator is similar to the CPU one, except two main differences: "1/4"
rewrites and a task-sharing grid.  For example, on NVidia's RTX 4090, we keep a
grid of 128x128 redex bags, where each bag contains redexes to be processed by a
"squad", which consists of 4 threads, each one performing "1/4" of the rewrite,
which increases the granularity. This allows us to keep `16,384` active sqxuads,
for a total of `65,536` active threads, which means the maximum degree of
parallelism (65k) is achieved at just 16k redexes. Visually:

```
REDEX ::= (Ptr32, Ptr32)

    A1 --|\        /|-- B2
         | |A0--B0| |
    A2 --|/        \|-- B1

REDEX_BAG ::= Vec<REDEX>

    [(A0,B0), (C0,D0), (E0,F0), ...]

REDEX_GRID ::= Matrix<128, 128 REDEX_BAG>

    [ ] -- [ ] -- [ ] -- [ ] -- [ ] -- [ ] -- [ ] -- [ ] ...
     |      |      |      |      |      |      |      |
    [ ] -- [ ] -- [ ] -- [ ] -- [ ] -- [ ] -- [ ] -- [ ] ...
     |      |      |      |      |      |      |      |
    [ ] -- [ ] -- [ ] -- [ ] -- [ ] -- [ ] -- [ ] -- [ ] ...
     |      |      |      |      |      |      |      |
    [ ] -- [ ] -- [ ] -- [ ] -- [ ] -- [ ] -- [ ] -- [ ] ...
     |      |      |      |      |      |      |      |
    [ ] -- [ ] -- [ ] -- [ ] -- [ ] -- [ ] -- [ ] -- [ ] ...
     |      |      |      |      |      |      |      |
    [ ] -- [ ] -- [ ] -- [ ] -- [ ] -- [ ] -- [ ] -- [ ] ...
     |      |      |      |      |      |      |      |
    [ ] -- [ ] -- [ ] -- [ ] -- [ ] -- [ ] -- [ ] -- [ ] ...
     |      |      |      |      |      |      |      |
    [ ] -- [ ] -- [ ] -- [ ] -- [ ] -- [ ] -- [ ] -- [ ] ...
     |      |      |      |      |      |      |      |
    ...    ...    ...    ...    ...    ...    ...    ...

SQUAD ::= Vec<4, THREAD>

THREAD ::=

    loop {
      if (A, B) = pop_redex():
        atomic_rewrite(A, B)
      share_redexes()
    }

    atomic_rewrite(A, B):
      ... match redex type ...
      ... perform atomic links ...

    atomic_link(a, b):
      ... see algorith on 'paper/' ...

RESULT:

    - thread 0 links A1 -> B1
    - thread 1 links B1 -> A1
    - thread 2 links A2 -> B2
    - thread 3 links B2 -> A2

OUTPUT:

    A1 <--------------> B1
    A2 <--------------> B2
```

## Interactions

The available interactions are the same described on the reference paper, namely
annihilation and commutation rules, plus a few additional interactions,
including primitive numeric operations, conditionals, and dereference. The core
interactions are:

```
()---()
~~~~~~~ ERA-ERA
nothing
```

```
A1 --|\
     |a|-- ()
A2 --|/
~~~~~~~~~~~~~ CTR-ERA
A1 ------- ()
A2 ------- ()
```

```
A1 --|\     /|-- B2
     |a|---|b|
A2 --|/     \|-- B1
~~~~~~~~~~~~~~~~~~~ CTR-CTR (if a ~~ b)
A1 -----, ,----- B2
         X
A2 -----' '----- B1
```

```
A1 --|\         /|-- B2
     |a|-------|b|
A2 --|/         \|-- B1
~~~~~~~~~~~~~~~~~~~~~~~ CTR-CTR (if a !~ b)
      /|-------|\
A1 --|b|       |a|-- B2
      \|--, ,--|/
           X
      /|--' '--|\
A2 --|b|       |a|-- B1
      \|-------|/
```

The dereference interactions happen when a @REF node interacts with another
node. When that node is a constructor, the dereference will be unrolled
efficiently. This makes HVM practical, because, without it, top-level
definitions would need to be implemented with DUP nodes. This would cause
considerable overhead when trying to implement functions, due to DUP nodes
incremental copying nature. When the other node is anything else, that implies
two closed nets got disconnected from the main graph, so both nodes are
collected, allowing recursive functions to halt without infinite expansions.

```
() -- @REF
~~~~~~~~~~ ERA-REF
nothing
```

```
A1 --|\
     | |-- @REF
A2 --|/
~~~~~~~~~~~~~~~~ CTR-REF
A1 --|\
     | |-- {val}
A2 --|/
```

Since Interaction Combinator nodes only have 1 active port, which is a property
that is essential for key computational characteristics such as strong
confluence, we can't have a binary numeric operation node. Instead, we split
numeric operations in two nodes: OP2, which processes the first operand and
returns an OP1 node, which then processes the second operand, performs the
computation, and connects the result to the return wire.

```
A1 --,
     [}-- #X
A2 --'
~~~~~~~~~~~~~~ OP2-NUM
A2 --[#X}-- A1
```

```
A1 --[#X}-- #Y
~~~~~~~~~~~~~~ OP1-NUM
A1 -- #Z
```

Note that the OP2 operator doesn't store the operation type. Instead, it is
stored on 4 unused bits of the left operand. As such, an additional operation
called "load-op-type" is used to load the next operation on the left operand.
See the `/examples` directory for more info. Below is a table with all available
operations:

sym | name
--- | ---------
`+` | addition
`-` | subtraction
`*` | multiplication
`/` | division
`%` | modulus
`==`| equal-to
`!=`| not-equal-to
`<` | less-than
`>` | greater-than
`<=`| less-than-or-equal
`>=`| greater-than-or-equal
`&` | bitwise-and
`\|`| bitwise-or
`^` | bitwise-xor
`~` | bitwise-not
`<<`| left-shift
`>>`| right-shift

Since HVM already provides plenty of solutions for branching (global references,
lambda encoded booleans and pattern-matching, etc.), the pattern-match operation
is only necessary to read bits from numbers: otherwise, numbers would be "black
boxes" that can't interact with the rest of the program. The way it works is
simple: it receives a number, two branches (case-zero and case-succ, stored in a
CON node) and a return wire. If the number is 0, it erases the case-succ branch
and returns the case-zero branch. Otherwise, it erases the case-zero branch and
returns the case-succ branch applied to the predecessor of the number.

```
A1 --,
     (?)-- #X
A2 --'
~~~~~~~~~~~~~~~~~~ MAT-NUM (#X > 0)
           /|-- A2
      /|--| |
A1 --| |   \|-- #(X-1)
      \|-- ()
```

```
A1 --,
     (?)-- #X
A2 --'
~~~~~~~~~~~~~~~~~~ MAT-NUM (#X == 0)
      /|-- ()
A1 --| |
      \|-- A2
```

Note that some interactions like NUM-ERA are omitted, but should logically
follow from the ones described above.

## Memory Layout

The memory layout is optimized for efficiency. Conceptually, it equals:

```rust
// A pointer is a 64-bit word
type Ptr = u64;

// A node stores its two aux ports
struct Node {
  p1: Ptr, // this node's fst aux port
  p2: Ptr, // this node's snd aux port
}

// A redex links two main ports
struct Redex {
  a: Ptr, // main port of node A
  b: Ptr, // main port of node B
}

// A closed net
struct Net {
  root: Ptr,       // a free wire
  rdex: Vec<Redex> // a vector of redexes
  heap: Vec<Node>  // a vector of nodes
}
```

As you can see, the memory layout resembles the textual syntax, with nets being
represented as a vector of trees, with the 'redex' buffer storing the tree roots
(as active pairs), and the 'nodes' buffer storing all the nodes. Each node has
two 32-bit pointers and, thus, uses exactly 64 bits. Pointers include a 4-bit
tag, a 28-bit label (used for DUP colors, OP2 operators) and a 32-bit addr,
which allows addressing a 2 GB space per instance. There are 12 pointer types:

```rust
VR1: Tag = 0x0; // Variable to aux port 1
VR2: Tag = 0x1; // Variable to aux port 2
RD1: Tag = 0x2; // Redirect to aux port 1
RD2: Tag = 0x3; // Redirect to aux port 2
REF: Tag = 0x4; // Lazy closed net
ERA: Tag = 0x5; // Unboxed eraser
NUM: Tag = 0x6; // Unboxed number
OP2: Tag = 0x7; // Binary numeric operation
OP1: Tag = 0x8; // Unary numeric operation
MAT: Tag = 0x9; // Numeric pattern-matching
LAM: Tag = 0xA; // Main port of lam node
TUP: Tag = 0xB; // Main port of tup node
DUP: Tag = 0xC; // Main port of dup node
```

This memory-efficient format allows for a fast implementation in many
situations; for example, an interaction combinator annihilation can be performed
with just 2 atomic CAS.

Note that LAM, TUP and DUP nodes are identical: they are interaction combinator
nodes, and they annihilate/commute based on their labels being identical. The
distinction is made for better printing, but isn't used internally.

We also provide unboxed 60-bit unsigned integers, which allows HVMC to store raw
data with minimal loss. For example, to store a raw 3.75 KB buffer, one could
use a perfect binary tree of CON nodes with a depth of 8, as follows:

```javascript
@buff = (((((((((X0 X1) (X2 X3)) ((X4 X5) (X6 X7))) ...)))))))
```

This would use a total of 511 nodes, which takes a space of almost exactly 8 KB
on HVMC. As such, while buffers are not part of the spec, we can store raw data
with a ~46% efficiency using interaction-net-based trees. This structure isn't
as compact as arrays, but it allows us access and transform data in parallel,
which is a great tradeoff in practice.

## Lock-free Algorithm

At the heart of HVM-Core's massively parallel evaluator lies a lock-free
algorithm that allows performing interaction combinator rewrites in a concurrent
environment with minimal contention and completely avoiding locks and backoffs.
To understand the difference, see the images below:

### Before: using locks

In a lock-based interaction net evaluator, threads must lock the entire
surrounding region of an active pair (redex) before reducing it. That is a
source of contention and can results in backoffs, which completely prevents a
thread from making progress, reducing performance.

![Naive lock-based evaluator](paper/images/lock_based_evaluator.png)

### After: lock-free

The lock-free approach works by attempting to perform the link with a single
compare-and-swap. When it succeeds, nothing else needs to be done. When it
fails, we place redirection wires that semantically complete the rewrite without
any interaction. Then, when a main port is connected to a redirection wire, it
traverses and consumes the entire path, eaching its target location. If it is an
auxiliary port, we store with a cas, essentially moving a node to another
thread. If it is a main port, we create a new redex, which can then be reduced
in parallel. This essentially results in an "implicit ownership" scheme that
allows threads to collaborate in a surgically precise contention-avoiding dance.

![HVM-Core's lock-free evaluator](paper/images/lock_free_evaluator.png)

For more information, check the [paper/draft.pdf](paper/draft.pdf).

## Contributing

To verify if there's no performance regression:

```bash
git checkout main
cargo bench -- --save-baseline main # save the unchanged code as "main"
git checkout <your-branch>
cargo bench -- --baseline main      # compare your changes with the "main" branch
```

To verify if there's no correctness regression, run `cargo test`. You can add
`cargo-insta` with `cargo install cargo-insta` and run `cargo insta test` to see
if all test cases pass, if some don't and they need to be changed, you can run
`cargo insta review` to review the snapshots and correct them if necessary.

## Community

HVM-Core is part of [Higher Order Company](https://HigherOrderCO.com/)'s efforts
to harness massively parallelism. Join our [Discord](https://discord.HigherOrderCO.com/)!

// ast.rs

// An interaction combinator language
// ----------------------------------
// This file implements a textual syntax to interact with the runtime. It includes a pure AST for
// nets, as well as functions for parsing, stringifying, and converting pure ASTs to runtime nets.
// On the runtime, a net is represented by a list of active trees, plus a root tree. The textual
// syntax reflects this representation. The grammar is specified on this repo's README.

use crate::run;
use std::collections::BTreeMap;
use std::collections::HashMap;
use std::collections::HashSet;
use std::iter::Peekable;
use std::str::Chars;

// AST
// ---

#[derive(Clone, Hash, PartialEq, Eq, Debug)]
pub enum Tree {
  Era,
  Con { lft: Box<Tree>, rgt: Box<Tree> },
  Tup { lft: Box<Tree>, rgt: Box<Tree> },
  Dup { lab: run::Lab, lft: Box<Tree>, rgt: Box<Tree> },
  Var { nam: String },
  Ref { nam: run::Val },
  Num { val: run::Val },
  Op1 { opr: run::Lab, lft: run::Val, rgt: Box<Tree> },
  Op2 { opr: run::Lab, lft: Box<Tree>, rgt: Box<Tree> },
  Mat { sel: Box<Tree>, ret: Box<Tree> },
}

type Redex = Vec<(Tree, Tree)>;

#[derive(Clone, Hash, PartialEq, Eq, Debug)]
pub struct Net {
  pub root: Tree,
  pub rdex: Redex,
}

pub type Book = BTreeMap<String, Net>;

// Parser
// ------

// FIXME: remove after skip is fixed
fn skip_spaces(chars: &mut Peekable<Chars>) {
  while let Some(c) = chars.peek() {
    if !c.is_ascii_whitespace() {
      break;
    } else {
      chars.next();
    }
  }
}

// FIXME: detect two '/' for comments, allowing us to remove 'skip_spaces'
fn skip(chars: &mut Peekable<Chars>) {
  while let Some(c) = chars.peek() {
    if *c == '/' {
      chars.next();
      while let Some(c) = chars.peek() {
        if *c == '\n' {
          break;
        }
        chars.next();
      }
    } else if !c.is_ascii_whitespace() {
      break;
    } else {
      chars.next();
    }
  }
}

pub fn consume(chars: &mut Peekable<Chars>, text: &str) -> Result<(), String> {
  skip(chars);
  for c in text.chars() {
    if chars.next() != Some(c) {
      return Err(format!("Expected '{}', found {:?}", text, chars.peek()));
    }
  }
  return Ok(());
}

pub fn parse_decimal(chars: &mut Peekable<Chars>) -> Result<u64, String> {
  let mut num: u64 = 0;
  skip(chars);
  if !chars.peek().map_or(false, |c| c.is_digit(10)) {
    return Err(format!("Expected a decimal number, found {:?}", chars.peek()));
  }
  while let Some(c) = chars.peek() {
    if !c.is_digit(10) {
      break;
    }
    num = num * 10 + c.to_digit(10).unwrap() as u64;
    chars.next();
  }
  Ok(num)
}

pub fn parse_name(chars: &mut Peekable<Chars>) -> Result<String, String> {
  let mut txt = String::new();
  skip(chars);
  if !chars.peek().map_or(false, |c| c.is_alphanumeric() || *c == '_' || *c == '.') {
    return Err(format!("Expected a name character, found {:?}", chars.peek()))
  }
  while let Some(c) = chars.peek() {
    if !c.is_alphanumeric() && *c != '_' && *c != '.' {
      break;
    }
    txt.push(*c);
    chars.next();
  }
  Ok(txt)
}

pub fn parse_opx_lit(chars: &mut Peekable<Chars>) -> Result<String, String> {
  let mut opx = String::new();
  skip_spaces(chars);
  while let Some(c) = chars.peek() {
    if !"+-=*/%<>|&^!?".contains(*c) {
      break;
    }
    opx.push(*c);
    chars.next();
  }
  Ok(opx)
}

fn parse_opr(chars: &mut Peekable<Chars>) -> Result<run::Lab, String> {
  let opx = parse_opx_lit(chars)?;
  match opx.as_str() {
    "+"  => Ok(run::ADD),
    "-"  => Ok(run::SUB),
    "*"  => Ok(run::MUL),
    "/"  => Ok(run::DIV),
    "%"  => Ok(run::MOD),
    "==" => Ok(run::EQ),
    "!=" => Ok(run::NE),
    "<"  => Ok(run::LT),
    ">"  => Ok(run::GT),
    "<=" => Ok(run::LTE),
    ">=" => Ok(run::GTE),
    "&&" => Ok(run::AND),
    "||" => Ok(run::OR),
    "^"  => Ok(run::XOR),
    "!"  => Ok(run::NOT),
    "<<" => Ok(run::LSH),
    ">>" => Ok(run::RSH),
    _ => Err(format!("Unknown operator: {}", opx)),
  }
}

pub fn parse_tree(chars: &mut Peekable<Chars>) -> Result<Tree, String> {
  skip(chars);
  match chars.peek() {
    Some('*') => {
      chars.next();
      Ok(Tree::Era)
    }
    Some('(') => {
      chars.next();
      let lft = Box::new(parse_tree(chars)?);
      let rgt = Box::new(parse_tree(chars)?);
      consume(chars, ")")?;
      Ok(Tree::Con { lft, rgt })
    }
    Some('[') => {
      chars.next();
      let lab = 1;
      let lft = Box::new(parse_tree(chars)?);
      let rgt = Box::new(parse_tree(chars)?);
      consume(chars, "]")?;
      Ok(Tree::Tup { lft, rgt })
    }
    Some('{') => {
      chars.next();
      let lab = parse_decimal(chars)? as run::Lab;
      let lft = Box::new(parse_tree(chars)?);
      let rgt = Box::new(parse_tree(chars)?);
      consume(chars, "}")?;
      Ok(Tree::Dup { lab, lft, rgt })
    }
    Some('@') => {
      chars.next();
      skip(chars);
      let name = parse_name(chars)?;
      Ok(Tree::Ref { nam: name_to_val(&name) })
    }
    Some('#') => {
      chars.next();
      Ok(Tree::Num { val: parse_decimal(chars)? })
    }
    Some('<') => {
      chars.next();
      if chars.peek().map_or(false, |c| c.is_digit(10)) {
        let lft = parse_decimal(chars)?;
        let opr = parse_opr(chars)?;
        let rgt = Box::new(parse_tree(chars)?);
        consume(chars, ">")?;
        Ok(Tree::Op1 { opr, lft, rgt })
      } else {
        let opr = parse_opr(chars)?;
        let lft = Box::new(parse_tree(chars)?);
        let rgt = Box::new(parse_tree(chars)?);
        consume(chars, ">")?;
        Ok(Tree::Op2 { opr, lft, rgt })
      }
    }
    Some('?') => {
      chars.next();
      consume(chars, "<")?;
      let sel = Box::new(parse_tree(chars)?);
      let ret = Box::new(parse_tree(chars)?);
      consume(chars, ">")?;
      Ok(Tree::Mat { sel, ret })
    }
    _ => {
      Ok(Tree::Var { nam: parse_name(chars)? })
    },
  }
}

pub fn parse_net(chars: &mut Peekable<Chars>) -> Result<Net, String> {
  let mut rdex = Vec::new();
  let root = parse_tree(chars)?;
  while let Some(c) = { skip(chars); chars.peek() } {
    if *c == '&' {
      chars.next();
      let tree1 = parse_tree(chars)?;
      consume(chars, "~")?;
      let tree2 = parse_tree(chars)?;
      rdex.push((tree1, tree2));
    } else {
      break;
    }
  }
  Ok(Net { root, rdex })
}

pub fn parse_book(chars: &mut Peekable<Chars>) -> Result<Book, String> {
  let mut book = BTreeMap::new();
  while let Some(c) = { skip(chars); chars.peek() } {
    if *c == '@' {
      chars.next();
      let name = parse_name(chars)?;
      consume(chars, "=")?;
      let net = parse_net(chars)?;
      book.insert(name, net);
    } else {
      break;
    }
  }
  Ok(book)
}

fn do_parse<T>(code: &str, parse_fn: impl Fn(&mut Peekable<Chars>) -> Result<T, String>) -> T {
  let chars = &mut code.chars().peekable();
  match parse_fn(chars) {
    Ok(result) => {
      if chars.next().is_none() {
        result
      } else {
        eprintln!("Unable to parse the whole input. Is this not an hvmc file?");
        std::process::exit(1);
      }
    }
    Err(err) => {
      eprintln!("{}", err);
      std::process::exit(1);
    }
  }
}

pub fn do_parse_tree(code: &str) -> Tree {
  do_parse(code, parse_tree)
}

pub fn do_parse_net(code: &str) -> Net {
  do_parse(code, parse_net)
}

pub fn do_parse_book(code: &str) -> Book {
  do_parse(code, parse_book)
}

// Stringifier
// -----------

pub fn show_opr(opr: run::Lab) -> String {
  match opr {
    run::ADD => "+".to_string(),
    run::SUB => "-".to_string(),
    run::MUL => "*".to_string(),
    run::DIV => "/".to_string(),
    run::MOD => "%".to_string(),
    run::EQ  => "==".to_string(),
    run::NE  => "!=".to_string(),
    run::LT  => "<".to_string(),
    run::GT  => ">".to_string(),
    run::LTE => "<=".to_string(),
    run::GTE => ">=".to_string(),
    run::AND => "&&".to_string(),
    run::OR  => "||".to_string(),
    run::XOR => "^".to_string(),
    run::NOT => "!".to_string(),
    run::LSH => "<<".to_string(),
    run::RSH => ">>".to_string(),
    _        => panic!("Unknown operator label."),
  }
}

pub fn show_tree(tree: &Tree) -> String {
  match tree {
    Tree::Era => {
      "*".to_string()
    }
    Tree::Con { lft, rgt } => {
      format!("({} {})", show_tree(&*lft), show_tree(&*rgt))
    }
    Tree::Tup { lft, rgt } => {
      format!("[{} {}]", show_tree(&*lft), show_tree(&*rgt))
    }
    Tree::Dup { lab, lft, rgt } => {
      format!("{{{} {} {}}}", lab, show_tree(&*lft), show_tree(&*rgt))
    }
    Tree::Var { nam } => {
      nam.clone()
    }
    Tree::Ref { nam } => {
      format!("@{}", val_to_name(*nam))
    }
    Tree::Num { val } => {
      format!("#{}", (*val).to_string())
    }
    Tree::Op1 { opr, lft, rgt } => {
      format!("<{}{} {}>", lft, show_opr(*opr), show_tree(rgt))
    }
    Tree::Op2 { opr, lft, rgt } => {
      format!("<{} {} {}>", show_opr(*opr), show_tree(&*lft), show_tree(&*rgt))
    }
    Tree::Mat { sel, ret } => {
      format!("?<{} {}>", show_tree(&*sel), show_tree(&*ret))
    }
  }
}

pub fn show_net(net: &Net) -> String {
  let mut result = String::new();
  result.push_str(&format!("{}", show_tree(&net.root)));
  for (a, b) in &net.rdex {
    result.push_str(&format!("\n& {} ~ {}", show_tree(a), show_tree(b)));
  }
  return result;
}

pub fn show_book(book: &Book) -> String {
  let mut result = String::new();
  for (name, net) in book {
    result.push_str(&format!("@{} = {}\n", name, show_net(net)));
  }
  return result;
}

pub fn show_runtime_tree<const LAZY: bool>(rt_net: &run::NetFields<LAZY>, ptr: run::Ptr) -> String where [(); LAZY as usize]:{
  show_tree(&tree_from_runtime_go(rt_net, ptr, PARENT_ROOT, &mut HashMap::new(), &mut 0))
}

pub fn show_runtime_net<const LAZY: bool>(rt_net: &run::NetFields<LAZY>) -> String where [(); LAZY as usize]:{
  show_net(&net_from_runtime(rt_net))
}

pub fn show_runtime_book(book: &run::Book) -> String {
  show_book(&book_from_runtime(book))
}

// Conversion
// ----------

pub fn num_to_str(mut num: usize) -> String {
  let mut txt = String::new();
  num += 1;
  while num > 0 {
    num -= 1;
    let c = ((num % 26) as u8 + b'a') as char;
    txt.push(c);
    num /= 26;
  }
  return txt.chars().rev().collect();
}

pub const fn tag_to_port(tag: run::Tag) -> run::Port {
  match tag {
    run::VR1 => run::P1,
    run::VR2 => run::P2,
    _        => unreachable!(),
  }
}

pub fn port_to_tag(port: run::Port) -> run::Tag {
  match port {
    run::P1 => run::VR1,
    run::P2 => run::VR2,
    _       => unreachable!(),
  }
}

pub fn name_to_letters(name: &str) -> Vec<u8> {
  let mut letters = Vec::new();
  for c in name.chars() {
    letters.push(match c {
      '0'..='9' => c as u8 - '0' as u8 + 0,
      'A'..='Z' => c as u8 - 'A' as u8 + 10,
      'a'..='z' => c as u8 - 'a' as u8 + 36,
      '_'       => 62,
      '.'       => 63,
      _         => panic!("Invalid character in name"),
    });
  }
  return letters;
}

pub fn letters_to_name(letters: Vec<u8>) -> String {
  let mut name = String::new();
  for letter in letters {
    name.push(match letter {
       0..= 9 => (letter - 0 + '0' as u8) as char,
      10..=35 => (letter - 10 + 'A' as u8) as char,
      36..=61 => (letter - 36 + 'a' as u8) as char,
      62      => '_',
      63      => '.',
      _       => panic!("Invalid letter in name"),
    });
  }
  return name;
}

pub fn val_to_letters(num: run::Val) -> Vec<u8> {
  let mut letters = Vec::new();
  let mut num = num;
  while num > 0 {
    letters.push((num % 64) as u8);
    num /= 64;
  }
  letters.reverse();
  return letters;
}

pub fn letters_to_val(letters: Vec<u8>) -> run::Val {
  let mut num = 0;
  for letter in letters {
    num = num * 64 + letter as run::Val;
  }
  return num;
}

pub fn name_to_val(name: &str) -> run::Val {
  letters_to_val(name_to_letters(name))
}

pub fn val_to_name(num: run::Val) -> String {
  letters_to_name(val_to_letters(num))
}

// Injection and Readback
// ----------------------

// To runtime

#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum Parent {
  Redex,
  Node { loc: run::Loc, port: run::Port },
}
const PARENT_ROOT: Parent = Parent::Node { loc: run::ROOT.loc(), port: tag_to_port(run::ROOT.tag()) };

pub fn tree_to_runtime_go<const LAZY: bool>(rt_net: &mut run::NetFields<LAZY>, tree: &Tree, vars: &mut HashMap<String, Parent>, parent: Parent) -> run::Ptr where [(); LAZY as usize]: {
  match tree {
    Tree::Era => {
      run::ERAS
    }
    Tree::Con { lft, rgt } => {
      let loc = rt_net.alloc();
      let p1 = tree_to_runtime_go(rt_net, &*lft, vars, Parent::Node { loc, port: run::P1 });
      rt_net.heap.set(loc, run::P1, p1);
      let p2 = tree_to_runtime_go(rt_net, &*rgt, vars, Parent::Node { loc, port: run::P2 });
      rt_net.heap.set(loc, run::P2, p2);
      run::Ptr::new(run::LAM, 0, loc)
    }
    Tree::Tup { lft, rgt } => {
      let loc = rt_net.alloc();
      let p1 = tree_to_runtime_go(rt_net, &*lft, vars, Parent::Node { loc, port: run::P1 });
      rt_net.heap.set(loc, run::P1, p1);
      let p2 = tree_to_runtime_go(rt_net, &*rgt, vars, Parent::Node { loc, port: run::P2 });
      rt_net.heap.set(loc, run::P2, p2);
      run::Ptr::new(run::TUP, 1, loc)
    }
    Tree::Dup { lab, lft, rgt } => {
      let loc = rt_net.alloc();
      let p1 = tree_to_runtime_go(rt_net, &*lft, vars, Parent::Node { loc, port: run::P1 });
      rt_net.heap.set(loc, run::P1, p1);
      let p2 = tree_to_runtime_go(rt_net, &*rgt, vars, Parent::Node { loc, port: run::P2 });
      rt_net.heap.set(loc, run::P2, p2);
      run::Ptr::new(run::DUP, *lab, loc)
    }
    Tree::Var { nam } => {
      if let Parent::Redex = parent {
        panic!("By definition, can't have variable on active pairs.");
      };
      match vars.get(nam) {
        Some(Parent::Redex) => {
          unreachable!();
        }
        Some(Parent::Node { loc: other_loc, port: other_port }) => {
          match parent {
            Parent::Redex => { unreachable!(); }
            Parent::Node { loc, port } => rt_net.heap.set(*other_loc, *other_port, run::Ptr::new(port_to_tag(port), 0, loc)),
          }
          return run::Ptr::new(port_to_tag(*other_port), 0, *other_loc);
        }
        None => {
          vars.insert(nam.clone(), parent);
          run::NULL
        }
      }
    }
    Tree::Ref { nam } => {
      run::Ptr::big(run::REF, *nam)
    }
    Tree::Num { val } => {
      run::Ptr::big(run::NUM, *val)
    }
    Tree::Op1 { opr, lft, rgt } => {
      let loc = rt_net.alloc();
      let p1 = run::Ptr::big(run::NUM, *lft);
      rt_net.heap.set(loc, run::P1, p1);
      let p2 = tree_to_runtime_go(rt_net, rgt, vars, Parent::Node { loc, port: run::P2 });
      rt_net.heap.set(loc, run::P2, p2);
      run::Ptr::new(run::OP1, *opr, loc)
    }
    Tree::Op2 { opr, lft, rgt } => {
      let loc = rt_net.alloc();
      let p1 = tree_to_runtime_go(rt_net, &*lft, vars, Parent::Node { loc, port: run::P1 });
      rt_net.heap.set(loc, run::P1, p1);
      let p2 = tree_to_runtime_go(rt_net, &*rgt, vars, Parent::Node { loc, port: run::P2 });
      rt_net.heap.set(loc, run::P2, p2);
      run::Ptr::new(run::OP2, *opr, loc)
    }
    Tree::Mat { sel, ret } => {
      let loc = rt_net.alloc();
      let p1 = tree_to_runtime_go(rt_net, &*sel, vars, Parent::Node { loc, port: run::P1 });
      rt_net.heap.set(loc, run::P1, p1);
      let p2 = tree_to_runtime_go(rt_net, &*ret, vars, Parent::Node { loc, port: run::P2 });
      rt_net.heap.set(loc, run::P2, p2);
      run::Ptr::new(run::MAT, 0, loc)
    }
  }
}

pub fn tree_to_runtime<const LAZY: bool>(rt_net: &mut run::NetFields<LAZY>, tree: &Tree) -> run::Ptr where [(); LAZY as usize]: {
  tree_to_runtime_go(rt_net, tree, &mut HashMap::new(), PARENT_ROOT)
}

pub fn net_to_runtime<const LAZY: bool>(rt_net: &mut run::NetFields<LAZY>, net: &Net) where [(); LAZY as usize]: {
  let mut vars = HashMap::new();
  let root = tree_to_runtime_go(rt_net, &net.root, &mut vars, PARENT_ROOT);
  rt_net.heap.set_root(root);
  for (tree1, tree2) in &net.rdex {
    let ptr1 = tree_to_runtime_go(rt_net, tree1, &mut vars, Parent::Redex);
    let ptr2 = tree_to_runtime_go(rt_net, tree2, &mut vars, Parent::Redex);
    rt_net.rdex.push((ptr1, ptr2));
  }
}

// Holds dup labels and ref ids used by a definition
type InsideLabs = HashSet<run::Lab, nohash_hasher::BuildNoHashHasher<run::Lab>>;
type InsideRefs = HashSet<run::Val>;
#[derive(Debug)]
pub struct Inside {
  labs: InsideLabs,
  refs: InsideRefs,
}

// Collects dup labels and ref ids used by a definition
pub fn runtime_def_get_inside(def: &run::Def) -> Inside {
  let mut inside = Inside {
    labs: HashSet::with_hasher(std::hash::BuildHasherDefault::default()),
    refs: HashSet::new(),
  };
  fn register(inside: &mut Inside, ptr: run::Ptr) {
    if ptr.is_dup() {
      inside.labs.insert(ptr.lab());
    }
    if ptr.is_ref() {
      inside.refs.insert(ptr.val());
    }
  }
  for i in 0 .. def.node.len() {
    register(&mut inside, def.node[i].1);
    register(&mut inside, def.node[i].2);
  }
  for i in 0 .. def.rdex.len() {
    register(&mut inside, def.rdex[i].0);
    register(&mut inside, def.rdex[i].1);
  }
  return inside;
}

// Computes all dup labels used by a definition, direct or not.
// FIXME: memoize to avoid duplicated work
pub fn runtime_def_get_all_labs(labs: &mut InsideLabs, insides: &HashMap<run::Val, Inside>, fid: run::Val, seen: &mut HashSet<run::Val>) {
  if seen.contains(&fid) {
    return;
  } else {
    seen.insert(fid);
    if let Some(fid_insides) = insides.get(&fid) {
      for dup in &fid_insides.labs {
        labs.insert(*dup);
      }
      for child_fid in &fid_insides.refs {
        runtime_def_get_all_labs(labs, insides, *child_fid, seen);
      }
    }
  }
}

// Converts a book from the pure AST representation to the runtime representation.
pub fn book_to_runtime(book: &Book) -> run::Book {
  let mut rt_book = run::Book::new();

  // Convert each net in 'book' to a runtime net and add to 'rt_book'
  for (name, net) in book {
    let fid = name_to_val(name);
    let nodes = run::Heap::<false>::init(1 << 16);
    let mut rt = run::NetFields::new(&nodes);
    net_to_runtime(&mut rt, net);
    rt_book.def(fid, runtime_net_to_runtime_def(&rt));
  }

  // Calculate the 'insides' of each runtime definition
  let mut insides = HashMap::new();
  for (fid, def) in &rt_book.defs {
    insides.insert(*fid, runtime_def_get_inside(&def));
  }

  // Compute labs labels used in each runtime definition
  let mut labs_by_fid = HashMap::new();
  for (fid, _) in &rt_book.defs {
    let mut labs = HashSet::with_hasher(std::hash::BuildHasherDefault::default());
    let mut seen = HashSet::new();
    runtime_def_get_all_labs(&mut labs, &insides, *fid, &mut seen);
    labs_by_fid.insert(*fid, labs);
  }

  // Set the 'labs' field for each definition
  for (fid, def) in &mut rt_book.defs {
    def.labs = labs_by_fid.get(fid).unwrap().clone();
    //println!("{} {:?}", val_to_name(*fid), def.labs);
  }

  rt_book
}

// Converts to a def.
pub fn runtime_net_to_runtime_def<const LAZY: bool>(net: &run::NetFields<LAZY>) -> run::Def where [(); LAZY as usize]: {
  let mut node = vec![];
  let mut rdex = vec![];
  let labs = HashSet::with_hasher(std::hash::BuildHasherDefault::default());
  for i in 0 .. net.heap.nodes.len() {
    let p0 = run::APtr::new(run::Ptr(0));
    let p1 = net.heap.get(node.len() as run::Loc, run::P1);
    let p2 = net.heap.get(node.len() as run::Loc, run::P2);
    if p1 != run::NULL || p2 != run::NULL {
      node.push(((), p1, p2));
    } else {
      break;
    }
  }
  for i in 0 .. net.rdex.len() {
    let p1 = net.rdex[i].0;
    let p2 = net.rdex[i].1;
    rdex.push((p1, p2));
  }
  return run::Def { labs, rdex, node };
}

// Reads back from a def.
pub fn runtime_def_to_runtime_net<'a, const LAZY: bool>(nodes: &'a run::Nodes<LAZY>, def: &run::Def) -> run::NetFields<'a, LAZY> where [(); LAZY as usize]: {
  let mut net = run::NetFields::new(&nodes);
  for (i, &(p0, p1, p2)) in def.node.iter().enumerate() {
    net.heap.set(i as run::Loc, run::P1, p1);
    net.heap.set(i as run::Loc, run::P2, p2);
  }
  net.rdex = def.rdex.clone();
  net
}

pub fn tree_from_runtime_go<const LAZY: bool>(rt_net: &run::NetFields<LAZY>, ptr: run::Ptr, parent: Parent, vars: &mut HashMap<Parent, String>, fresh: &mut usize) -> Tree where [(); LAZY as usize]: {
  match ptr.tag() {
    run::ERA => {
      Tree::Era
    }
    run::REF => {
      Tree::Ref { nam: ptr.val() }
    }
    run::NUM => {
      Tree::Num { val: ptr.val() }
    }
    run::OP1 => {
      let opr = ptr.lab();
      let lft = tree_from_runtime_go(rt_net, rt_net.heap.get(ptr.loc(), run::P1), Parent::Node { loc: ptr.loc(), port: run::P1 }, vars, fresh);
      let Tree::Num { val } = lft else { unreachable!() };
      let rgt = tree_from_runtime_go(rt_net, rt_net.heap.get(ptr.loc(), run::P2), Parent::Node { loc: ptr.loc(), port: run::P2 }, vars, fresh);
      Tree::Op1 { opr, lft: val, rgt: Box::new(rgt) }
    }
    run::OP2 => {
      let opr = ptr.lab();
      let lft = tree_from_runtime_go(rt_net, rt_net.heap.get(ptr.loc(), run::P1), Parent::Node { loc: ptr.loc(), port: run::P1 }, vars, fresh);
      let rgt = tree_from_runtime_go(rt_net, rt_net.heap.get(ptr.loc(), run::P2), Parent::Node { loc: ptr.loc(), port: run::P2 }, vars, fresh);
      Tree::Op2 { opr, lft: Box::new(lft), rgt: Box::new(rgt) }
    }
    run::MAT => {
      let sel = tree_from_runtime_go(rt_net, rt_net.heap.get(ptr.loc(), run::P1), Parent::Node { loc: ptr.loc(), port: run::P1 }, vars, fresh);
      let ret = tree_from_runtime_go(rt_net, rt_net.heap.get(ptr.loc(), run::P2), Parent::Node { loc: ptr.loc(), port: run::P2 }, vars, fresh);
      Tree::Mat { sel: Box::new(sel), ret: Box::new(ret) }
    }
    run::VR1 | run::VR2 => {
      let key = match ptr.tag() {
        run::VR1 => Parent::Node { loc: ptr.loc(), port: run::P1 },
        run::VR2 => Parent::Node { loc: ptr.loc(), port: run::P2 },
        _        => unreachable!(),
      };
      if let Some(nam) = vars.get(&key) {
        Tree::Var { nam: nam.clone() }
      } else {
        let nam = num_to_str(*fresh);
        *fresh += 1;
        vars.insert(parent, nam.clone());
        Tree::Var { nam }
      }
    }
    run::LAM => {
      let p1  = rt_net.heap.get(ptr.loc(), run::P1);
      let p2  = rt_net.heap.get(ptr.loc(), run::P2);
      let lft = tree_from_runtime_go(rt_net, p1, Parent::Node { loc: ptr.loc(), port: run::P1 }, vars, fresh);
      let rgt = tree_from_runtime_go(rt_net, p2, Parent::Node { loc: ptr.loc(), port: run::P2 }, vars, fresh);
      Tree::Con { lft: Box::new(lft), rgt: Box::new(rgt) }
    }
    run::TUP => {
      let p1  = rt_net.heap.get(ptr.loc(), run::P1);
      let p2  = rt_net.heap.get(ptr.loc(), run::P2);
      let lft = tree_from_runtime_go(rt_net, p1, Parent::Node { loc: ptr.loc(), port: run::P1 }, vars, fresh);
      let rgt = tree_from_runtime_go(rt_net, p2, Parent::Node { loc: ptr.loc(), port: run::P2 }, vars, fresh);
      Tree::Tup { lft: Box::new(lft), rgt: Box::new(rgt) }
    }
    run::DUP => {
      let p1  = rt_net.heap.get(ptr.loc(), run::P1);
      let p2  = rt_net.heap.get(ptr.loc(), run::P2);
      let lft = tree_from_runtime_go(rt_net, p1, Parent::Node { loc: ptr.loc(), port: run::P1 }, vars, fresh);
      let rgt = tree_from_runtime_go(rt_net, p2, Parent::Node { loc: ptr.loc(), port: run::P2 }, vars, fresh);
      Tree::Dup { lab: ptr.lab(), lft: Box::new(lft), rgt: Box::new(rgt) }
    }
    _ => {
      unreachable!()
    }
  }
}

pub fn tree_from_runtime<const LAZY: bool>(rt_net: &run::NetFields<LAZY>, ptr: run::Ptr) -> Tree where [(); LAZY as usize]: {
  let mut vars = HashMap::new();
  let mut fresh = 0;
  tree_from_runtime_go(rt_net, ptr, PARENT_ROOT, &mut vars, &mut fresh)
}

pub fn net_from_runtime<const LAZY: bool>(rt_net: &run::NetFields<LAZY>) -> Net where [(); LAZY as usize]: {
  let mut vars = HashMap::new();
  let mut fresh = 0;
  let mut rdex = Vec::new();
  let root = tree_from_runtime_go(rt_net, rt_net.heap.get_root(), PARENT_ROOT, &mut vars, &mut fresh);
  for &(a, b) in &rt_net.rdex {
    let tree_a = tree_from_runtime_go(rt_net, a, Parent::Redex, &mut vars, &mut fresh);
    let tree_b = tree_from_runtime_go(rt_net, b, Parent::Redex, &mut vars, &mut fresh);
    rdex.push((tree_a, tree_b));
  }
  Net { root, rdex }
}

pub fn book_from_runtime(rt_book: &run::Book) -> Book {
  let mut book = BTreeMap::new();
  for (fid, def) in rt_book.defs.iter() {
    if def.node.len() > 0 {
      let name  = val_to_name(*fid);
      let nodes = run::Heap::<false>::init(def.node.len());
      let net   = net_from_runtime(&runtime_def_to_runtime_net(&nodes, &def));
      book.insert(name, net);
    }
  }
  book
}

// fns.rs

use crate::run::{*};

impl<'a, const LAZY: bool> NetFields<'a, LAZY> where [(); LAZY as usize]: {
  pub fn call_native(&mut self, book: &Book, ptr: Ptr, x: Ptr) -> bool {
    match ptr.loc() {
      _ => { return false; }
    }
  }
}

// jit.rs

// Despite the file name, this is not actually a JIT (yet).

use crate::run;
use crate::ast;

use std::collections::HashMap;

pub fn compile_book(book: &run::Book) -> String {
  let mut code = String::new();

  code.push_str(&format!("use crate::run::{{*}};\n"));
  code.push_str(&format!("\n"));

  for (fid, def) in book.defs.iter() {
    if def.node.len() > 0 {
      let name = &ast::val_to_name(*fid as run::Val);
      code.push_str(&format!("pub const F_{:4} : Val = 0x{:06x};\n", name, fid));
    }
  }

  code.push_str(&format!("\n"));

  code.push_str(&format!("impl<'a, const LAZY: bool> NetFields<'a, LAZY> where [(); LAZY as usize]: {{\n"));
  code.push_str(&format!("\n"));

  code.push_str(&format!("{}pub fn call_native(&mut self, book: &Book, ptr: Ptr, x: Ptr) -> bool {{\n", ident(1)));
  code.push_str(&format!("{}match ptr.val() {{\n", ident(2)));
  for (fid, def) in book.defs.iter() {
    if def.node.len() > 0 {
      let fun = ast::val_to_name(*fid);
      code.push_str(&format!("{}F_{} => {{ return self.F_{}(ptr, Trg::Ptr(x)); }}\n", ident(3), fun, fun));
    }
  }
  code.push_str(&format!("{}_ => {{ return false; }}\n", ident(3)));
  code.push_str(&format!("{}}}\n", ident(2)));
  code.push_str(&format!("{}}}\n", ident(1)));
  code.push_str(&format!("\n"));

  for (fid, def) in book.defs.iter() {
    if def.node.len() > 0 {
      code.push_str(&compile_term(&book, 1, *fid));
      code.push_str(&format!("\n"));
    }
  }

  code.push_str(&format!("}}"));

  return code;

}

pub fn ident(tab: usize) -> String {
  return "  ".repeat(tab);
}

pub fn tag(tag: run::Tag) -> &'static str {
  match tag {
    run::VR1 => "VR1",
    run::VR2 => "VR2",
    run::RD1 => "RD1",
    run::RD2 => "RD2",
    run::REF => "REF",
    run::ERA => "ERA",
    run::NUM => "NUM",
    run::OP2 => "OP2",
    run::OP1 => "OP1",
    run::MAT => "MAT",
    run::LAM => "LAM",
    run::TUP => "TUP",
    run::DUP => "DUP",
    _ => unreachable!(),
  }
}

pub fn atom(ptr: run::Ptr) -> String {
  if ptr.is_ref() {
    return format!("Ptr::big(REF, F_{})", ast::val_to_name(ptr.val()));
  } else {
    return format!("Ptr::new({}, 0x{:x}, 0x{:x})", tag(ptr.tag()), ptr.lab(), ptr.loc());
  }
}

struct Target {
  nam: String
}

impl Target {
  fn show(&self) -> String {
    format!("{}", self.nam)
  }

  fn get(&self) -> String {
    format!("self.get({})", self.nam)
  }

  fn swap(&self, value: &str) -> String {
    format!("self.swap({}, {})", self.nam, value)
  }

  fn take(&self) -> String {
    self.swap(&"NULL")
  }
}

pub fn compile_term(book: &run::Book, tab: usize, fid: run::Val) -> String {

  // returns a fresh variable: 'v<NUM>'
  fn fresh(newx: &mut usize) -> String {
    *newx += 1;
    format!("k{}", newx)
  }

  fn call_redex(
    book : &run::Book,
    tab  : usize,
    newx : &mut usize,
    vars : &mut HashMap<run::Ptr, String>,
    def  : &run::Def,
    rdex : (run::Ptr, run::Ptr),
  ) -> String {
    let (rf, rx) = adjust_redex(rdex.0, rdex.1);
    let rf_name  = format!("_{}", fresh(newx));
    let mut code = String::new();
    code.push_str(&format!("{}let {} : Trg = Trg::Ptr({});\n", ident(tab), rf_name, &atom(rf)));
    code.push_str(&burn(book, tab, None, newx, vars, def, rx, &Target { nam: rf_name }));
    return code;
  }

  fn call(
    book : &run::Book,
    tab  : usize,
    tail : Option<run::Val>,
    newx : &mut usize,
    vars : &mut HashMap<run::Ptr, String>,
    fid  : run::Val,
    trg  : &Target,
  ) -> String {
    //let newx = &mut 0;
    //let vars = &mut HashMap::new();

    let def = &book.get(fid).unwrap();

    // Tail call
    // TODO: when I manually edited a file to implement tail call, the single-core performance
    // increased a lot, but it resulted in a single thread withholding all redexes and, thus,
    // the program went single-core mode again. I believe a smarter redex sharing structure is
    // necessary for us to implement tail calls in a way that doesn't sacrify parallelism.
    //if tail.is_some() && def.rdex.len() > 0 && def.rdex[0].0.is_ref() && def.rdex[0].0.loc() == tail.unwrap() {
      //println!("tco {}", ast::val_to_name(tail.unwrap() as run::Val));
      //let mut code = String::new();
      //for rdex in &def.rdex[1..] {
        //code.push_str(&call_redex(book, tab, newx, vars, def, *rdex));
      //}
      //code.push_str(&burn(book, tab, Some(fid), newx, vars, def, def.node[0].1, &trg));
      //code.push_str(&call_redex(book, tab, newx, vars, def, def.rdex[0]));
      //return code;
    //}

    // Normal call
    let mut code = String::new();
    for rdex in &def.rdex {
      code.push_str(&call_redex(book, tab, newx, vars, def, *rdex));
    }
    code.push_str(&burn(book, tab, Some(fid), newx, vars, def, def.node[0].2, &trg));
    return code;
  }

  fn burn(
    book : &run::Book,
    tab  : usize,
    tail : Option<run::Val>,
    newx : &mut usize,
    vars : &mut HashMap<run::Ptr, String>,
    def  : &run::Def,
    ptr  : run::Ptr,
    trg  : &Target,
  ) -> String {
    //println!("burn {:08x} {}", ptr.0, x);
    let mut code = String::new();

    // (<?(ifz ifs) ret> ret) ~ (#X R)
    // ------------------------------- fast match
    // if X == 0:
    //   ifz ~ R
    //   ifs ~ *
    // else:
    //   ifz ~ *
    //   ifs ~ (#(X-1) R)
    // When ifs is REF, tail-call optimization is applied.
    if ptr.tag() == run::LAM {
      let mat = def.node[ptr.loc() as usize].1;
      let rty = def.node[ptr.loc() as usize].2;
      if mat.tag() == run::MAT {
        let cse = def.node[mat.loc() as usize].1;
        let rtx = def.node[mat.loc() as usize].2;
        let got = def.node[rty.loc() as usize];
        let rtz = if rty.tag() == run::VR1 { got.1 } else { got.2 };
        if cse.tag() == run::LAM && rtx.is_var() && rtx == rtz {
          let ifz = def.node[cse.loc() as usize].1;
          let ifs = def.node[cse.loc() as usize].2;
          let c_z = Target { nam: fresh(newx) };
          let c_s = Target { nam: fresh(newx) };
          let num = Target { nam: format!("{}x", trg.show()) };
          let res = Target { nam: format!("{}y", trg.show()) };
          let lam = fresh(newx);
          let mat = fresh(newx);
          let cse = fresh(newx);
          code.push_str(&format!("{}let {} : Trg;\n", ident(tab), &c_z.show()));
          code.push_str(&format!("{}let {} : Trg;\n", ident(tab), &c_s.show()));
          code.push_str(&format!("{}// fast match\n", ident(tab)));
          code.push_str(&format!("{}if {}.tag() == LAM && self.heap.get({}.loc(), P1).is_num() {{\n", ident(tab), trg.get(), trg.get()));
          code.push_str(&format!("{}self.rwts.anni += 2;\n", ident(tab+1)));
          code.push_str(&format!("{}self.rwts.oper += 1;\n", ident(tab+1)));
          code.push_str(&format!("{}let got = {};\n", ident(tab+1), trg.take()));
          code.push_str(&format!("{}let {} = Trg::Dir(Ptr::new(VR1, 0, got.loc()));\n", ident(tab+1), num.show()));
          code.push_str(&format!("{}let {} = Trg::Dir(Ptr::new(VR2, 0, got.loc()));\n", ident(tab+1), res.show()));
          code.push_str(&format!("{}if {}.val() == 0 {{\n", ident(tab+1), num.get()));
          code.push_str(&format!("{}{};\n", ident(tab+2), num.take()));
          code.push_str(&format!("{}{} = {};\n", ident(tab+2), &c_z.show(), res.show()));
          code.push_str(&format!("{}{} = Trg::Ptr({});\n", ident(tab+2), &c_s.show(), "ERAS"));
          code.push_str(&format!("{}}} else {{\n", ident(tab+1)));
          code.push_str(&format!("{}{};\n", ident(tab+2), num.swap(&format!("Ptr::big(NUM, {}.val() - 1)", num.get()))));
          code.push_str(&format!("{}{} = Trg::Ptr({});\n", ident(tab+2), &c_z.show(), "ERAS"));
          code.push_str(&format!("{}{} = {};\n", ident(tab+2), &c_s.show(), trg.show()));
          code.push_str(&format!("{}}}\n", ident(tab+1)));
          code.push_str(&format!("{}}} else {{\n", ident(tab)));
          code.push_str(&format!("{}let {} = self.alloc();\n", ident(tab+1), lam));
          code.push_str(&format!("{}let {} = self.alloc();\n", ident(tab+1), mat));
          code.push_str(&format!("{}let {} = self.alloc();\n", ident(tab+1), cse));
          code.push_str(&format!("{}self.heap.set({}, P1, Ptr::new(MAT, 0, {}));\n", ident(tab+1), lam, mat));
          code.push_str(&format!("{}self.heap.set({}, P2, Ptr::new(VR2, 0, {}));\n", ident(tab+1), lam, mat));
          code.push_str(&format!("{}self.heap.set({}, P1, Ptr::new(LAM, 0, {}));\n", ident(tab+1), mat, cse));
          code.push_str(&format!("{}self.heap.set({}, P2, Ptr::new(VR2, 0, {}));\n", ident(tab+1), mat, lam));
          code.push_str(&format!("{}self.safe_link(Trg::Ptr(Ptr::new(LAM, 0, {})), {});\n", ident(tab+1), lam, trg.show()));
          code.push_str(&format!("{}{} = Trg::Ptr(Ptr::new(VR1, 0, {}));\n", ident(tab+1), &c_z.show(), cse));
          code.push_str(&format!("{}{} = Trg::Ptr(Ptr::new(VR2, 0, {}));\n", ident(tab+1), &c_s.show(), cse));
          code.push_str(&format!("{}}}\n", ident(tab)));
          code.push_str(&burn(book, tab, None, newx, vars, def, ifz, &c_z));
          code.push_str(&burn(book, tab, tail, newx, vars, def, ifs, &c_s));
          return code;
        }
      }
    }

    // #A ~ <+ #B r>
    // ----------------- fast op
    // r <~ #(op(+,A,B))
    if ptr.is_op2() {
      let val = def.node[ptr.loc() as usize].1;
      let ret = def.node[ptr.loc() as usize].2;
      if let Some(val) = got(vars, def, val) {
        let val = Target { nam: val };
        let nxt = Target { nam: fresh(newx) };
        let op2 = fresh(newx);
        code.push_str(&format!("{}let {} : Trg;\n", ident(tab), &nxt.show()));
        code.push_str(&format!("{}// fast op\n", ident(tab)));
        code.push_str(&format!("{}if {}.is_num() && {}.is_num() {{\n", ident(tab), trg.get(), val.get()));
        code.push_str(&format!("{}self.rwts.oper += 2;\n", ident(tab+1))); // OP2 + OP1
        code.push_str(&format!("{}let vx = {};\n", ident(tab+1), trg.take()));
        code.push_str(&format!("{}let vy = {};\n", ident(tab+1), val.take()));
        code.push_str(&format!("{}{} = Trg::Ptr(Ptr::big(NUM, self.op({},vx.val(),vy.val())));\n", ident(tab+1), &nxt.show(), ptr.lab()));
        code.push_str(&format!("{}}} else {{\n", ident(tab)));
        code.push_str(&format!("{}let {} = self.alloc();\n", ident(tab+1), op2));
        code.push_str(&format!("{}self.safe_link(Trg::Ptr(Ptr::new(VR1, 0, {})), {});\n", ident(tab+1), op2, val.show()));
        code.push_str(&format!("{}self.safe_link(Trg::Ptr(Ptr::new(OP2, {}, {})), {});\n", ident(tab+1), ptr.lab(), op2, trg.show()));
        code.push_str(&format!("{}{} = Trg::Ptr(Ptr::new(VR2, 0, {}));\n", ident(tab+1), &nxt.show(), op2));
        code.push_str(&format!("{}}}\n", ident(tab)));
        code.push_str(&burn(book, tab, None, newx, vars, def, ret, &nxt));
        return code;
      }
    }

    // {p1 p2} <~ #N
    // ------------- fast copy
    // p1 <~ #N
    // p2 <~ #N
    if ptr.is_dup() {
      let x1 = Target { nam: format!("{}x", trg.show()) };
      let x2 = Target { nam: format!("{}y", trg.show()) };
      let p1 = def.node[ptr.loc() as usize].1;
      let p2 = def.node[ptr.loc() as usize].2;
      let lc = fresh(newx);
      code.push_str(&format!("{}let {} : Trg;\n", ident(tab), &x1.show()));
      code.push_str(&format!("{}let {} : Trg;\n", ident(tab), &x2.show()));
      code.push_str(&format!("{}// fast copy\n", ident(tab)));
      code.push_str(&format!("{}if {}.tag() == NUM {{\n", ident(tab), trg.get()));
      code.push_str(&format!("{}self.rwts.comm += 1;\n", ident(tab+1)));
      code.push_str(&format!("{}let got = {};\n", ident(tab+1), trg.take()));
      code.push_str(&format!("{}{} = Trg::Ptr(got);\n", ident(tab+1), &x1.show()));
      code.push_str(&format!("{}{} = Trg::Ptr(got);\n", ident(tab+1), &x2.show()));
      code.push_str(&format!("{}}} else {{\n", ident(tab)));
      code.push_str(&format!("{}let {} = self.alloc();\n", ident(tab+1), lc));
      code.push_str(&format!("{}{} = Trg::Ptr(Ptr::new(VR1, 0, {}));\n", ident(tab+1), &x1.show(), lc));
      code.push_str(&format!("{}{} = Trg::Ptr(Ptr::new(VR2, 0, {}));\n", ident(tab+1), &x2.show(), lc));
      code.push_str(&format!("{}self.safe_link(Trg::Ptr(Ptr::new({}, {}, {})), {});\n", ident(tab+1), tag(ptr.tag()), ptr.lab(), lc, trg.show()));
      code.push_str(&format!("{}}}\n", ident(tab)));
      code.push_str(&burn(book, tab, None, newx, vars, def, p2, &x2));
      code.push_str(&burn(book, tab, None, newx, vars, def, p1, &x1));
      return code;
    }

    // (p1 p2) <~ (x1 x2)
    // ------------------ fast apply
    // p1 <~ x1
    // p2 <~ x2
    if ptr.is_ctr() && ptr.tag() == run::LAM {
      let x1 = Target { nam: format!("{}x", trg.show()) };
      let x2 = Target { nam: format!("{}y", trg.show()) };
      let p1 = def.node[ptr.loc() as usize].1;
      let p2 = def.node[ptr.loc() as usize].2;
      let lc = fresh(newx);
      code.push_str(&format!("{}let {} : Trg;\n", ident(tab), &x1.show()));
      code.push_str(&format!("{}let {} : Trg;\n", ident(tab), &x2.show()));
      code.push_str(&format!("{}// fast apply\n", ident(tab)));
      code.push_str(&format!("{}if {}.tag() == {} {{\n", ident(tab), trg.get(), tag(ptr.tag())));
      code.push_str(&format!("{}self.rwts.anni += 1;\n", ident(tab+1)));
      code.push_str(&format!("{}let got = {};\n", ident(tab+1), trg.take()));
      code.push_str(&format!("{}{} = Trg::Dir(Ptr::new(VR1, 0, got.loc()));\n", ident(tab+1), &x1.show()));
      code.push_str(&format!("{}{} = Trg::Dir(Ptr::new(VR2, 0, got.loc()));\n", ident(tab+1), &x2.show()));
      code.push_str(&format!("{}}} else {{\n", ident(tab)));
      code.push_str(&format!("{}let {} = self.alloc();\n", ident(tab+1), lc));
      code.push_str(&format!("{}{} = Trg::Ptr(Ptr::new(VR1, 0, {}));\n", ident(tab+1), &x1.show(), lc));
      code.push_str(&format!("{}{} = Trg::Ptr(Ptr::new(VR2, 0, {}));\n", ident(tab+1), &x2.show(), lc));
      code.push_str(&format!("{}self.safe_link(Trg::Ptr(Ptr::new({}, 0, {})), {});\n", ident(tab+1), tag(ptr.tag()), lc, trg.show()));
      code.push_str(&format!("{}}}\n", ident(tab)));
      code.push_str(&burn(book, tab, None, newx, vars, def, p2, &x2));
      code.push_str(&burn(book, tab, None, newx, vars, def, p1, &x1));
      return code;
    }

    //// TODO: implement inlining correctly
    //// NOTE: enabling this makes dec_bits_tree hang; investigate
    //if ptr.is_ref() && tail.is_some() {
      //code.push_str(&format!("{}// inline @{}\n", ident(tab), ast::val_to_name(ptr.loc() as run::Val)));
      //code.push_str(&format!("{}if !{}.is_skp() {{\n", ident(tab), trg.get()));
      //code.push_str(&format!("{}self.rwts.dref += 1;\n", ident(tab+1)));
      //code.push_str(&call(book, tab+1, tail, newx, &mut HashMap::new(), ptr.loc(), trg));
      //code.push_str(&format!("{}}} else {{\n", ident(tab)));
      //code.push_str(&make(tab+1, newx, vars, def, ptr, &trg.show()));
      //code.push_str(&format!("{}}}\n", ident(tab)));
      //return code;
    //}

    // ATOM <~ *
    // --------- fast erase
    // nothing
    if ptr.is_num() || ptr.is_era() {
      code.push_str(&format!("{}// fast erase\n", ident(tab)));
      code.push_str(&format!("{}if {}.is_skp() {{\n", ident(tab), trg.get()));
      code.push_str(&format!("{}{};\n", ident(tab+1), trg.take()));
      code.push_str(&format!("{}self.rwts.eras += 1;\n", ident(tab+1)));
      code.push_str(&format!("{}}} else {{\n", ident(tab)));
      code.push_str(&make(tab+1, newx, vars, def, ptr, &trg.show()));
      code.push_str(&format!("{}}}\n", ident(tab)));
      return code;
    }

    code.push_str(&make(tab, newx, vars, def, ptr, &trg.show()));
    return code;
  }

  fn make(
    tab  : usize,
    newx : &mut usize,
    vars : &mut HashMap<run::Ptr, String>,
    def  : &run::Def,
    ptr  : run::Ptr,
    trg  : &String,
  ) -> String {
    //println!("make {:08x} {}", ptr.0, x);
    let mut code = String::new();
    if ptr.is_nod() {
      let lc = fresh(newx);
      let p1 = def.node[ptr.loc() as usize].1;
      let p2 = def.node[ptr.loc() as usize].2;
      code.push_str(&format!("{}let {} = self.alloc();\n", ident(tab), lc));
      code.push_str(&make(tab, newx, vars, def, p2, &format!("Trg::Ptr(Ptr::new(VR2, 0, {}))", lc)));
      code.push_str(&make(tab, newx, vars, def, p1, &format!("Trg::Ptr(Ptr::new(VR1, 0, {}))", lc)));
      code.push_str(&format!("{}self.safe_link(Trg::Ptr(Ptr::new({}, {}, {})), {});\n", ident(tab), tag(ptr.tag()), ptr.lab(), lc, trg));
    } else if ptr.is_var() {
      match got(vars, def, ptr) {
        None => {
          //println!("-var fst");
          vars.insert(ptr, trg.clone());
        },
        Some(got) => {
          //println!("-var snd");
          code.push_str(&format!("{}self.safe_link({}, {});\n", ident(tab), trg, got));
        }
      }
    } else {
      code.push_str(&format!("{}self.safe_link({}, Trg::Ptr({}));\n", ident(tab), trg, atom(ptr)));
    }
    return code;
  }

  fn got(
    vars : &HashMap<run::Ptr, String>,
    def  : &run::Def,
    ptr  : run::Ptr,
  ) -> Option<String> {
    if ptr.is_var() {
      let got = def.node[ptr.loc() as usize];
      let slf = if ptr.tag() == run::VR1 { got.1 } else { got.2 };
      return vars.get(&slf).cloned();
    } else {
      return None;
    }
  }

  let fun = ast::val_to_name(fid);
  let def = &book.get(fid).unwrap();

  let mut code = String::new();
  // Given a label, returns true if the definition contains that dup label, directly or not
  code.push_str(&format!("{}pub fn L_{}(&mut self, lab: Lab) -> bool {{\n", ident(tab), fun));
  for dup in &def.labs {
    code.push_str(&format!("{}if lab == 0x{:x} {{ return true; }}\n", ident(tab+1), dup));
  }
  code.push_str(&format!("{}return false;\n", ident(tab+1)));
  code.push_str(&format!("{}}}\n", ident(tab)));
  // Calls the definition, performing inline rewrites when possible, and expanding it when not
  code.push_str(&format!("{}pub fn F_{}(&mut self, ptr: Ptr, trg: Trg) -> bool {{\n", ident(tab), fun));
  code.push_str(&format!("{}if self.get(trg).is_dup() && !self.L_{}(self.get(trg).lab()) {{\n", ident(tab+1), fun));
  code.push_str(&format!("{}self.copy(self.swap(trg, NULL), ptr);\n", ident(tab+2)));
  code.push_str(&format!("{}return true;\n", ident(tab+2)));
  code.push_str(&format!("{}}}\n", ident(tab+1)));
  code.push_str(&call(book, tab+1, None, &mut 0, &mut HashMap::new(), fid, &Target { nam: "trg".to_string() }));
  code.push_str(&format!("{}return true;\n", ident(tab+1)));
  code.push_str(&format!("{}}}\n", ident(tab)));

  return code;
}

// TODO: HVM-Lang must always output in this form.
fn adjust_redex(rf: run::Ptr, rx: run::Ptr) -> (run::Ptr, run::Ptr) {
  if rf.is_skp() && !rx.is_skp() {
    return (rf, rx);
  } else if !rf.is_skp() && rx.is_skp() {
    return (rx, rf);
  } else {
    println!("Invalid redex. Compiled HVM requires that ALL defs are in the form:");
    println!("@name = ROOT");
    println!("  & ATOM ~ TERM");
    println!("  & ATOM ~ TERM");
    println!("  & ATOM ~ TERM");
    println!("  ...");
    println!("Where ATOM must be either a ref (`@foo`), a num (`#123`), or an era (`*`).");
    println!("If you used HVM-Lang, please report on https://github.com/HigherOrderCO/hvm-lang.");
    panic!("Invalid HVMC file.");
  }
}

// lib.rs

#![feature(generic_const_exprs)]
#![allow(incomplete_features)]

#![allow(dead_code)]
#![allow(unused_variables)]
#![allow(unused_imports)]
#![allow(non_snake_case)]
#![allow(non_upper_case_globals)]

pub mod ast;
pub mod fns;
pub mod jit;
pub mod run;
pub mod u60;

// main.rs

#![feature(generic_const_exprs)]
#![allow(incomplete_features)]

#![allow(dead_code)]
#![allow(non_snake_case)]
#![allow(non_upper_case_globals)]
#![allow(unused_imports)]
#![allow(unused_variables)]

use std::env;
use std::fs;

use hvmc::ast;
use hvmc::fns;
use hvmc::jit;
use hvmc::run;
use hvmc::u60;

use std::collections::HashSet;

struct Args {
  func: String,
  argm: String,
  opts: HashSet<String>,
}

fn get_args() -> Args {
  let args: Vec<String> = env::args().collect();
  let func = args.get(1).unwrap_or(&"help".to_string()).to_string();
  let argm = args.get(2).unwrap_or(&"".to_string()).to_string();
  let opts = args.iter().skip(3).map(|s| s.to_string()).collect::<HashSet<_>>();
  return Args { func, argm, opts };
}

// Runs 'main' without showing the CLI options
fn run_without_cli(args: Args) {
  let lazy    = args.opts.contains("-L");
  let seq     = lazy || args.opts.contains("-1");
  let file    = args.argm;
  let book    = run::Book::new();
  let mut net = run::Net::new(1 << 28, false);
  let begin   = std::time::Instant::now();
  if lazy { todo!() }
  if seq {
    net.normal(&book);
  } else {
    net.parallel_normal(&book);
  }
  println!("{}", net.show());
  print_stats(&net, begin);
}

fn run_with_cli(args: Args) -> Result<(), Box<dyn std::error::Error>> {
  let lazy = args.opts.contains("-L");
  let seq  = lazy || args.opts.contains("-1");
  match args.func.as_str() {
    "run" => {
      if args.argm.len() > 0 {
        let file    = args.argm;
        let book    = load_book(&file);
        let mut net = run::Net::new(1 << 28, lazy);
        let begin   = std::time::Instant::now();
        if seq {
          net.normal(&book);
        } else {
          net.parallel_normal(&book);
        }
        //println!("{}", net.show());
        println!("{}", net.show());
        if args.opts.contains("-s") {
          print_stats(&net, begin);
        }
      } else {
        println!("Usage: hvmc run <file.hvmc> [-s]");
        std::process::exit(1);
      }
    }
    "compile" => {
      if args.argm.len() > 0 {
        let file  = args.argm;
        let book  = load_book(&file);
        let net   = run::Net::new(1 << 28, lazy);
        let begin = std::time::Instant::now();
        compile_book_to_rust_crate(&file, &book)?;
        compile_rust_crate_to_executable(&file)?;
      } else {
        println!("Usage: hvmc compile <file.hvmc>");
        std::process::exit(1);
      }
    }
    "gen-cuda-book" => {
      if args.argm.len() > 0 {
        let file  = args.argm;
        let book  = load_book(&file);
        let net   = run::Net::new(1 << 28, lazy);
        let begin = std::time::Instant::now();
        println!("{}", gen_cuda_book(&book));
      } else {
        println!("Usage: hvmc gen-cuda-book <file.hvmc>");
        std::process::exit(1);
      }
    }
    _ => {
      println!("Usage: hvmc <cmd> <file.hvmc> [-s]");
      println!("Commands:");
      println!("  run           - Run the given file");
      println!("  compile       - Compile the given file to an executable");
      println!("  gen-cuda-book - Generate a CUDA book from the given file");
      println!("Options:");
      println!("  [-s] Show stats, including rewrite count");
      println!("  [-1] Single-core mode (no parallelism)");
    }
  }
  Ok(())
}

#[cfg(not(feature = "hvm_cli_options"))]
fn main() {
  run_without_cli(get_args())
}

#[cfg(feature = "hvm_cli_options")]
fn main() -> Result<(), Box<dyn std::error::Error>> {
  run_with_cli(get_args())
}

fn print_stats(net: &run::Net, begin: std::time::Instant) {
  let rewrites = net.get_rewrites();
  println!("RWTS   : {}", rewrites.total());
  println!("- ANNI : {}", rewrites.anni);
  println!("- COMM : {}", rewrites.comm);
  println!("- ERAS : {}", rewrites.eras);
  println!("- DREF : {}", rewrites.dref);
  println!("- OPER : {}", rewrites.oper);
  println!("TIME   : {:.3} s", (begin.elapsed().as_millis() as f64) / 1000.0);
  println!("RPS    : {:.3} m", (rewrites.total() as f64) / (begin.elapsed().as_millis() as f64) / 1000.0);
}

// Load file
fn load_book(file: &str) -> run::Book {
  let Ok(file) = fs::read_to_string(file) else {
    eprintln!("Input file not found");
    std::process::exit(1);
  };
  return ast::book_to_runtime(&ast::do_parse_book(&file));
}

pub fn compile_book_to_rust_crate(f_name: &str, book: &run::Book) -> Result<(), std::io::Error> {
  let fns_rs = jit::compile_book(book);
  let outdir = ".hvm";
  if std::path::Path::new(&outdir).exists() {
    fs::remove_dir_all(&outdir)?;
  }
  let cargo_toml = include_str!("../Cargo.toml");
  let cargo_toml = cargo_toml.split("##--COMPILER-CUTOFF--##").next().unwrap();
  let cargo_toml = cargo_toml.replace("\"hvm_cli_options\"", "");
  fs::create_dir_all(&format!("{}/src", outdir))?;
  fs::write(".hvm/Cargo.toml", cargo_toml)?;
  fs::write(".hvm/src/ast.rs", include_str!("../src/ast.rs"))?;
  fs::write(".hvm/src/jit.rs", include_str!("../src/jit.rs"))?;
  fs::write(".hvm/src/lib.rs", include_str!("../src/lib.rs"))?;
  fs::write(".hvm/src/main.rs", include_str!("../src/main.rs"))?;
  fs::write(".hvm/src/run.rs", include_str!("../src/run.rs"))?;
  fs::write(".hvm/src/u60.rs", include_str!("../src/u60.rs"))?;
  fs::write(".hvm/src/fns.rs", fns_rs)?;
  return Ok(());
}

pub fn compile_rust_crate_to_executable(f_name: &str) -> Result<(), std::io::Error> {
  let output = std::process::Command::new("cargo").current_dir("./.hvm").arg("build").arg("--release").output()?;
  let target = format!("./{}", f_name.replace(".hvmc", ""));
  if std::path::Path::new(&target).exists() {
    fs::remove_file(&target)?;
  }
  fs::copy("./.hvm/target/release/hvmc", target)?;
  return Ok(());
}

// TODO: move to hvm-cuda repo
pub fn gen_cuda_book(book: &run::Book) -> String {
  use std::collections::BTreeMap;

  // Sort the book.defs by key
  let mut defs = BTreeMap::new();
  for (fid, def) in book.defs.iter() {
    if def.node.len() > 0 {
      defs.insert(fid, def.clone());
    }
  }

  // Initializes code
  let mut code = String::new();

  // Generate function ids
  for (i, id) in defs.keys().enumerate() {
    code.push_str(&format!("const u32 F_{} = 0x{:x};\n", crate::ast::val_to_name(**id), id));
  }
  code.push_str("\n");

  // Create book
  code.push_str("u32 BOOK_DATA[] = {\n");

  // Generate book data
  for (i, (id, net)) in defs.iter().enumerate() {
    let node_len = net.node.len();
    let rdex_len = net.rdex.len();

    code.push_str(&format!("  // @{}\n", crate::ast::val_to_name(**id)));

    // Collect all pointers from root, nodes and rdex into a single buffer
    code.push_str(&format!("  // .nlen\n"));
    code.push_str(&format!("  0x{:08X},\n", node_len));
    code.push_str(&format!("  // .rlen\n"));
    code.push_str(&format!("  0x{:08X},\n", rdex_len));

    // .node
    code.push_str("  // .node\n");
    for (i, node) in net.node.iter().enumerate() {
      code.push_str(&format!("  0x{:08X},", node.1.0));
      code.push_str(&format!(" 0x{:08X},", node.2.0));
      if (i + 1) % 4 == 0 {
        code.push_str("\n");
      }
    }
    if node_len % 4 != 0 {
      code.push_str("\n");
    }

    // .rdex
    code.push_str("  // .rdex\n");
    for (i, (a, b)) in net.rdex.iter().enumerate() {
      code.push_str(&format!("  0x{:08X},", a.0));
      code.push_str(&format!(" 0x{:08X},", b.0));
      if (i + 1) % 4 == 0 {
        code.push_str("\n");
      }
    }
    if rdex_len % 4 != 0 {
      code.push_str("\n");
    }
  }

  code.push_str("};\n\n");

  code.push_str("u32 JUMP_DATA[] = {\n");

  let mut index = 0;
  for (i, fid) in defs.keys().enumerate() {
    code.push_str(&format!("  0x{:08X}, 0x{:08X}, // @{}\n", fid, index, crate::ast::val_to_name(**fid)));
    index += 2 + 2 * defs[fid].node.len() as u32 + 2 * defs[fid].rdex.len() as u32;
  }

  code.push_str("};");

  return code;
}

// run.rs

// An efficient Interaction Combinator runtime
// ===========================================
// This file implements an efficient interaction combinator runtime. Nodes are represented by 2 aux
// ports (P1, P2), with the main port (P1) omitted. A separate vector, 'rdex', holds main ports,
// and, thus, tracks active pairs that can be reduced in parallel. Pointers are unboxed, meaning
// that ERAs, NUMs and REFs don't use any additional space. REFs lazily expand to closed nets when
// they interact with nodes, and are cleared when they interact with ERAs, allowing for constant
// space evaluation of recursive functions on Scott encoded datatypes.

use std::sync::atomic::{AtomicU64, AtomicUsize, Ordering};
use std::sync::{Arc, Barrier};
use std::collections::HashMap;
use std::collections::HashSet;
use crate::u60;

pub type Tag  = u8;
pub type Lab  = u32;
pub type Loc  = u32;
pub type Val  = u64;
pub type AVal = AtomicU64;

// Core terms.
pub const VR1: Tag = 0x0; // Variable to aux port 1
pub const VR2: Tag = 0x1; // Variable to aux port 2
pub const RD1: Tag = 0x2; // Redirect to aux port 1
pub const RD2: Tag = 0x3; // Redirect to aux port 2
pub const REF: Tag = 0x4; // Lazy closed net
pub const ERA: Tag = 0x5; // Unboxed eraser
pub const NUM: Tag = 0x6; // Unboxed number
pub const OP2: Tag = 0x7; // Binary numeric operation
pub const OP1: Tag = 0x8; // Unary numeric operation
pub const MAT: Tag = 0x9; // Numeric pattern-matching
pub const LAM: Tag = 0xA; // Main port of lam node
pub const TUP: Tag = 0xB; // Main port of tup node
pub const DUP: Tag = 0xC; // Main port of dup node
pub const END: Tag = 0xE; // Last pointer tag

// Numeric operations.
pub const ADD: Lab = 0x00; // addition
pub const SUB: Lab = 0x01; // subtraction
pub const MUL: Lab = 0x02; // multiplication
pub const DIV: Lab = 0x03; // division
pub const MOD: Lab = 0x04; // modulus
pub const EQ : Lab = 0x05; // equal-to
pub const NE : Lab = 0x06; // not-equal-to
pub const LT : Lab = 0x07; // less-than
pub const GT : Lab = 0x08; // greater-than
pub const LTE: Lab = 0x09; // less-than-or-equal
pub const GTE: Lab = 0x0A; // greater-than-or-equal
pub const AND: Lab = 0x0B; // logical-and
pub const OR : Lab = 0x0C; // logical-or
pub const XOR: Lab = 0x0D; // logical-xor
pub const LSH: Lab = 0x0E; // left-shift
pub const RSH: Lab = 0x0F; // right-shift
pub const NOT: Lab = 0x10; // logical-not

pub const ERAS: Ptr = Ptr::new(ERA, 0, 0);
pub const ROOT: Ptr = Ptr::new(VR2, 0, 0);
pub const NULL: Ptr = Ptr(0x0000_0000_0000_0000);
pub const GONE: Ptr = Ptr(0xFFFF_FFFF_FFFF_FFEF);
pub const LOCK: Ptr = Ptr(0xFFFF_FFFF_FFFF_FFFF); // if last digit is F it will be seen as a CTR

// An auxiliary port.
pub type Port = Val;
pub const P1: Port = 0;
pub const P2: Port = 1;

// A tagged pointer.
#[derive(Copy, Clone, Debug, Eq, PartialEq, PartialOrd, Hash)]
pub struct Ptr(pub Val);

// An atomic tagged pointer.
pub struct APtr(pub AVal);

// FIXME: the 'this' pointer of headers is wasteful, since it is only used once in the lazy
// reducer, and, there, only the tag/lab is needed, because the loc is already known. As such, we
// could actually store only the tag/lab, saving up 32 bits per node.

// A principal port, used on lazy mode.
pub struct Head {
  this: Ptr, // points to this node's port 0
  targ: Ptr, // points to the target port 0
}

// An atomic principal port, used on lazy mode.
pub struct AHead {
  this: APtr, // points to this node's port 0
  targ: APtr, // points to the target port 0
}

// An interaction combinator node.
pub type  Node<const LAZY: bool> = ([ Head; LAZY as usize],  Ptr,  Ptr);
pub type ANode<const LAZY: bool> = ([AHead; LAZY as usize], APtr, APtr);

// A target pointer, with implied ownership.
#[derive(Copy, Clone, Debug, Eq, PartialEq, PartialOrd, Hash)]
pub enum Trg {
  Dir(Ptr), // we don't own the pointer, so we point to its location
  Ptr(Ptr), // we own the pointer, so we store it directly
}

// The global node buffer.
pub type Nodes<const LAZY: bool> = [ANode<LAZY>];

// A handy wrapper around Nodes.
pub struct Heap<'a, const LAZY: bool>
where [(); LAZY as usize]: {
  pub nodes: &'a Nodes<LAZY>,
}

// Rewrite counter.
#[derive(Copy, Clone)]
pub struct Rewrites {
  pub anni: usize, // anni rewrites
  pub comm: usize, // comm rewrites
  pub eras: usize, // eras rewrites
  pub dref: usize, // dref rewrites
  pub oper: usize, // oper rewrites
}

// Rewrite counter, atomic.
pub struct AtomicRewrites {
  pub anni: AtomicUsize, // anni rewrites
  pub comm: AtomicUsize, // comm rewrites
  pub eras: AtomicUsize, // eras rewrites
  pub dref: AtomicUsize, // dref rewrites
  pub oper: AtomicUsize, // oper rewrites
}

// An allocation area delimiter
pub struct Area {
  pub init: usize, // first allocation index
  pub size: usize, // total nodes in area
}

// A interaction combinator net.
pub struct NetFields<'a, const LAZY: bool>
where [(); LAZY as usize]: {
  pub tid : usize, // thread id
  pub tids: usize, // thread count
  pub labs: Lab, // dup labels
  pub heap: Heap<'a, LAZY>, // nodes
  pub rdex: Vec<(Ptr,Ptr)>, // redexes
  pub locs: Vec<Loc>,
  pub area: Area, // allocation area
  pub next: usize, // next allocation index within area
  pub rwts: Rewrites, // rewrite count
}

// A compact closed net, used for dereferences.
#[derive(Clone, Debug, Eq, PartialEq)]
pub struct Def {
  pub labs: HashSet<Lab, nohash_hasher::BuildNoHashHasher<Lab>>,
  pub rdex: Vec<(Ptr, Ptr)>,
  pub node: Vec<((), Ptr, Ptr)>,
}

// A map of id to definitions (closed nets).
pub struct Book {
  pub defs: HashMap<Val, Def, nohash_hasher::BuildNoHashHasher<Val>>,
}

impl Ptr {
  #[inline(always)]
  pub const fn new(tag: Tag, lab: Lab, loc: Loc) -> Self {
    Ptr(((loc as Val) << 32) | ((lab as Val) << 4) | (tag as Val))
  }

  #[inline(always)]
  pub const fn big(tag: Tag, val: Val) -> Self {
    Ptr((val << 4) | (tag as Val))
  }

  #[inline(always)]
  pub const fn tag(&self) -> Tag {
    (self.0 & 0xF) as Tag
  }

  #[inline(always)]
  pub const fn lab(&self) -> Lab {
    (self.0 as Lab) >> 4
  }

  #[inline(always)]
  pub const fn loc(&self) -> Loc {
    (self.0 >> 32) as Loc
  }

  #[inline(always)]
  pub const fn val(&self) -> Val {
    self.0 >> 4
  }

  #[inline(always)]
  pub fn is_nil(&self) -> bool {
    return self.0 == 0;
  }

  #[inline(always)]
  pub fn is_var(&self) -> bool {
    return matches!(self.tag(), VR1..=VR2) && !self.is_nil();
  }

  #[inline(always)]
  pub fn is_red(&self) -> bool {
    return matches!(self.tag(), RD1..=RD2) && !self.is_nil();
  }

  #[inline(always)]
  pub fn is_era(&self) -> bool {
    return matches!(self.tag(), ERA);
  }

  #[inline(always)]
  pub fn is_ctr(&self) -> bool {
    return matches!(self.tag(), LAM..=END);
  }

  #[inline(always)]
  pub fn is_dup(&self) -> bool {
    return matches!(self.tag(), DUP);
  }

  #[inline(always)]
  pub fn is_ref(&self) -> bool {
    return matches!(self.tag(), REF);
  }

  #[inline(always)]
  pub fn is_pri(&self) -> bool {
    return matches!(self.tag(), REF..=END);
  }

  #[inline(always)]
  pub fn is_num(&self) -> bool {
    return matches!(self.tag(), NUM);
  }

  #[inline(always)]
  pub fn is_op1(&self) -> bool {
    return matches!(self.tag(), OP1);
  }

  #[inline(always)]
  pub fn is_op2(&self) -> bool {
    return matches!(self.tag(), OP2);
  }

  #[inline(always)]
  pub fn is_skp(&self) -> bool {
    return matches!(self.tag(), ERA | NUM | REF);
  }

  #[inline(always)]
  pub fn is_mat(&self) -> bool {
    return matches!(self.tag(), MAT);
  }

  #[inline(always)]
  pub fn is_nod(&self) -> bool {
    return matches!(self.tag(), OP2..=END);
  }

  #[inline(always)]
  pub fn has_loc(&self) -> bool {
    return matches!(self.tag(), VR1..=VR2 | OP2..=END);
  }

  #[inline(always)]
  pub fn redirect(&self) -> Ptr {
    return Ptr::new(self.tag() + RD2 - VR2, 0, self.loc());
  }

  #[inline(always)]
  pub fn unredirect(&self) -> Ptr {
    return Ptr::new(self.tag() + RD2 - VR2, 0, self.loc());
  }

  #[inline(always)]
  pub fn can_skip(a: Ptr, b: Ptr) -> bool {
    return matches!(a.tag(), ERA | REF) && matches!(b.tag(), ERA | REF);
  }

  #[inline(always)]
  pub fn view(&self) -> String {
    if *self == NULL {
      return format!("(NUL)");
    } else {
      return match self.tag() {
        VR1 => format!("(VR1 {:07x} {:08x})", self.lab(), self.loc()),
        VR2 => format!("(VR2 {:07x} {:08x})", self.lab(), self.loc()),
        RD1 => format!("(RD1 {:07x} {:08x})", self.lab(), self.loc()),
        RD2 => format!("(RD2 {:07x} {:08x})", self.lab(), self.loc()),
        REF => format!("(REF \"{}\")", crate::ast::val_to_name(self.val())),
        ERA => format!("(ERA)"),
        NUM => format!("(NUM {:x})", self.val()),
        OP2 => format!("(OP2 {:07x} {:08x})", self.lab(), self.loc()),
        OP1 => format!("(OP1 {:07x} {:08x})", self.lab(), self.loc()),
        MAT => format!("(MAT {:07x} {:08x})", self.lab(), self.loc()),
        LAM => format!("(LAM {:07x} {:08x})", self.lab(), self.loc()),
        TUP => format!("(TUP {:07x} {:08x})", self.lab(), self.loc()),
        DUP => format!("(DUP {:07x} {:08x})", self.lab(), self.loc()),
        END => format!("(END)"),
        _   => format!("???"),
      };
    };
  }
}

impl APtr {
  pub const fn new(ptr: Ptr) -> Self {
    APtr(AtomicU64::new(ptr.0))
  }

  pub fn load(&self) -> Ptr {
    Ptr(self.0.load(Ordering::Relaxed))
  }

  pub fn store(&self, ptr: Ptr) {
    self.0.store(ptr.0, Ordering::Relaxed);
  }
}


impl Book {
  #[inline(always)]
  pub fn new() -> Self {
    Book {
      defs: HashMap::with_hasher(std::hash::BuildHasherDefault::default()),
    }
  }

  #[inline(always)]
  pub fn def(&mut self, name: Val, def: Def) {
    self.defs.insert(name, def);
  }

  #[inline(always)]
  pub fn get(&self, name: Val) -> Option<&Def> {
    self.defs.get(&name)
  }
}

impl Def {
  pub fn new() -> Self {
    Def {
      labs: HashSet::with_hasher(std::hash::BuildHasherDefault::default()),
      rdex: vec![],
      node: vec![],
    }
  }
}

impl<'a, const LAZY: bool> Heap<'a, LAZY>
where [(); LAZY as usize]: {
  pub fn new(nodes: &'a Nodes<LAZY>) -> Self {
    Heap { nodes }
  }

  pub fn init(size: usize) -> Box<[ANode<LAZY>]> {
    let mut data = vec![];
    const head : AHead = AHead {
      this: APtr::new(NULL),
      targ: APtr::new(NULL),
    };
    for _ in 0..size {
      let p0 = [head; LAZY as usize];
      let p1 = APtr::new(NULL);
      let p2 = APtr::new(NULL);
      data.push((p0, p1, p2));
    }
    return data.into_boxed_slice();
  }

  #[inline(always)]
  pub fn get(&self, index: Loc, port: Port) -> Ptr {
    unsafe {
      let node = self.nodes.get_unchecked(index as usize);
      if port == P1 {
        return node.1.load();
      } else {
        return node.2.load();
      }
    }
  }

  #[inline(always)]
  pub fn set(&self, index: Loc, port: Port, value: Ptr) {
    unsafe {
      let node = self.nodes.get_unchecked(index as usize);
      if port == P1 {
        node.1.store(value);
      } else {
        node.2.store(value);
      }
    }
  }

  #[inline(always)]
  pub fn get_pri(&self, index: Loc) -> Head {
    unsafe {
      //println!("main of: {:016x} = {:016x}", index, self.nodes.get_unchecked(index as usize).0[0].1.load().0);
      let this = self.nodes.get_unchecked(index as usize).0[0].this.load();
      let targ = self.nodes.get_unchecked(index as usize).0[0].targ.load();
      return Head { this, targ };
    }
  }

  #[inline(always)]
  pub fn set_pri(&self, index: Loc, this: Ptr, targ: Ptr) {
    //println!("set main {:x} = {:016x} ~ {:016x}", index, this.0, targ.0);
    unsafe {
      self.nodes.get_unchecked(index as usize).0[0].this.store(this);
      self.nodes.get_unchecked(index as usize).0[0].targ.store(targ);
    }
  }

  #[inline(always)]
  pub fn cas(&self, index: Loc, port: Port, expected: Ptr, value: Ptr) -> Result<Ptr,Ptr> {
    unsafe {
      let node = self.nodes.get_unchecked(index as usize);
      let data = if port == P1 { &node.1.0 } else { &node.2.0 };
      let done = data.compare_exchange_weak(expected.0, value.0, Ordering::Relaxed, Ordering::Relaxed);
      return done.map(Ptr).map_err(Ptr);
    }
  }

  #[inline(always)]
  pub fn swap(&self, index: Loc, port: Port, value: Ptr) -> Ptr {
    unsafe {
      let node = self.nodes.get_unchecked(index as usize);
      let data = if port == P1 { &node.1.0 } else { &node.2.0 };
      return Ptr(data.swap(value.0, Ordering::Relaxed));
    }
  }

  #[inline(always)]
  pub fn get_root(&self) -> Ptr {
    return self.get(ROOT.loc(), P2);
  }

  #[inline(always)]
  pub fn set_root(&self, value: Ptr) {
    self.set(ROOT.loc(), P2, value);
  }
}

impl Rewrites {
  pub fn new() -> Self {
    Rewrites {
      anni: 0,
      comm: 0,
      eras: 0,
      dref: 0,
      oper: 0,
    }
  }

  pub fn add_to(&self, target: &AtomicRewrites) {
    target.anni.fetch_add(self.anni, Ordering::Relaxed);
    target.comm.fetch_add(self.comm, Ordering::Relaxed);
    target.eras.fetch_add(self.eras, Ordering::Relaxed);
    target.dref.fetch_add(self.dref, Ordering::Relaxed);
    target.oper.fetch_add(self.oper, Ordering::Relaxed);
  }

  pub fn total(&self) -> usize {
    self.anni + self.comm + self.eras + self.dref + self.oper
  }

}

impl AtomicRewrites {
  pub fn new() -> Self {
    AtomicRewrites {
      anni: AtomicUsize::new(0),
      comm: AtomicUsize::new(0),
      eras: AtomicUsize::new(0),
      dref: AtomicUsize::new(0),
      oper: AtomicUsize::new(0),
    }
  }

  pub fn add_to(&self, target: &mut Rewrites) {
    target.anni += self.anni.load(Ordering::Relaxed);
    target.comm += self.comm.load(Ordering::Relaxed);
    target.eras += self.eras.load(Ordering::Relaxed);
    target.dref += self.dref.load(Ordering::Relaxed);
    target.oper += self.oper.load(Ordering::Relaxed);
  }
}

impl<'a, const LAZY: bool> NetFields<'a, LAZY> where [(); LAZY as usize]: {
  // Creates an empty net with given size.
  pub fn new(nodes: &'a Nodes<LAZY>) -> Self {
    NetFields {
      tid : 0,
      tids: 1,
      labs: 0x1,
      heap: Heap { nodes },
      rdex: vec![],
      locs: vec![0; 1 << 16],
      area: Area { init: 0, size: nodes.len() },
      next: 0,
      rwts: Rewrites::new(),
    }
  }

  // Creates a net and boots from a REF.
  pub fn boot(&self, root_id: Val) {
    self.heap.set_root(Ptr::big(REF, root_id));
  }

  // Total rewrite count.
  pub fn rewrites(&self) -> usize {
    return self.rwts.anni + self.rwts.comm + self.rwts.eras + self.rwts.dref + self.rwts.oper;
  }

  #[inline(always)]
  pub fn alloc(&mut self) -> Loc {
    // On the first pass, just alloc without checking.
    // Note: we add 1 to avoid overwritting root.
    let index = if self.next < self.area.size - 1 {
      self.next += 1;
      self.area.init as Loc + self.next as Loc
    // On later passes, search for an available slot.
    } else {
      loop {
        self.next += 1;
        let index = (self.area.init + self.next % self.area.size) as Loc;
        if self.heap.get(index, P1).is_nil() && self.heap.get(index, P2).is_nil() {
          break index;
        }
      }
    };
    self.heap.set(index, P1, LOCK);
    self.heap.set(index, P2, LOCK);
    //println!("ALLOC {}", index);
    index
  }

  // Gets a pointer's target.
  #[inline(always)]
  pub fn get_target(&self, ptr: Ptr) -> Ptr {
    self.heap.get(ptr.loc(), ptr.0 & 1)
  }

  // Sets a pointer's target.
  #[inline(always)]
  pub fn set_target(&mut self, ptr: Ptr, val: Ptr) {
    self.heap.set(ptr.loc(), ptr.0 & 1, val)
  }

  // Takes a pointer's target.
  #[inline(always)]
  pub fn swap_target(&self, ptr: Ptr, value: Ptr) -> Ptr {
    self.heap.swap(ptr.loc(), ptr.0 & 1, value)
  }

  // Takes a pointer's target.
  #[inline(always)]
  pub fn take_target(&self, ptr: Ptr) -> Ptr {
    loop {
      let got = self.heap.swap(ptr.loc(), ptr.0 & 1, LOCK);
      if got != LOCK && got != NULL {
        return got;
      }
    }
  }

  // Sets a pointer's target, using CAS.
  #[inline(always)]
  pub fn cas_target(&self, ptr: Ptr, expected: Ptr, value: Ptr) -> Result<Ptr,Ptr> {
    self.heap.cas(ptr.loc(), ptr.0 & 1, expected, value)
  }

  // Like get_target, but also for main ports
  #[inline(always)]
  pub fn get_target_full(&self, ptr: Ptr) -> Ptr {
    if ptr.is_var() || ptr.is_red() {
      return self.get_target(ptr);
    }
    if ptr.is_nod() {
      return self.heap.get_pri(ptr.loc()).targ;
    }
    panic!("Can't get target of: {}", ptr.view());
  }

  #[inline(always)]
  pub fn redux(&mut self, a: Ptr, b: Ptr) {
    if Ptr::can_skip(a, b) {
      self.rwts.eras += 1;
    } else if !LAZY {
      self.rdex.push((a, b));
    } else {
      if a.is_nod() { self.heap.set_pri(a.loc(), a, b); }
      if b.is_nod() { self.heap.set_pri(b.loc(), b, a); }
    }
  }

  #[inline(always)]
  pub fn get(&self, a: Trg) -> Ptr {
    match a {
      Trg::Dir(dir) => self.get_target(dir),
      Trg::Ptr(ptr) => ptr,
    }
  }

  #[inline(always)]
  pub fn swap(&self, a: Trg, val: Ptr) -> Ptr {
    match a {
      Trg::Dir(dir) => self.swap_target(dir, val),
      Trg::Ptr(ptr) => ptr,
    }
  }

  // Links two pointers, forming a new wire. Assumes ownership.
  #[inline(always)]
  pub fn link(&mut self, a_ptr: Ptr, b_ptr: Ptr) {
    if a_ptr.is_pri() && b_ptr.is_pri() {
      return self.redux(a_ptr, b_ptr);
    } else {
      self.linker(a_ptr, b_ptr);
      self.linker(b_ptr, a_ptr);
    }
  }

  // Given two locations, links both stored pointers, atomically.
  #[inline(always)]
  pub fn atomic_link(&mut self, a_dir: Ptr, b_dir: Ptr) {
    //println!("link {:016x} {:016x}", a_dir.0, b_dir.0);
    let a_ptr = self.take_target(a_dir);
    let b_ptr = self.take_target(b_dir);
    if a_ptr.is_pri() && b_ptr.is_pri() {
      self.set_target(a_dir, NULL);
      self.set_target(b_dir, NULL);
      return self.redux(a_ptr, b_ptr);
    } else {
      self.atomic_linker(a_ptr, a_dir, b_ptr);
      self.atomic_linker(b_ptr, b_dir, a_ptr);
    }
  }

  // Given a location, link the pointer stored to another pointer, atomically.
  #[inline(always)]
  pub fn half_atomic_link(&mut self, a_dir: Ptr, b_ptr: Ptr) {
    let a_ptr = self.take_target(a_dir);
    if a_ptr.is_pri() && b_ptr.is_pri() {
      self.set_target(a_dir, NULL);
      return self.redux(a_ptr, b_ptr);
    } else {
      self.atomic_linker(a_ptr, a_dir, b_ptr);
      self.linker(b_ptr, a_ptr);
    }
  }

  // When two threads interfere, uses the lock-free link algorithm described on the 'paper/'.
  #[inline(always)]
  pub fn linker(&mut self, a_ptr: Ptr, b_ptr: Ptr) {
    if a_ptr.is_var() {
      self.set_target(a_ptr, b_ptr);
    } else {
      if LAZY && a_ptr.is_nod() {
        self.heap.set_pri(a_ptr.loc(), a_ptr, b_ptr);
      }
    }
  }

  // When two threads interfere, uses the lock-free link algorithm described on the 'paper/'.
  #[inline(always)]
  pub fn atomic_linker(&mut self, a_ptr: Ptr, a_dir: Ptr, b_ptr: Ptr) {
    // If 'a_ptr' is a var...
    if a_ptr.is_var() {
      let got = self.cas_target(a_ptr, a_dir, b_ptr);
      // Attempts to link using a compare-and-swap.
      if got.is_ok() {
        self.set_target(a_dir, NULL);
      // If the CAS failed, resolve by using redirections.
      } else {
        //println!("[{:04x}] cas fail {:016x}", self.tid, got.unwrap_err().0);
        if b_ptr.is_var() {
          self.set_target(a_dir, b_ptr.redirect());
          //self.atomic_linker_var(a_ptr, a_dir, b_ptr);
        } else if b_ptr.is_pri() {
          self.set_target(a_dir, b_ptr);
          self.atomic_linker_pri(a_ptr, a_dir, b_ptr);
        } else {
          todo!();
        }
      }
    } else {
      self.set_target(a_dir, NULL);
      if LAZY && a_ptr.is_nod() {
        self.heap.set_pri(a_ptr.loc(), a_ptr, b_ptr);
      }
    }
  }

  // Atomic linker for when 'b_ptr' is a principal port.
  pub fn atomic_linker_pri(&mut self, mut a_ptr: Ptr, a_dir: Ptr, b_ptr: Ptr) {
    loop {
      // Peek the target, which may not be owned by us.
      let mut t_dir = a_ptr;
      let mut t_ptr = self.get_target(t_dir);
      // If target is a redirection, we own it. Clear and move forward.
      if t_ptr.is_red() {
        self.set_target(t_dir, NULL);
        a_ptr = t_ptr;
        continue;
      }
      // If target is a variable, we don't own it. Try replacing it.
      if t_ptr.is_var() {
        if self.cas_target(t_dir, t_ptr, b_ptr).is_ok() {
          //println!("[{:04x}] var", self.tid);
          // Clear source location.
          self.set_target(a_dir, NULL);
          // Collect the orphaned backward path.
          t_dir = t_ptr;
          t_ptr = self.get_target(t_ptr);
          while t_ptr.is_red() {
            self.swap_target(t_dir, NULL);
            t_dir = t_ptr;
            t_ptr = self.get_target(t_dir);
          }
          return;
        }
        // If the CAS failed, the var changed, so we try again.
        continue;
      }
      // If it is a node, two threads will reach this branch.
      if t_ptr.is_pri() || t_ptr == GONE {
        // Sort references, to avoid deadlocks.
        let x_dir = if a_dir < t_dir { a_dir } else { t_dir };
        let y_dir = if a_dir < t_dir { t_dir } else { a_dir };
        // Swap first reference by GONE placeholder.
        let x_ptr = self.swap_target(x_dir, GONE);
        // First to arrive creates a redex.
        if x_ptr != GONE {
          //println!("[{:04x}] fst {:016x}", self.tid, x_ptr.0);
          let y_ptr = self.swap_target(y_dir, GONE);
          self.redux(x_ptr, y_ptr);
          return;
        // Second to arrive clears up the memory.
        } else {
          //println!("[{:04x}] snd", self.tid);
          self.swap_target(x_dir, NULL);
          while self.cas_target(y_dir, GONE, NULL).is_err() {};
          return;
        }
      }
      // If it is taken, we wait.
      if t_ptr == LOCK {
        continue;
      }
      if t_ptr == NULL {
        continue;
      }
      // Shouldn't be reached.
      //println!("[{:04x}] {:016x} | {:016x} {:016x} {:016x}", self.tid, t_ptr.0, a_dir.0, a_ptr.0, b_ptr.0);
      unreachable!()
    }
  }

  // Atomic linker for when 'b_ptr' is an aux port.
  pub fn atomic_linker_var(&mut self, a_ptr: Ptr, a_dir: Ptr, b_ptr: Ptr) {
    loop {
      let ste_dir = b_ptr;
      let ste_ptr = self.get_target(ste_dir);
      if ste_ptr.is_var() {
        let trg_dir = ste_ptr;
        let trg_ptr = self.get_target(trg_dir);
        if trg_ptr.is_red() {
          let neo_ptr = trg_ptr.unredirect();
          if self.cas_target(ste_dir, ste_ptr, neo_ptr).is_ok() {
            self.swap_target(trg_dir, NULL);
            continue;
          }
        }
      }
      break;
    }
  }

  // Links two targets, using atomics when necessary, based on implied ownership.
  #[inline(always)]
  pub fn safe_link(&mut self, a: Trg, b: Trg) {
    match (a, b) {
      (Trg::Dir(a_dir), Trg::Dir(b_dir)) => self.atomic_link(a_dir, b_dir),
      (Trg::Dir(a_dir), Trg::Ptr(b_ptr)) => self.half_atomic_link(a_dir, b_ptr),
      (Trg::Ptr(a_ptr), Trg::Dir(b_dir)) => self.half_atomic_link(b_dir, a_ptr),
      (Trg::Ptr(a_ptr), Trg::Ptr(b_ptr)) => self.link(a_ptr, b_ptr),
    }
  }

  // Performs an interaction over a redex.
  #[inline(always)]
  pub fn interact(&mut self, book: &Book, a: Ptr, b: Ptr) {
    //println!("inter {} ~ {}", a.view(), b.view());
    match (a.tag(), b.tag()) {
      (REF   , OP2..) => self.call(book, a, b),
      (OP2.. , REF  ) => self.call(book, b, a),
      (LAM.. , LAM..) if a.lab() == b.lab() => self.anni(a, b),
      (LAM.. , LAM..) => self.comm(a, b),
      (LAM.. , ERA  ) => self.era2(a),
      (ERA   , LAM..) => self.era2(b),
      (REF   , ERA  ) => self.rwts.eras += 1,
      (ERA   , REF  ) => self.rwts.eras += 1,
      (REF   , NUM  ) => self.rwts.eras += 1,
      (NUM   , REF  ) => self.rwts.eras += 1,
      (ERA   , ERA  ) => self.rwts.eras += 1,
      (LAM.. , NUM  ) => self.copy(a, b),
      (NUM   , LAM..) => self.copy(b, a),
      (NUM   , ERA  ) => self.rwts.eras += 1,
      (ERA   , NUM  ) => self.rwts.eras += 1,
      (NUM   , NUM  ) => self.rwts.eras += 1,
      (OP2   , NUM  ) => self.op2n(a, b),
      (NUM   , OP2  ) => self.op2n(b, a),
      (OP1   , NUM  ) => self.op1n(a, b),
      (NUM   , OP1  ) => self.op1n(b, a),
      (OP2   , LAM..) => self.comm(a, b),
      (LAM.. , OP2  ) => self.comm(b, a),
      (OP1   , LAM..) => self.pass(a, b),
      (LAM.. , OP1  ) => self.pass(b, a),
      (OP2   , ERA  ) => self.era2(a),
      (ERA   , OP2  ) => self.era2(b),
      (OP1   , ERA  ) => self.era1(a),
      (ERA   , OP1  ) => self.era1(b),
      (MAT   , NUM  ) => self.mtch(a, b),
      (NUM   , MAT  ) => self.mtch(b, a),
      (MAT   , LAM..) => self.comm(a, b),
      (LAM.. , MAT  ) => self.comm(b, a),
      (MAT   , ERA  ) => self.era2(a),
      (ERA   , MAT  ) => self.era2(b),
      _ => {
        println!("Invalid interaction: {} ~ {}", a.view(), b.view());
        unreachable!();
      },
    };
  }

  pub fn anni(&mut self, a: Ptr, b: Ptr) {
    self.rwts.anni += 1;
    let a1 = Ptr::new(VR1, 0, a.loc());
    let b1 = Ptr::new(VR1, 0, b.loc());
    self.atomic_link(a1, b1);
    let a2 = Ptr::new(VR2, 0, a.loc());
    let b2 = Ptr::new(VR2, 0, b.loc());
    self.atomic_link(a2, b2);
  }

  pub fn comm(&mut self, a: Ptr, b: Ptr) {
    self.rwts.comm += 1;
    let loc0 = self.alloc();
    let loc1 = self.alloc();
    let loc2 = self.alloc();
    let loc3 = self.alloc();
    self.heap.set(loc0, P1, Ptr::new(VR1, 0, loc2));
    self.heap.set(loc0, P2, Ptr::new(VR1, 0, loc3));
    self.heap.set(loc1, P1, Ptr::new(VR2, 0, loc2));
    self.heap.set(loc1, P2, Ptr::new(VR2, 0, loc3));
    self.heap.set(loc2, P1, Ptr::new(VR1, 0, loc0));
    self.heap.set(loc2, P2, Ptr::new(VR1, 0, loc1));
    self.heap.set(loc3, P1, Ptr::new(VR2, 0, loc0));
    self.heap.set(loc3, P2, Ptr::new(VR2, 0, loc1));
    let a1 = Ptr::new(VR1, 0, a.loc());
    self.half_atomic_link(a1, Ptr::new(b.tag(), b.lab(), loc0));
    let b1 = Ptr::new(VR1, 0, b.loc());
    self.half_atomic_link(b1, Ptr::new(a.tag(), a.lab(), loc2));
    let a2 = Ptr::new(VR2, 0, a.loc());
    self.half_atomic_link(a2, Ptr::new(b.tag(), b.lab(), loc1));
    let b2 = Ptr::new(VR2, 0, b.loc());
    self.half_atomic_link(b2, Ptr::new(a.tag(), a.lab(), loc3));
  }

  pub fn era2(&mut self, a: Ptr) {
    self.rwts.eras += 1;
    let a1 = Ptr::new(VR1, 0, a.loc());
    self.half_atomic_link(a1, ERAS);
    let a2 = Ptr::new(VR2, 0, a.loc());
    self.half_atomic_link(a2, ERAS);
  }

  pub fn era1(&mut self, a: Ptr) {
    self.rwts.eras += 1;
    let a2 = Ptr::new(VR2, 0, a.loc());
    self.half_atomic_link(a2, ERAS);
  }

  pub fn pass(&mut self, a: Ptr, b: Ptr) {
    self.rwts.comm += 1;
    let loc0 = self.alloc();
    let loc1 = self.alloc();
    let loc2 = self.alloc();
    self.heap.set(loc0, P1, Ptr::new(VR2, 0, loc1));
    self.heap.set(loc0, P2, Ptr::new(VR2, 0, loc2));
    self.heap.set(loc1, P1, self.heap.get(a.loc(), P1));
    self.heap.set(loc1, P2, Ptr::new(VR1, 0, loc0));
    self.heap.set(loc2, P1, self.heap.get(a.loc(), P1));
    self.heap.set(loc2, P2, Ptr::new(VR2, 0, loc0));
    let a2 = Ptr::new(VR2, 0, a.loc());
    self.half_atomic_link(a2, Ptr::new(b.tag(), b.lab(), loc0));
    let b1 = Ptr::new(VR1, 0, b.loc());
    self.half_atomic_link(b1, Ptr::new(a.tag(), a.lab(), loc1));
    let b2 = Ptr::new(VR2, 0, b.loc());
    self.half_atomic_link(b2, Ptr::new(a.tag(), a.lab(), loc2));
  }

  pub fn copy(&mut self, a: Ptr, b: Ptr) {
    self.rwts.comm += 1;
    let a1 = Ptr::new(VR1, 0, a.loc());
    self.half_atomic_link(a1, b);
    let a2 = Ptr::new(VR2, 0, a.loc());
    self.half_atomic_link(a2, b);
  }

  pub fn mtch(&mut self, a: Ptr, b: Ptr) {
    self.rwts.oper += 1;
    let a1 = Ptr::new(VR1, 0, a.loc()); // branch
    let a2 = Ptr::new(VR2, 0, a.loc()); // return
    if b.val() == 0 {
      let loc0 = self.alloc();
      //self.heap.set(loc0, P2, ERAS);
      self.link(Ptr::new(VR2, 0, loc0), ERAS);
      self.half_atomic_link(a1, Ptr::new(LAM, 0, loc0));
      self.half_atomic_link(a2, Ptr::new(VR1, 0, loc0));
    } else {
      let loc0 = self.alloc();
      let loc1 = self.alloc();
      self.link(Ptr::new(VR1, 0, loc0), ERAS);
      self.link(Ptr::new(VR2, 0, loc0), Ptr::new(LAM, 0, loc1));
      self.link(Ptr::new(VR1, 0, loc1), Ptr::big(NUM, b.val() - 1));
      //self.heap.set(loc0, P1, ERAS);
      //self.heap.set(loc0, P2, Ptr::new(LAM, 0, loc1));
      //self.heap.set(loc1, P1, Ptr::big(NUM, b.val() - 1));
      self.half_atomic_link(a1, Ptr::new(LAM, 0, loc0));
      self.half_atomic_link(a2, Ptr::new(VR2, 0, loc1));
    }
  }

  pub fn op2n(&mut self, a: Ptr, b: Ptr) {
    self.rwts.oper += 1;
    let loc0 = self.alloc();
    let a1 = Ptr::new(VR1, 0, a.loc());
    let a2 = Ptr::new(VR2, 0, a.loc());
    self.heap.set(loc0, P1, b);
    self.half_atomic_link(a2, Ptr::new(VR2, 0, loc0));
    self.half_atomic_link(a1, Ptr::new(OP1, a.lab(), loc0));
  }

  pub fn op1n(&mut self, a: Ptr, b: Ptr) {
    self.rwts.oper += 1;
    let op = a.lab();
    let v0 = self.heap.get(a.loc(), P1).val();
    let v1 = b.val();
    let v2 = self.op(op, v0, v1);
    let a2 = Ptr::new(VR2, 0, a.loc());
    self.half_atomic_link(a2, Ptr::big(NUM, v2));
  }

  #[inline(always)]
  pub fn op(&self, op: Lab, a: Val, b: Val) -> Val {
    match op {
      ADD => { u60::add(a, b) }
      SUB => { u60::sub(a, b) }
      MUL => { u60::mul(a, b) }
      DIV => { u60::div(a, b) }
      MOD => { u60::rem(a, b) }
      EQ  => { u60::eq(a, b) }
      NE  => { u60::ne(a, b) }
      LT  => { u60::lt(a, b) }
      GT  => { u60::gt(a, b) }
      LTE => { u60::lte(a, b) }
      GTE => { u60::gte(a, b) }
      AND => { u60::and(a, b) }
      OR  => { u60::or(a, b) }
      XOR => { u60::xor(a, b) }
      NOT => { u60::not(a) }
      LSH => { u60::lsh(a, b) }
      RSH => { u60::rsh(a, b) }
      _   => { unreachable!() }
    }
  }

  // Expands a closed net.
  #[inline(always)]
  pub fn call(&mut self, book: &Book, ptr: Ptr, trg: Ptr) {
    //println!("call {} {}", ptr.view(), trg.view());
    self.rwts.dref += 1;
    let mut ptr = ptr;
    // FIXME: change "while" to "if" once lang prevents refs from returning refs
    if ptr.is_ref() {
      // Intercepts with a native function, if available.
      if !LAZY && self.call_native(book, ptr, trg) {
        return;
      }
      // Load the closed net.
      let fid = ptr.val();
      let got = book.get(fid).unwrap();
      if !LAZY && trg.is_dup() && !got.labs.contains(&trg.lab()) {
        return self.copy(trg, ptr);
      } else if got.node.len() > 0 {
        let len = got.node.len() - 1;
        // Allocate space.
        for i in 0 .. len {
          *unsafe { self.locs.get_unchecked_mut(1 + i) } = self.alloc();
        }
        // Load nodes, adjusted.
        for i in 0 .. len {
          let p1 = self.adjust(unsafe { got.node.get_unchecked(1 + i) }.1);
          let p2 = self.adjust(unsafe { got.node.get_unchecked(1 + i) }.2);
          let lc = *unsafe { self.locs.get_unchecked(1 + i) };
          //println!(":: link loc={} [{} {}]", lc, p1.view(), p2.view());
          if p1 != ROOT { self.link(Ptr::new(VR1, 0, lc), p1); }
          if p2 != ROOT { self.link(Ptr::new(VR2, 0, lc), p2); }
        }
        // Load redexes, adjusted.
        for r in &got.rdex {
          let p1 = self.adjust(r.0);
          let p2 = self.adjust(r.1);
          self.redux(p1, p2);
          //self.rdex.push((p1, p2));
        }
        // Load root, adjusted.
        ptr = self.adjust(got.node[0].2);
      }
    }
    self.link(ptr, trg);
  }

  // Adjusts dereferenced pointer locations.
  #[inline(always)]
  fn adjust(&mut self, ptr: Ptr) -> Ptr {
    if ptr.has_loc() {
      let tag = ptr.tag();
      // FIXME
      //let lab = if LAZY && ptr.is_dup() && ptr.lab() == 0 {
        //self.labs += 2;
        //self.labs
      //} else {
        //ptr.lab()
      //};
      let lab = ptr.lab();
      let loc = *unsafe { self.locs.get_unchecked(ptr.loc() as usize) };
      return Ptr::new(tag, lab, loc)
    } else {
      return ptr;
    }
  }

  pub fn view(&self) -> String {
    let mut txt = String::new();
    for i in 0 .. self.heap.nodes.len() as Loc {
      let p0 = self.heap.get_pri(i).targ;
      let p1 = self.heap.get(i, P1);
      let p2 = self.heap.get(i, P2);
      if p1 != NULL || p2 != NULL {
        txt.push_str(&format!("{:04x} | {:22} {:22} {:22}\n", i, p0.view(), p1.view(), p2.view()));
      }
    }
    return txt;
  }

  // Reduces all redexes.
  #[inline(always)]
  pub fn reduce(&mut self, book: &Book, limit: usize) -> usize {
    let mut count = 0;
    while let Some((a, b)) = self.rdex.pop() {
      //if !a.is_nil() && !b.is_nil() {
        self.interact(book, a, b);
        count += 1;
        if count >= limit {
          break;
        }
      //}
    }
    return count;
  }

  // Expands heads.
  #[inline(always)]
  pub fn expand(&mut self, book: &Book) {
    fn go<const LAZY: bool>(net: &mut NetFields<LAZY>, book: &Book, dir: Ptr, len: usize, key: usize) where [(); LAZY as usize]: {
      //println!("[{:04x}] expand dir: {:016x}", net.tid, dir.0);
      let ptr = net.get_target(dir);
      if ptr.is_ctr() {
        if len >= net.tids || key % 2 == 0 {
          go(net, book, Ptr::new(VR1, 0, ptr.loc()), len * 2, key / 2);
        }
        if len >= net.tids || key % 2 == 1 {
          go(net, book, Ptr::new(VR2, 0, ptr.loc()), len * 2, key / 2);
        }
      } else if ptr.is_ref() {
        let got = net.swap_target(dir, LOCK);
        if got != LOCK {
          //println!("[{:08x}] expand {:08x}", net.tid, dir.0);
          net.call(book, ptr, dir);
        }
      }
    }
    return go(self, book, ROOT, 1, self.tid);
  }

  // Forks into child threads, returning a NetFields for the (tid/tids)'th thread.
  pub fn fork(&self, tid: usize, tids: usize) -> Self {
    let mut net = NetFields::new(self.heap.nodes);
    net.tid  = tid;
    net.tids = tids;
    net.area = Area {
      init: self.heap.nodes.len() * tid / tids,
      size: self.heap.nodes.len() / tids,
    };
    let from = self.rdex.len() * (tid + 0) / tids;
    let upto = self.rdex.len() * (tid + 1) / tids;
    for i in from .. upto {
      net.rdex.push((self.rdex[i].0, self.rdex[i].1));
    }
    if tid == 0 {
      net.next = self.next;
    }
    return net;
  }

  // Evaluates a term to normal form in parallel
  pub fn parallel_normal(&mut self, book: &Book) {

    const SHARE_LIMIT : usize = 1 << 12; // max share redexes per split
    const LOCAL_LIMIT : usize = 1 << 18; // max local rewrites per epoch

    // Local thread context
    struct ThreadContext<'a, const LAZY: bool> where [(); LAZY as usize]: {
      tid: usize, // thread id
      tids: usize, // thread count
      tlog2: usize, // log2 of thread count
      tick: usize, // current tick
      net: NetFields<'a, LAZY>, // thread's own net object
      book: &'a Book, // definition book
      delta: &'a AtomicRewrites, // global delta rewrites
      share: &'a Vec<(APtr, APtr)>, // global share buffer
      rlens: &'a Vec<AtomicUsize>, // global redex lengths
      total: &'a AtomicUsize, // total redex length
      barry: Arc<Barrier>, // synchronization barrier
    }

    // Initialize global objects
    let cores = std::thread::available_parallelism().unwrap().get() as usize;
    let tlog2 = cores.ilog2() as usize;
    let tids  = 1 << tlog2;
    let delta = AtomicRewrites::new(); // delta rewrite counter
    let rlens = (0..tids).map(|_| AtomicUsize::new(0)).collect::<Vec<_>>();
    let share = (0..SHARE_LIMIT*tids).map(|_| (APtr(AtomicU64::new(0)), APtr(AtomicU64::new(0)))).collect::<Vec<_>>();
    let total = AtomicUsize::new(0); // sum of redex bag length
    let barry = Arc::new(Barrier::new(tids)); // global barrier

    // Perform parallel reductions
    std::thread::scope(|s| {
      for tid in 0 .. tids {
        let mut ctx = ThreadContext {
          tid: tid,
          tids: tids,
          tick: 0,
          net: self.fork(tid, tids),
          book: &book,
          tlog2: tlog2,
          delta: &delta,
          share: &share,
          rlens: &rlens,
          total: &total,
          barry: Arc::clone(&barry),
        };
        s.spawn(move || {
          main(&mut ctx)
        });
      }
    });

    // Clear redexes and sum stats
    self.rdex.clear();
    delta.add_to(&mut self.rwts);

    // Main reduction loop
    #[inline(always)]
    fn main<const LAZY: bool>(ctx: &mut ThreadContext<LAZY>) where [(); LAZY as usize]: {
      loop {
        reduce(ctx);
        expand(ctx);
        if count(ctx) == 0 { break; }
      }
      ctx.net.rwts.add_to(ctx.delta);
    }

    // Reduce redexes locally, then share with target
    #[inline(always)]
    fn reduce<const LAZY: bool>(ctx: &mut ThreadContext<LAZY>) where [(); LAZY as usize]: {
      loop {
        let reduced = ctx.net.reduce(ctx.book, LOCAL_LIMIT);
        if count(ctx) == 0 {
          break;
        }
        let tlog2 = ctx.tlog2;
        split(ctx, tlog2);
        ctx.tick += 1;
      }
    }

    // Expand head refs
    #[inline(always)]
    fn expand<const LAZY: bool>(ctx: &mut ThreadContext<LAZY>) where [(); LAZY as usize]: {
      ctx.net.expand(ctx.book);
    }

    // Count total redexes (and populate 'rlens')
    #[inline(always)]
    fn count<const LAZY: bool>(ctx: &mut ThreadContext<LAZY>) -> usize where [(); LAZY as usize]: {
      ctx.barry.wait();
      ctx.total.store(0, Ordering::Relaxed);
      ctx.barry.wait();
      ctx.rlens[ctx.tid].store(ctx.net.rdex.len(), Ordering::Relaxed);
      ctx.total.fetch_add(ctx.net.rdex.len(), Ordering::Relaxed);
      ctx.barry.wait();
      return ctx.total.load(Ordering::Relaxed);
    }


    // Share redexes with target thread
    #[inline(always)]
    fn split<const LAZY: bool>(ctx: &mut ThreadContext<LAZY>, plog2: usize) where [(); LAZY as usize]: {
      unsafe {
        let side  = (ctx.tid >> (plog2 - 1 - (ctx.tick % plog2))) & 1;
        let shift = (1 << (plog2 - 1)) >> (ctx.tick % plog2);
        let a_tid = ctx.tid;
        let b_tid = if side == 1 { a_tid - shift } else { a_tid + shift };
        let a_len = ctx.net.rdex.len();
        let b_len = ctx.rlens[b_tid].load(Ordering::Relaxed);
        let send  = if a_len > b_len { (a_len - b_len) / 2 } else { 0 };
        let recv  = if b_len > a_len { (b_len - a_len) / 2 } else { 0 };
        let send  = std::cmp::min(send, SHARE_LIMIT);
        let recv  = std::cmp::min(recv, SHARE_LIMIT);
        for i in 0 .. send {
          let init = a_len - send * 2;
          let rdx0 = *ctx.net.rdex.get_unchecked(init + i * 2 + 0);
          let rdx1 = *ctx.net.rdex.get_unchecked(init + i * 2 + 1);
          let targ = ctx.share.get_unchecked(b_tid * SHARE_LIMIT + i);
          *ctx.net.rdex.get_unchecked_mut(init + i) = rdx0;
          targ.0.store(rdx1.0);
          targ.1.store(rdx1.1);
        }
        ctx.net.rdex.truncate(a_len - send);
        ctx.barry.wait();
        for i in 0 .. recv {
          let got = ctx.share.get_unchecked(a_tid * SHARE_LIMIT + i);
          ctx.net.rdex.push((got.0.load(), got.1.load()));
        }
      }
    }
  }

  // Lazy mode weak head normalizer
  #[inline(always)]
  pub fn weak_normal(&mut self, book: &Book, mut prev: Ptr) -> Ptr {
    let mut path : Vec<Ptr> = vec![];

    loop {
      // Load ptrs
      let next = self.get_target_full(prev);

      // If next is ref, dereferences
      if next.is_ref() {
        self.call(book, next, prev);
        continue;
      }

      // If next is root, stop.
      if next == ROOT {
        break ;
      }

      // If next is a main port...
      if next.is_pri() {
        // If prev is a main port, reduce the active pair.
        if prev.is_pri() {
          self.interact(book, prev, next);
          prev = path.pop().unwrap();
          continue;
        // Otherwise, we're done.
        } else {
          break;
        }
      }

      // If next is an aux port, pass through.
      let main = self.heap.get_pri(next.loc());
      path.push(prev);
      prev = main.this;
    }

    return self.get_target_full(prev);
  }

  // Reduce a net to normal form.
  pub fn normal(&mut self, book: &Book) {
    if LAZY {
      let mut visit = vec![ROOT];
      while let Some(prev) = visit.pop() {
        //println!("normal {} | {}", prev.view(), self.rewrites());
        let next = self.weak_normal(book, prev);
        if next.is_nod() {
          visit.push(Ptr::new(VR1, 0, next.loc()));
          if !next.is_op1() { visit.push(Ptr::new(VR2, 0, next.loc())); } // TODO: improve
        }
      }
    } else {
      self.expand(book);
      while self.rdex.len() > 0 {
        self.reduce(book, usize::MAX);
        self.expand(book);
      }
    }
  }

}

// A net holding a static nodes buffer.
pub struct StaticNet<const LAZY: bool> where [(); LAZY as usize]: {
  pub mem: *mut [ANode<LAZY>],
  pub net: NetFields<'static, LAZY>,
}

// A simple Net API. Holds its own nodes buffer, and knows its mode (lazy/eager).
pub enum Net {
  Lazy(StaticNet<true>),
  Eager(StaticNet<false>),
}

impl Drop for Net {
  fn drop(&mut self) {
    match self {
      Net::Lazy(this)  => { let _ = unsafe { Box::from_raw(this.mem) }; }
      Net::Eager(this) => { let _ = unsafe { Box::from_raw(this.mem) }; }
    }
  }
}

impl Net {
  // Creates a new net with the given size.
  pub fn new(size: usize, lazy: bool) -> Self {
    if lazy {
      let mem = Box::leak(Heap::<true>::init(size)) as *mut _;
      let net = NetFields::<true>::new(unsafe { &*mem });
      net.boot(crate::ast::name_to_val("main"));
      return Net::Lazy(StaticNet { mem, net });
    } else {
      let mem = Box::leak(Heap::<false>::init(size)) as *mut _;
      let net = NetFields::<false>::new(unsafe { &*mem });
      net.boot(crate::ast::name_to_val("main"));
      return Net::Eager(StaticNet { mem, net });
    }
  }

  // Pretty prints.
  pub fn show(&self) -> String {
    match self {
      Net::Lazy(this)  => crate::ast::show_runtime_net(&this.net),
      Net::Eager(this) => crate::ast::show_runtime_net(&this.net),
    }
  }

  // Reduces to normal form.
  pub fn normal(&mut self, book: &Book) {
    match self {
      Net::Lazy(this)  => this.net.normal(book),
      Net::Eager(this) => this.net.normal(book),
    }
  }

  // Reduces to normal form in parallel.
  pub fn parallel_normal(&mut self, book: &Book) {
    match self {
      Net::Lazy(this)  => this.net.parallel_normal(book),
      Net::Eager(this) => this.net.parallel_normal(book),
    }
  }

  pub fn get_rewrites(&self) -> Rewrites {
    match self {
      Net::Lazy(this)  => this.net.rwts,
      Net::Eager(this) => this.net.rwts,
    }
  }
}

// u60.rs

// Implements u48: 48-bit unsigned integers using u64 and u128

type U60 = u64;

#[inline(always)]
pub fn new(a: u64) -> U60 {
  return a & 0xFFF_FFFF_FFFF_FFFF;
}

#[inline(always)]
pub fn val(a: u64) -> U60 {
  return a;
}

#[inline(always)]
pub fn add(a: U60, b: U60) -> U60 {
  return new(a + b);
}

#[inline(always)]
pub fn sub(a: U60, b: U60) -> U60 {
  return if a >= b { a - b } else { 0x1000000000000000 - (b - a) };
}

#[inline(always)]
pub fn mul(a: U60, b: U60) -> U60 {
  return new((a as u128 * b as u128) as u64);
}

#[inline(always)]
pub fn div(a: U60, b: U60) -> U60 {
  return a / b;
}

#[inline(always)]
pub fn rem(a: U60, b: U60) -> U60 {
  return a % b;
}

#[inline(always)]
pub fn and(a: U60, b: U60) -> U60 {
  return a & b;
}

#[inline(always)]
pub fn or(a: U60, b: U60) -> U60 {
  return a | b;
}

#[inline(always)]
pub fn xor(a: U60, b: U60) -> U60 {
  return a ^ b;
}

#[inline(always)]
pub fn lsh(a: U60, b: U60) -> U60 {
  return new(a << b);
}

#[inline(always)]
pub fn rsh(a: U60, b: U60) -> U60 {
  return a >> b;
}

#[inline(always)]
pub fn lt(a: U60, b: U60) -> U60 {
  return if a < b { 1 } else { 0 };
}

#[inline(always)]
pub fn gt(a: U60, b: U60) -> U60 {
  return if a > b { 1 } else { 0 };
}

#[inline(always)]
pub fn lte(a: U60, b: U60) -> U60 {
  return if a <= b { 1 } else { 0 };
}

#[inline(always)]
pub fn gte(a: U60, b: U60) -> U60 {
  return if a >= b { 1 } else { 0 };
}

#[inline(always)]
pub fn eq(a: U60, b: U60) -> U60 {
  return if a == b { 1 } else { 0 };
}

#[inline(always)]
pub fn ne(a: U60, b: U60) -> U60 {
  return if a != b { 1 } else { 0 };
}

#[inline(always)]
pub fn min(a: U60, b: U60) -> U60 {
  return if a < b { a } else { b };
}

#[inline(always)]
pub fn max(a: U60, b: U60) -> U60 {
  return if a > b { a } else { b };
}

#[inline(always)]
pub fn not(a: U60) -> U60 {
  return !a & 0xFFF_FFFF_FFFF_FFFF;
}

#[inline(always)]
pub fn show(a: U60) -> String {
  return format!("{}", a);
}

// HVM1

// README.md

Higher-order Virtual Machine (HVM)
==================================

**Higher-order Virtual Machine (HVM)** is a pure functional runtime that is **lazy**, **non-garbage-collected** and
**massively parallel**. It is also **beta-optimal**, meaning that, for higher-order computations, it can, in
some cases, be exponentially (in the asymptotical sense) faster than alternatives, including Haskell's GHC.

That is possible due to a new model of computation, the **Interaction Net**, which supersedes the **Turing Machine** and
the **Lambda Calculus**. Previous implementations of this model have been inefficient in practice, however, a recent
breakthrough has drastically improved its efficiency, resulting in the HVM. Despite being relatively new, it already
beats mature compilers in some cases, and is being continuously improved.

**Welcome to the massively parallel future of computers!**

Production Ready Soon!
======================

The code here is a prototype, but the first production ready version is coming soon, with tons of optimizations and -
most importantly - correctness work. Follow the progress here: [HVM-Core](https://github.com/HigherOrderCO/HVM-Core).


Examples
========

Essentially, HVM is a minimalist functional language that is compiled to a novel runtime based on [Interaction
Nets](https://pdf.sciencedirectassets.com/272575/1-s2.0-S0890540100X00600/1-s2.0-S0890540197926432/main.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjEEkaCXVzLWVhc3QtMSJGMEQCIDQQ1rA8jV9UQEKlaFzLfqINP%2B%2FrcTj4NCbI40n%2FrGbyAiBLG%2FpnaXmp6BGaG1Yplr3YYsHzxet6aQXc1qnkbb3W0irMBAhSEAUaDDA1OTAwMzU0Njg2NSIM1FOMCDHcoyvFFAU6KqkEgJPcH6B8%2BRYsdLtUERtVlwtcXZBW38xnvb%2FPRSkmxcNaK%2BmQTa7L3ZFuZt9rpNjrB3sJHg%2Bxc%2FqdAF%2FsthEb1NreHNze7LmbStuRufZCGEcxax%2FsyjnSb9bnrHuDEpnck1Dhk2YPqR8%2Bim%2BdQisDUp%2F4torZsCK1%2BPAEQkQAmGqinioexAr8dEE0BOlHgxBz5YRIkV9pjLoq%2FjWFqiUSO2bPdVi2AfpDbXI48ek6gQs%2F6VTIFRShfezfAr1HoDlQEoyyVYnVy6wI%2Fu1WVB%2FA0JJHK1B7rZFEYilPSAdUpVSOvjhNHN9elxIxlFX6hOZz3YJ4QDeLCPztfMClYYxAex6hoBBVzTkRzszs18hK1K%2FMUMwF4o%2FDy1i3WLeUmC36CL7WXDik%2BTZ7WjJNYGVRILH6cDsHrg17A0MVI5njvw7iM%2FrYKoOgBD2ESct4nO3mpRkKVq%2F9UyKScwVT5VrNpuLWLnrg29BDvE%2BDoFI6c71cisENjhIhGPNrBCQvZLNe1k%2BD54NyfqOe4a1DguuzxBnsNj6BBD2lM6TyDvCz9w36u194aN8oks9hLuTuKp7Rk05dTt6rj4pThkHA%2FQQymmx74MlQtTXTnD5v%2F%2BmGSUz6vHzqaV2Ft5xjWf9w9NJHfTkFkpxNEv8fTUUSMBEhL4nF8wj0wiNbSwp9NvPOj3YMIG2icNxdAZyNsJYJUowOCXi4JTwCkqb2WdNOi88pOSaAautZrBg7nzCKyuCbBjqqATOzXItndBn%2Be6oyH2l8sD%2B5v%2FjIqCz8%2Bx%2Bz%2FZA3dntddFac64iWFGPbJeRGw05BiPX5TKBnrR%2BmaqfO%2F7SxoYfTV4hl5Z2lmJcoiEd%2BWUmNK2wntMlGtFn%2FmFeeljKBeMxnfh8DN0qRz10NZAfxhvqxAEBu67G0ZXpECGxr8fAiBrdvnEac6rWfv8%2FT0VA%2Fu6xjIMIrrwU65xAuVuIG%2BXpsdC073VLm1%2BEW&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20221119T011901Z&X-Amz-SignedHeaders=host&X-Amz-Expires=300&X-Amz-Credential=ASIAQ3PHCVTYYRK5XVMW%2F20221119%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=74892553e56ba432350974a6f4dbebfd97418e2187a5c4e183da61dd0e951609&hash=bc1de316d0b6ee58191106c1cdbc34d1eaeab536a9bbc02dfae09818a8cc2510&host=68042c943591013ac2b2430a89b270f6af2c76d8dfd086a07176afe7c76c2c61&pii=S0890540197926432&tid=spdf-00500b38-a41c-4d5b-98bb-4a2754da3953&sid=17532fa99b4522476f2b00d636dc838e7e36gxrqa&type=client&ua=515904515402570a0401&rr=76c51d7eea7b4d36).
This approach is not only memory-efficient (no GC needed), but also has two significant advantages: **automatic
parallelism** and **beta-optimality**. The idea is that you write a simple functional program, and HVM will turn it into
a massively parallel, beta-optimal executable. The examples below highlight these advantages in action.

Bubble Sort
-----------

<table>
<tr>
  <td>From: <a href="./examples/sort/bubble/main.hvm">HVM/examples/sort/bubble/main.hvm</a></td>
  <td>From: <a href="./examples/sort/bubble/main.hs" >HVM/examples/sort/bubble/main.hs</a></td>
</tr>
<tr>
<td>

```javascript
// sort : List -> List
(Sort Nil)         = Nil
(Sort (Cons x xs)) = (Insert x (Sort xs))

// Insert : U60 -> List -> List
(Insert v Nil)         = (Cons v Nil)
(Insert v (Cons x xs)) = (SwapGT (> v x) v x xs)

// SwapGT : U60 -> U60 -> U60 -> List -> List
(SwapGT 0 v x xs) = (Cons v (Cons x xs))
(SwapGT 1 v x xs) = (Cons x (Insert v xs))
```

</td>
<td>

```haskell
sort' :: List -> List
sort' Nil         = Nil
sort' (Cons x xs) = insert x (sort' xs)

insert :: Word64 -> List -> List
insert v Nil         = Cons v Nil
insert v (Cons x xs) = swapGT (if v > x then 1 else 0) v x xs

swapGT :: Word64 -> Word64 -> Word64 -> List -> List
swapGT 0 v x xs = Cons v (Cons x xs)
swapGT 1 v x xs = Cons x (insert v xs)
```

</td>
</tr>
</table>

![](bench/_results_/sort-bubble.png)

On this example, we run a simple, recursive [Bubble Sort](https://en.wikipedia.org/wiki/Bubble_sort) on both HVM and GHC
(Haskell's compiler). Notice the algorithms are identical. The chart shows how much time each runtime took to sort a
list of given size (the lower, the better). The purple line shows GHC (single-thread), the green lines show HVM (1, 2, 4
and 8 threads). As you can see, both perform similarly, with HVM having a small edge.  Sadly, here, its performance
doesn't improve with added cores. That's because Bubble Sort is an *inherently sequential* algorithm, so HVM can't
improve it.

Radix Sort
----------

<table>
<tr>
  <td>From: <a href="./examples/sort/radix/main.hvm">HVM/examples/sort/radix/main.hvm</a></td>
  <td>From: <a href="./examples/sort/radix/main.hs" >HVM/examples/sort/radix/main.hs</a></td>
</tr>
<tr>
<td>

```javascript
// Sort : Arr -> Arr
(Sort t) = (ToArr 0 (ToMap t))

// ToMap : Arr -> Map
(ToMap Null)       = Free
(ToMap (Leaf a))   = (Radix a)
(ToMap (Node a b)) =
  (Merge (ToMap a) (ToMap b))

// ToArr : Map -> Arr
(ToArr x Free) = Null
(ToArr x Used) = (Leaf x)
(ToArr x (Both a b)) =
  let a = (ToArr (+ (* x 2) 0) a)
  let b = (ToArr (+ (* x 2) 1) b)
  (Node a b)

// Merge : Map -> Map -> Map
(Merge Free       Free)       = Free
(Merge Free       Used)       = Used
(Merge Used       Free)       = Used
(Merge Used       Used)       = Used
(Merge Free       (Both c d)) = (Both c d)
(Merge (Both a b) Free)       = (Both a b)
(Merge (Both a b) (Both c d)) =
  (Both (Merge a c) (Merge b d))
```

</td>
<td>

```haskell
sort :: Arr -> Arr
sort t = toArr 0 (toMap t)

toMap :: Arr -> Map
toMap Null       = Free
toMap (Leaf a)   = radix a
toMap (Node a b) =
  merge (toMap a) (toMap b)

toArr :: Word64 -> Map -> Arr
toArr x Free       = Null
toArr x Used       = Leaf x
toArr x (Both a b) =
  let a' = toArr (x * 2 + 0) a
      b' = toArr (x * 2 + 1) b
  in Node a' b'

merge :: Map -> Map -> Map
merge Free       Free       = Free
merge Free       Used       = Used
merge Used       Free       = Used
merge Used       Used       = Used
merge Free       (Both c d) = (Both c d)
merge (Both a b) Free       = (Both a b)
merge (Both a b) (Both c d) =
  (Both (merge a c) (merge b d))
```

</td>
</tr>
</table>


![](bench/_results_/sort-radix.png)

On this example, we try a [Radix Sort](https://en.wikipedia.org/wiki/Radix_sort), based on merging immutable trees. In
this test, for now, single-thread performance was superior on GHC - and this is often the case, since GHC is much older
and has astronomically more micro-optimizations - yet, since this algorithm is *inherently parallel*, HVM was able to
outperform GHC given enough cores. With **8 threads**, HVM sorted a large list **2.5x faster** than GHC.

Keep in mind one could parallelize the Haskell version with `par` annotations, but that would demand time-consuming,
expensive refactoring - and, in some cases, it isn't even *possible* to use all the available parallelism with `par`
alone. HVM, on the other hands, will automatically distribute parallel workloads through all available cores, achieving
horizontal scalability. As HVM matures, the single-thread gap will decrease significantly.

Lambda Multiplication
---------------------

<table>
<tr>
  <td>From: <a href="./examples/lambda/multiplication/main.hvm">HVM/examples/lambda/multiplication/main.hvm </a></td>
  <td>From: <a href="./examples/lambda/multiplication/main.hs" >HVM/examples/lambda/multiplication/main.hs </a></td>
</tr>
<tr>
<td>

```javascript
// Increments a Bits by 1
// Inc : Bits -> Bits
(Inc xs) = λex λox λix
  let e = ex
  let o = ix
  let i = λp (ox (Inc p))
  (xs e o i)

// Adds two Bits
// Add : Bits -> Bits -> Bits
(Add xs ys) = (App xs λx(Inc x) ys)

// Multiplies two Bits
// Mul : Bits -> Bits -> Bits
(Mul xs ys) =
  let e = End
  let o = λp (B0 (Mul p ys))
  let i = λp (Add ys (B0 (Mul p ys)))
  (xs e o i)
```

</td>
<td>

```haskell
-- Increments a Bits by 1
inc :: Bits -> Bits
inc xs = Bits $ \ex -> \ox -> \ix ->
  let e = ex
      o = ix
      i = \p -> ox (inc p)
  in get xs e o i

-- Adds two Bits
add :: Bits -> Bits -> Bits
add xs ys = app xs (\x -> inc x) ys

-- Muls two Bits
mul :: Bits -> Bits -> Bits
mul xs ys =
  let e = end
      o = \p -> b0 (mul p ys)
      i = \p -> add ys (b0 (mul p ys))
  in get xs e o i
```

</td>
</tr>
</table>

![](bench/_results_/lambda-multiplication.png)

This example implements bitwise multiplication using [λ-encodings](https://en.wikipedia.org/wiki/Church_encoding). Its
purpose is to show yet another important advantage of HVM: beta-optimality. This chart isn't wrong: HVM multiplies
λ-encoded numbers **exponentially faster** than GHC, since it can deal with very higher-order programs with optimal
asymptotics, while GHC can not. As esoteric as this technique may look, it can actually be very useful to design
efficient functional algorithms. One application, for example, is to implement [runtime
deforestation](https://github.com/Kindelia/HVM/issues/167#issuecomment-1314665474) for immutable datatypes. In general,
HVM is capable of applying any fusible function `2^n` times in linear time, which sounds impossible, but is indeed true.

*Charts made on [plotly.com](https://chart-studio.plotly.com/).*

Getting Started
===============

1. Install Rust nightly:

    ```
    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    rustup toolchain install nightly
    ```

2. Install HVM:

    ```
    cargo +nightly install hvm
    ```

3. Run an HVM expression:

    ```
    hvm run "(@x(+ x 1) 41)"
    ```

That's it! For more advanced usage, check the [complete guide](guide/README.md).

More Information
================

- To learn more about the **underlying tech**, check [guide/HOW.md](guide/HOW.md).

- To ask questions and **join our community**, check our [Discord Server](https://discord.gg/kindelia).

- To **contact the author** directly, send an email to <taelin@higherorderco.com>.

FAQ
===

### Is HVM faster than GHC in a single core today?

No. For now, HVM seems to be from 50% faster to 3x slower in single thread
performance, to even worse if the Haskell code exploits optimizations that
HVM doesn't have yet (ST Monad, mutable arrays, inlining, loops).

### Is HVM faster than Rust today?

No.

### Is HVM faster than C today?

No!

### Can HVM be faster than these one day?

Hard question. Perhaps! The underlying model is very efficient. HVM shares the
same initial core as Rust (an affine λ-calculus), has great memory management
(no thunks, no garbage-collection). Some people think interaction nets are an
overhead, but that's not the case - they're the *lack* of overhead. For example,
a lambda on HVM uses only 2 64-bit pointers, which is about as lightweight as it
gets. Furthermore, every reduction rule of HVM is a lightweight, constant-time
operation that can be compiled to very fast machine code. As such, given enough
optimizations, from proper inlining, to real loops, to inner mutability
(FBIP-like?), I believe HVM could one day compare to GHC and even Rust or C. But
we're still far from that.

### Why do the benchmarks compare single-thread vs multi-core?

They do not! Notice all benchmarks include a line for single-threaded HVM
execution, which is usually 3x slower than GHC. We do include multi-core HVM
execution to let us visualize how its performance scales with added cores,
without any change of the code. We do not include multi-core GHC execution
because GHC doesn't support automatic parallelism, so it is not possible to make
use of threads without changing the code. Keep in mind, once again, the
benchmarks are NOT claiming that HVM is faster than GHC today.

### Does HVM support the full λ-Calculus, or System-F?

Not yet! HVM is an impementation of the bookkeeping-free version of the
reduction algorithm proposed on [TOIOFPL](https://www.researchgate.net/publication/235778993_The_optimal_implementation_of_functional_programming_languages)
book, up to page 40. As such, it doesn't support some λ-terms, such as:

```
(λx.(x x) λf.λx.(f (f x)))
```

HVM is, though, Turing complete, so you could implement a full λ-calculus
interpreter on it - that limitation only addresses built-in closures. Keep in
mind many popular languages don't include the full λ-calculus closures either;
Rust, for example, covers a very restricted subset, due to the borrow system.
That said, HVM covers a wide class of λ-terms, including the the Y-combinator,
church encodings (even algorithms like addition, multiplication and
exponentiation), as well as arbitrary datatypes (both native and scott encoded)
and recursion.

### Will HVM support the full λ-Calculus, or System-F?

Yes! We plan to, by implementing the full algorithm described on the
[TOIOFPL](https://www.researchgate.net/publication/235778993_The_optimal_implementation_of_functional_programming_languages),
i.e., after page 40. Sadly, this results in an overhead that affects
the performance of beta-reduction by about 10x. As such, we want to
do so with caution to keep HVM efficient. Currently, the plan is:

1. Split lambdas into full-lambdas and light-lambdas

    - Light lambdas are what HVM has today. They're fast, but don't support the full λ-Calculus.

    - Full lambdas will be slower, but support the full λ-Calculus, via "internal brackets/croissants".

2. To decrease the overhead, convert full-lambdas to light-lambdas using EAL inference

    Elementary Affine Logic is a substructural logic that rejects the structural
    rule of contraction, replacing it by a controlled form of duplication. By
    extending HVM with EAL inference, we'll be able to convert most full-lambdas
    into lightweight lambdas, greatly reducing the associated slowdown.

Finally, keep in mind this only concerns lambdas. Low-order terms (constructors,
trees, recursion) aren't affected.

### Are unsupported terms "Undefined Behavior"?

No! Unsupported λ-terms like `λx.(x x) λf.λx.(f (f x))` don't cause HVM to
display undefined behavior. HVM will always behave deterministically, and give
you a correct result to any input, except it will be in terms of [Interaction
Calculus](https://github.com/HigherOrderCO/Kindex/blob/master/Apps/IC/_.kind2) (IC)
semantics. The IC is an alternative to the Lambda Calculus (LC) which differs
slightly in how non-linear variables are treated. As such, these "unsupported"
terms are just cases where the LC and the IC evaluation disagree. In theory, you
could use the HVM as a Interaction Net runtime, and it would always give you
perfectly correct answers under these semantics - but that's not
usual, so we don't talk about it often.

### What is HVM's main innovation, in simple terms?

In complex terms, HVM's main innovation is that it is an efficient
implementation of the Interaction Net, which is a concurrent model of
computation. But there is a way to translate it to more familiar terms. HVM's
performance, parallelism and GC-freedom all come from the fact it is based on a
linear core - just like Rust!  But, on top of it, instead of adding loops and
references (plus a "borrow checker"), HVM adds recursion and a *lazy,
incremental cloning primitive*. For example, the expression below:

```
let xs = (Cons 1 (Cons 2 (Cons 3 Nil))) in [xs, xs]
```

Computes to:

```
let xs = (Cons 2 (Cons 3 Nil)) in [(Cons 1 xs), (Cons 1 xs)]
```


Notice the first `Cons 1` layer was cloned incrementally. This makes cloning
essentially free, for the same reason Haskell's lazy evaluator allows you to
make infinite lists: there is no cost until you actually read the copy! That
lazy-cloning primitive is pervasive, and covers all primitives of HVM's runtime:
constructors, numbers and lambdas. This idea, though, breaks down for lambdas:
how do you incrementally copy a lambda?

```
let f = λx. (2 + x) in [f, f]
```

If you try it, you'll realize why that's not possible:

```
let f = (2 + x) in [λx. f, λx. f]
```

The solution to that question is the main insight that the Interaction Net model
brought to the table, and it is described in more details on the
[HOW.md](https://github.com/Kindelia/HVM/blob/master/guide/HOW.md) document.

### Is HVM always *asymptotically* faster than GHC?

No. In most common cases, it will have the same asymptotics. In some cases, it
is exponentially faster. In [this
issue](https://github.com/Kindelia/HVM/issues/60), a user noticed that HVM
displays quadratic asymptotics for certain functions that GHC computes in linear
time. That was a surprise to me, and, as far as I can tell, despite the
"optimal" brand, seems to be a limitation of the underlying theory. That said,
there are multiple ways to alleviate, or solve, this problem. One approach would
be to implement "safe pointers", also described on the book, which would reduce
the cloning overhead and make some quadratic cases linear. But that wouldn't
work for all cases. A complimentary approach would be to do linearity analysis,
converting problematic quadratic programs in faster, linear versions. Finally,
in the worst case, we could add references just like Haskell, but that should be
made with a lot of caution, in order not to break the assumptions made by the
parallel execution engine. For a more in depth explanation, check [read comment
on Hacker News](https://news.ycombinator.com/edit?id=35342297).

### Is HVM's optimality only relevant for weird, academic λ-encoded terms?

No. HVM's optimality has some very practical benefits. For example, all the
"deforesting" techniques that Haskell employs as compile-time rewrite rules,
happen naturally, at runtime, on the HVM. For example, Haskell optimizes:

`map f .  map g`

Into:

`map (f . g)`

This is a hardcoded optimization. On HVM, that occurs naturally, at runtime,
in a very general and pervasive way. So, for example, if you have something
like:

```
foldr (.) id funcs :: [Int -> Int]
```

GHC won't be able to "fuse" the functions on the `funcs` list, since they're not
known at compile time. HVM will do that just fine. See [this
issue](https://github.com/Kindelia/HVM/issues/167) for a practical example.

Another practical application for λ-encodings is for monads. On Haskell, the
Free Monad library uses Church encodings as an important optimization. Without
it, the asymptotics of binding make free monads much less practical. HVM has
optimal asymptotics for Church encoded data, making it great for these problems.


### Why is HVM so parallelizable?

Because it is fully linear: every piece of data only occurs in one place at the
same time, which reduces need for synchronization. Furthermore, it is pure, so
there are no global side effects that demand communication. Because of that,
reducing HVM expressions in parallel is actually quite simple: we just keep a
work strealing queue of redexes, and let a pool of threads computing them. That
said, there are two places where HVM needs synchronization:

- On dup nodes, used by lazy cloning: a lock is needed to prevent threads from
  passing through, and, thus, accessing the same data

- On the substitution operation: that's because substitution could send data
  from one thread to another, so it must be done atomically

In theory, Haskell could be parallelized too, and GHC devs tried it at a point,
but I believe the non-linearity of the STG model would make the problem much
more complex than it is for the HVM, making it hard to not lose too much
performance due to synchronization overhead.

### How is the memory footprint of HVM, compared to other runtimes?

It is a common misconception that an "interactional" runtime would somehow
consume more memory than a "procedural" runtime like Rust's. That's not the
case. Interaction nets, as implemented on HVM, add no overhead, and HVM
instantly collects any piece of data that becomes unreachable, just like Rust,
so there are no accumulating thunks that result in world-stopping garbage
collection, as happens in Haskell currently.

That said, currently, HVM doesn't implement memory-efficient features like
references, loops and local mutability. As such, to do anything on HVM today,
you need to use immutable datatypes and recursion, which are naturally
memory-hungry. Thus, HVM programs today will have increased memory footprint, in
relation to C and Rust programs. Thankfully, there is no theoretical limitation
preventing us from adding loops and local mutability, and, once/if we do, one
can expect the same memory footprint as Rust. The only caveat, though, is shared
references: we're not sure if we want to add these, as they might impact
parallelism. As such, it is possible that we choose to let lazy clones to be the
only form of non-linearity, which would preserve parallelism, at the cost of
making some algorithms more memory-hungry.

### Is HVM meant to replace GHC?

No! GHC is actually a superb, glorious runtime that is very hard to match. HVM
is meant to be a lightweight, massively parallel runtime for functional, and
even imperative, languages, from Elm to JavaScript. That said, we do want to
support Haskell, but that will require HVM being in a much later stage of
maturity, as well as provide support for full lambdas, which it doesn't do yet.
Once we do, HVM could be a great alternative for GHC, giving the Haskell
community an option to run it in a runtime with automatic parallelism, no
slow garbage-collector and beta-optimality. Which will be the best option
will likely depend on the type of application you're compiling, but having
more choices is generally good and, as such, HVM can be a great tool for
the Haskell community.

### Is HVM production-ready?

No. HVM is still to be considered a prototype. Right now, I had less than
3 months to work on it directly. It is considerably less mature than other
compilers and runtimes like GHC and V8. That said, we're raising funds to
have a proper team of engineers working on the HVM. If all goes well, we
can expect a production-ready release by Q1 2024.

### I've ran an HVM program and it consumed 1950 GB and my computer exploded.

HVM is a prototype. Bugs are expected. Please, open an issue!

### I've used HVM in production and now my company is bankrupt.

I quit.

Related Work
============

- [Inpla](https://github.com/inpla/inpla) - a pure interaction net framework, without the "functional/calculus" style of HVM
- [HINet](http://www.cas.mcmaster.ca/~kahl/Haskell/HINet/) - implementation of interaction nets in Haskell

// HOW.md

![magic](https://c.tenor.com/md3foOULKGIAAAAC/magic.gif)

**Note: this is a public draft. It contains a lot of errors and may be too
meme-ish and handholding in some parts. I know it needs improvements. I'll
review and finish in the future. Corrections and feedbacks are welcome!**

How?
====

* [TL;DR](#tldr)
* [Core Language Overview](#hvms-core-language-overview)
* [What makes it fast](#what-makes-it-fast)
* [Rewrite Rules](#hvms-rewrite-rules)
* [Low-level Implementation](#hvms-low-level-implementation)
* [Bonus: Copatterns](#bonus-copatterns)
* [Bonus: Abusing Beta-Optimality](#bonus-abusing-beta-optimality)
* [Bonus: Abusing Parallelism](#bonus-abusing-parallelism)

TL;DR
=====

Since this became a long, in-depth overview, here is the TL;DR for the lazy:

HVM doesn't need a global, stop-the-world garbage collector because every
"object" only exists in one place, **exactly like in Rust**; i.e., HVM is
*linear*. The catch is that, when an object needs to be referenced in multiple
places, instead of a complex borrow system, HVM has an elegant, pervasive **lazy
clone primitive** that works very similarly to Haskell's evaluation model. This
makes cloning essentially free, because the copy of any object isn't made in a
single, expensive pass, but in a layer-by-layer, on-demand fashion. And the
nicest part is that this clone primitive works for not only data, but also for
lambdas, which explains why HVM has better asymptotics than GHC: it is capable
of **sharing computations inside lambdas, which GHC can't**. That was only
possible due to a key insight that comes from Lamping's Abstract Algorithm for
optimal evaluation of λ-calculus terms. Finally, the fact that objects only
exist in one place greatly simplifies parallelism.

This was all known and possible since years ago (see other implementations of
optimal reduction), but all implementations of this algorithm, until now,
represented terms as graphs. This demanded a lot of pointer indirection, making
it slow in practice. A new memory format, based on the [Interaction Calculus](https://github.com/VictorTaelin/Symmetric-Interaction-Calculus),
takes advantage of the fact that inputs are known to be λ-terms, allowing for a
50% lower memory usage, and letting us avoid several impossible cases. This
made the runtime 50x (!) faster, which finally allowed it to compete with GHC
and similar. And this is just a prototype I wrote in about a month. I don't even
consider myself proficient in C, so I have expectations for the long-term
potential of HVM.

HVM's optimality and complexity reasoning comes from the vast literature on the
optimal evaluation of functional programming languages. [This book](https://www.amazon.com/Implementation-Functional-Programming-Languages-Theoretical/dp/0521621127),
by Andrea Asperti and Stefano Guerrini, has a great overview. HVM is merely a
practical, efficient implementation of the bookkeeping-free reduction machine
depicted in the book (pages 14-39). Its higher-order machinery has a 1-to-1
relationship to the theoretical model, and the same complexity bounds, and
respective proofs (chapter 10) apply. HVM has additional features (machine
integers, datatypes) that do not affect complexity.

That's about it. Now, onto the long, in-depth explanation.

HVM's Core Language Overview
============================


HVM is, in essence, just a virtual machine that evaluates terms in its core
language. So, before we dig deeper, let's review that language. HVM's Core is a
very simple language that resembles untyped Haskell. It features lambdas
(eliminated by applications), constructors (eliminated by user-defined rewrite
rules) and machine integers (eliminated by operators).

```
term ::=
  | λvar. term               # a lambda
  | (term term)              # an application
  | (ctr term term ...)      # a constructor
  | num                      # a machine int
  | (op2 term term)          # an operator
  | let var = term; term     # a local definition

rule ::=
  | term = term

file ::= list<rule>
```

A constructor begins with a name and is followed by up to 16 fields.
Constructor names must start uppercased. For example, below is a pair with 2
numbers:

```javascript
(Pair 42 7)
```

HVM files consist of a series of `rules`, each with a left-hand term and a
right-hand term. These rules enact a computational behavior where every
occurrence of the left-hand term is replaced by its corresponding right-hand
term. For example, below is a rule that gets the first element of a pair:

```javascript
(Fst (Pair x y)) = x
```

Once that rule is enacted, the `(Fst (Pair 42 7))` term will be reduced to
`42`. Note that `(Fst ...)` is itself just a constructor, even though it is
used like a function. This has important consequences which will be elaborated
later on. From this, the remaining syntax should be pretty easy to guess. As an
example, we define and use the `Map` function on `List` as follows:

```javascript
(Map f Nil)         = Nil
(Map f (Cons x xs)) = (Cons (f x) (Map f xs))

(Main) =
  let list = (Cons 1 (Cons 2 Nil))
  let add1 = λx (+ x 1)
  (Map add1 list)
```

By running this file (with `hvm r main`), HVM outputs `(Cons 2 (Cons 3 Nil))`,
having incremented each number in `list` by 1. Notes:

- Application is distinguished from constructors by case (`(f x)` vs `(F x)`).

- The parentheses can be omitted from unary constructors (`Nil` == `(Nil)`)

- You can abbreviate applications (`(((f a) b) c ...)` == `(f a b c ...)`).

- You may write `@` instead of `λ`.

- Check [this](https://github.com/Kindelia/HVM/issues/64#issuecomment-1030688993) issue about how constructors, applications and currying work.

What makes it fast
==================

What makes HVM special, though, is **how** it evaluates its programs. HVM has
one simple trick that hackers don't want you to know. This trick is responsible
for HVM's major features: beta-optimality, parallelism, and no garbage
collection. But before we get too technical, we must first talk about
**clones**, and how their work obsession ruins everything, for everyone. This
section should provide more context and a better intuition about why things are
the way they are.

### Clones ruin everything

By clones, I mean when a value is copied, shared, replicated, or whatever else
you call it. For example, consider the JavaScript program below:

```javascript
function foo(x, y) {
  return x + x;
}
```

To compute `foo(2, 3)`, the number `2` must be **cloned** before adding it
to itself. This seemingly innocent operation has made a lot of people very
confused and has been widely regarded as the hardest problem of the 21st
century.

The main issue with clones is how they interact with the **order of
evaluation**. For example, consider the expression `foo(2 * 2, 3 * 3)`. In a
**strict** language, it is evaluated as such:

```javascript
foo(2 * 2, 3 * 3) // the input
foo(4    , 9    ) // arguments are reduced
4 + 4             // foo is applied
8                 // the output
```

But this computation has a silly issue: the `3 * 3` value is not necessary to
produce the output, so the `3 * 3` multiplication was wasted work. This led to
the idea of **lazy** evaluation. An initial implementation of this idea would
operate as follows:

```javascript
foo(2 * 2, 3 * 3) // the input
(2 * 2) + (2 * 2) // foo is applied
4       + 4       // arguments are reduced
8                 // the output
```

Notice how the `3 * 3` expression was never computed, thereby saving work?
Instead, the other `2 * 2` expression has been computed twice, yet again
resulting in **wasted work**! That's because `x` was used two times in the body
of `foo`, which caused the `2 * 2` expression to be **cloned** and, thus,
computed twice. In other words, clones ruined the virtue of laziness.

### Everyone's solution: ban the clones

Imagine a language without clones. Such a language would be computationally
perfect. Lazy evaluators wouldn't waste work, since expressions can't be
cloned. Garbage collection would be cheap, as every object would only have one
reference. Parallelism would be trivial, because there would be no simultaneous
accesses to the same object. Sadly, such a language wouldn't be practical.
Imagine never being able to copy anything! Therefore, real languages must find
a way to let their users replicate values, without impacting the features they
desire, all while avoiding these expensive clones. In previous languages, the
solution has almost always been the use of references of some sort.

For example, Haskell is lazy. To avoid "cloning computations", it implements
thunks, which are nothing but *memoized references* to shared expressions,
allowing the `(2 * 2)` example above to be cached. This solution, though, breaks
down when there are lambdas. Similarly, Rust is GC-free, so every object has
only one "owner". To avoid too much cloning, it implements a complex *borrowed
references* system, allowing the same object to be accessed from multiple places,
when the compiler can prove it is safe. Finally, parallel languages require
mutexes and atomics to synchronize accesses to *shared references*. In other
words, references saved the world by letting us avoid these clones, and that's
great... right?

> clone wasn't the impostor

References. **They** ruin everything. They're the reason Rust is so hard to use.
They're the reason parallel programming is so complex. They're the reason
Haskell isn't optimal, since thunks can't share computations that have free
variables (i.e., any expression inside lambdas). They're why a 1-month-old
prototype beats GHC in the same class of programs it should thrive in.  It isn't
GHC's fault. References are the real culprits.

### HVM's solution: make clones cheap

Clones aren't bad. They just need to relax.

Once you understand the context above, grasping how HVM can be optimally lazy,
non-GC'd and inherently parallel is easy. At its base, it has the same "linear"
core that both Haskell and Rust share in common (which, as we've just
established, already exhibit these properties). The difference is that, instead
of adding some kind of clever reference system to circumvent the cost of
cloning... **HVM introduces a pervasive, lazy clone primitive**.

**HVM's runtime has no references. Instead, it features a `.clone()` primitive
that has zero cost, until the cloned value needs to be read. Once it does,
instead of being copied whole, it's done layer by layer, on-demand.**

For the purpose of lazy evaluation, HVM's lazy clones works like Haskell's
thunks, except they do not break down on lambdas. In the context of garbage
collection, since the data is actually copied, there are no shared references,
so memory can be freed when values go out of scope.  For the same reason,
parallelism becomes trivial, and the runtime's `reduce()` procedure is almost
entirely thread safe, requiring minimal synchronization.

In other words, think of HVM as Rust, except replacing the borrow system by a
very cheap `.clone()` operation that can be used and abused with no mercy. This
is the secret sauce! Easy, right? Well, no. There is still a big problem to be
solved: **how do we incrementally clone a lambda?** There is a beautiful answer
to this problem that made this all possible. Let's get technical!

HVM's Rewrite Rules
===================

HVM is, in essence, a graph-rewrite system, which means that all it does is
repeatedly rewrite terms in memory until there is no more work left to do.
These rewrites come in two forms: user-defined and primitive rules.

User-defined Rules
------------------

User-defined rules are generated from equations in a file. For example, the
following equation:

```javascript
(Foo (Tic a) (Tac b)) = (+ a b)
```

Generates the following rewrite rule:

```javascript
(Foo (Tic a) (Tac b))
--------------------- Foo-Rule-0
(+ a b)
```

It should be read as "the expression above reduces to the expression below".
So, for example, `Foo-rule-0` dictates that `(Foo (Tic 42) (Tac 7))` reduces to
`(+ 42 7)`. As for the primitive rules, they deal with lambdas, native numbers
and the duplication primitive. Let's start with numeric operations.

Operations
----------

```
(<op> x y)
------------------- Op2-U32
x +  y if <op> is +
x -  y if <op> is -
x *  y if <op> is *
x /  y if <op> is /
x %  y if <op> is %
x &  y if <op> is &
x |  y if <op> is |
x ^  y if <op> is ^
x >> y if <op> is >>
x << y if <op> is <<
x <  y if <op> is <
x <= y if <op> is <=
x == y if <op> is ==
x >= y if <op> is >=
x >  y if <op> is >
x != y if <op> is !=
```

This should be read as: *"the addition of `x` and `y` reduces to `x + y`"*.
This just says that we can perform numeric operations on HVM. For example,
`(+ 2 3)` is reduced to `5`, `(* 5 10)` is reduced to `50`, and so on. HVM
numbers are 32-bit unsigned integers, but more numeric types will be added in
the future.

Number Duplication
------------------

```javascript
dup x y = N
-----------
x <- N
y <- N
```

This should be read as: *"the duplication of the number `N` as `x` and `y`
reduces to the substitution of `x` by a copy of `N`, and of `y` by another copy
of `N`"*. Before explaining what is going on here, let me also present the
constructor duplication rule below.

Constructor Duplication
-----------------------

```javascript
dup x y = (Foo a b ...)
----------------------- Dup-Ctr
dup a0 a1 = a
dup b0 b1 = b
...
x <- (Foo a0 b0 ...)
y <- (Foo a1 b1 ...)
```

This should be read as: *"the duplication of the constructor `(Foo a b ...)` as
`x` and `y` reduces to the duplication of `a` as `a0` and `a1`, the duplication
of `b` as `b0` and `b1`, and the substitution of `x` by `(Foo a0 b0 ...)` and
the substitution of `y` by `(Foo a1 b1 ...)`"*.

There is a lot of new information here, so, before moving on, let's dissect it
all one by one.

**1.** What the hell is `dup`? That is an **internal duplication node**. You
can't write it directly on the user-facing language; instead, it is inserted by
the pre-processor whenever you use a variable more than once. For example, at
compile time, the equation below:

```javascript
(Foo a) = (+ a a)
```

Is actually replaced by:

```javascript
(Foo a) =
  dup x y = a
  (+ x y)
```

Because of that transformation, **every runtime variable only occurs once**.
The effect of `dup` is that of cloning an expression, and moving it to two
locations. For example, the program below:

```javascript
dup x y = 42
(Pair x y)
```

Is reduced to:

```javascript
(Pair 42 42)
```

**2.** By "substitution", we mean "replacing a variable by a value". For
example, the substitution of `x` by `7` in `[1, x, 8]` would be `[1, 7, 8]`.
Since every variable only occurs once in the runtime, substitution is a fast,
constant time operation that performs either 1 or 2 array writes.

**3.** `dup`s aren't stored inside the expressions. Instead, they "float" on
the global scope. That's why they're always written on top.

**4.** Remember that `dup` (like all other rules) is only triggered when it is
needed, due to lazy evaluation. That's what makes it ultra-cheap. In a way, it
is as if HVM has added a `.clone()` to every variable used more than once. And
that's fine.

**5.** Even though the user-facing language makes no distinction between
constructors and functions, the runtime does, for optimality purposes.
Specifically, a duplication is only applied for constructors that are not used
as functions. This equal treatment means we can write Copatterns easily in HVM
([more on this in the bonus section](#bonus-copatterns)).

#### Example

Now that you know all that, let's watch `dup` in action, by visualizing how the
`[1 + 1, 2 + 2, 3 + 3]` list is cloned. Lines separate reduction steps.

```javascript
dup x y = (Cons (+ 1 1) (Cons (+ 2 2) (Cons (+ 3 3) Nil)))
(Pair x y)
------------------------------------------- Dup-Ctr
dup x y = (Cons (+ 2 2) (Cons (+ 3 3) Nil))
dup a b = (+ 1 1)
(Pair
  (Cons a x)
  (Cons b y)
)
------------------------------------------- Op2-U32
dup x y = (Cons (+ 2 2) (Cons (+ 3 3) Nil))
dup a b = 2
(Pair
  (Cons a x)
  (Cons b y)
)
------------------------------------------- Dup-U32
dup x y = (Cons (+ 2 2) (Cons (+ 3 3) Nil))
(Pair
  (Cons 2 x)
  (Cons 2 y)
)
------------------------------------------- Dup-Ctr
dup x y = (Cons (+ 3 3) Nil)
dup a b = (+ 2 2)
(Pair
  (Cons 2 (Cons a x))
  (Cons 2 (Cons b y))
)
------------------------------------------- Op2-U32
dup x y = (Cons (+ 3 3) Nil)
dup a b = 4
(Pair
  (Cons 2 (Cons a x))
  (Cons 2 (Cons b y))
)
------------------------------------------- Dup-U32
dup x y = (Cons (+ 3 3) Nil)
(Pair
  (Cons 2 (Cons 4 x))
  (Cons 2 (Cons 4 y))
)
------------------------------------------- Dup-Ctr
dup x y = Nil
dup a b = (+ 3 3)
(Pair
  (Cons 2 (Cons 4 (Cons a x)))
  (Cons 2 (Cons 4 (Cons b y)))
)
------------------------------------------- Op2-U32
dup x y = Nil
dup a b = 6
(Pair
  (Cons 2 (Cons 4 (Cons a x)))
  (Cons 2 (Cons 4 (Cons b y)))
)
------------------------------------------- Dup-U32
dup x y = Nil
(Pair
  (Cons 2 (Cons 4 (Cons 6 x)))
  (Cons 2 (Cons 4 (Cons 6 y)))
)
------------------------------------------- Dup-Ctr
(Pair
  (Cons 2 (Cons 4 (Cons 6 Nil)))
  (Cons 2 (Cons 4 (Cons 6 Nil)))
)
```

In the end, we made two copies of the list. Note how the `(+ 1 1)` expression,
was NOT "cloned". It only happened once, even though we evaluated the program
lazily. And, of course, since the cloning itself is lazy, if we only needed
parts of the list, we wouldn't need to make two full copies. For example,
consider the following program instead:

```javascript
dup x y = (Cons (+ 1 1) (Cons (+ 2 2) (Cons (+ 3 3) Nil)))
(Pair (Head x) (Head (Tail y)))
------------------------------------------- Dup-Ctr
dup x y = (Cons (+ 2 2) (Cons (+ 3 3) Nil))
dup a b = (+ 1 1)
(Pair (Head (Cons a x)) (Head (Tail (Cons b y))))
------------------------------------------- Head
dup x y = (Cons (+ 2 2) (Cons (+ 3 3) Nil))
dup a b = (+ 1 1)
(Pair a (Head (Tail (Cons b y))))
------------------------------------------- Op2-U32
dup x y = (Cons (+ 2 2) (Cons (+ 3 3) Nil))
dup a b = 2
(Pair a (Head (Tail (Cons b y))))
------------------------------------------- Dup-U32
dup x y = (Cons (+ 2 2) (Cons (+ 3 3) Nil))
(Pair 2 (Head (Tail (Cons 2 y))))
------------------------------------------- Tail
dup x y = (Cons (+ 2 2) (Cons (+ 3 3) Nil))
(Pair 2 (Head y))
------------------------------------------- Dup-Ctr
dup x y = (Cons (+ 3 3) Nil)
dup a b = (+ 2 2)
(Pair 2 (Head (Cons b y)))
------------------------------------------- Head
dup x y = (Cons (+ 3 3) Nil)
dup a b = (+ 2 2)
(Pair 2 b)
------------------------------------------- Op2-U32
dup x y = (Cons (+ 3 3) Nil)
dup a b = 4
(Pair 2 b)
------------------------------------------- Dup-U32
dup x y = (Cons (+ 3 3) Nil)
(Pair 2 4)
------------------------------------------- Collect
(Pair 2 4)
```

Notice how only the minimal amount of copying was performed. The first part of
the list (`(Cons (+ 1 1) ...)`) was copied twice, the second part
(`(Cons (+ 2 2) ...)`) was copied once, and the rest (`(Cons (+ 3 3) Nil)`) was
simply collected as garbage. Collection is orchestrated by variables
that go out of scope. For example, in the last lines, `x` and `y` both aren't
referenced anywhere. That triggers the collection of the remaining list.

That was a lot of info. Hopefully, by now, you have an intuition about how the
lazy duplication primitive works. Moving on.

### Lambda Application

```javascript
(λx(body) arg)
-------------- App-Lam
x <- arg
body
```

This is the famous beta-reduction rule. This must be read as: *"the application
of the lambda `λx(body)` to the argument `arg` reduces to `body`, and the
substitution of `x` by `arg`"*. For example, `(λx(Single x) 42)` is reduced to
`(Single 42)`. Remember that variables only occur once. Because of that,
beta-reduction is a very fast operation. A modern CPU can perform more than 200
million beta-reductions per second, in a single core. Here's an example of it
in action:

```javascript
(λxλy(Pair x y) 2 3)
-------------------- App-Lam
(λy(Pair 2 y) 3)
-------------------- App-Lam
(Pair 2 3)
```

Simple, right? This rule is beautiful, but the next one is special, as it is
responsible for making all of HVM possible.

### Lambda Duplication

Incrementally cloning datatypes is a neat idea. But there is nothing special to
it. In fact, that **is** exactly how Haskell's thunks behave! But, now, take a
moment and ask yourself: **how the hell do we incrementally clone a lambda**?

```javascript
dup a b = λx(body)
------------------ Dup-Lam
a <- λx0(b0)
b <- λx1(b1)
x <- {x0 x1}
dup b0 b1 = body

dup a b = {r s}
--------------- Dup-Sup
a <- r
b <- s
```

Here is how. This may be a bit overwhelming. A good place to start is by writing
this in plain English. It reads as: "the duplication of a lambda `λx(body)` as
`a` and `b` reduces in the duplication of its `body` as `b0` and `b1`, and the
substitution of `a` by `λx0(b0)`, `b` by `λx1(b1)` and `x` by the superposition
`{x0 x1}`".

What this is saying is that, in order to duplicate a lambda, we must duplicate
its body; then we must create two lambdas. Then, weird things happen with its
variable. And then there is a brand new construct, the superposition, that I
haven't explained yet. But, this is fine. Let's try to do it with an example:

```javascript
dup a b = λx λy (Pair x y)
(Pair a b)
```

This program just makes two copies of the `λx λy (Pair x y)` lambda. But, to
get there, we are not allowed to copy the entire lambda whole.  Instead, we
must go through a series of incremental lazy steps. Let's try it, and copy the
outermost lambda (`λx`):

```javascript
dup a b = λy (Pair x y)
(Pair λx(a) λx(b))
```

Can you spot the issue? As soon as the lambda is copied, it is moved to another
location of the program, which means it gets detached from its own body.
Because of that, the variable `x` gets unbound on the first line, and the body
of each copied `λx` has no reference to `x`. That makes no sense at all! How do
we solve this?

First, we must let go of material goods and accept a cruel reality of HVM:
**lambdas don't have scopes**. That is, a variable bound by a lambda can occur
outside of its body. So, for example, `(Pair x (λx(8) 7))` would reduce to
`(Pair 7 8)`. Please, take a moment to make sense out of this... even if looks
like it doesn't.

Once you accept that in your heart, you'll find that the program above will
make a little more sense, because we can say that the `λx` binder on the second
line is "connected" to the `x` variable on the first line, even if it's
outside. But there is still a problem: there are **two** lambdas bound to
the same variable. If the left lambda gets applied to an argument, it should
NOT affect the second one. But with the way it is written, that's what would
happen. To work around this issue, we need a new construct: the
**superposition**. Written as `{r s}`, a superposition stands for an
expression that is part of two partially copied lambdas. So, for example,
`(Pair {1 2} 3)` can represent either `(Pair 1 3)` or `(Pair 2 3)`, depending
on the context.

This gives us the tools we need to incrementally copy these lambdas. Here is
how that would work:

```javascript
dup a b = λx(λy(Pair x y))
(Pair a b)
------------------------------------------------ Dup-Lam
dup a b = λy(Pair {x0 x1} y)
(Pair λx0(a) λx1(b))
------------------------------------------------ Dup-Lam
dup a b = (Pair {x0 x1} {y0 y1})
(Pair λx0(λy0(a)) λx1(λy1(b)))
------------------------------------------------ Dup-Ctr
dup a b = {x0 x1}
dup c d = {y0 y1}
(Pair λx0(λy0(Pair a c)) λx1(λy1(Pair b d)))
------------------------------------------------ Dup-Sup
dup c d = {y0 y1}
(Pair λx0(λy0(Pair x0 c)) λx1(λy1(Pair x1 d)))
------------------------------------------------ Dup-Sup
(Pair λx0(λy0(Pair x0 y0)) λx1(λy1(Pair x1 y1)))
```

Wow, did it actually work? Yes, it did. Notice that, despite the fact that
"weird things" happened during the intermediate steps (specifically, variables
got out of their own lambda bodies, and parts of the program got temporarily
superposed), in the end, it all worked out, and the result was proper copies of
the original lambdas. This allows us to share computations inside lambdas,
something that GHC isn't capable of. For example, consider the following
reduction:

```javascript
dup f g = ((λx λy (Pair (+ x x) y)) 2)
(Pair (f 10) (g 20))
-------------------------------------- App-Lam
dup f g = λy (Pair (+ 2 2) y)
(Pair (f 10) (g 20))
-------------------------------------- Dup-Lam
dup f g = (Pair (+ 2 2) {y0 y1})
(Pair (λy0(f) 10) (λy1(g) 20))
-------------------------------------- App-Lam
dup f g = (Pair (+ 2 2) {10 y1})
(Pair f (λy1(g) 20))
-------------------------------------- App-Lam
dup f g = (Pair (+ 2 2) {10 20})
(Pair f g)
-------------------------------------- Dup-Ctr
dup a b = (+ 2 2)
dup c d = {10 20}
(Pair (Pair a c) (Pair b d))
-------------------------------------- Op2-U32
dup a b = 4
dup c d = {10 20}
(Pair (Pair a c) (Pair b d))
-------------------------------------- Dup-U32
dup c d = {10 20}
(Pair (Pair 4 c) (Pair 4 d))
-------------------------------------- Dup-Sup
(Pair (Pair 4 10) (Pair 4 20))
```


Notice that the `(+ 2 2)` addition only happened once, even though it was
nested inside two copied lambda binders! In Haskell, this situation would lead
to the un-sharing of the lambdas, and `(+ 2 2)` would happen twice. Notice also
how, in some steps, lambdas were applied to arguments that appeared outside of
their bodies. This is all fine, and, in the end, the result is correct.

Uff. That was hard, wasn't it? The good news is the worst part is done. From
now on, nothing too surprising will happen.

Superposed Application
----------------------

Since duplications are pervasive, what may happen is that a superposition will
end up in the function position of an application. For example, the situation
below can happen at runtime:

```javascript
({λx(x) λy(y)} 10)
```

This represents two superposed lambdas, applied to an argument `10`. If we
leave this expression as is, certain programs would get stuck, and we wouldn't
be able to evaluate them. We need a way out. Because of that, there is a
superposed application rule that deals with that situation:

```javascript
({a b} c)
--------------- App-Sup
dup x0 x1 = c
{(a x0) (b x1)}
```

In English, this rule says that: "the application of a superposition `{a b}` to
`c` is the superposition of the application of `a` and `b` to copies of `c`".
Makes sense, doesn't it? That rule also applies to user-defined functions. The
logic is the same, only adapted depending on the arity. I won't show it here.

Superposed Operation
--------------------

```javascript
(+ {a0 a1} b)
--------------------- Op2-Sup-A
dup b0 b1 = b
{(+ a0 b0) (+ a1 b1)}


(+ a {b0 b1})
--------------------- Op2-Sup-B
dup a0 a1 = a
{(+ a0 b0) (+ a1 b1)}
```

This, too, follows the same logic of superposed application, except operators
are strict on both arguments.


Superposed Duplication
----------------------

There is one last rule that is worth discussing.

```javascript
dup x y = {a b}
--------------- Dup-Sup (different)
x <- {xA xB}
y <- {yA yB}
dup xA yA = a
dup xB yB = b
```

This rule handles the duplication of a superposition. In English, it says that:
*"the duplication of a superposition `{a b}` as `x` and `y` reduces to the
duplication of `a` as `xA` and `yA`, `b` as `xB` and `yB`, and the substitution
of `x` by the superposition `{xA xB}`, and the substitution of `y` by `{yA
yB}`"*.  At that point, the formal notation is probably doing a better job than
English at conveying this information.

If you've paid close attention, though, you may have noticed the Dup-Sup has
already been defined, on the *Lambda Application* section. So, what is going on
here? Well, it turns out that Dup-Sup is a special case that has two different
reduction rules. If this Dup-Sup represents the end of a duplication process, it
must go with the former rule. However, if you're duplicating a term, which
itself duplicates something, then this rule must be used. Due to the extremely
local nature of HVM reductions though, determining when each rule should be
used in general would require an expensive book-keeping machinery. To avoid that
extra cost, HVM instead placed a limitation that allowed for a much faster
decision procedure. That limitation is:

**If a lambda that clones its argument is itself cloned, then its clones aren't
allowed to clone each other.**

For example, this term is **not** allowed:

```javascript
let g = λf(λx(f (f x)))
(g g)
```

That's because `g` is a function that clones its argument (since `f` is used
twice). It is then cloned, so each `g` on the second line is a clone. Then the
first clone attempts to clone the second clone. That is considered undefined
behavior, and a typed language that compiles to HVM must check that this kind of
situation won't happen.

How common is this? Well, unless you like multiplying Church-encoded natural
numbers in a loop, you've probably never seen a program that reaches this
limitation in your entire career. Even if you're a fan of λ encodings, you're
fine. For example, the program above can be fixed by just avoiding one clone:

```javascript
let g = λf(λx(f (f x)))
let h = λf(λx(f (f x)))
(g h)
```

And all the other "hardcore" functional programming tools are compatible.
Y-Combinators, Church encodings, nested maps of maps, all work just fine.
If you think you'll reach this limitation in practice, you're probably
misunderstanding how esoteric a program must be for it to happen. It
is a common (and annoying) misconception that this limit has much relevance in
practice. C programmers survived without closures, for decades. Rust programmers
live well with far more restrictive limitations on what shapes of programs
they're allowed to write. HVM has all sorts of extremely high-level closures you
can use. You just can't have a clone clone its own clone. Without this
limitation, which is almost irrelevant in practice, it wouldn't be possible for
HVM to achieve its current performance, so we believe it is justified.

HVM's Low-level Implementation
==============================

To learn more about HVM's low level implementation and memory layout, check the comments on
[runtime/base/memory.rs](../src/runtime/base/memory.rs).

Bonus: Copatterns
=================

Since functions and constructors are treated the same, this means there is
nothing preventing us from writing copatterns, by just swapping the roles of
eliminators and introducers. As an example, consider the program below:

```javascript
// List Map function
(Map f Nil)         = Nil
(Map f (Cons x xs)) = (Cons (f x) (Map f xs))

// List projectors
(Head (Cons x xs)) = x
(Tail (Cons x xs)) = xs

// The infinite list: 0, 1, 2, 3 ...
Nats = (Cons 0 (Map λx(+ x 1) Nats))

// Just a test (returns 2)
Main = (Head (Tail (Tail Nats)))
```

This is the usual recursive `Map` applied to an infinite `List`. Here, `Map` is
used in the function position, and the List constructors (`Nil` and `Cons`) are
matched. The same program can be written in a corecursive fashion, by inverting
everything: the `List` destructors (`Head`/`Tail`) are used in the function
position, and the function `Map` is matched:

```javascript
// CoList Map function
(Head (Map f xs)) = (f (Head xs))
(Tail (Map f xs)) = (Map f (Tail xs))

// The infinite colist: 0, 1, 2, 3 ...
(Head Nats) = 0
(Tail Nats) = (Map λx(+ x 1) Nats)

// Just a test (returns 2)
(Main n) = (Head (Tail (Tail Nats)))
```

Bonus: Abusing Beta-Optimality
==============================

By abusing beta-optimality, we're able to turn some exponential-time algorithms
into linear-time ones. That is why we're able to implement `Add` on `BitStrings`
as repeated applications of increment:


```javascript
// Addition is just "increment N times"
(Add xs ys) = (App xs λx(Inc x) ys)
```

This small, elegant and mathematical one-liner is as efficient as the
manually-crafted add-with-carry operation, which is an 8-cases, low-level and
error-prone definition. In order for this to be possible, we must apply some
techniques to make sure the self-composition (`λx (f (f x))`) of the function
remains as small as possible. First, we must use λ-encoded algorithms. If we
don't, then the normal form will not be small. For example:

```javascript
(Not True)  = False
(Not False) = True
```

This is easy to read, but then `λx (Not (Not x))` will not have a small normal
form. If we use λ encodings, we can write `Not` as:

```javascript
True  = λt λf t
False = λt λf f
Not   = λb (b False True)
```

This correctly negates a λ-encoded boolean. But `λx (Not (Not x))` still has a
large normal form: `λx (x λtλf(f) λtλf(t) λtλf(f) λtλf(t))`. Now, if we inline
the definition of `Not`, we get:

```javascript
True  = λt λf t
False = λt λf f
Not   = λb (b λtλf(f) λtλf(t))
```

Notice how both branches start with the same lambdas? We can lift them up and
**share** them:

```javascript
True  = λt λf t
False = λt λf f
Not   = λb λt λf (b f t)
```

This will make the normal form of `λx (Not (Not x))` small: i.e., it becomes
`λx λt λf (x t f)`. This makes `Not^(2^N)` linear time in `N`!

The same technique also applies for `Inc`. We start with the usual definition:

```javascript
(Inc E)     = E
(Inc (O x)) = (I x)
(Inc (I x)) = (O (Inc x))
```

Then we make it λ-encoded:

```javascript
(Inc x) =
  let case_e = λe λo λi e
  let case_o = λx λe λo λi (i x)
  let case_i = λx λe λo λi (o (Inc x))
  (x case_e case_o case_i)
```

Then we lift the shared lambdas up:

```javascript
(Inc x) = λe λo λi
  let case_e = e
  let case_o = λx (i x)
  let case_i = λx (o (Inc x))
  (x case_e case_o case_i)
```

This makes `λx (Inc (Inc x))` have a constant-space normal form, which in turn
makes the composition of `Inc` fast, allowing `Add` to be efficiently
implemented as repeated increment.

Similar uses of this idea can greatly speed up functional algorithms. For
example, a clever way to implement a `Data.List` would be to let all algorithms
operate on Church-encoded Lists under the hoods, converting as needed. This has
the same "deforestation" effect of Haskell's rewrite pragmas, without any hard-
coded compile-time rewriting, and in a more flexible way. For example, using
`map` in a loop is "deforested" in HVM. GHC can't do that, because the number
of applications is not known statically.

Note that too much cloning will often make your normal forms large, so avoid
these by keeping your programs linear. For example, instead of:

```javascript
Add = λa λb
  let case_zero = b
  let case_succ = λa_pred (Add a_pred b)
  (a case_succ case_zero)
```

Write:

```javascript
Add = λa
  let case_zero = λb b
  let case_succ = λa_pred λb (Add a_pred b)
  (a case_succ case_zero b)
```

Notice how the latter avoids cloning `b` entirely.

// language/readback.rs

//! Moves HVM Terms from runtime, and building dynamic functions.

// FIXME: `as_code` and `as_term` should just call `readback`, but before doing so, we must test
// the new readback properly to ensure it is correct

use crate::language as language;
use crate::runtime as runtime;
use crate::runtime::{Ptr, Heap, Program};
use std::collections::{hash_map, HashMap, HashSet};

/// Reads back a term from Runtime's memory
pub fn as_code(heap: &Heap, prog: &Program, host: u64) -> String {
  return format!("{}", as_term(heap, prog, host));
}

/// Reads back a term from Runtime's memory
pub fn as_term(heap: &Heap, prog: &Program, host: u64) -> Box<language::syntax::Term> {
  struct CtxName<'a> {
    heap: &'a Heap,
    prog: &'a Program,
    names: &'a mut HashMap<Ptr, String>,
    seen: &'a mut HashSet<Ptr>,
  }

  fn gen_var_names(heap: &Heap, prog: &Program, ctx: &mut CtxName, term: Ptr, depth: u32) {
    if ctx.seen.contains(&term) {
      return;
    };

    ctx.seen.insert(term);

    match runtime::get_tag(term) {
      runtime::LAM => {
        let param = runtime::load_arg(&ctx.heap, term, 0);
        let body = runtime::load_arg(&ctx.heap, term, 1);
        if runtime::get_tag(param) != runtime::ERA {
          let var = runtime::Var(runtime::get_loc(term, 0));
          ctx.names.insert(var, format!("x{}", ctx.names.len()));
        };
        gen_var_names(heap, prog, ctx, body, depth + 1);
      }
      runtime::APP => {
        let lam = runtime::load_arg(&ctx.heap, term, 0);
        let arg = runtime::load_arg(&ctx.heap, term, 1);
        gen_var_names(heap, prog, ctx, lam, depth + 1);
        gen_var_names(heap, prog, ctx, arg, depth + 1);
      }
      runtime::SUP => {
        let arg0 = runtime::load_arg(&ctx.heap, term, 0);
        let arg1 = runtime::load_arg(&ctx.heap, term, 1);
        gen_var_names(heap, prog, ctx, arg0, depth + 1);
        gen_var_names(heap, prog, ctx, arg1, depth + 1);
      }
      runtime::DP0 => {
        let arg = runtime::load_arg(&ctx.heap, term, 2);
        gen_var_names(heap, prog, ctx, arg, depth + 1);
      }
      runtime::DP1 => {
        let arg = runtime::load_arg(&ctx.heap, term, 2);
        gen_var_names(heap, prog, ctx, arg, depth + 1);
      }
      runtime::OP2 => {
        let arg0 = runtime::load_arg(&ctx.heap, term, 0);
        let arg1 = runtime::load_arg(&ctx.heap, term, 1);
        gen_var_names(heap, prog, ctx, arg0, depth + 1);
        gen_var_names(heap, prog, ctx, arg1, depth + 1);
      }
      runtime::U60 => {}
      runtime::F60 => {}
      runtime::CTR | runtime::FUN => {
        let arity = runtime::arity_of(&ctx.prog.aris, term);
        for i in 0..arity {
          let arg = runtime::load_arg(&ctx.heap, term, i);
          gen_var_names(heap, prog, ctx, arg, depth + 1);
        }
      }
      _ => {}
    }
  }

  #[allow(dead_code)]
  struct CtxGo<'a> {
    heap: &'a Heap,
    prog: &'a Program,
    names: &'a HashMap<Ptr, String>,
    seen: &'a HashSet<Ptr>,
  }

  struct Stacks {
    stacks: HashMap<Ptr, Vec<bool>>,
  }

  impl Stacks {
    fn new() -> Stacks {
      Stacks { stacks: HashMap::new() }
    }
    fn get(&self, col: Ptr) -> Option<&Vec<bool>> {
      self.stacks.get(&col)
    }
    fn pop(&mut self, col: Ptr) -> bool {
      let stack = self.stacks.entry(col).or_insert_with(Vec::new);
      stack.pop().unwrap_or(false)
    }
    fn push(&mut self, col: Ptr, val: bool) {
      let stack = self.stacks.entry(col).or_insert_with(Vec::new);
      stack.push(val);
    }
  }

  fn readback(heap: &Heap, prog: &Program, ctx: &mut CtxGo, stacks: &mut Stacks, term: Ptr, depth: u32) -> Box<language::syntax::Term> {
    match runtime::get_tag(term) {
      runtime::LAM => {
        let body = runtime::load_arg(&ctx.heap, term, 1);
        let body = readback(heap, prog, ctx, stacks, body, depth + 1);
        let bind = runtime::load_arg(&ctx.heap, term, 0);
        let name = if runtime::get_tag(bind) == runtime::ERA {
          "*".to_string()
        } else {
          let var = runtime::Var(runtime::get_loc(term, 0));
          ctx.names.get(&var).map(|s| s.clone()).unwrap_or("?".to_string())
        };
        return Box::new(language::syntax::Term::Lam { name, body });
      }
      runtime::APP => {
        let func = runtime::load_arg(&ctx.heap, term, 0);
        let argm = runtime::load_arg(&ctx.heap, term, 1);
        let func = readback(heap, prog, ctx, stacks, func, depth + 1);
        let argm = readback(heap, prog, ctx, stacks, argm, depth + 1);
        return Box::new(language::syntax::Term::App { func, argm });
      }
      runtime::SUP => {
        let col = runtime::get_ext(term);
        let empty = &Vec::new();
        let stack = stacks.get(col).unwrap_or(empty);
        if let Some(val) = stack.last() {
          let arg_idx = *val as u64;
          let val = runtime::load_arg(&ctx.heap, term, arg_idx);
          let old = stacks.pop(col);
          let got = readback(heap, prog, ctx, stacks, val, depth + 1);
          stacks.push(col, old);
          got
        } else {
          let val0 = runtime::load_arg(&ctx.heap, term, 0);
          let val1 = runtime::load_arg(&ctx.heap, term, 1);
          let val0 = readback(heap, prog, ctx, stacks, val0, depth + 1);
          let val1 = readback(heap, prog, ctx, stacks, val1, depth + 1);
          return Box::new(language::syntax::Term::Sup { val0, val1 });
        }
      }
      runtime::DP0 => {
        let col = runtime::get_ext(term);
        let val = runtime::load_arg(&ctx.heap, term, 2);
        stacks.push(col, false);
        let result = readback(heap, prog, ctx, stacks, val, depth + 1);
        stacks.pop(col);
        result
      }
      runtime::DP1 => {
        let col = runtime::get_ext(term);
        let val = runtime::load_arg(&ctx.heap, term, 2);
        stacks.push(col, true);
        let result = readback(heap, prog, ctx, stacks, val, depth + 1);
        stacks.pop(col);
        result
      }
      runtime::OP2 => {
        let oper = match runtime::get_ext(term) {
          runtime::ADD => language::syntax::Oper::Add,
          runtime::SUB => language::syntax::Oper::Sub,
          runtime::MUL => language::syntax::Oper::Mul,
          runtime::DIV => language::syntax::Oper::Div,
          runtime::MOD => language::syntax::Oper::Mod,
          runtime::AND => language::syntax::Oper::And,
          runtime::OR  => language::syntax::Oper::Or,
          runtime::XOR => language::syntax::Oper::Xor,
          runtime::SHL => language::syntax::Oper::Shl,
          runtime::SHR => language::syntax::Oper::Shr,
          runtime::LTN => language::syntax::Oper::Ltn,
          runtime::LTE => language::syntax::Oper::Lte,
          runtime::EQL => language::syntax::Oper::Eql,
          runtime::GTE => language::syntax::Oper::Gte,
          runtime::GTN => language::syntax::Oper::Gtn,
          runtime::NEQ => language::syntax::Oper::Neq,
          _            => panic!("unknown operation"),
        };
        let val0 = runtime::load_arg(&ctx.heap, term, 0);
        let val1 = runtime::load_arg(&ctx.heap, term, 1);
        let val0 = readback(heap, prog, ctx, stacks, val0, depth + 1);
        let val1 = readback(heap, prog, ctx, stacks, val1, depth + 1);
        return Box::new(language::syntax::Term::Op2 { oper, val0, val1 });
      }
      runtime::U60 => {
        let numb = runtime::get_num(term);
        return Box::new(language::syntax::Term::U6O { numb });
      }
      runtime::F60 => {
        let numb = runtime::get_num(term);
        return Box::new(language::syntax::Term::F6O { numb });
      }
      runtime::CTR | runtime::FUN => {
        let func = runtime::get_ext(term);
        let arit = runtime::arity_of(&ctx.prog.aris, term);
        let mut args = Vec::new();
        for i in 0 .. arit {
          let arg = runtime::load_arg(&ctx.heap, term, i);
          args.push(readback(heap, prog, ctx, stacks, arg, depth + 1));
        }
        let name = ctx.prog.nams.get(&func).map(String::to_string).unwrap_or_else(|| format!("${}", func));
        return Box::new(language::syntax::Term::Ctr { name, args });
      }
      runtime::VAR => {
        let name = ctx.names.get(&term).map(String::to_string).unwrap_or_else(|| format!("^{}", runtime::get_loc(term, 0)));
        return Box::new(language::syntax::Term::Var { name }); // ............... /\ why this sounds so threatening?
      }
      runtime::ARG => {
        return Box::new(language::syntax::Term::Var { name: "<arg>".to_string() });
      }
      runtime::ERA => {
        return Box::new(language::syntax::Term::Var { name: "<era>".to_string() });
      }
      _ => {
        return Box::new(language::syntax::Term::Var { name: format!("<unknown_tag_{}>", runtime::get_tag(term)) });
      }
    }
  }

  let term = runtime::load_ptr(heap, host);

  let mut names = HashMap::<Ptr, String>::new();
  let mut seen = HashSet::<Ptr>::new();

  let ctx = &mut CtxName { heap, prog, names: &mut names, seen: &mut seen };
  gen_var_names(heap, prog, ctx, term, 0);

  let ctx = &mut CtxGo { heap, prog, names: &names, seen: &seen };
  let mut stacks = Stacks::new();
  readback(heap, prog, ctx, &mut stacks, term, 0)
}

// Reads a term linearly, i.e., preserving dups
pub fn as_linear_term(heap: &Heap, prog: &Program, host: u64) -> Box<language::syntax::Term> {
  enum StackItem {
    Term(Ptr),
    Resolver(Ptr),
  }

  fn ctr_name(prog: &Program, id: u64) -> String {
    if let Some(name) = prog.nams.get(&id) {
      return name.clone();
    } else {
      return format!("${}", id);
    }
  }

  fn dups(heap: &Heap, prog: &Program, term: Ptr, names: &mut HashMap<u64, String>) -> language::syntax::Term {
    let mut lets: HashMap<u64, u64> = HashMap::new();
    let mut kinds: HashMap<u64, u64> = HashMap::new();
    let mut stack = vec![term];
    while !stack.is_empty() {
      let term = stack.pop().unwrap();
      match runtime::get_tag(term) {
        runtime::LAM => {
          names.insert(runtime::get_loc(term, 0), format!("{}", names.len()));
          stack.push(runtime::load_arg(heap, term, 1));
        }
        runtime::APP => {
          stack.push(runtime::load_arg(heap, term, 1));
          stack.push(runtime::load_arg(heap, term, 0));
        }
        runtime::SUP => {
          stack.push(runtime::load_arg(heap, term, 1));
          stack.push(runtime::load_arg(heap, term, 0));
        }
        runtime::DP0 => {
          if let hash_map::Entry::Vacant(e) = lets.entry(runtime::get_loc(term, 0)) {
            names.insert(runtime::get_loc(term, 0), format!("{}", names.len()));
            kinds.insert(runtime::get_loc(term, 0), runtime::get_ext(term));
            e.insert(runtime::get_loc(term, 0));
            stack.push(runtime::load_arg(heap, term, 2));
          }
        }
        runtime::DP1 => {
          if let hash_map::Entry::Vacant(e) = lets.entry(runtime::get_loc(term, 0)) {
            names.insert(runtime::get_loc(term, 0), format!("{}", names.len()));
            kinds.insert(runtime::get_loc(term, 0), runtime::get_ext(term));
            e.insert(runtime::get_loc(term, 0));
            stack.push(runtime::load_arg(heap, term, 2));
          }
        }
        runtime::OP2 => {
          stack.push(runtime::load_arg(heap, term, 1));
          stack.push(runtime::load_arg(heap, term, 0));
        }
        runtime::CTR | runtime::FUN => {
          let arity = runtime::arity_of(&prog.aris, term);
          for i in (0..arity).rev() {
            stack.push(runtime::load_arg(heap, term, i));
          }
        }
        _ => {}
      }
    }

    let cont = expr(heap, prog, term, &names);
    if lets.is_empty() {
      cont
    } else {
      let mut output = language::syntax::Term::Var { name: "?".to_string() };
      for (i, (_key, pos)) in lets.iter().enumerate() {
        // todo: reverse
        let what = String::from("?h");
        let name = names.get(&pos).unwrap_or(&what);
        let nam0 = if runtime::load_ptr(heap, pos + 0) == runtime::Era() { String::from("*") } else { format!("a{}", name) };
        let nam1 = if runtime::load_ptr(heap, pos + 1) == runtime::Era() { String::from("*") } else { format!("b{}", name) };
        let expr = expr(heap, prog, runtime::load_ptr(heap, pos + 2), &names);
        if i == 0 {
          output = language::syntax::Term::Dup { nam0, nam1, expr: Box::new(expr), body: Box::new(cont.clone()) };
        } else {
          output = language::syntax::Term::Dup { nam0, nam1, expr: Box::new(expr), body: Box::new(output) };
        }
      }
      output
    }
  }

  fn expr(heap: &Heap, prog: &Program, term: Ptr, names: &HashMap<u64, String>) -> language::syntax::Term {
    let mut stack = vec![StackItem::Term(term)];
    let mut output : Vec<language::syntax::Term> = Vec::new();
    while !stack.is_empty() {
      let item = stack.pop().unwrap();
      match item {
        StackItem::Resolver(term) => {
          match runtime::get_tag(term) {
            runtime::CTR => {
              let func = runtime::get_ext(term);
              let arit = runtime::arity_of(&prog.aris, term);
              let mut args = Vec::new();
              for _ in 0..arit {
                args.push(Box::new(output.pop().unwrap()));
              }
              let name = ctr_name(prog, func);
              output.push(language::syntax::Term::Ctr { name, args });
            },
            runtime::FUN => {
              let func = runtime::get_ext(term);
              let arit = runtime::arity_of(&prog.aris, term);
              let mut args = Vec::new();
              for _ in 0..arit {
                args.push(Box::new(output.pop().unwrap()));
              }
              let name = ctr_name(prog, func);
              output.push(language::syntax::Term::Ctr { name, args });
            }
            runtime::LAM => {
              let name = format!("x{}", names.get(&runtime::get_loc(term, 0)).unwrap_or(&String::from("?")));
              let body = Box::new(output.pop().unwrap());
              output.push(language::syntax::Term::Lam { name, body });
            }
            runtime::APP => {
              let argm = Box::new(output.pop().unwrap());
              let func = Box::new(output.pop().unwrap());
              output.push(language::syntax::Term::App { func , argm });
            }
            runtime::OP2 => {
              let oper = runtime::get_ext(term);
              let oper = match oper {
                runtime::ADD => language::syntax::Oper::Add,
                runtime::SUB => language::syntax::Oper::Sub,
                runtime::MUL => language::syntax::Oper::Mul,
                runtime::DIV => language::syntax::Oper::Div,
                runtime::MOD => language::syntax::Oper::Mod,
                runtime::AND => language::syntax::Oper::And,
                runtime::OR  => language::syntax::Oper::Or,
                runtime::XOR => language::syntax::Oper::Xor,
                runtime::SHL => language::syntax::Oper::Shl,
                runtime::SHR => language::syntax::Oper::Shr,
                runtime::LTN => language::syntax::Oper::Ltn,
                runtime::LTE => language::syntax::Oper::Lte,
                runtime::EQL => language::syntax::Oper::Eql,
                runtime::GTE => language::syntax::Oper::Gte,
                runtime::GTN => language::syntax::Oper::Gtn,
                runtime::NEQ => language::syntax::Oper::Neq,
                _       => panic!("Invalid operator."),
              };
              let val1 = Box::new(output.pop().unwrap());
              let val0 = Box::new(output.pop().unwrap());
              output.push(language::syntax::Term::Op2 { oper, val0, val1 })
            }
            _ => panic!("Term not valid in readback"),
          }
        },
        StackItem::Term(term) => {
          match runtime::get_tag(term) {
            runtime::DP0 => {
              let name = format!("a{}", names.get(&runtime::get_loc(term, 0)).unwrap_or(&String::from("?a")));
              output.push(language::syntax::Term::Var { name });
            }
            runtime::DP1 => {
              let name = format!("b{}", names.get(&runtime::get_loc(term, 0)).unwrap_or(&String::from("?b")));
              output.push(language::syntax::Term::Var { name });
            }
            runtime::VAR => {
              let name = format!("x{}", names.get(&runtime::get_loc(term, 0)).unwrap_or(&String::from("?x")));
              output.push(language::syntax::Term::Var { name });
            }
            runtime::LAM => {
              stack.push(StackItem::Resolver(term));
              stack.push(StackItem::Term(runtime::load_arg(heap, term, 1)));
            }
            runtime::APP => {
              stack.push(StackItem::Resolver(term));
              stack.push(StackItem::Term(runtime::load_arg(heap, term, 1)));
              stack.push(StackItem::Term(runtime::load_arg(heap, term, 0)));
            }
            runtime::SUP => {}
            runtime::OP2 => {
              stack.push(StackItem::Resolver(term));
              stack.push(StackItem::Term(runtime::load_arg(heap, term, 1)));
              stack.push(StackItem::Term(runtime::load_arg(heap, term, 0)));
            }
            runtime::U60 => {
              let numb = runtime::get_num(term);
              output.push(language::syntax::Term::U6O { numb });
            }
            runtime::F60 => {
              let numb = runtime::get_num(term);
              output.push(language::syntax::Term::F6O { numb });
            }
            runtime::CTR => {
              let arit = runtime::arity_of(&prog.aris, term);
              stack.push(StackItem::Resolver(term));
              for i in 0..arit {
                stack.push(StackItem::Term(runtime::load_arg(heap, term, i)));
              }
            }
            runtime::FUN => {
              let arit = runtime::arity_of(&prog.aris, term);
              stack.push(StackItem::Resolver(term));
              for i in 0..arit {
                stack.push(StackItem::Term(runtime::load_arg(heap, term, i)));
              }
            }
            runtime::ERA => {}
            _ => {}
          }
        }
      }
    }
    output.pop().unwrap()
  }

  let mut names: HashMap<u64, String> = HashMap::new();
  Box::new(dups(heap, prog, runtime::load_ptr(heap, host), &mut names))
}

/// Reads back a term from Runtime's memory
pub fn as_linear_code(heap: &Heap, prog: &Program, host: u64) -> String {
  return format!("{}", as_linear_term(heap, prog, host));
}


// This reads a term in the `(String.cons ... String.nil)` shape directly into a string.
pub fn as_string(heap: &Heap, prog: &Program, tids: &[usize], host: u64) -> Option<String> {
  let mut host = host;
  let mut text = String::new();
  runtime::reduce(heap, prog, tids, host, true, false);
  loop {
    let term = runtime::load_ptr(heap, host);
    if runtime::get_tag(term) == runtime::CTR {
      let fid = runtime::get_ext(term);
      if fid == runtime::STRING_NIL {
        break;
      }
      if fid == runtime::STRING_CONS {
        let chr = runtime::load_ptr(heap, runtime::get_loc(term, 0));
        if runtime::get_tag(chr) == runtime::U60 {
          text.push(std::char::from_u32(runtime::get_num(chr) as u32).unwrap_or('?'));
          host = runtime::get_loc(term, 1);
          continue;
        } else {
          return None;
        }
      }
      return None;
    } else if runtime::get_tag(term) == runtime::SUP {
      host = runtime::get_loc(term, 0);
      continue;
    } else {
      return None;
    }
  }
  return Some(text);
}

// language/rulebook.rs

use crate::language as language;
use crate::runtime as runtime;
use std::collections::{BTreeMap, HashMap, HashSet};

// RuleBook
// ========

// A RuleBook is a file ready for compilation. It includes:
// - rule_group: sanitized rules grouped by function
// - id_to_name: maps ctr ids to names
// - name_to_id: maps ctr names to ids
// - ctr_is_fun: true if a ctr is used as a function
// A sanitized rule has all its variables renamed to have unique names.
// Variables that are never used are renamed to "*".
#[derive(Clone, Debug)]
pub struct RuleBook {
  pub rule_group: HashMap<String, RuleGroup>,
  pub name_count: u64,
  pub name_to_id: HashMap<String, u64>,
  pub id_to_smap: HashMap<u64, Vec<bool>>,
  pub id_to_name: HashMap<u64, String>,
  pub ctr_is_fun: HashMap<String, bool>,
}

pub type RuleGroup = (usize, Vec<language::syntax::Rule>);

// Creates an empty rulebook
pub fn new_rulebook() -> RuleBook {
  let mut book = RuleBook {
    rule_group: HashMap::new(),
    name_count: 0,
    name_to_id: HashMap::new(),
    id_to_smap: HashMap::new(),
    id_to_name: HashMap::new(),
    ctr_is_fun: HashMap::new(),
  };
  for precomp in runtime::PRECOMP {
    book.name_count = book.name_count + 1;
    book.name_to_id.insert(precomp.name.to_string(), precomp.id);
    book.id_to_name.insert(precomp.id, precomp.name.to_string());
    book.id_to_smap.insert(precomp.id, precomp.smap.to_vec());
    book.ctr_is_fun.insert(precomp.name.to_string(), precomp.funs.is_some());
  }
  return book;
}

// Adds a group to a rulebook
pub fn add_group(book: &mut RuleBook, name: &str, group: &RuleGroup) {
  fn register(book: &mut RuleBook, term: &language::syntax::Term, lhs_top: bool) {
    match term {
      language::syntax::Term::Dup { expr, body, .. } => {
        register(book, expr, false);
        register(book, body, false);
      }
      language::syntax::Term::Sup { val0, val1 } => {
        register(book, val0, false);
        register(book, val1, false);
      }
      language::syntax::Term::Let { expr, body, .. } => {
        register(book, expr, false);
        register(book, body, false);
      }
      language::syntax::Term::Lam { body, .. } => {
        register(book, body, false);
      }
      language::syntax::Term::App { func, argm, .. } => {
        register(book, func, false);
        register(book, argm, false);
      }
      language::syntax::Term::Op2 { val0, val1, .. } => {
        register(book, val0, false);
        register(book, val1, false);
      }
      term@language::syntax::Term::Ctr { name, args } => {
        // Registers id
        let id = match book.name_to_id.get(name) {
          None => {
            let id = book.name_count;
            book.name_to_id.insert(name.clone(), id);
            book.id_to_name.insert(id, name.clone());
            book.name_count += 1;
            id
          }
          Some(id) => {
            *id
          }
        };
        // Registers smap
        match book.id_to_smap.get(&id) {
          None => {
            book.id_to_smap.insert(id, vec![false; args.len()]);
          }
          Some(smap) => {
            if smap.len() != args.len() {
              panic!("inconsistent arity on: '{}'", term);
            }
          }
        }
        // Force strictness when pattern-matching
        if lhs_top {
          for i in 0 .. args.len() {
            let is_strict = match *args[i] {
              language::syntax::Term::Ctr { .. } => true,
              language::syntax::Term::U6O { .. } => true,
              language::syntax::Term::F6O { .. } => true,
              _ => false,
            };
            if is_strict {
              book.id_to_smap.get_mut(&id).unwrap()[i] = true;
            }
          }
        }
        // Recurses
        for arg in args {
          register(book, arg, false);
        }
      }
      _ => (),
    }
  }

  // Inserts the group on the book
  book.rule_group.insert(name.to_string(), group.clone());

  // Builds its metadata (name_to_id, id_to_name, ctr_is_fun)
  for rule in &group.1 {
    register(book, &rule.lhs, true);
    register(book, &rule.rhs, false);
    if let language::syntax::Term::Ctr { ref name, .. } = *rule.lhs {
      book.ctr_is_fun.insert(name.clone(), true);
    }
  }
}

// Converts a file to a rulebook
pub fn gen_rulebook(file: &language::syntax::File) -> RuleBook {
  // Creates an empty rulebook
  let mut book = new_rulebook();

  // Flattens, sanitizes and groups this file's rules
  let groups = group_rules(&sanitize_rules(&flatten(&file.rules)));

  // Adds each group
  for (name, group) in groups.iter() {
    if book.name_to_id.get(name).unwrap_or(&u64::MAX) >= &runtime::PRECOMP_COUNT {
      add_group(&mut book, name, group);
    }
  }

  // Includes SMaps
  for (rule_name, rule_smap) in &file.smaps {
    let id = book.name_to_id.get(rule_name).unwrap();
    if book.id_to_smap.get(id).is_none() {
      book.id_to_smap.insert(*id, vec![false; rule_smap.len()]);
    }
    let smap = book.id_to_smap.get_mut(id).unwrap();
    for i in 0 .. smap.len() {
      if rule_smap[i] {
        smap[i] = true;
      }
    }
  }

  book
}

// Groups rules by name. For example:
//   (add (succ a) (succ b)) = (succ (succ (add a b)))
//   (add (succ a) (zero)  ) = (succ a)
//   (add (zero)   (succ b)) = (succ b)
//   (add (zero)   (zero)  ) = (zero)
// This is a group of 4 rules starting with the "add" name.
pub fn group_rules(rules: &[language::syntax::Rule]) -> HashMap<String, RuleGroup> {
  let mut groups: HashMap<String, RuleGroup> = HashMap::new();
  for rule in rules {
    if let language::syntax::Term::Ctr { ref name, ref args } = *rule.lhs {
      let group = groups.get_mut(name);
      match group {
        None => {
          groups.insert(name.clone(), (args.len(), Vec::from([rule.clone()])));
        }
        Some((_arity, rules)) => {
          rules.push(rule.clone());
        }
      }
    }
  }
  groups
}

// Sanitize
// ========

#[allow(dead_code)]
pub struct SanitizedRule {
  pub rule: language::syntax::Rule,
  pub uses: HashMap<String, u64>,
}

// FIXME: right now, the sanitizer isn't able to identify if a scopeless lambda doesn't use its
// bound variable, so it won't set the "eras" flag to "true" in this case, but it should.

// This big function sanitizes a rule. That has the following effect:
// - All variables are renamed to have a global unique name.
// - All variables are linearized.
//   - If they're used more than once, dups are inserted.
//   - If they're used once, nothing changes.
//   - If they're never used, their name is changed to "*"
// Example:
//   - sanitizing: `(Foo a b) = (+ a a)`
//   - results in: `(Foo x0 *) = dup x0.0 x0.1 = x0; (+ x0.0 x0.1)`
pub fn sanitize_rule(rule: &language::syntax::Rule) -> Result<language::syntax::Rule, String> {
  ...
}


// Sanitizes all rules in a vector
pub fn sanitize_rules(rules: &[language::syntax::Rule]) -> Vec<language::syntax::Rule> {
  ...
}

// notes
// -----

// hoas_opt: this is an internal optimization that allows us to simplify kind2's hoas generator.
// it will cause the default patterns of functions with a name starting with "f$" to only match
// productive hoas constructors (ct0, ct1, ..., ctg, num), as well as native numbers and
// constructors with 0-arity, which are used by kind2's hoas functions, unless it is the last
// (default) clause, which kind2 uses to quote a call back to low-order. this is an internal
// feature that won't affect programs other than kind2. we can remove this in a future, but that
// would require kind2 to replicate hvm's flattener algorithm, so we just use it instead.

// language/syntax.rs

use HOPA;
use crate::runtime::data::u60;
use crate::runtime::data::f60;

// Types
// =====

// Term
// ----

#[derive(Clone, Debug)]
pub enum Term {
  Var { name: String }, // TODO: add `global: bool`
  Dup { nam0: String, nam1: String, expr: Box<Term>, body: Box<Term> },
  Sup { val0: Box<Term>, val1: Box<Term> },
  Let { name: String, expr: Box<Term>, body: Box<Term> },
  Lam { name: String, body: Box<Term> },
  App { func: Box<Term>, argm: Box<Term> },
  Ctr { name: String, args: Vec<Box<Term>> },
  U6O { numb: u64 },
  F6O { numb: u64 },
  Op2 { oper: Oper, val0: Box<Term>, val1: Box<Term> },
}

#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub enum Oper {
  Add, Sub, Mul, Div,
  Mod, And, Or,  Xor,
  Shl, Shr, Lte, Ltn,
  Eql, Gte, Gtn, Neq,
}

// Rule
// ----

#[derive(Clone, Debug)]
pub struct Rule {
  pub lhs: Box<Term>,
  pub rhs: Box<Term>,
}

// SMap
// ----

type SMap = (String, Vec<bool>);

// File
// ----

pub struct File {
  pub rules: Vec<Rule>,
  pub smaps: Vec<SMap>,
}

// Stringifier
// ===========

// Term
// ----

impl std::fmt::Display for Oper {
  fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
    write!(f, "{}", match self {
      Self::Add => "+",
      Self::Sub => "-",
      Self::Mul => "*",
      Self::Div => "/",
      Self::Mod => "%",
      Self::And => "&",
      Self::Or  => "|",
      Self::Xor => "^",
      Self::Shl => "<<",
      Self::Shr => ">>",
      Self::Lte => "<=",
      Self::Ltn => "<",
      Self::Eql => "==",
      Self::Gte => ">=",
      Self::Gtn => ">",
      Self::Neq => "!=",
    })
  }
}

impl std::fmt::Display for Term {
  // WARN: I think this could overflow, might need to rewrite it to be iterative instead of recursive?
  // NOTE: Another issue is complexity. This function is O(N^2). Should use ropes to be linear.
  fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
    fn lst_sugar(term: &Term) -> Option<String> {
      fn go(term: &Term, text: &mut String, fst: bool) -> Option<()> {
        if let Term::Ctr { name, args } = term {
          if name == "List.cons" && args.len() == 2 {
            if !fst {
              text.push_str(", ");
            }
            text.push_str(&format!("{}", args[0]));
            go(&args[1], text, false)?;
            return Some(());
          }
          if name == "List.nil" && args.is_empty() {
            return Some(());
          }
        }
        None
      }
      let mut result = String::new();
      result.push('[');
      go(term, &mut result, true)?;
      result.push(']');
      Some(result)
    }

    fn str_sugar(term: &Term) -> Option<String> {
      fn go(term: &Term, text: &mut String) -> Option<()> {
        if let Term::Ctr { name, args } = term {
          if name == "String.cons" && args.len() == 2 {
            if let Term::U6O { numb } = *args[0] {
              text.push(std::char::from_u32(numb as u32)?);
              go(&args[1], text)?;
              return Some(());
            }
          }
          if name == "String.nil" && args.is_empty() {
            return Some(());
          }
        }
        None
      }
      let mut result = String::new();
      result.push('"');
      go(term, &mut result)?;
      result.push('"');
      Some(result)
    }
    match self {
      Self::Var { name } => write!(f, "{}", name),
      Self::Dup { nam0, nam1, expr, body } => write!(f, "dup {} {} = {}; {}", nam0, nam1, expr, body),
      Self::Sup { val0, val1 } => write!(f, "{{{} {}}}", val0, val1),
      Self::Let { name, expr, body } => write!(f, "let {} = {}; {}", name, expr, body),
      Self::Lam { name, body } => write!(f, "λ{} {}", name, body),
      Self::App { func, argm } => {
        let mut args = vec![argm];
        let mut expr = func;
        while let Self::App { func, argm } = &**expr {
          args.push(argm);
          expr = func;
        }
        args.reverse();
        write!(f, "({} {})", expr, args.iter().map(|x| format!("{}",x)).collect::<Vec<String>>().join(" "))
      },
      Self::Ctr { name, args } => {
        // Ctr sugars
        let sugars = [str_sugar, lst_sugar];
        for sugar in sugars {
          if let Some(term) = sugar(self) {
            return write!(f, "{}", term);
          }
        }

        write!(f, "({}{})", name, args.iter().map(|x| format!(" {}", x)).collect::<String>())
      }
      Self::U6O { numb } => write!(f, "{}", &u60::show(*numb)),
      Self::F6O { numb } => write!(f, "{}", &f60::show(*numb)),
      Self::Op2 { oper, val0, val1 } => write!(f, "({} {} {})", oper, val0, val1),
    }
  }
}

// Rule
// ----

impl std::fmt::Display for Rule {
  fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
    write!(f, "{} = {}", self.lhs, self.rhs)
  }
}

// File
// ----

impl std::fmt::Display for File {
  fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
    write!(f, "{}", self.rules.iter().map(|rule| format!("{}", rule)).collect::<Vec<String>>().join("\n"))
  }
}

// Parser
// ======

pub fn parse_let(state: HOPA::State) -> HOPA::Answer<Option<Box<Term>>> {
  return HOPA::guard(
    HOPA::do_there_take_exact("let "),
    Box::new(|state| {
      let (state, _)    = HOPA::force_there_take_exact("let ", state)?;
      let (state, name) = HOPA::there_nonempty_name(state)?;
      let (state, _)    = HOPA::force_there_take_exact("=", state)?;
      let (state, expr) = parse_term(state)?;
      let (state, _)    = HOPA::there_take_exact(";", state)?;
      let (state, body) = parse_term(state)?;
      Ok((state, Box::new(Term::Let { name, expr, body })))
    }),
    state,
  );
}

pub fn parse_dup(state: HOPA::State) -> HOPA::Answer<Option<Box<Term>>> {
  return HOPA::guard(
    HOPA::do_there_take_exact("dup "),
    Box::new(|state| {
      let (state, _)    = HOPA::force_there_take_exact("dup ", state)?;
      let (state, nam0) = HOPA::there_nonempty_name(state)?;
      let (state, nam1) = HOPA::there_nonempty_name(state)?;
      let (state, _)    = HOPA::force_there_take_exact("=", state)?;
      let (state, expr) = parse_term(state)?;
      let (state, _)    = HOPA::there_take_exact(";", state)?;
      let (state, body) = parse_term(state)?;
      Ok((state, Box::new(Term::Dup { nam0, nam1, expr, body })))
    }),
    state,
  );
}

pub fn parse_sup(state: HOPA::State) -> HOPA::Answer<Option<Box<Term>>> {
  HOPA::guard(
    HOPA::do_there_take_exact("{"),
    Box::new(move |state| {
      let (state, _)    = HOPA::force_there_take_exact("{", state)?;
      let (state, val0) = parse_term(state)?;
      let (state, val1) = parse_term(state)?;
      let (state, _)    = HOPA::force_there_take_exact("}", state)?;
      Ok((state, Box::new(Term::Sup { val0, val1 })))
    }),
    state,
  )
}

pub fn parse_lam(state: HOPA::State) -> HOPA::Answer<Option<Box<Term>>> {
  let parse_symbol = |x| {
    return HOPA::any(&[
      HOPA::do_there_take_exact("λ"),
      HOPA::do_there_take_exact("@"),
    ], x);
  };
  HOPA::guard(
    Box::new(parse_symbol),
    Box::new(move |state| {
      let (state, _)    = parse_symbol(state)?;
      let (state, name) = HOPA::there_name(state)?;
      let (state, body) = parse_term(state)?;
      Ok((state, Box::new(Term::Lam { name, body })))
    }),
    state,
  )
}

pub fn parse_app(state: HOPA::State) -> HOPA::Answer<Option<Box<Term>>> {
  return HOPA::guard(
    HOPA::do_there_take_exact("("),
    Box::new(|state| {
      HOPA::list(
        HOPA::do_there_take_exact("("),
        HOPA::do_there_take_exact(""),
        HOPA::do_there_take_exact(")"),
        Box::new(parse_term),
        Box::new(|args| {
          if !args.is_empty() {
            args.into_iter().reduce(|a, b| Box::new(Term::App { func: a, argm: b })).unwrap()
          } else {
            Box::new(Term::U6O { numb: 0 })
          }
        }),
        state,
      )
    }),
    state,
  );
}

pub fn parse_ctr(state: HOPA::State) -> HOPA::Answer<Option<Box<Term>>> {
  HOPA::guard(
    Box::new(|state| {
      let (state, _) = HOPA::there_take_exact("(", state)?;
      let (state, head) = HOPA::there_take_head(state)?;
      Ok((state, ('A'..='Z').contains(&head)))
    }),
    Box::new(|state| {
      let (state, open) = HOPA::there_take_exact("(", state)?;
      let (state, name) = HOPA::there_nonempty_name(state)?;
      let (state, args) = if open {
        HOPA::until(HOPA::do_there_take_exact(")"), Box::new(parse_term), state)?
      } else {
        (state, Vec::new())
      };
      Ok((state, Box::new(Term::Ctr { name, args })))
    }),
    state,
  )
}

pub fn parse_num(state: HOPA::State) -> HOPA::Answer<Option<Box<Term>>> {
  HOPA::guard(
    Box::new(|state| {
      let (state, head) = HOPA::there_take_head(state)?;
      Ok((state, ('0'..='9').contains(&head)))
    }),
    Box::new(|state| {
      let (state, text) = HOPA::there_nonempty_name(state)?;
      if !text.is_empty() {
        if text.starts_with("0x") {
          return Ok((state, Box::new(Term::U6O { numb: u60::new(u64::from_str_radix(&text[2..], 16).unwrap()) })));
        } else {
          if text.find(".").is_some() {
            return Ok((state, Box::new(Term::F6O { numb: f60::new(text.parse::<f64>().unwrap()) })));
          } else {
            return Ok((state, Box::new(Term::U6O { numb: u60::new(text.parse::<u64>().unwrap()) })));
          }
        }
      } else {
        Ok((state, Box::new(Term::U6O { numb: 0 })))
      }
    }),
    state,
  )
}

pub fn parse_op2(state: HOPA::State) -> HOPA::Answer<Option<Box<Term>>> {
  fn is_op_char(chr: char) -> bool {
    matches!(chr, '+' | '-' | '*' | '/' | '%' | '&' | '|' | '^' | '<' | '>' | '=' | '!')
  }
  fn parse_oper(state: HOPA::State) -> HOPA::Answer<Oper> {
    fn op<'a>(symbol: &'static str, oper: Oper) -> HOPA::Parser<'a, Option<Oper>> {
      Box::new(move |state| {
        let (state, done) = HOPA::there_take_exact(symbol, state)?;
        Ok((state, if done { Some(oper) } else { None }))
      })
    }
    HOPA::attempt("Oper", &[
      op("+", Oper::Add),
      op("-", Oper::Sub),
      op("*", Oper::Mul),
      op("/", Oper::Div),
      op("%", Oper::Mod),
      op("&", Oper::And),
      op("|", Oper::Or),
      op("^", Oper::Xor),
      op("<<", Oper::Shl),
      op(">>", Oper::Shr),
      op("<=", Oper::Lte),
      op("<", Oper::Ltn),
      op("==", Oper::Eql),
      op(">=", Oper::Gte),
      op(">", Oper::Gtn),
      op("!=", Oper::Neq),
    ], state)
  }
  HOPA::guard(
    Box::new(|state| {
      let (state, open) = HOPA::there_take_exact("(", state)?;
      let (state, head) = HOPA::there_take_head(state)?;
      Ok((state, open && is_op_char(head)))
    }),
    Box::new(|state| {
      let (state, _) = HOPA::there_take_exact("(", state)?;
      let (state, oper) = parse_oper(state)?;
      let (state, val0) = parse_term(state)?;
      let (state, val1) = parse_term(state)?;
      let (state, _) = HOPA::there_take_exact(")", state)?;
      Ok((state, Box::new(Term::Op2 { oper, val0, val1 })))
    }),
    state,
  )
}

pub fn parse_var(state: HOPA::State) -> HOPA::Answer<Option<Box<Term>>> {
  HOPA::guard(
    Box::new(|state| {
      let (state, head) = HOPA::there_take_head(state)?;
      Ok((state, ('a'..='z').contains(&head) || head == '_' || head == '$'))
    }),
    Box::new(|state| {
      let (state, name) = HOPA::there_name(state)?;
      Ok((state, Box::new(Term::Var { name })))
    }),
    state,
  )
}

pub fn parse_sym_sugar(state: HOPA::State) -> HOPA::Answer<Option<Box<Term>>> {
  use std::hash::Hasher;
  HOPA::guard(
    HOPA::do_there_take_exact("%"),
    Box::new(|state| {
      let (state, _)    = HOPA::there_take_exact("%", state)?;
      let (state, name) = HOPA::there_name(state)?;
      let hash = {
        let mut hasher = std::collections::hash_map::DefaultHasher::new();
        hasher.write(name.as_bytes());
        hasher.finish()
      };
      Ok((state, Box::new(Term::U6O { numb: u60::new(hash) })))
    }),
    state,
  )
}

// ask x = fn; body
// ----------------
// (fn λx body)
pub fn parse_ask_sugar_named(state: HOPA::State) -> HOPA::Answer<Option<Box<Term>>> {
  return HOPA::guard(
    Box::new(|state| {
      let (state, asks) = HOPA::there_take_exact("ask ", state)?;
      let (state, name) = HOPA::there_name(state)?;
      let (state, eqls) = HOPA::there_take_exact("=", state)?;
      Ok((state, asks && name.len() > 0 && eqls))
    }),
    Box::new(|state| {
      let (state, _)    = HOPA::force_there_take_exact("ask ", state)?;
      let (state, name) = HOPA::there_nonempty_name(state)?;
      let (state, _)    = HOPA::force_there_take_exact("=", state)?;
      let (state, func) = parse_term(state)?;
      let (state, _)    = HOPA::there_take_exact(";", state)?;
      let (state, body) = parse_term(state)?;
      Ok((state, Box::new(Term::App { func, argm: Box::new(Term::Lam { name, body }) })))
    }),
    state,
  );
}

pub fn parse_ask_sugar_anon(state: HOPA::State) -> HOPA::Answer<Option<Box<Term>>> {
  return HOPA::guard(
    HOPA::do_there_take_exact("ask "),
    Box::new(|state| {
      let (state, _)    = HOPA::force_there_take_exact("ask ", state)?;
      let (state, func) = parse_term(state)?;
      let (state, _)    = HOPA::there_take_exact(";", state)?;
      let (state, body) = parse_term(state)?;
      Ok((state, Box::new(Term::App { func, argm: Box::new(Term::Lam { name: "*".to_string(), body }) })))
    }),
    state,
  );
}

pub fn parse_chr_sugar(state: HOPA::State) -> HOPA::Answer<Option<Box<Term>>> {
  HOPA::guard(
    Box::new(|state| {
      let (state, head) = HOPA::there_take_head(state)?;
      Ok((state, head == '\''))
    }),
    Box::new(|state| {
      let (state, _) = HOPA::there_take_exact("'", state)?;
      if let Some(c) = HOPA::head(state) {
        let state = HOPA::tail(state);
        let (state, _) = HOPA::there_take_exact("'", state)?;
        Ok((state, Box::new(Term::U6O { numb: c as u64 })))
      } else {
        HOPA::expected("character", 1, state)
      }
    }),
    state,
  )
}

// TODO: parse escape sequences
pub fn parse_str_sugar(state: HOPA::State) -> HOPA::Answer<Option<Box<Term>>> {
  HOPA::guard(
    Box::new(|state| {
      let (state, head) = HOPA::there_take_head(state)?;
      Ok((state, head == '"' || head == '`'))
    }),
    Box::new(|state| {
      let delim = HOPA::head(state).unwrap_or('\0');
      let state = HOPA::tail(state);
      let mut chars: Vec<char> = Vec::new();
      let mut state = state;
      loop {
        if let Some(next) = HOPA::head(state) {
          if next == delim || next == '\0' {
            state = HOPA::tail(state);
            break;
          } else {
            chars.push(next);
            state = HOPA::tail(state);
          }
        }
      }
      let empty = Term::Ctr { name: "String.nil".to_string(), args: Vec::new() };
      let list = Box::new(chars.iter().rfold(empty, |t, h| Term::Ctr {
        name: "String.cons".to_string(),
        args: vec![Box::new(Term::U6O { numb: *h as u64 }), Box::new(t)],
      }));
      Ok((state, list))
    }),
    state,
  )
}

pub fn parse_lst_sugar(state: HOPA::State) -> HOPA::Answer<Option<Box<Term>>> {
  HOPA::guard(
    Box::new(|state| {
      let (state, head) = HOPA::there_take_head(state)?;
      Ok((state, head == '['))
    }),
    Box::new(|state| {
      let (state, _head) = HOPA::there_take_exact("[", state)?;
      // let mut elems: Vec<Box<Term>> = Vec::new();
      let state = state;
      let (state, elems) = HOPA::until(
        Box::new(|x| HOPA::there_take_exact("]", x)),
        Box::new(|x| {
          let (state, term) = parse_term(x)?;
          let (state, _) = HOPA::maybe(Box::new(|x| HOPA::there_take_exact(",", x)), state)?;
          Ok((state, term))
        }),
        state,
      )?;
      let empty = Term::Ctr { name: "List.nil".to_string(), args: Vec::new() };
      let list = Box::new(elems.iter().rfold(empty, |t, h| Term::Ctr {
        name: "List.cons".to_string(),
        args: vec![h.clone(), Box::new(t)],
      }));
      Ok((state, list))
    }),
    state,
  )
}

pub fn parse_if_sugar(state: HOPA::State) -> HOPA::Answer<Option<Box<Term>>> {
  return HOPA::guard(
    HOPA::do_there_take_exact("if "),
    Box::new(|state| {
      let (state, _)    = HOPA::force_there_take_exact("if ", state)?;
      let (state, cond) = parse_term(state)?;
      let (state, _)    = HOPA::force_there_take_exact("{", state)?;
      let (state, if_t) = parse_term(state)?;
      let (state, _)    = HOPA::force_there_take_exact("}", state)?;
      let (state, _)    = HOPA::force_there_take_exact("else", state)?;
      let (state, _)    = HOPA::force_there_take_exact("{", state)?;
      let (state, if_f) = parse_term(state)?;
      let (state, _)    = HOPA::force_there_take_exact("}", state)?;
      Ok((state, Box::new(Term::Ctr { name: "U60.if".to_string(), args: vec![cond, if_t, if_f] })))
    }),
    state,
  );
}

pub fn parse_bng(state: HOPA::State) -> HOPA::Answer<Option<Box<Term>>> {
  return HOPA::guard(HOPA::do_there_take_exact("!"), Box::new(|state| {
    let (state, _)    = HOPA::force_there_take_exact("!", state)?;
    let (state, term) = parse_term(state)?;
    Ok((state, term))
  }), state);
}

pub fn parse_term(state: HOPA::State) -> HOPA::Answer<Box<Term>> {
  HOPA::attempt("Term", &[
    Box::new(parse_let),
    Box::new(parse_dup),
    Box::new(parse_lam),
    Box::new(parse_ctr),
    Box::new(parse_op2),
    Box::new(parse_app),
    Box::new(parse_sup),
    Box::new(parse_num),
    Box::new(parse_sym_sugar),
    Box::new(parse_chr_sugar),
    Box::new(parse_str_sugar),
    Box::new(parse_lst_sugar),
    Box::new(parse_if_sugar),
    Box::new(parse_bng),
    Box::new(parse_ask_sugar_named),
    Box::new(parse_ask_sugar_anon),
    Box::new(parse_var),
    Box::new(|state| Ok((state, None))),
  ], state)
}

pub fn parse_rule(state: HOPA::State) -> HOPA::Answer<Option<Rule>> {
  return HOPA::guard(
    HOPA::do_there_take_exact(""),
    Box::new(|state| {
      let (state, lhs) = parse_term(state)?;
      let (state, _) = HOPA::force_there_take_exact("=", state)?;
      let (state, rhs) = parse_term(state)?;
      Ok((state, Rule { lhs, rhs }))
    }),
    state,
  );
}

pub fn parse_smap(state: HOPA::State) -> HOPA::Answer<Option<SMap>> {
  pub fn parse_stct(state: HOPA::State) -> HOPA::Answer<bool> {
    let (state, stct) = HOPA::there_take_exact("!", state)?;
    let (state, _)    = parse_term(state)?;
    Ok((state, stct))
  }
  let (state, init) = HOPA::there_take_exact("(", state)?;
  if init {
    let (state, name) = HOPA::there_nonempty_name(state)?;
    let (state, args) = HOPA::until(HOPA::do_there_take_exact(")"), Box::new(parse_stct), state)?;
    return Ok((state, Some((name, args))));
  } else {
    return Ok((state, None));
  }
}

pub fn parse_file(state: HOPA::State) -> HOPA::Answer<File> {
  let mut rules = Vec::new();
  let mut smaps = Vec::new();
  let mut state = state;
  loop {
    let (new_state, done) = HOPA::there_end(state)?;
    if done {
      break;
    }
    let (_, smap) = parse_smap(new_state)?;
    if let Some(smap) = smap {
      smaps.push(smap);
    }
    let (new_state, rule) = parse_rule(new_state)?;
    if let Some(rule) = rule {
      rules.push(rule);
      state = new_state;
      continue;
    }
    return HOPA::expected("declaration", 1, state);
  }
  Ok((state, File { rules, smaps }))
}

pub fn read_term(code: &str) -> Result<Box<Term>, String> {
  HOPA::read(Box::new(parse_term), code)
}

pub fn read_file(code: &str) -> Result<File, String> {
  HOPA::read(Box::new(parse_file), code)
}

#[allow(dead_code)]
pub fn read_rule(code: &str) -> Result<Option<Rule>, String> {
  HOPA::read(Box::new(parse_rule), code)
}

// runtime/base/memory.rs

// HVM's memory model
// ------------------
//
// The runtime memory consists of just a vector of u64 pointers. That is:
//
//   Mem ::= Vec<Ptr>
//
// A pointer has 3 parts:
//
//   Ptr ::= 0xTAAAAAAABBBBBBBB
//
// Where:
//
//   T : u4  is the pointer tag
//   A : u28 is the 1st value
//   B : u32 is the 2nd value
//
// There are 12 possible tags:
//
//   Tag | Val | Meaning
//   ----| --- | -------------------------------
//   DP0 |   0 | a variable, bound to the 1st argument of a duplication
//   DP1 |   1 | a variable, bound to the 2nd argument of a duplication
//   VAR |   2 | a variable, bound to the one argument of a lambda
//   ARG |   3 | an used argument of a lambda or duplication
//   ERA |   4 | an erased argument of a lambda or duplication
//   LAM |   5 | a lambda
//   APP |   6 | an application
//   SUP |   7 | a superposition
//   CTR |   8 | a constructor
//   FUN |   9 | a function
//   OP2 |  10 | a numeric operation
//   U60 |  11 | a 60-bit unsigned integer
//   F60 |  12 | a 60-bit floating point
//
// The semantics of the 1st and 2nd values depend on the pointer tag.
//
//   Tag | 1st ptr value                | 2nd ptr value
//   --- | ---------------------------- | ---------------------------------
//   DP0 | the duplication label        | points to the duplication node
//   DP1 | the duplication label        | points to the duplication node
//   VAR | not used                     | points to the lambda node
//   ARG | not used                     | points to the variable occurrence
//   ERA | not used                     | not used
//   LAM | not used                     | points to the lambda node
//   APP | not used                     | points to the application node
//   SUP | the duplication label        | points to the superposition node
//   CTR | the constructor name         | points to the constructor node
//   FUN | the function name            | points to the function node
//   OP2 | the operation name           | points to the operation node
//   U60 | the most significant 28 bits | the least significant 32 bits
//   F60 | the most significant 28 bits | the least significant 32 bits
//
// Notes:
//
//   1. The duplication label is an internal value used on the DUP-SUP rule.
//   2. The operation name only uses 4 of the 28 bits, as there are only 16 ops.
//   3. U60 and F60 pointers don't point anywhere, they just store the number directly.
//
// A node is a tuple of N pointers stored on sequential memory indices.
// The meaning of each index depends on the node. There are 7 types:
//
//   Duplication Node:
//   - [0] => either an ERA or an ARG pointing to the 1st variable location
//   - [1] => either an ERA or an ARG pointing to the 2nd variable location
//   - [2] => pointer to the duplicated expression
//
//   Lambda Node:
//   - [0] => either and ERA or an ARG pointing to the variable location
//   - [1] => pointer to the lambda's body
//
//   Application Node:
//   - [0] => pointer to the lambda
//   - [1] => pointer to the argument
//
//   Superposition Node:
//   - [0] => pointer to the 1st superposed value
//   - [1] => pointer to the 2sd superposed value
//
//   Constructor Node:
//   - [0] => pointer to the 1st field
//   - [1] => pointer to the 2nd field
//   - ... => ...
//   - [N] => pointer to the Nth field
//
//   Function Node:
//   - [0] => pointer to the 1st argument
//   - [1] => pointer to the 2nd argument
//   - ... => ...
//   - [N] => pointer to the Nth argument
//
//   Operation Node:
//   - [0] => pointer to the 1st operand
//   - [1] => pointer to the 2nd operand
//
// Notes:
//
//   1. Duplication nodes DON'T have a body. They "float" on the global scope.
//   2. Lambdas and Duplications point to their variables, and vice-versa.
//   3. ARG pointers can only show up inside Lambdas and Duplications.
//   4. Nums and Vars don't require a node type, because they're unboxed.
//   5. Function and Constructor arities depends on the user-provided definition.
//
// Example 0:
//
//   Core:
//
//    {Tuple2 #7 #8}
//
//   Memory:
//
//     Root : Ptr(CTR, 0x0000001, 0x00000000)
//     0x00 | Ptr(U60, 0x0000000, 0x00000007) // the tuple's 1st field
//     0x01 | Ptr(U60, 0x0000000, 0x00000008) // the tuple's 2nd field
//
//   Notes:
//
//     1. This is just a pair with two numbers.
//     2. The root pointer is not stored on memory.
//     3. The 'Tuple2' name was encoded as the ID 1.
//     4. Since nums are unboxed, a 2-tuple uses 2 memory slots, or 32 bytes.
//
// Example 1:
//
//   Core:
//
//     λ~ λb b
//
//   Memory:
//
//     Root : Ptr(LAM, 0x0000000, 0x00000000)
//     0x00 | Ptr(ERA, 0x0000000, 0x00000000) // 1st lambda's argument
//     0x01 | Ptr(LAM, 0x0000000, 0x00000002) // 1st lambda's body
//     0x02 | Ptr(ARG, 0x0000000, 0x00000003) // 2nd lambda's argument
//     0x03 | Ptr(VAR, 0x0000000, 0x00000002) // 2nd lambda's body
//
//   Notes:
//
//     1. This is a λ-term that discards the 1st argument and returns the 2nd.
//     2. The 1st lambda's argument not used, thus, an ERA pointer.
//     3. The 2nd lambda's argument points to its variable, and vice-versa.
//     4. Each lambda uses 2 memory slots. This term uses 64 bytes in total.
//
// Example 2:
//
//   Core:
//
//     λx dup x0 x1 = x; (* x0 x1)
//
//   Memory:
//
//     Root : Ptr(LAM, 0x0000000, 0x00000000)
//     0x00 | Ptr(ARG, 0x0000000, 0x00000004) // the lambda's argument
//     0x01 | Ptr(OP2, 0x0000002, 0x00000005) // the lambda's body
//     0x02 | Ptr(ARG, 0x0000000, 0x00000005) // the duplication's 1st argument
//     0x03 | Ptr(ARG, 0x0000000, 0x00000006) // the duplication's 2nd argument
//     0x04 | Ptr(VAR, 0x0000000, 0x00000000) // the duplicated expression
//     0x05 | Ptr(DP0, 0xa31fb21, 0x00000002) // the operator's 1st operand
//     0x06 | Ptr(DP1, 0xa31fb21, 0x00000002) // the operator's 2st operand
//
//   Notes:
//
//     1. This is a lambda function that squares a number.
//     2. Notice how every ARGs point to a VAR/DP0/DP1, that points back its source node.
//     3. DP1 does not point to its ARG. It points to the duplication node, which is at 0x02.
//     4. The lambda's body does not point to the dup node, but to the operator. Dup nodes float.
//     5. 0xa31fb21 is a globally unique random label assigned to the duplication node.
//     6. That duplication label is stored on the DP0/DP1 that point to the node, not on the node.
//     7. A lambda uses 2 memory slots, a duplication uses 3, an operator uses 2. Total: 112 bytes.
//     8. In-memory size is different to, and larger than, serialization size.

pub use crate::runtime::*;

use crossbeam::utils::{Backoff, CachePadded};
use std::sync::atomic::{AtomicI64, AtomicU64, AtomicU8, Ordering};

// Types
// -----

pub type Ptr = u64;
pub type AtomicPtr = AtomicU64;
pub type ArityMap = crate::runtime::data::u64_map::U64Map<u64>;

// Thread local data and stats
#[derive(Debug)]
pub struct LocalVars {
  pub tid: usize,
  pub used: AtomicI64, // number of used memory cells
  pub next: AtomicU64, // next alloc index
  pub amin: AtomicU64, // min alloc index
  pub amax: AtomicU64, // max alloc index
  pub dups: AtomicU64, // next dup label to be created
  pub cost: AtomicU64, // total number of rewrite rules
}

// Global memory buffer
pub struct Heap {
  pub tids: usize,
  pub node: Box<[AtomicU64]>,
  pub lock: Box<[AtomicU8]>,
  pub lvar: Box<[CachePadded<LocalVars>]>,
  pub vstk: Box<[VisitQueue]>,
  pub aloc: Box<[Box<[AtomicU64]>]>,
  pub vbuf: Box<[Box<[AtomicU64]>]>,
  pub rbag: RedexBag,
}

// Pointer Constructors
// --------------------

pub const VAL: u64 = 1;
pub const EXT: u64 = 0x100000000;
pub const TAG: u64 = 0x1000000000000000;

pub const DP0: u64 = 0x0;
pub const DP1: u64 = 0x1;
pub const VAR: u64 = 0x2;
pub const ARG: u64 = 0x3;
pub const ERA: u64 = 0x4;
pub const LAM: u64 = 0x5;
pub const APP: u64 = 0x6;
pub const SUP: u64 = 0x7;
pub const CTR: u64 = 0x8;
pub const FUN: u64 = 0x9;
pub const OP2: u64 = 0xA;
pub const U60: u64 = 0xB;
pub const F60: u64 = 0xC;
pub const NIL: u64 = 0xF;

pub const ADD: u64 = 0x0;
pub const SUB: u64 = 0x1;
pub const MUL: u64 = 0x2;
pub const DIV: u64 = 0x3;
pub const MOD: u64 = 0x4;
pub const AND: u64 = 0x5;
pub const OR: u64 = 0x6;
pub const XOR: u64 = 0x7;
pub const SHL: u64 = 0x8;
pub const SHR: u64 = 0x9;
pub const LTN: u64 = 0xA;
pub const LTE: u64 = 0xB;
pub const EQL: u64 = 0xC;
pub const GTE: u64 = 0xD;
pub const GTN: u64 = 0xE;
pub const NEQ: u64 = 0xF;

// Pointer Constructors
// --------------------

pub fn Var(pos: u64) -> Ptr {
  (VAR * TAG) | pos
}

pub fn Dp0(col: u64, pos: u64) -> Ptr {
  (DP0 * TAG) | (col * EXT) | pos
}

pub fn Dp1(col: u64, pos: u64) -> Ptr {
  (DP1 * TAG) | (col * EXT) | pos
}

pub fn Arg(pos: u64) -> Ptr {
  (ARG * TAG) | pos
}

pub fn Era() -> Ptr {
  ERA * TAG
}

pub fn Lam(pos: u64) -> Ptr {
  (LAM * TAG) | pos
}

pub fn App(pos: u64) -> Ptr {
  (APP * TAG) | pos
}

pub fn Sup(col: u64, pos: u64) -> Ptr {
  (SUP * TAG) | (col * EXT) | pos
}

pub fn Op2(ope: u64, pos: u64) -> Ptr {
  (OP2 * TAG) | (ope * EXT) | pos
}

pub fn U6O(val: u64) -> Ptr {
  (U60 * TAG) | val
}

pub fn F6O(val: u64) -> Ptr {
  (F60 * TAG) | val
}

pub fn Ctr(fun: u64, pos: u64) -> Ptr {
  (CTR * TAG) | (fun * EXT) | pos
}

pub fn Fun(fun: u64, pos: u64) -> Ptr {
  (FUN * TAG) | (fun * EXT) | pos
}

// Pointer Getters
// ---------------

pub fn get_tag(lnk: Ptr) -> u64 {
  lnk / TAG
}

pub fn get_ext(lnk: Ptr) -> u64 {
  (lnk / EXT) & 0xFFF_FFFF
}

pub fn get_val(lnk: Ptr) -> u64 {
  lnk & 0xFFFF_FFFF
}

pub fn get_num(lnk: Ptr) -> u64 {
  lnk & 0xFFF_FFFF_FFFF_FFFF
}

pub fn get_loc(lnk: Ptr, arg: u64) -> u64 {
  get_val(lnk) + arg
}

pub fn get_cost(heap: &Heap) -> u64 {
  heap.lvar.iter().map(|x| x.cost.load(Ordering::Relaxed)).sum()
}

pub fn get_used(heap: &Heap) -> i64 {
  heap.lvar.iter().map(|x| x.used.load(Ordering::Relaxed)).sum()
}

pub fn inc_cost(heap: &Heap, tid: usize) {
  unsafe { heap.lvar.get_unchecked(tid) }.cost.fetch_add(1, Ordering::Relaxed);
}

pub fn gen_dup(heap: &Heap, tid: usize) -> u64 {
  return unsafe { heap.lvar.get_unchecked(tid) }.dups.fetch_add(1, Ordering::Relaxed) & 0xFFF_FFFF;
}

pub fn arity_of(arit: &ArityMap, lnk: Ptr) -> u64 {
  return *arit.get(&get_ext(lnk)).unwrap_or(&0);
}

// Pointers
// --------

// Given a location, loads the ptr stored on it
pub fn load_ptr(heap: &Heap, loc: u64) -> Ptr {
  unsafe { heap.node.get_unchecked(loc as usize).load(Ordering::Relaxed) }
}

// Moves a pointer to another location
pub fn move_ptr(heap: &Heap, old_loc: u64, new_loc: u64) -> Ptr {
  link(heap, new_loc, take_ptr(heap, old_loc))
}

// Given a pointer to a node, loads its nth arg
pub fn load_arg(heap: &Heap, term: Ptr, arg: u64) -> Ptr {
  load_ptr(heap, get_loc(term, arg))
}

// Given a location, takes the ptr stored on it
pub fn take_ptr(heap: &Heap, loc: u64) -> Ptr {
  unsafe { heap.node.get_unchecked(loc as usize).swap(0, Ordering::Relaxed) }
}

// Given a pointer to a node, takes its nth arg
pub fn take_arg(heap: &Heap, term: Ptr, arg: u64) -> Ptr {
  take_ptr(heap, get_loc(term, arg))
}

// Writes a ptr to memory. Updates binders.
pub fn link(heap: &Heap, loc: u64, ptr: Ptr) -> Ptr {
  unsafe {
    heap.node.get_unchecked(loc as usize).store(ptr, Ordering::Relaxed);
    if get_tag(ptr) <= VAR {
      let arg_loc = get_loc(ptr, get_tag(ptr) & 0x01);
      heap.node.get_unchecked(arg_loc as usize).store(Arg(loc), Ordering::Relaxed);
    }
  }
  ptr
}

// Heap Constructors
// -----------------

pub fn new_atomic_u8_array(size: usize) -> Box<[AtomicU8]> {
  return unsafe {
    Box::from_raw(AtomicU8::from_mut_slice(Box::leak(vec![0xFFu8; size].into_boxed_slice())))
  };
}

pub fn new_atomic_u64_array(size: usize) -> Box<[AtomicU64]> {
  return unsafe {
    Box::from_raw(AtomicU64::from_mut_slice(Box::leak(vec![0u64; size].into_boxed_slice())))
  };
}

pub fn new_tids(tids: usize) -> Box<[usize]> {
  return (0..tids).collect::<Vec<usize>>().into_boxed_slice();
}

pub fn new_heap(size: usize, tids: usize) -> Heap {
  let mut lvar = vec![];
  for tid in 0..tids {
    lvar.push(CachePadded::new(LocalVars {
      tid: tid,
      used: AtomicI64::new(0),
      next: AtomicU64::new((size / tids * (tid + 0)) as u64),
      amin: AtomicU64::new((size / tids * (tid + 0)) as u64),
      amax: AtomicU64::new((size / tids * (tid + 1)) as u64),
      dups: AtomicU64::new(((1 << 28) / tids * tid) as u64),
      cost: AtomicU64::new(0),
    }))
  }
  let node = new_atomic_u64_array(size);
  let lock = new_atomic_u8_array(size);
  let lvar = lvar.into_boxed_slice();
  let rbag = RedexBag::new(tids);
  let aloc = (0..tids)
    .map(|x| new_atomic_u64_array(1 << 20))
    .collect::<Vec<Box<[AtomicU64]>>>()
    .into_boxed_slice();
  let vbuf = (0..tids)
    .map(|x| new_atomic_u64_array(1 << 16))
    .collect::<Vec<Box<[AtomicU64]>>>()
    .into_boxed_slice();
  let vstk = (0..tids).map(|x| VisitQueue::new()).collect::<Vec<VisitQueue>>().into_boxed_slice();
  return Heap { tids, node, lock, lvar, rbag, aloc, vbuf, vstk };
}

// Allocator
// ---------

pub fn alloc(heap: &Heap, tid: usize, arity: u64) -> u64 {
  unsafe {
    let lvar = &heap.lvar.get_unchecked(tid);
    if arity == 0 {
      0
    } else {
      let mut length = 0;
      //let mut count = 0;
      loop {
        //count += 1;
        //if tid == 9 && count > 5000000 {
        //println!("[9] slow-alloc {} | {}", count, *lvar.next.as_ptr());
        //}
        // Loads value on cursor
        let val = heap.node.get_unchecked(*lvar.next.as_ptr() as usize).load(Ordering::Relaxed);
        // If it is empty, increment length
        if val == 0 {
          length += 1;
        // Otherwise, reset length
        } else {
          length = 0;
        };
        // Moves cursor right
        *lvar.next.as_ptr() += 1;
        // If it is out of bounds, warp around
        if *lvar.next.as_ptr() >= *lvar.amax.as_ptr() {
          length = 0;
          *lvar.next.as_ptr() = *lvar.amin.as_ptr();
        }
        // If length equals arity, allocate that space
        if length == arity {
          //println!("[{}] return", lvar.tid);
          //println!("[{}] alloc {} at {}", lvar.tid, arity, lvar.next - length);
          //lvar.used.fetch_add(arity as i64, Ordering::Relaxed);
          //if tid == 9 && count > 50000 {
          //println!("[{}] allocated {}! {}", 9, length, *lvar.next.as_ptr() - length);
          //}
          return *lvar.next.as_ptr() - length;
        }
      }
    }
  }
}

pub fn free(heap: &Heap, tid: usize, loc: u64, arity: u64) {
  for i in 0..arity {
    unsafe { heap.node.get_unchecked((loc + i) as usize) }.store(0, Ordering::Relaxed);
  }
}

// Substitution
// ------------

// Atomically replaces a ptr by another. Updates binders.
pub fn atomic_relink(heap: &Heap, loc: u64, old: Ptr, neo: Ptr) -> Result<Ptr, Ptr> {
  unsafe {
    let got = heap.node.get_unchecked(loc as usize).compare_exchange_weak(
      old,
      neo,
      Ordering::Relaxed,
      Ordering::Relaxed,
    )?;
    if get_tag(neo) <= VAR {
      let arg_loc = get_loc(neo, get_tag(neo) & 0x01);
      heap.node.get_unchecked(arg_loc as usize).store(Arg(loc), Ordering::Relaxed);
    }
    return Ok(got);
  }
}

// Performs a global [x <- val] substitution atomically.
pub fn atomic_subst(heap: &Heap, arit: &ArityMap, tid: usize, var: Ptr, val: Ptr) {
  loop {
    let arg_ptr = load_ptr(heap, get_loc(var, get_tag(var) & 0x01));
    if get_tag(arg_ptr) == ARG {
      if heap.tids == 1 {
        link(heap, get_loc(arg_ptr, 0), val);
        return;
      } else {
        if atomic_relink(heap, get_loc(arg_ptr, 0), var, val).is_ok() {
          return;
        } else {
          continue;
        }
      }
    }
    if get_tag(arg_ptr) == ERA {
      collect(heap, arit, tid, val); // safe, since `val` is owned by this thread
      return;
    }
  }
}

// Locks
// -----

pub const LOCK_OPEN: u8 = 0xFF;

pub fn acquire_lock(heap: &Heap, tid: usize, term: Ptr) -> Result<u8, u8> {
  let locker = unsafe { heap.lock.get_unchecked(get_loc(term, 0) as usize) };
  locker.compare_exchange_weak(LOCK_OPEN, tid as u8, Ordering::Acquire, Ordering::Relaxed)
}

pub fn release_lock(heap: &Heap, tid: usize, term: Ptr) {
  let locker = unsafe { heap.lock.get_unchecked(get_loc(term, 0) as usize) };
  locker.store(LOCK_OPEN, Ordering::Release)
}

// Garbage Collection
// ------------------

// As soon as we detect an expression is unreachable, i.e., when it is applied to a lambda or
// function that doesn't use its argument, we call `collect()` on it. Since the expression is now
// implicitly "owned" by this thread, we're allowed to traverse the structure and fully free its
// memory. There are some complications, though: lambdas, duplications, and their respective
// variables. When a lam is collected, we must first substitute its bound variable by `Era()`, and
// then recurse. When a lam-bound variable is collected, we just link its argument to `Era()`. This
// will allow lams to be collected properly in all scenarios.
//
// A. When the lam is collected before the var. Ex: λx (Pair 42 x)
//    1. We substitute [x <- Era()] and recurse into the lam's body.
//    2. When we reach x, it will be Era(), so there is nothing to do.
//    3. All memory related to this lambda is freed.
//    This is safe, because both are owned by this thread
//
// B. When the var is collected before the lam. Ex: (Pair x λx(42))
//    1. We reach x and link the lam's argument to Era().
//    2. When we reach the lam, its var will be Era(), so [Era() <- Era()] will do nothing.
//    3. All memory related to this lambda is freed.
//    This is safe, because both are owned by this thread.
//
// C. When the var is collected, but the lam isn't. Ex: (Pair x 42)
//    1. We reach x and link the lam's argument to Era().
//    2. The owner of the lam can still use it, and applying it will trigger collect().
//    This is safe, because the lam arg field is owned by the thread that owns the var (this one).
//
// D. When the lam is collected, but the var isn't. Ex: (Pair λx(42) 777)
//    1. We reach the lam and substitute [x <- Era()].
//    2. The owner of var will now have an Era(), rather than an unbound variable.
//    This is safe because, subst is atomic.
//
// As for dup nodes, the same idea applies. When a dup-bound variable is collected, we just link
// its argument to Era(). The problem is, it is impossible to reach a dup node directly. Because
// of that, if two threads collected the same dup, we'd have a memory leak: the dup node wouldn't
// be freed, and the dup expression wouldn't be collected. As such, when we reach a dup-bound
// variable, we also visit the dup node. Visiting dup nodes doesn't imply ownership, since a dup
// node can be accessed through two different dup-bound variables. As such, we attempt to lock it.
// If we can't have the lock, that means another thread is handling that dup, so we let it decide
// what to do with it, and return. If we get the lock, then we now have ownership, so we check the
// other argument. If it is Era(), that means this dup node was collected twice, so, we clear it
// and collect its expression. Otherwise, we release the lock and let the owner of the other
// variable decide what to do with it in a future. This covers most cases, but the is still a
// problem: what if the other variable is contained inside the duplicated expression? For example,
// the normal form of `(λf λx (f (f x)) λf λx (f (f x)))` is:
//
// λf λx b0
// dup f0 f1 = f
// dup b0 b1 = (f0 (f1 {b1 x}))
//
// If we attempt to collect it with the algorithm above, we'll have:
//
// dup f0 f1 = ~
// dup ~  b1 = (f0 (f1 {b1 ~}))
//
// That's because, once we reached `b0`, we replaced its respective arg by `Era()`, then locked its
// dup node and checked the other arg, `b1`; since it isn't `Era()`, we released the lock and let
// the owner of `b1` decide what to do. But `b1` is contained inside the expression, so it has no
// owner anymore; it forms a cycle, and no other part of the program will access it! This will not
// be handled by HVM's automatic collector and will be left as a memory leak. Under normal
// circumstances, the leak is too minimal to be a problem. It could be eliminated by enabling an
// external garbage collector (which would rarely need to be triggered), or avoided altogether by
// not allowing inputs that can result in self-referential clones on the input language's type
// system. Sadly, it is an issue that exists, and, for the time being, I'm not aware of a good
// solution that maintains HVM philosophy of only including constant-time compute primitives.

pub fn collect(heap: &Heap, arit: &ArityMap, tid: usize, term: Ptr) {
  let mut coll = Vec::new();
  let mut next = term;
  loop {
    let term = next;
    match get_tag(term) {
      DP0 => {
        link(heap, get_loc(term, 0), Era());
        if acquire_lock(heap, tid, term).is_ok() {
          if get_tag(load_arg(heap, term, 1)) == ERA {
            coll.push(take_arg(heap, term, 2));
            free(heap, tid, get_loc(term, 0), 3);
          }
          release_lock(heap, tid, term);
        }
      }
      DP1 => {
        link(heap, get_loc(term, 1), Era());
        if acquire_lock(heap, tid, term).is_ok() {
          if get_tag(load_arg(heap, term, 0)) == ERA {
            coll.push(take_arg(heap, term, 2));
            free(heap, tid, get_loc(term, 0), 3);
          }
          release_lock(heap, tid, term);
        }
      }
      VAR => {
        link(heap, get_loc(term, 0), Era());
      }
      LAM => {
        atomic_subst(heap, arit, tid, Var(get_loc(term, 0)), Era());
        next = take_arg(heap, term, 1);
        free(heap, tid, get_loc(term, 0), 2);
        continue;
      }
      APP => {
        coll.push(take_arg(heap, term, 0));
        next = take_arg(heap, term, 1);
        free(heap, tid, get_loc(term, 0), 2);
        continue;
      }
      SUP => {
        coll.push(take_arg(heap, term, 0));
        next = take_arg(heap, term, 1);
        free(heap, tid, get_loc(term, 0), 2);
        continue;
      }
      OP2 => {
        coll.push(take_arg(heap, term, 0));
        next = take_arg(heap, term, 1);
        free(heap, tid, get_loc(term, 0), 2);
        continue;
      }
      U60 => {}
      F60 => {}
      CTR | FUN => {
        let arity = arity_of(arit, term);
        for i in 0..arity {
          if i < arity - 1 {
            coll.push(take_arg(heap, term, i));
          } else {
            next = take_arg(heap, term, i);
          }
        }
        free(heap, tid, get_loc(term, 0), arity);
        if arity > 0 {
          continue;
        }
      }
      _ => {}
    }
    if let Some(got) = coll.pop() {
      next = got;
    } else {
      break;
    }
  }
}

// runtime/base/mod.rs

pub mod debug;
pub mod memory;
pub mod precomp;
pub mod program;
pub mod reducer;

pub use debug::{*};
pub use memory::{*};
pub use precomp::{*};
pub use program::{*};
pub use reducer::{*};

// runtime/base/precomp.rs

use crate::runtime::{*};
use std::sync::atomic::{AtomicBool, Ordering};

// Precomps
// --------

pub struct PrecompFuns {
  pub visit: VisitFun,
  pub apply: ApplyFun,
}

pub struct Precomp {
  pub id: u64,
  pub name: &'static str,
  pub smap: &'static [bool],
  pub funs: Option<PrecompFuns>,
}

pub const STRING_NIL : u64 = 0;
pub const STRING_CONS : u64 = 1;
pub const BOTH : u64 = 2;
pub const KIND_TERM_CT0 : u64 = 3;
pub const KIND_TERM_CT1 : u64 = 4;
pub const KIND_TERM_CT2 : u64 = 5;
pub const KIND_TERM_CT3 : u64 = 6;
pub const KIND_TERM_CT4 : u64 = 7;
pub const KIND_TERM_CT5 : u64 = 8;
pub const KIND_TERM_CT6 : u64 = 9;
pub const KIND_TERM_CT7 : u64 = 10;
pub const KIND_TERM_CT8 : u64 = 11;
pub const KIND_TERM_CT9 : u64 = 12;
pub const KIND_TERM_CTA : u64 = 13;
pub const KIND_TERM_CTB : u64 = 14;
pub const KIND_TERM_CTC : u64 = 15;
pub const KIND_TERM_CTD : u64 = 16;
pub const KIND_TERM_CTE : u64 = 17;
pub const KIND_TERM_CTF : u64 = 18;
pub const KIND_TERM_CTG : u64 = 19;
pub const KIND_TERM_U60 : u64 = 20;
pub const KIND_TERM_F60 : u64 = 21;
pub const U60_IF : u64 = 22;
pub const U60_SWAP : u64 = 23;
pub const HVM_LOG : u64 = 24;
pub const HVM_QUERY : u64 = 25;
pub const HVM_PRINT : u64 = 26;
pub const HVM_SLEEP : u64 = 27;
pub const HVM_STORE : u64 = 28;
pub const HVM_LOAD : u64 = 29;
//[[CODEGEN:PRECOMP-IDS]]//

pub const PRECOMP : &[Precomp] = &[
  Precomp {
    id: STRING_NIL,
    name: "String.nil",
    smap: &[false; 0],
    funs: None,
  },
  Precomp {
    id: STRING_CONS,
    name: "String.cons",
    smap: &[false; 2],
    funs: None,
  },
  Precomp {
    id: BOTH,
    name: "Both",
    smap: &[false; 2],
    funs: None,
  },
  Precomp {
    id: KIND_TERM_CT0,
    name: "Kind.Term.ct0",
    smap: &[false; 2],
    funs: None,
  },
  Precomp {
    id: KIND_TERM_CT1,
    name: "Kind.Term.ct1",
    smap: &[false; 3],
    funs: None,
  },
  Precomp {
    id: KIND_TERM_CT2,
    name: "Kind.Term.ct2",
    smap: &[false; 4],
    funs: None,
  },
  Precomp {
    id: KIND_TERM_CT3,
    name: "Kind.Term.ct3",
    smap: &[false; 5],
    funs: None,
  },
  Precomp {
    id: KIND_TERM_CT4,
    name: "Kind.Term.ct4",
    smap: &[false; 6],
    funs: None,
  },
  Precomp {
    id: KIND_TERM_CT5,
    name: "Kind.Term.ct5",
    smap: &[false; 7],
    funs: None,
  },
  Precomp {
    id: KIND_TERM_CT6,
    name: "Kind.Term.ct6",
    smap: &[false; 8],
    funs: None,
  },
  Precomp {
    id: KIND_TERM_CT7,
    name: "Kind.Term.ct7",
    smap: &[false; 9],
    funs: None,
  },
  Precomp {
    id: KIND_TERM_CT8,
    name: "Kind.Term.ct8",
    smap: &[false; 10],
    funs: None,
  },
  Precomp {
    id: KIND_TERM_CT9,
    name: "Kind.Term.ct9",
    smap: &[false; 11],
    funs: None,
  },
  Precomp {
    id: KIND_TERM_CTA,
    name: "Kind.Term.ctA",
    smap: &[false; 12],
    funs: None,
  },
  Precomp {
    id: KIND_TERM_CTB,
    name: "Kind.Term.ctB",
    smap: &[false; 13],
    funs: None,
  },
  Precomp {
    id: KIND_TERM_CTC,
    name: "Kind.Term.ctC",
    smap: &[false; 14],
    funs: None,
  },
  Precomp {
    id: KIND_TERM_CTD,
    name: "Kind.Term.ctD",
    smap: &[false; 15],
    funs: None,
  },
  Precomp {
    id: KIND_TERM_CTE,
    name: "Kind.Term.ctE",
    smap: &[false; 16],
    funs: None,
  },
  Precomp {
    id: KIND_TERM_CTF,
    name: "Kind.Term.ctF",
    smap: &[false; 17],
    funs: None,
  },
  Precomp {
    id: KIND_TERM_CTG,
    name: "Kind.Term.ctG",
    smap: &[false; 18],
    funs: None,
  },
  Precomp {
    id: KIND_TERM_U60,
    name: "Kind.Term.u60",
    smap: &[false; 2],
    funs: None,
  },
  Precomp {
    id: KIND_TERM_F60,
    name: "Kind.Term.f60",
    smap: &[false; 2],
    funs: None,
  },
  Precomp {
    id: U60_IF,
    name: "U60.if",
    smap: &[true, false, false],
    funs: Some(PrecompFuns {
      visit: u60_if_visit,
      apply: u60_if_apply,
    }),
  },
  Precomp {
    id: U60_SWAP,
    name: "U60.swap",
    smap: &[true, false, false],
    funs: Some(PrecompFuns {
      visit: u60_swap_visit,
      apply: u60_swap_apply,
    }),
  },
  Precomp {
    id: HVM_LOG,
    name: "HVM.log",
    smap: &[false; 2],
    funs: Some(PrecompFuns {
      visit: hvm_log_visit,
      apply: hvm_log_apply,
    }),
  },
  Precomp {
    id: HVM_QUERY,
    name: "HVM.query",
    smap: &[false; 1],
    funs: Some(PrecompFuns {
      visit: hvm_query_visit,
      apply: hvm_query_apply,
    }),
  },
  Precomp {
    id: HVM_PRINT,
    name: "HVM.print",
    smap: &[false; 2],
    funs: Some(PrecompFuns {
      visit: hvm_print_visit,
      apply: hvm_print_apply,
    }),
  },
  Precomp {
    id: HVM_SLEEP,
    name: "HVM.sleep",
    smap: &[false; 2],
    funs: Some(PrecompFuns {
      visit: hvm_sleep_visit,
      apply: hvm_sleep_apply,
    }),
  },
  Precomp {
    id: HVM_STORE,
    name: "HVM.store",
    smap: &[false; 3],
    funs: Some(PrecompFuns {
      visit: hvm_store_visit,
      apply: hvm_store_apply,
    }),
  },
  Precomp {
    id: HVM_LOAD,
    name: "HVM.load",
    smap: &[false; 2],
    funs: Some(PrecompFuns {
      visit: hvm_load_visit,
      apply: hvm_load_apply,
    }),
  },
//[[CODEGEN:PRECOMP-ELS]]//
];

pub const PRECOMP_COUNT : u64 = PRECOMP.len() as u64;

// Ul0.if (cond: Term) (if_t: Term) (if_f: Term)
// ---------------------------------------------

#[inline(always)]
pub fn u60_if_visit(ctx: ReduceCtx) -> bool {
  if is_whnf(load_arg(ctx.heap, ctx.term, 0)) {
    return false;
  } else {
    let goup = ctx.redex.insert(ctx.tid, new_redex(*ctx.host, *ctx.cont, 1));
    *ctx.cont = goup;
    *ctx.host = get_loc(ctx.term, 0);
    return true;
  }
}

#[inline(always)]
pub fn u60_if_apply(ctx: ReduceCtx) -> bool {
  let arg0 = load_arg(ctx.heap, ctx.term, 0);
  let arg1 = load_arg(ctx.heap, ctx.term, 1);
  let arg2 = load_arg(ctx.heap, ctx.term, 2);
  if get_tag(arg0) == SUP {
    fun::superpose(ctx.heap, &ctx.prog.aris, ctx.tid, *ctx.host, ctx.term, arg0, 0);
  }
  if (get_tag(arg0) == U60) {
    if (get_num(arg0) == 0) {
      inc_cost(ctx.heap, ctx.tid);
      let done = arg2;
      link(ctx.heap, *ctx.host, done);
      collect(ctx.heap, &ctx.prog.aris, ctx.tid, arg1);
      free(ctx.heap, ctx.tid, get_loc(ctx.term, 0), 3);
      return true;
    } else {
      inc_cost(ctx.heap, ctx.tid);
      let done = arg1;
      link(ctx.heap, *ctx.host, done);
      collect(ctx.heap, &ctx.prog.aris, ctx.tid, arg2);
      free(ctx.heap, ctx.tid, get_loc(ctx.term, 0), 3);
      return true;
    }
  }
  return false;
}

// U60.swap (cond: Term) (pair: Term)
// ----------------------------------

#[inline(always)]
pub fn u60_swap_visit(ctx: ReduceCtx) -> bool {
  if is_whnf(load_arg(ctx.heap, ctx.term, 0)) {
    return false;
  } else {
    let goup = ctx.redex.insert(ctx.tid, new_redex(*ctx.host, *ctx.cont, 1));
    *ctx.cont = goup;
    *ctx.host = get_loc(ctx.term, 0);
    return true;
  }
}

#[inline(always)]
pub fn u60_swap_apply(ctx: ReduceCtx) -> bool {
  let arg0 = load_arg(ctx.heap, ctx.term, 0);
  let arg1 = load_arg(ctx.heap, ctx.term, 1);
  let arg2 = load_arg(ctx.heap, ctx.term, 2);
  if get_tag(arg0) == SUP {
    fun::superpose(ctx.heap, &ctx.prog.aris, ctx.tid, *ctx.host, ctx.term, arg0, 0);
  }
  if (get_tag(arg0) == U60) {
    if (get_num(arg0) == 0) {
      inc_cost(ctx.heap, ctx.tid);
      let ctr_0 = alloc(ctx.heap, ctx.tid, 2);
      link(ctx.heap, ctr_0 + 0, arg1);
      link(ctx.heap, ctr_0 + 1, arg2);
      let done = Ctr(BOTH, ctr_0);
      link(ctx.heap, *ctx.host, done);
      free(ctx.heap, ctx.tid, get_loc(ctx.term, 0), 3);
      return true;
    } else {
      inc_cost(ctx.heap, ctx.tid);
      let ctr_0 = alloc(ctx.heap, ctx.tid, 2);
      link(ctx.heap, ctr_0 + 0, arg2);
      link(ctx.heap, ctr_0 + 1, arg1);
      let done = Ctr(BOTH, ctr_0);
      link(ctx.heap, *ctx.host, done);
      free(ctx.heap, ctx.tid, get_loc(ctx.term, 0), 3);
      return true;
    }
  }
  return false;
}

// HVM.log (term: Term)
// --------------------

fn hvm_log_visit(ctx: ReduceCtx) -> bool {
  return false;
}

fn hvm_log_apply(ctx: ReduceCtx) -> bool {
  normalize(ctx.heap, ctx.prog, &[ctx.tid], get_loc(ctx.term, 0), false);
  let code = crate::language::readback::as_code(ctx.heap, ctx.prog, get_loc(ctx.term, 0));
  println!("{}", code);
  link(ctx.heap, *ctx.host, load_arg(ctx.heap, ctx.term, 1));
  collect(ctx.heap, &ctx.prog.aris, ctx.tid, load_ptr(ctx.heap, get_loc(ctx.term, 0)));
  free(ctx.heap, ctx.tid, get_loc(ctx.term, 0), 2);
  return true;
}

// HVM.query (cont: String -> Term)
// --------------------------------

fn hvm_query_visit(ctx: ReduceCtx) -> bool {
  return false;
}

fn hvm_query_apply(ctx: ReduceCtx) -> bool {
  fn read_input() -> String {
    use std::io::{stdin,stdout,Write};
    let mut input = String::new();
    stdin().read_line(&mut input).expect("string");
    if let Some('\n') = input.chars().next_back() { input.pop(); }
    if let Some('\r') = input.chars().next_back() { input.pop(); }
    return input;
  }
  let cont = load_arg(ctx.heap, ctx.term, 0);
  let text = make_string(ctx.heap, ctx.tid, &read_input());
  let app0 = alloc(ctx.heap, ctx.tid, 2);
  link(ctx.heap, app0 + 0, cont);
  link(ctx.heap, app0 + 1, text);
  free(ctx.heap, 0, get_loc(ctx.term, 0), 1);
  let done = App(app0);
  link(ctx.heap, *ctx.host, done);
  return true;
}

// HVM.print (text: String) (cont: Term)
// -----------------------------------------------

fn hvm_print_visit(ctx: ReduceCtx) -> bool {
  return false;
}

fn hvm_print_apply(ctx: ReduceCtx) -> bool {
  //normalize(ctx.heap, ctx.prog, &[ctx.tid], get_loc(ctx.term, 0), false);
  if let Some(text) = crate::language::readback::as_string(ctx.heap, ctx.prog, &[ctx.tid], get_loc(ctx.term, 0)) {
    println!("{}", text);
  }
  link(ctx.heap, *ctx.host, load_arg(ctx.heap, ctx.term, 1));
  collect(ctx.heap, &ctx.prog.aris, ctx.tid, load_ptr(ctx.heap, get_loc(ctx.term, 0)));
  free(ctx.heap, ctx.tid, get_loc(ctx.term, 0), 2);
  return true;
}

// HVM.sleep (time: U60) (cont: Term)
// ----------------------------------

fn hvm_sleep_visit(ctx: ReduceCtx) -> bool {
  return false;
}

fn hvm_sleep_apply(ctx: ReduceCtx) -> bool {
  let time = reduce(ctx.heap, ctx.prog, &[ctx.tid], get_loc(ctx.term, 0), true, false);
  std::thread::sleep(std::time::Duration::from_nanos(get_num(time)));
  link(ctx.heap, *ctx.host, load_ptr(ctx.heap, get_loc(ctx.term, 1)));
  free(ctx.heap, ctx.tid, get_loc(ctx.term, 0), 2);
  return true;
}

// HVM.store (key: String) (val: String) (cont: Term)
// --------------------------------------------------

fn hvm_store_visit(ctx: ReduceCtx) -> bool {
  return false;
}

fn hvm_store_apply(ctx: ReduceCtx) -> bool {
  if let Some(key) = crate::language::readback::as_string(ctx.heap, ctx.prog, &[ctx.tid], get_loc(ctx.term, 0)) {
    if let Some(val) = crate::language::readback::as_string(ctx.heap, ctx.prog, &[ctx.tid], get_loc(ctx.term, 1)) {
      if std::fs::write(key, val).is_ok() {
        //let app0 = alloc(ctx.heap, ctx.tid, 2);
        //link(ctx.heap, app0 + 0, cont);
        //link(ctx.heap, app0 + 1, U6O(0));
        //free(ctx.heap, 0, get_loc(ctx.term, 0), 2);
        let done = load_arg(ctx.heap, ctx.term, 2);
        link(ctx.heap, *ctx.host, done);
        collect(ctx.heap, &ctx.prog.aris, ctx.tid, load_arg(ctx.heap, ctx.term, 0));
        collect(ctx.heap, &ctx.prog.aris, ctx.tid, load_arg(ctx.heap, ctx.term, 1));
        free(ctx.heap, ctx.tid, get_loc(ctx.term, 0), 3);
        return true;
      }
    }
  }
  println!("Runtime failure on: {}", show_at(ctx.heap, ctx.prog, *ctx.host, &[]));
  std::process::exit(0);
}

// HVM.load (key: String) (cont: String -> Term)
// ---------------------------------------------

fn hvm_load_visit(ctx: ReduceCtx) -> bool {
  return false;
}

fn hvm_load_apply(ctx: ReduceCtx) -> bool {
  if let Some(key) = crate::language::readback::as_string(ctx.heap, ctx.prog, &[ctx.tid], get_loc(ctx.term, 0)) {
    if let Ok(file) = std::fs::read(key) {
      if let Ok(file) = std::str::from_utf8(&file) {
        let cont = load_arg(ctx.heap, ctx.term, 1);
        let text = make_string(ctx.heap, ctx.tid, file);
        let app0 = alloc(ctx.heap, ctx.tid, 2);
        link(ctx.heap, app0 + 0, cont);
        link(ctx.heap, app0 + 1, text);
        free(ctx.heap, 0, get_loc(ctx.term, 0), 2);
        let done = App(app0);
        link(ctx.heap, *ctx.host, done);
        return true;
      }
    }
  }
  println!("Runtime failure on: {}", show_at(ctx.heap, ctx.prog, *ctx.host, &[]));
  std::process::exit(0);
}

//[[CODEGEN:PRECOMP-FNS]]//

// runtime/base/program.rs

use crate::runtime::{*};
use crate::language;
use std::collections::{hash_map, HashMap};
use std::sync::atomic::{AtomicBool, AtomicU64, Ordering};

// A runtime term
#[derive(Clone, Debug)]
pub enum Core {
  Var { bidx: u64 },
  Glo { glob: u64, misc: u64 },
  Dup { eras: (bool, bool), glob: u64, expr: Box<Core>, body: Box<Core> },
  Sup { val0: Box<Core>, val1: Box<Core> },
  Let { expr: Box<Core>, body: Box<Core> },
  Lam { eras: bool, glob: u64, body: Box<Core> },
  App { func: Box<Core>, argm: Box<Core> },
  Fun { func: u64, args: Vec<Core> },
  Ctr { func: u64, args: Vec<Core> },
  U6O { numb: u64 },
  F6O { numb: u64 },
  Op2 { oper: u64, val0: Box<Core>, val1: Box<Core> },
}

// A runtime rule
#[derive(Clone, Debug)]
pub struct Rule {
  pub hoas: bool,
  pub cond: Vec<Ptr>,
  pub vars: Vec<RuleVar>,
  pub core: Core,
  pub body: RuleBody,
  pub free: Vec<(u64, u64)>,
}

// A rule left-hand side variable
#[derive(Clone, Debug)]
pub struct RuleVar {
  pub param: u64,
  pub field: Option<u64>,
  pub erase: bool,
}

// The rule right-hand side body (TODO: can the RuleBodyNode Vec be unboxed?)
pub type RuleBody = (RuleBodyCell, Vec<RuleBodyNode>, u64);

// A body node
pub type RuleBodyNode = Vec<RuleBodyCell>;

// A body cell
#[derive(Copy, Clone, Debug)]
pub enum RuleBodyCell {
  Val { value: u64 }, // Fixed value, doesn't require adjustment
  Var { index: u64 }, // Link to an external variable
  Ptr { value: u64, targ: u64, slot: u64 }, // Local link, requires adjustment
}

pub type VisitFun = fn(ReduceCtx) -> bool;
pub type ApplyFun = fn(ReduceCtx) -> bool;

pub struct VisitObj {
  pub strict_map: Vec<bool>,
  pub strict_idx: Vec<u64>,
}

pub struct ApplyObj {
  pub rules: Vec<Rule>,
}

pub enum Function {
  Interpreted {
    smap: Box<[bool]>,
    visit: VisitObj,
    apply: ApplyObj,
  },
  Compiled {
    smap: Box<[bool]>,
    visit: VisitFun,
    apply: ApplyFun,
  }
}

pub type Funs = U64Map<Function>;
pub type Aris = U64Map<u64>;
pub type Nams = U64Map<String>;

pub struct Program {
  pub funs: Funs,
  pub aris: Aris,
  pub nams: Nams,
}

impl Program {
  pub fn new() -> Program {
    let mut funs = U64Map::new();
    let mut aris = U64Map::new();
    let mut nams = U64Map::new();
    // Adds the built-in functions
    for fid in 0 .. crate::runtime::precomp::PRECOMP_COUNT as usize {
      if let Some(precomp) = PRECOMP.get(fid) {
        if let Some(fs) = &precomp.funs {
          funs.insert(fid as u64, Function::Compiled {
            smap: precomp.smap.to_vec().into_boxed_slice(),
            visit: fs.visit,
            apply: fs.apply,
          });
        }
        nams.insert(fid as u64, precomp.name.to_string());
        aris.insert(fid as u64, precomp.smap.len() as u64);
      }
    }
    return Program { funs, aris, nams };
  }

  pub fn add_book(&mut self, book: &language::rulebook::RuleBook) {
    let funs : &mut Funs = &mut gen_functions(&book);
    let nams : &mut Nams = &mut gen_names(&book);
    let aris : &mut Aris = &mut U64Map::new();
    for (fid, fun) in funs.data.drain(0..).enumerate() {
      if let Some(fun) = fun {
        self.funs.insert(fid as u64, fun);
      }
    }
    for (fid, nam) in nams.data.iter().enumerate() {
      if let Some(nam) = nam {
        self.nams.insert(fid as u64, nam.clone());
      }
    }
    for (fid, smp) in &book.id_to_smap {
      self.aris.insert(*fid as u64, smp.len() as u64);
    }
  }

  pub fn add_function(&mut self, name: String, function: Function) {
    self.nams.push(name);
    self.funs.push(function);
  }
}

pub fn get_var(heap: &Heap, term: Ptr, var: &RuleVar) -> Ptr {
  let RuleVar { param, field, erase: _ } = var;
  match field {
    Some(i) => take_arg(heap, load_arg(heap, term, *param), *i),
    None    => take_arg(heap, term, *param),
  }
}

pub fn alloc_body(heap: &Heap, prog: &Program, tid: usize, term: Ptr, vars: &[RuleVar], body: &RuleBody) -> Ptr {
  //#[inline(always)]
  fn cell_to_ptr(heap: &Heap, lvar: &LocalVars, aloc: &[AtomicU64], term: Ptr, vars: &[RuleVar], cell: &RuleBodyCell) -> Ptr {
    unsafe {
      match cell {
        RuleBodyCell::Val { value } => {
          *value
        },
        RuleBodyCell::Var { index } => {
          get_var(heap, term, vars.get_unchecked(*index as usize))
        },
        RuleBodyCell::Ptr { value, targ, slot } => {
          let mut val = value + *aloc.get_unchecked(*targ as usize).as_ptr() + slot;
          // should be changed if the pointer format changes
          if get_tag(*value) <= DP1 {
            val += (*lvar.dups.as_ptr() & 0xFFF_FFFF) * EXT;
          }
          val
        }
      }
    }
  }
  // FIXME: verify the use of get_unchecked
  unsafe {
    let (cell, nodes, dupk) = body;
    let aloc = &heap.aloc[tid];
    let lvar = &heap.lvar[tid];
    for i in 0 .. nodes.len() {
      *aloc.get_unchecked(i).as_ptr() = alloc(heap, tid, (*nodes.get_unchecked(i)).len() as u64);
    };
    if *lvar.dups.as_ptr() + dupk >= (1 << 28) {
      *lvar.dups.as_ptr() = 0;
    }
    for i in 0 .. nodes.len() {
      let host = *aloc.get_unchecked(i).as_ptr() as usize;
      for j in 0 .. (*nodes.get_unchecked(i)).len() {
        let cell = (*nodes.get_unchecked(i)).get_unchecked(j);
        let ptr = cell_to_ptr(heap, lvar, aloc, term, vars, cell);
        if let RuleBodyCell::Var { .. } = cell {
          link(heap, (host + j) as u64, ptr);
        } else {
          *heap.node.get_unchecked(host + j).as_ptr() = ptr;
        }
      }
    }
    let done = cell_to_ptr(heap, lvar, aloc, term, vars, cell);
    *lvar.dups.as_ptr() += dupk;
    //println!("result: {}\n{}\n", show_ptr(done), show_term(heap, prog, done, 0));
    return done;
  }
}

pub fn get_global_name_misc(name: &str) -> Option<u64> {
  if !name.is_empty() && name.starts_with(&"$") {
    if name.starts_with(&"$0") {
      return Some(DP0);
    } else if name.starts_with(&"$1") {
      return Some(DP1);
    } else {
      return Some(VAR);
    }
  }
  return None;
}

// todo: "dups" still needs to be moved out on `alloc_body` etc.
pub fn build_function(book: &language::rulebook::RuleBook, fn_name: &str, rules: &[language::syntax::Rule]) -> Function {
  let hoas = fn_name.starts_with("F$");
  let dynrules = rules.iter().filter_map(|rule| {
    if let language::syntax::Term::Ctr { ref name, ref args } = *rule.lhs {
      let mut cond = Vec::new();
      let mut vars = Vec::new();
      let mut inps = Vec::new();
      let mut free = Vec::new();
      for (i, arg) in args.iter().enumerate() {
        match &**arg {
          language::syntax::Term::Ctr { name, args } => {
            cond.push(Ctr(*book.name_to_id.get(&*name).unwrap_or(&0), 0));
            free.push((i as u64, args.len() as u64));
            for (j, arg) in args.iter().enumerate() {
              if let language::syntax::Term::Var { ref name } = **arg {
                vars.push(RuleVar { param: i as u64, field: Some(j as u64), erase: name == "*" });
                inps.push(name.clone());
              } else {
                panic!("sorry, left-hand sides can't have nested constructors yet.");
              }
            }
          }
          language::syntax::Term::U6O { numb } => {
            cond.push(U6O(*numb as u64));
          }
          language::syntax::Term::F6O { numb } => {
            cond.push(F6O(*numb as u64));
          }
          language::syntax::Term::Var { name } => {
            cond.push(Var(0));
            vars.push(RuleVar { param: i as u64, field: None, erase: name == "*" });
            inps.push(name.clone());
          }
          _ => {
            panic!("invalid left-hand side.");
          }
        }
      }

      let core = term_to_core(book, &rule.rhs, &inps);
      let body = build_body(&core, vars.len() as u64);

      Some(Rule { hoas, cond, vars, core, body, free })
    } else {
      None
    }
  }).collect();

  let fnid = book.name_to_id.get(fn_name).unwrap();
  let smap = book.id_to_smap.get(fnid).unwrap().clone().into_boxed_slice();

  let strict_map = smap.to_vec();
  let mut strict_idx = Vec::new();
  for (i, is_strict) in smap.iter().enumerate() {
    if *is_strict {
      strict_idx.push(i as u64);
    }
  }

  Function::Interpreted {
    smap,
    visit: VisitObj { strict_map, strict_idx },
    apply: ApplyObj { rules: dynrules },
  }
}

pub fn hash<T: std::hash::Hash>(t: &T) -> u64 {
  use std::hash::Hasher;
  let mut s = std::collections::hash_map::DefaultHasher::new();
  t.hash(&mut s);
  s.finish()
}

pub fn gen_functions(book: &language::rulebook::RuleBook) -> U64Map<Function> {
  let mut funs: U64Map<Function> = U64Map::new();
  for (name, rules_info) in &book.rule_group {
    let fnid = book.name_to_id.get(name).unwrap_or(&0);
    let func = build_function(book, &name, &rules_info.1);
    funs.insert(*fnid, func);
  }
  funs
}

pub fn gen_names(book: &language::rulebook::RuleBook) -> U64Map<String> {
  return U64Map::from_hashmap(&mut book.id_to_name.clone());
}

/// converts a language term to a runtime term
pub fn term_to_core(book: &language::rulebook::RuleBook, term: &language::syntax::Term, inps: &[String]) -> Core {
  fn convert_oper(oper: &language::syntax::Oper) -> u64 {
    match oper {
      language::syntax::Oper::Add => ADD,
      language::syntax::Oper::Sub => SUB,
      language::syntax::Oper::Mul => MUL,
      language::syntax::Oper::Div => DIV,
      language::syntax::Oper::Mod => MOD,
      language::syntax::Oper::And => AND,
      language::syntax::Oper::Or  => OR,
      language::syntax::Oper::Xor => XOR,
      language::syntax::Oper::Shl => SHL,
      language::syntax::Oper::Shr => SHR,
      language::syntax::Oper::Ltn => LTN,
      language::syntax::Oper::Lte => LTE,
      language::syntax::Oper::Eql => EQL,
      language::syntax::Oper::Gte => GTE,
      language::syntax::Oper::Gtn => GTN,
      language::syntax::Oper::Neq => NEQ,
    }
  }

  #[allow(clippy::identity_op)]
  fn convert_term(
    term: &language::syntax::Term,
    book: &language::rulebook::RuleBook,
    depth: u64,
    vars: &mut Vec<String>,
  ) -> Core {
    match term {
      language::syntax::Term::Var { name } => {
        if let Some((idx, _)) = vars.iter().enumerate().rev().find(|(_, var)| var == &name) {
          Core::Var { bidx: idx as u64 }
        } else {
          match get_global_name_misc(name) {
            Some(VAR) => Core::Glo { glob: hash(name), misc: VAR },
            Some(DP0) => Core::Glo { glob: hash(&name[2..].to_string()), misc: DP0 },
            Some(DP1) => Core::Glo { glob: hash(&name[2..].to_string()), misc: DP1 },
            _ => panic!("Unexpected error."),
          }
        }
      }
      language::syntax::Term::Dup { nam0, nam1, expr, body } => {
        let eras = (nam0 == "*", nam1 == "*");
        let glob = if get_global_name_misc(nam0).is_some() { hash(&nam0[2..].to_string()) } else { 0 };
        let expr = Box::new(convert_term(expr, book, depth + 0, vars));
        vars.push(nam0.clone());
        vars.push(nam1.clone());
        let body = Box::new(convert_term(body, book, depth + 2, vars));
        vars.pop();
        vars.pop();
        Core::Dup { eras, glob, expr, body }
      }
      language::syntax::Term::Sup { val0, val1 } => {
        let val0 = Box::new(convert_term(val0, book, depth + 0, vars));
        let val1 = Box::new(convert_term(val1, book, depth + 0, vars));
        Core::Sup { val0, val1 }
      }
      language::syntax::Term::Lam { name, body } => {
        let glob = if get_global_name_misc(name).is_some() { hash(name) } else { 0 };
        let eras = name == "*";
        vars.push(name.clone());
        let body = Box::new(convert_term(body, book, depth + 1, vars));
        vars.pop();
        Core::Lam { eras, glob, body }
      }
      language::syntax::Term::Let { name, expr, body } => {
        let expr = Box::new(convert_term(expr, book, depth + 0, vars));
        vars.push(name.clone());
        let body = Box::new(convert_term(body, book, depth + 1, vars));
        vars.pop();
        Core::Let { expr, body }
      }
      language::syntax::Term::App { func, argm } => {
        let func = Box::new(convert_term(func, book, depth + 0, vars));
        let argm = Box::new(convert_term(argm, book, depth + 0, vars));
        Core::App { func, argm }
      }
      language::syntax::Term::Ctr { name, args } => {
        let term_func = *book.name_to_id.get(name).unwrap_or_else(|| panic!("unbound symbol: {}", name));
        let term_args = args.iter().map(|arg| convert_term(arg, book, depth + 0, vars)).collect();
        if *book.ctr_is_fun.get(name).unwrap_or(&false) {
          Core::Fun { func: term_func, args: term_args }
        } else {
          Core::Ctr { func: term_func, args: term_args }
        }
      }
      language::syntax::Term::U6O { numb } => Core::U6O { numb: *numb },
      language::syntax::Term::F6O { numb } => Core::F6O { numb: *numb },
      language::syntax::Term::Op2 { oper, val0, val1 } => {
        let oper = convert_oper(oper);
        let val0 = Box::new(convert_term(val0, book, depth + 0, vars));
        let val1 = Box::new(convert_term(val1, book, depth + 1, vars));
        Core::Op2 { oper, val0, val1 }
      }
    }
  }

  let mut vars = inps.to_vec();
  convert_term(term, book, 0, &mut vars)
}

pub fn build_body(term: &Core, free_vars: u64) -> RuleBody {
  fn link(nodes: &mut [RuleBodyNode], targ: u64, slot: u64, elem: RuleBodyCell) {
    nodes[targ as usize][slot as usize] = elem;
    if let RuleBodyCell::Ptr { value, targ: var_targ, slot: var_slot } = elem {
      let tag = get_tag(value);
      if tag <= VAR {
        nodes[var_targ as usize][(var_slot + (tag & 0x01)) as usize] = RuleBodyCell::Ptr { value: Arg(0), targ, slot };
      }
    }
  }
  fn alloc_lam(lams: &mut std::collections::HashMap<u64, u64>, nodes: &mut Vec<RuleBodyNode>, glob: u64) -> u64 {
    if let Some(targ) = lams.get(&glob) {
      *targ
    } else {
      let targ = nodes.len() as u64;
      nodes.push(vec![RuleBodyCell::Val { value: 0 }; 2]);
      link(nodes, targ, 0, RuleBodyCell::Val { value: Era() });
      if glob != 0 {
        lams.insert(glob, targ);
      }
      return targ;
    }
  }
  fn alloc_dup(dups: &mut HashMap<u64, (u64,u64)>, nodes: &mut Vec<RuleBodyNode>, links: &mut Vec<(u64, u64, RuleBodyCell)>, dupk: &mut u64, glob: u64) -> (u64, u64) {
    if let Some(got) = dups.get(&glob) {
      return got.clone();
    } else {
      let dupc = *dupk;
      let targ = nodes.len() as u64;
      *dupk += 1;
      nodes.push(vec![RuleBodyCell::Val { value: 0 }; 3]);
      links.push((targ, 0, RuleBodyCell::Val { value: Era() }));
      links.push((targ, 1, RuleBodyCell::Val { value: Era() }));
      if glob != 0 {
        dups.insert(glob, (targ, dupc));
      }
      return (targ, dupc);
    }
  }
  fn gen_elems(
    term: &Core,
    dupk: &mut u64,
    vars: &mut Vec<RuleBodyCell>,
    lams: &mut HashMap<u64, u64>,
    dups: &mut HashMap<u64, (u64,u64)>,
    nodes: &mut Vec<RuleBodyNode>,
    links: &mut Vec<(u64, u64, RuleBodyCell)>,
  ) -> RuleBodyCell {
    match term {
      Core::Var { bidx } => {
        if *bidx < vars.len() as u64 {
          vars[*bidx as usize]
        } else {
          panic!("unbound variable.");
        }
      }
      Core::Glo { glob, misc } => {
        match *misc {
          VAR => {
            let targ = alloc_lam(lams, nodes, *glob);
            return RuleBodyCell::Ptr { value: Var(0), targ, slot: 0 };
          }
          DP0 => {
            let (targ, dupc) = alloc_dup(dups, nodes, links, dupk, *glob);
            return RuleBodyCell::Ptr { value: Dp0(dupc, 0), targ, slot: 0 };
          }
          DP1 => {
            let (targ, dupc) = alloc_dup(dups, nodes, links, dupk, *glob);
            return RuleBodyCell::Ptr { value: Dp1(dupc, 0), targ, slot: 0 };
          }
          _ => {
            panic!("Unexpected error.");
          }
        }
      }
      Core::Dup { eras: _, glob, expr, body } => {
        let (targ, dupc) = alloc_dup(dups, nodes, links, dupk, *glob);
        let expr = gen_elems(expr, dupk, vars, lams, dups, nodes, links);
        links.push((targ, 2, expr));
        //let dupc = 0; // FIXME remove
        vars.push(RuleBodyCell::Ptr { value: Dp0(dupc, 0), targ, slot: 0 });
        vars.push(RuleBodyCell::Ptr { value: Dp1(dupc, 0), targ, slot: 0 });
        let body = gen_elems(body, dupk, vars, lams, dups, nodes, links);
        vars.pop();
        vars.pop();
        body
      }
      Core::Sup { val0, val1 } => {
        let dupc = *dupk;
        let targ = nodes.len() as u64;
        *dupk += 1;
        nodes.push(vec![RuleBodyCell::Val { value: 0 }; 2]);
        let val0 = gen_elems(val0, dupk, vars, lams, dups, nodes, links);
        links.push((targ, 0, val0));
        let val1 = gen_elems(val1, dupk, vars, lams, dups, nodes, links);
        links.push((targ, 1, val1));
        //let dupc = 0; // FIXME remove
        RuleBodyCell::Ptr { value: Sup(dupc, 0), targ, slot: 0 }
      }
      Core::Let { expr, body } => {
        let expr = gen_elems(expr, dupk, vars, lams, dups, nodes, links);
        vars.push(expr);
        let body = gen_elems(body, dupk, vars, lams, dups, nodes, links);
        vars.pop();
        body
      }
      Core::Lam { eras: _, glob, body } => {
        let targ = alloc_lam(lams, nodes, *glob);
        let var = RuleBodyCell::Ptr { value: Var(0), targ, slot: 0 };
        vars.push(var);
        let body = gen_elems(body, dupk, vars, lams, dups, nodes, links);
        links.push((targ, 1, body));
        vars.pop();
        RuleBodyCell::Ptr { value: Lam(0), targ, slot: 0 }
      }
      Core::App { func, argm } => {
        let targ = nodes.len() as u64;
        nodes.push(vec![RuleBodyCell::Val { value: 0 }; 2]);
        let func = gen_elems(func, dupk, vars, lams, dups, nodes, links);
        links.push((targ, 0, func));
        let argm = gen_elems(argm, dupk, vars, lams, dups, nodes, links);
        links.push((targ, 1, argm));
        RuleBodyCell::Ptr { value: App(0), targ, slot: 0 }
      }
      Core::Fun { func, args } => {
        if !args.is_empty() {
          let targ = nodes.len() as u64;
          nodes.push(vec![RuleBodyCell::Val { value: 0 }; args.len() as usize]);
          for (i, arg) in args.iter().enumerate() {
            let arg = gen_elems(arg, dupk, vars, lams, dups, nodes, links);
            links.push((targ, i as u64, arg));
          }
          RuleBodyCell::Ptr { value: Fun(*func, 0), targ, slot: 0 }
        } else {
          RuleBodyCell::Val { value: Fun(*func, 0) }
        }
      }
      Core::Ctr { func, args } => {
        if !args.is_empty() {
          let targ = nodes.len() as u64;
          nodes.push(vec![RuleBodyCell::Val { value: 0 }; args.len() as usize]);
          for (i, arg) in args.iter().enumerate() {
            let arg = gen_elems(arg, dupk, vars, lams, dups, nodes, links);
            links.push((targ, i as u64, arg));
          }
          RuleBodyCell::Ptr { value: Ctr(*func, 0), targ, slot: 0 }
        } else {
          RuleBodyCell::Val { value: Ctr(*func, 0) }
        }
      }
      Core::U6O { numb } => RuleBodyCell::Val { value: U6O(*numb as u64) },
      Core::F6O { numb } => RuleBodyCell::Val { value: F6O(*numb as u64) },
      Core::Op2 { oper, val0, val1 } => {
        let targ = nodes.len() as u64;
        nodes.push(vec![RuleBodyCell::Val { value: 0 }; 2]);
        let val0 = gen_elems(val0, dupk, vars, lams, dups, nodes, links);
        links.push((targ, 0, val0));
        let val1 = gen_elems(val1, dupk, vars, lams, dups, nodes, links);
        links.push((targ, 1, val1));
        RuleBodyCell::Ptr { value: Op2(*oper, 0), targ, slot: 0 }
      }
    }
  }

  let mut links: Vec<(u64, u64, RuleBodyCell)> = Vec::new();
  let mut nodes: Vec<RuleBodyNode> = Vec::new();
  let mut lams: HashMap<u64, u64> = HashMap::new();
  let mut dups: HashMap<u64, (u64,u64)> = HashMap::new();
  let mut vars: Vec<RuleBodyCell> = (0..free_vars).map(|i| RuleBodyCell::Var { index: i }).collect();
  let mut dupk: u64 = 0;

  let elem = gen_elems(term, &mut dupk, &mut vars, &mut lams, &mut dups, &mut nodes, &mut links);
  for (targ, slot, elem) in links {
    link(&mut nodes, targ, slot, elem);
  }

  (elem, nodes, dupk)
}

pub fn alloc_closed_core(heap: &Heap, prog: &Program, tid: usize, term: &Core) -> u64 {
  let host = alloc(heap, tid, 1);
  let body = build_body(term, 0);
  let term = alloc_body(heap, prog, tid, 0, &[], &body);
  link(heap, host, term);
  host
}

pub fn alloc_term(heap: &Heap, prog: &Program, tid: usize, book: &language::rulebook::RuleBook, term: &language::syntax::Term) -> u64 {
  alloc_closed_core(heap, prog, tid, &term_to_core(book, term, &vec![]))
}

pub fn make_string(heap: &Heap, tid: usize, text: &str) -> Ptr {
  let mut term = Ctr(STRING_NIL, 0);
  for chr in text.chars().rev() { // TODO: reverse
    let ctr0 = alloc(heap, tid, 2);
    link(heap, ctr0 + 0, U6O(chr as u64));
    link(heap, ctr0 + 1, term);
    term = Ctr(STRING_CONS, ctr0);
  }
  return term;
}

// runtime/base/reducer.rs

pub use crate::runtime::{*};
use crossbeam::utils::{Backoff};
use std::collections::HashSet;
use std::sync::atomic::{AtomicBool, AtomicUsize, AtomicU64, Ordering};

pub struct ReduceCtx<'a> {
  pub heap  : &'a Heap,
  pub prog  : &'a Program,
  pub tid   : usize,
  pub hold  : bool,
  pub term  : Ptr,
  pub visit : &'a VisitQueue,
  pub redex : &'a RedexBag,
  pub cont  : &'a mut u64,
  pub host  : &'a mut u64,
}

// HVM's reducer is a finite stack machine with 4 possible states:
// - visit: visits a node and add its children to the visit stack ~> visit, apply, blink
// - apply: reduces a node, applying a rewrite rule               ~> visit, apply, blink, halt
// - blink: pops the visit stack and enters visit mode            ~> visit, blink, steal
// - steal: attempt to steal work from the global pool            ~> visit, steal, halt
// Since Rust doesn't have `goto`, the loop structure below is used.
// It allows performing any allowed state transition with a jump.
//   main {
//     work {
//       visit { ... }
//       apply { ... }
//       complete
//     }
//     blink { ... }
//     steal { ... }
//   }

pub fn is_whnf(term: Ptr) -> bool {
  match get_tag(term) {
    ERA => true,
    LAM => true,
    SUP => true,
    CTR => true,
    U60 => true,
    F60 => true,
    _   => false,
  }
}

pub fn reduce(heap: &Heap, prog: &Program, tids: &[usize], root: u64, full: bool, debug: bool) -> Ptr {
  // Halting flag
  let stop = &AtomicUsize::new(1);
  let barr = &Barrier::new(tids.len());
  let locs = &tids.iter().map(|x| AtomicU64::new(u64::MAX)).collect::<Vec<AtomicU64>>();

  // Spawn a thread for each worker
  std::thread::scope(|s| {
    for tid in tids {
      s.spawn(move || {
        reducer(heap, prog, tids, stop, barr, locs, root, *tid, full, debug);
        //println!("[{}] done", tid);
      });
    }
  });

  // Return whnf term ptr
  return load_ptr(heap, root);
}

pub fn reducer(
  heap: &Heap,
  prog: &Program,
  tids: &[usize],
  stop: &AtomicUsize,
  barr: &Barrier,
  locs: &[AtomicU64],
  root: u64,
  tid: usize,
  full: bool,
  debug: bool,
) {

  // State Stacks
  let redex = &heap.rbag;
  let visit = &heap.vstk[tid];
  let bkoff = &Backoff::new();
  let hold  = tids.len() <= 1;
  let seen  = &mut HashSet::new();

  // State Vars
  let (mut cont, mut host) = if tid == tids[0] {
    (REDEX_CONT_RET, root)
  } else {
    (0, u64::MAX)
  };

  // Debug Printer
  let print = |tid: usize, host: u64| {
    barr.wait(stop);
    locs[tid].store(host, Ordering::SeqCst);
    barr.wait(stop);
    if tid == tids[0] {
      println!("{}\n----------------", show_at(heap, prog, root, locs));
    }
    barr.wait(stop);
  };

  // State Machine
  'main: loop {
    'init: {
      if host == u64::MAX {
        break 'init;
      }
      'work: loop {
        'visit: loop {
          let term = load_ptr(heap, host);
          if debug {
            print(tid, host);
          }
          match get_tag(term) {
            APP => {
              if app::visit(ReduceCtx { heap, prog, tid, hold, term, visit, redex, cont: &mut cont, host: &mut host }) {
                continue 'visit;
              } else {
                break 'work;
              }
            }
            DP0 | DP1 => {
              match acquire_lock(heap, tid, term) {
                Err(locker_tid) => {
                  continue 'work;
                }
                Ok(_) => {
                  // If the term changed, release lock and try again
                  if term != load_ptr(heap, host) {
                    release_lock(heap, tid, term);
                    continue 'visit;
                  } else {
                    if dup::visit(ReduceCtx { heap, prog, tid, hold, term, visit, redex, cont: &mut cont, host: &mut host }) {
                      continue 'visit;
                    } else {
                      break 'work;
                    }
                  }
                }
              }
            }
            OP2 => {
              if op2::visit(ReduceCtx { heap, prog, tid, hold, term, visit, redex, cont: &mut cont, host: &mut host }) {
                continue 'visit;
              } else {
                break 'work;
              }
            }
            FUN | CTR => {
              let fid = get_ext(term);
//[[CODEGEN:FAST-VISIT]]//
              match &prog.funs.get(&fid) {
                Some(Function::Interpreted { smap: fn_smap, visit: fn_visit, apply: fn_apply }) => {
                  if fun::visit(ReduceCtx { heap, prog, tid, hold, term, visit, redex, cont: &mut cont, host: &mut host }, &fn_visit.strict_idx) {
                    continue 'visit;
                  } else {
                    break 'visit;
                  }
                }
                Some(Function::Compiled { smap: fn_smap, visit: fn_visit, apply: fn_apply }) => {
                  if fn_visit(ReduceCtx { heap, prog, tid, hold, term, visit, redex, cont: &mut cont, host: &mut host }) {
                    continue 'visit;
                  } else {
                    break 'visit;
                  }
                }
                None => {
                  break 'visit;
                }
              }
            }
            _ => {
              break 'visit;
            }
          }
        }
        'call: loop {
          'apply: loop {
            let term = load_ptr(heap, host);
            if debug {
              print(tid, host);
            }
            // Apply rewrite rules
            match get_tag(term) {
              APP => {
                if app::apply(ReduceCtx { heap, prog, tid, hold, term, visit, redex, cont: &mut cont, host: &mut host }) {
                  continue 'work;
                } else {
                  break 'apply;
                }
              }
              DP0 | DP1 => {
                if dup::apply(ReduceCtx { heap, prog, tid, hold, term, visit, redex, cont: &mut cont, host: &mut host }) {
                  release_lock(heap, tid, term);
                  continue 'work;
                } else {
                  release_lock(heap, tid, term);
                  break 'apply;
                }
              }
              OP2 => {
                if op2::apply(ReduceCtx { heap, prog, tid, hold, term, visit, redex, cont: &mut cont, host: &mut host }) {
                  continue 'work;
                } else {
                  break 'apply;
                }
              }
              FUN | CTR => {
                let fid = get_ext(term);
//[[CODEGEN:FAST-APPLY]]//
                match &prog.funs.get(&fid) {
                  Some(Function::Interpreted { smap: fn_smap, visit: fn_visit, apply: fn_apply }) => {
                    if fun::apply(ReduceCtx { heap, prog, tid, hold, term, visit, redex, cont: &mut cont, host: &mut host }, fid, fn_visit, fn_apply) {
                      continue 'work;
                    } else {
                      break 'apply;
                    }
                  }
                  Some(Function::Compiled { smap: fn_smap, visit: fn_visit, apply: fn_apply }) => {
                    if fn_apply(ReduceCtx { heap, prog, tid, hold, term, visit, redex, cont: &mut cont, host: &mut host }) {
                      continue 'work;
                    } else {
                      break 'apply;
                    }
                  }
                  None => {
                    break 'apply;
                  }
                }
              }
              _ => {
                break 'apply;
              }
            }
          }
          // If root is on WHNF, halt
          if cont == REDEX_CONT_RET {
            //println!("done {}", show_at(heap, prog, host, &[]));
            stop.fetch_sub(1, Ordering::Relaxed);
            if full && !seen.contains(&host) {
              seen.insert(host);
              let term = load_ptr(heap, host);
              match get_tag(term) {
                LAM => {
                  stop.fetch_add(1, Ordering::Relaxed);
                  visit.push(new_visit(get_loc(term, 1), hold, cont));
                }
                APP => {
                  stop.fetch_add(2, Ordering::Relaxed);
                  visit.push(new_visit(get_loc(term, 0), hold, cont));
                  visit.push(new_visit(get_loc(term, 1), hold, cont));
                }
                SUP => {
                  stop.fetch_add(2, Ordering::Relaxed);
                  visit.push(new_visit(get_loc(term, 0), hold, cont));
                  visit.push(new_visit(get_loc(term, 1), hold, cont));
                }
                DP0 => {
                  stop.fetch_add(1, Ordering::Relaxed);
                  visit.push(new_visit(get_loc(term, 2), hold, cont));
                }
                DP1 => {
                  stop.fetch_add(1, Ordering::Relaxed);
                  visit.push(new_visit(get_loc(term, 2), hold, cont));
                }
                CTR | FUN => {
                  let arit = arity_of(&prog.aris, term);
                  if arit > 0 {
                    stop.fetch_add(arit as usize, Ordering::Relaxed);
                    for i in 0 .. arit {
                      visit.push(new_visit(get_loc(term, i), hold, cont));
                    }
                  }
                }
                _ => {}
              }
            }
            break 'work;
          }
          // Otherwise, try reducing the parent redex
          if let Some((new_cont, new_host)) = redex.complete(cont) {
            cont = new_cont;
            host = new_host;
            continue 'call;
          }
          // Otherwise, visit next pointer
          break 'work;
        }
      }
      'blink: loop {
        // If available, visit a new location
        if let Some((new_cont, new_host)) = visit.pop() {
          cont = new_cont;
          host = new_host;
          continue 'main;
        }
        // Otherwise, we have nothing to do
        else {
          break 'blink;
        }
      }
    }
    'steal: loop {
      if debug {
        //println!("[{}] steal delay={}", tid, delay.len());
        print(tid, u64::MAX);
      }
      //println!("[{}] steal", tid);
      if stop.load(Ordering::Relaxed) == 0 {
        //println!("[{}] stop", tid);
        break 'main;
      } else {
        for victim_tid in tids {
          if *victim_tid != tid {
            if let Some((new_cont, new_host)) = heap.vstk[*victim_tid].steal() {
              cont = new_cont;
              host = new_host;
              //println!("stolen");
              continue 'main;
            }
          }
        }
        bkoff.snooze();
        continue 'steal;
      }
    }
  }
}

pub fn normalize(heap: &Heap, prog: &Program, tids: &[usize], host: u64, debug: bool) -> Ptr {
  let mut cost = get_cost(heap);
  loop {
    reduce(heap, prog, tids, host, true, debug);
    let new_cost = get_cost(heap);
    if new_cost != cost {
      cost = new_cost;
    } else {
      break;
    }
  }
  load_ptr(heap, host)
}

use crossbeam::utils::{CachePadded};
use std::sync::atomic::{AtomicU64, AtomicUsize, Ordering};

// Allocator
// ---------

pub struct AllocatorNext {
  pub cell: AtomicU64,
  pub area: AtomicU64,
}

pub struct Allocator {
  pub tids: usize,
  pub data: Box<[AtomicU64]>,
  pub used: Box<[AtomicU64]>,
  pub next: Box<[CachePadded<AllocatorNext>]>,
}

pub const PAGE_SIZE : usize = 4096;

impl Allocator {

  pub fn new(tids: usize) -> Allocator {
    let mut next = vec![];
    for i in 0 .. tids {
      let cell = AtomicU64::new(u64::MAX);
      let area = AtomicU64::new((crate::runtime::HEAP_SIZE / PAGE_SIZE / tids * i) as u64);
      next.push(CachePadded::new(AllocatorNext { cell, area }));
    }
    let data = crate::runtime::new_atomic_u64_array(crate::runtime::HEAP_SIZE);
    let used = crate::runtime::new_atomic_u64_array(crate::runtime::HEAP_SIZE / PAGE_SIZE);
    let next = next.into_boxed_slice();
    Allocator { tids, data, used, next }
  }

  pub fn alloc(&self, tid: usize, arity: u64) -> u64 {
    unsafe {
      let lvar = &heap.lvar[tid];
      if arity == 0 {
        0
      } else {
        let mut length = 0;
        loop {
          // Loads value on cursor
          let val = self.data.get_unchecked(*lvar.next.as_mut_ptr() as usize).load(Ordering::Relaxed);
          // If it is empty, increment length; otherwise, reset it
          length = if val == 0 { length + 1 } else { 0 };
          // Moves the cursor forward
          *lvar.next.as_mut_ptr() += 1;
          // If it is out of bounds, warp around
          if *lvar.next.as_mut_ptr() >= *lvar.amax.as_mut_ptr() {
            length = 0;
            *lvar.next.as_mut_ptr() = *lvar.amin.as_mut_ptr();
          }
          // If length equals arity, allocate that space
          if length == arity {
            return *lvar.next.as_mut_ptr() - length;
          }
        }
      }
    }
  }

  pub fn free(&self, tid: usize, loc: u64, arity: u64) {
    for i in 0 .. arity {
      unsafe { self.data.get_unchecked((loc + i) as usize) }.store(0, Ordering::Relaxed);
    }
  }

  pub fn arena_alloc(&self, tid: usize, arity: u64) -> u64 {
    let next = unsafe { self.next.get_unchecked(tid) };
    // Attempts to allocate on this thread's owned area
    let aloc = next.cell.fetch_add(arity, Ordering::Relaxed);
    let area = aloc / PAGE_SIZE as u64;
    if aloc != u64::MAX && (aloc + arity) / PAGE_SIZE as u64 == area {
      unsafe { self.used.get_unchecked(area as usize) }.fetch_add(arity, Ordering::Relaxed);
      //println!("[{}] old_alloc {} at {}, used={} ({} {})", tid, arity, aloc, self.used[area as usize].load(Ordering::Relaxed), area, (aloc + arity) / PAGE_SIZE as u64);
      return aloc;
    }
    // If we can't, attempt to allocate on a new area
    let mut area = next.area.load(Ordering::Relaxed)  % ((crate::runtime::HEAP_SIZE / PAGE_SIZE) as u64);
    loop {
      if unsafe { self.used.get_unchecked(area as usize) }.compare_exchange_weak(0, arity, Ordering::Relaxed, Ordering::Relaxed).is_ok() {
        let aloc = area * PAGE_SIZE as u64;
        next.cell.store(aloc + arity, Ordering::Relaxed);
        next.area.store((area + 1) % ((crate::runtime::HEAP_SIZE / PAGE_SIZE) as u64), Ordering::Relaxed);
        //println!("[{}] new_alloc {} at {}, used={}", tid, arity, aloc, self.used[area as usize].load(Ordering::Relaxed));
        return aloc;
      } else {
        area = (area + 1) % ((crate::runtime::HEAP_SIZE / PAGE_SIZE) as u64);
      }
    }
  }

  pub fn arena_free(&self, tid: usize, loc: u64, arity: u64) {
    //for i in 0 .. arity { unsafe { self.data.get_unchecked((loc + i) as usize) }.store(0, Ordering::Relaxed); }
    let area = loc / PAGE_SIZE as u64;
    let used = unsafe { self.used.get_unchecked(area as usize) }.fetch_sub(arity, Ordering::Relaxed);
    //println!("[{}] free {} at {}, used={}", tid, arity, loc, self.used[area as usize].load(Ordering::Relaxed));
  }

}

// runtime/base/barrier.rs

use std::sync::atomic::{AtomicUsize, AtomicBool, Ordering, fence};

pub struct Barrier {
  pub done: AtomicUsize,
  pub pass: AtomicUsize,
  pub tids: usize,
}

impl Barrier {
  pub fn new(tids: usize) -> Barrier {
    Barrier {
      done: AtomicUsize::new(0),
      pass: AtomicUsize::new(0),
      tids: tids,
    }
  }

  pub fn wait(&self, stop: &AtomicUsize) {
    let pass = self.pass.load(Ordering::Relaxed);
    if self.done.fetch_add(1, Ordering::SeqCst) == self.tids - 1 {
      self.done.store(0, Ordering::Relaxed);
      self.pass.store(pass + 1, Ordering::Release);
    } else {
      while stop.load(Ordering::Relaxed) != 0 && self.pass.load(Ordering::Relaxed) == pass {}
      fence(Ordering::Acquire);
    }
  }
}

// runtime/base/f64.rs

type F60 = u64;

#[inline(always)]
pub fn new(a: f64) -> F60 {
  let b = a.to_bits();
  if b & 0b1111 > 8 {
    return (b >> 4) + 1;
  } else {
    return b >> 4;
  }
}

#[inline(always)]
pub fn val(a: F60) -> f64 {
  f64::from_bits(a << 4)
}

#[inline(always)]
pub fn add(a: F60, b: F60) -> F60 {
  return new(val(a) + val(b));
}

#[inline(always)]
pub fn sub(a: F60, b: F60) -> F60 {
  return new(val(a) - val(b));
}

#[inline(always)]
pub fn mul(a: F60, b: F60) -> F60 {
  return new(val(a) * val(b));
}

#[inline(always)]
pub fn div(a: F60, b: F60) -> F60 {
  return new(val(a) / val(b));
}

#[inline(always)]
pub fn mdl(a: F60, b: F60) -> F60 {
  return new(val(a) % val(b));
}

#[inline(always)]
pub fn and(a: F60, b: F60) -> F60 {
  return new(f64::cos(val(a)) + f64::sin(val(b)));
}

#[inline(always)]
pub fn or(a: F60, b: F60) -> F60 {
  return new(f64::atan2(val(a), val(b)));
}

#[inline(always)]
pub fn shl(a: F60, b: F60) -> F60 {
  return new(val(b).powf(val(a)));
}

#[inline(always)]
pub fn shr(a: F60, b: F60) -> F60 {
  return new(val(a).log(val(b)));
}

#[inline(always)]
pub fn xor(a: F60, b: F60) -> F60 {
  return new(val(a).ceil() + val(a).floor());
}

#[inline(always)]
pub fn ltn(a: F60, b: F60) -> F60 {
  return new(if val(a) < val(b) { 1.0 } else { 0.0 });
}

#[inline(always)]
pub fn lte(a: F60, b: F60) -> F60 {
  return new(if val(a) <= val(b) { 1.0 } else { 0.0 });
}

#[inline(always)]
pub fn eql(a: F60, b: F60) -> F60 {
  return new(if val(a) == val(b) { 1.0 } else { 0.0 });
}

#[inline(always)]
pub fn gte(a: F60, b: F60) -> F60 {
  return new(if val(a) >= val(b) { 1.0 } else { 0.0 });
}

#[inline(always)]
pub fn gtn(a: F60, b: F60) -> F60 {
  return new(if val(a) > val(b) { 1.0 } else { 0.0 });
}

#[inline(always)]
pub fn neq(a: F60, b: F60) -> F60 {
  return new(if val(a) != val(b) { 1.0 } else { 0.0 });
}

#[inline(always)]
pub fn show(a: F60) -> String {
  let txt = format!("{}", val(a));
  if txt.find(".").is_none() {
    return format!("{}.0", txt);
  } else {
    return txt;
  }
}

// runtime/base/mod.rs

//pub mod allocator;

pub mod f60;
pub mod u60;

pub mod barrier;
pub mod redex_bag;
pub mod u64_map;
pub mod visit_queue;

pub use barrier::{*};
pub use redex_bag::{*};
pub use u64_map::{*};
pub use visit_queue::{*};

// runtime/base/redex_bag.rs

// Redex Bag
// ---------
// Concurrent bag featuring insert, read and modify. No pop.

use crossbeam::utils::{CachePadded};
use std::sync::atomic::{AtomicU64, AtomicUsize, Ordering};

pub const REDEX_BAG_SIZE : usize = 1 << 26;
pub const REDEX_CONT_RET : u64 = 0x3FFFFFF; // signals to return

// - 32 bits: host
// - 26 bits: cont
// -  6 bits: left
pub type Redex = u64;

pub struct RedexBag {
  tids: usize,
  next: Box<[CachePadded<AtomicUsize>]>,
  data: Box<[AtomicU64]>,
}

pub fn new_redex(host: u64, cont: u64, left: u64) -> Redex {
  return (host << 32) | (cont << 6) | left;
}

pub fn get_redex_host(redex: Redex) -> u64 {
  return redex >> 32;
}

pub fn get_redex_cont(redex: Redex) -> u64 {
  return (redex >> 6) & 0x3FFFFFF;
}

pub fn get_redex_left(redex: Redex) -> u64 {
  return redex & 0x3F;
}

impl RedexBag {
  pub fn new(tids: usize) -> RedexBag {
    let mut next = vec![];
    for _ in 0 .. tids {
      next.push(CachePadded::new(AtomicUsize::new(0)));
    }
    let next = next.into_boxed_slice();
    let data = crate::runtime::new_atomic_u64_array(REDEX_BAG_SIZE);
    return RedexBag { tids, next, data };
  }

  //pub fn min_index(&self, tid: usize) -> usize {
    //return REDEX_BAG_SIZE / self.tids * (tid + 0);
  //}

  //pub fn max_index(&self, tid: usize) -> usize {
    //return std::cmp::min(REDEX_BAG_SIZE / self.tids * (tid + 1), REDEX_CONT_RET as usize - 1);
  //}

  #[inline(always)]
  pub fn insert(&self, tid: usize, redex: u64) -> u64 {
    loop {
      let index = unsafe { self.next.get_unchecked(tid) }.fetch_add(1, Ordering::Relaxed);
      if index + 2 >= REDEX_BAG_SIZE {
        unsafe { self.next.get_unchecked(tid) }.store(0, Ordering::Relaxed);
      }
      if unsafe { self.data.get_unchecked(index) }.compare_exchange_weak(0, redex, Ordering::Relaxed, Ordering::Relaxed).is_ok() {
        return index as u64;
      }
    }
  }

  #[inline(always)]
  pub fn complete(&self, index: u64) -> Option<(u64,u64)> {
    let redex = unsafe { self.data.get_unchecked(index as usize) }.fetch_sub(1, Ordering::Relaxed);
    if get_redex_left(redex) == 1 {
      unsafe { self.data.get_unchecked(index as usize) }.store(0, Ordering::Relaxed);
      return Some((get_redex_cont(redex), get_redex_host(redex)));
    } else {
      return None;
    }
  }
}

// runtime/base/u60.rs

// Implements u60: 60-bit unsigned integers using u64 and u128

type U60 = u64;

#[inline(always)]
pub fn new(a: u64) -> U60 {
  return a & 0xFFF_FFFF_FFFF_FFFF;
}

#[inline(always)]
pub fn val(a: u64) -> U60 {
  return a;
}

#[inline(always)]
pub fn add(a: U60, b: U60) -> U60 {
  return new(a + b);
}

#[inline(always)]
pub fn sub(a: U60, b: U60) -> U60 {
  return if a >= b { a - b } else { 0x1000000000000000 - (b - a) };
}

#[inline(always)]
pub fn mul(a: U60, b: U60) -> U60 {
  return new((a as u128 * b as u128) as u64);
}

#[inline(always)]
pub fn div(a: U60, b: U60) -> U60 {
  return a / b;
}

#[inline(always)]
pub fn mdl(a: U60, b: U60) -> U60 {
  return a % b;
}

#[inline(always)]
pub fn and(a: U60, b: U60) -> U60 {
  return a & b;
}

#[inline(always)]
pub fn or(a: U60, b: U60) -> U60 {
  return a | b;
}

#[inline(always)]
pub fn xor(a: U60, b: U60) -> U60 {
  return a ^ b;
}

#[inline(always)]
pub fn shl(a: U60, b: U60) -> U60 {
  return new(a << b);
}

#[inline(always)]
pub fn shr(a: U60, b: U60) -> U60 {
  return a >> b;
}

#[inline(always)]
pub fn ltn(a: U60, b: U60) -> U60 {
  return if a < b { 1 } else { 0 };
}

#[inline(always)]
pub fn lte(a: U60, b: U60) -> U60 {
  return if a <= b { 1 } else { 0 };
}

#[inline(always)]
pub fn eql(a: U60, b: U60) -> U60 {
  return if a == b { 1 } else { 0 };
}

#[inline(always)]
pub fn gte(a: U60, b: U60) -> U60 {
  return if a >= b { 1 } else { 0 };
}

#[inline(always)]
pub fn gtn(a: U60, b: U60) -> U60 {
  return if a > b { 1 } else { 0 };
}

#[inline(always)]
pub fn neq(a: U60, b: U60) -> U60 {
  return if a != b { 1 } else { 0 };
}

#[inline(always)]
pub fn show(a: U60) -> String {
  return format!("{}", a);
}

// runtime/base/visit_queue.rs

// Visit Queue
// -----------
// A concurrent task-stealing queue featuring push, pop and steal.

use crossbeam::utils::{CachePadded};
use std::sync::atomic::{AtomicU64, AtomicUsize, Ordering};

pub const VISIT_QUEUE_SIZE : usize = 1 << 24;

// - 32 bits: host
// - 32 bits: cont
pub type Visit = u64;

pub struct VisitQueue {
  pub init: CachePadded<AtomicUsize>,
  pub last: CachePadded<AtomicUsize>,
  pub data: Box<[AtomicU64]>,
}

pub fn new_visit(host: u64, hold: bool, cont: u64) -> Visit {
  return (host << 32) | (if hold { 0x80000000 } else { 0 }) | cont;
}

pub fn get_visit_host(visit: Visit) -> u64 {
  return visit >> 32;
}

pub fn get_visit_hold(visit: Visit) -> bool {
  return (visit >> 31) & 1 == 1;
}

pub fn get_visit_cont(visit: Visit) -> u64 {
  return visit & 0x3FFFFFF;
}

impl VisitQueue {

  pub fn new() -> VisitQueue {
    return VisitQueue {
      init: CachePadded::new(AtomicUsize::new(0)),
      last: CachePadded::new(AtomicUsize::new(0)),
      data: crate::runtime::new_atomic_u64_array(VISIT_QUEUE_SIZE),
    }
  }

  pub fn push(&self, value: u64) {
    let index = self.last.fetch_add(1, Ordering::Relaxed);
    unsafe { self.data.get_unchecked(index) }.store(value, Ordering::Relaxed);
  }

  #[inline(always)]
  pub fn pop(&self) -> Option<(u64, u64)> {
    loop {
      let last = self.last.load(Ordering::Relaxed);
      if last > 0 {
        self.last.fetch_sub(1, Ordering::Relaxed);
        self.init.fetch_min(last - 1, Ordering::Relaxed);
        let visit = unsafe { self.data.get_unchecked(last - 1) }.swap(0, Ordering::Relaxed);
        if visit == 0 {
          continue;
        } else {
          return Some((get_visit_cont(visit), get_visit_host(visit)));
        }
      } else {
        return None;
      }
    }
  }

  #[inline(always)]
  pub fn steal(&self) -> Option<(u64, u64)> {
    let index = self.init.load(Ordering::Relaxed);
    let visit = unsafe { self.data.get_unchecked(index) }.load(Ordering::Relaxed);
    if visit != 0 && !get_visit_hold(visit) {
      if let Ok(visit) = unsafe { self.data.get_unchecked(index) }.compare_exchange(visit, 0, Ordering::Relaxed, Ordering::Relaxed) {
        self.init.fetch_add(1, Ordering::Relaxed);
        return Some((get_visit_cont(visit), get_visit_host(visit)));
      }
    }
    return None;
  }

}

// runtime/base/rule/app.rs

use crate::runtime::{*};

#[inline(always)]
pub fn visit(ctx: ReduceCtx) -> bool {
  let goup = ctx.redex.insert(ctx.tid, new_redex(*ctx.host, *ctx.cont, 1));
  *ctx.cont = goup;
  *ctx.host = get_loc(ctx.term, 0);
  return true;
}

#[inline(always)]
pub fn apply(ctx: ReduceCtx) -> bool {
  let arg0 = load_arg(ctx.heap, ctx.term, 0);

  // (λx(body) a)
  // ------------ APP-LAM
  // x <- a
  // body
  if get_tag(arg0) == LAM {
    inc_cost(ctx.heap, ctx.tid);
    atomic_subst(ctx.heap, &ctx.prog.aris, ctx.tid, Var(get_loc(arg0, 0)), take_arg(ctx.heap, ctx.term, 1));
    link(ctx.heap, *ctx.host, take_arg(ctx.heap, arg0, 1));
    free(ctx.heap, ctx.tid, get_loc(ctx.term, 0), 2);
    free(ctx.heap, ctx.tid, get_loc(arg0, 0), 2);
    return true;
  }

  // ({a b} c)
  // --------------- APP-SUP
  // dup x0 x1 = c
  // {(a x0) (b x1)}
  if get_tag(arg0) == SUP {
    inc_cost(ctx.heap, ctx.tid);
    let app0 = get_loc(ctx.term, 0);
    let app1 = get_loc(arg0, 0);
    let let0 = alloc(ctx.heap, ctx.tid, 3);
    let par0 = alloc(ctx.heap, ctx.tid, 2);
    link(ctx.heap, let0 + 2, take_arg(ctx.heap, ctx.term, 1));
    link(ctx.heap, app0 + 1, Dp0(get_ext(arg0), let0));
    link(ctx.heap, app0 + 0, take_arg(ctx.heap, arg0, 0));
    link(ctx.heap, app1 + 0, take_arg(ctx.heap, arg0, 1));
    link(ctx.heap, app1 + 1, Dp1(get_ext(arg0), let0));
    link(ctx.heap, par0 + 0, App(app0));
    link(ctx.heap, par0 + 1, App(app1));
    let done = Sup(get_ext(arg0), par0);
    link(ctx.heap, *ctx.host, done);
    return false;
  }

  return false;
}

// runtime/base/rule/dup.rs

use crate::runtime::{*};

#[inline(always)]
pub fn visit(ctx: ReduceCtx) -> bool {
  let goup = ctx.redex.insert(ctx.tid, new_redex(*ctx.host, *ctx.cont, 1));
  *ctx.cont = goup;
  *ctx.host = get_loc(ctx.term, 2);
  return true;
}

#[inline(always)]
pub fn apply(ctx: ReduceCtx) -> bool {

  let arg0 = load_arg(ctx.heap, ctx.term, 2);
  let tcol = get_ext(ctx.term);

  // dup r s = λx(f)
  // --------------- DUP-LAM
  // dup f0 f1 = f
  // r <- λx0(f0)
  // s <- λx1(f1)
  // x <- {x0 x1}
  if get_tag(arg0) == LAM {
    inc_cost(ctx.heap, ctx.tid);
    let let0 = alloc(ctx.heap, ctx.tid, 3);
    let par0 = alloc(ctx.heap, ctx.tid, 2);
    let lam0 = alloc(ctx.heap, ctx.tid, 2);
    let lam1 = alloc(ctx.heap, ctx.tid, 2);
    link(ctx.heap, let0 + 2, take_arg(ctx.heap, arg0, 1));
    link(ctx.heap, par0 + 1, Var(lam1));
    link(ctx.heap, par0 + 0, Var(lam0));
    link(ctx.heap, lam0 + 1, Dp0(get_ext(ctx.term), let0));
    link(ctx.heap, lam1 + 1, Dp1(get_ext(ctx.term), let0));
    atomic_subst(ctx.heap, &ctx.prog.aris, ctx.tid, Var(get_loc(arg0, 0)), Sup(get_ext(ctx.term), par0));
    atomic_subst(ctx.heap, &ctx.prog.aris, ctx.tid, Dp0(tcol, get_loc(ctx.term, 0)), Lam(lam0));
    atomic_subst(ctx.heap, &ctx.prog.aris, ctx.tid, Dp1(tcol, get_loc(ctx.term, 0)), Lam(lam1));
    let done = Lam(if get_tag(ctx.term) == DP0 { lam0 } else { lam1 });
    link(ctx.heap, *ctx.host, done);
    free(ctx.heap, ctx.tid, get_loc(ctx.term, 0), 3);
    free(ctx.heap, ctx.tid, get_loc(arg0, 0), 2);
    return true;
  }

  // dup x y = {a b}
  // --------------- DUP-SUP
  // if equal: | else:
  // x <- a    | x <- {xA xB}
  // y <- b    | y <- {yA yB}
  //           | dup xA yA = a
  //           | dup xB yB = b
  else if get_tag(arg0) == SUP {

    if tcol == get_ext(arg0) {
      inc_cost(ctx.heap, ctx.tid);
      atomic_subst(ctx.heap, &ctx.prog.aris, ctx.tid, Dp0(tcol, get_loc(ctx.term, 0)), take_arg(ctx.heap, arg0, 0));
      atomic_subst(ctx.heap, &ctx.prog.aris, ctx.tid, Dp1(tcol, get_loc(ctx.term, 0)), take_arg(ctx.heap, arg0, 1));
      free(ctx.heap, ctx.tid, get_loc(ctx.term, 0), 3);
      free(ctx.heap, ctx.tid, get_loc(arg0, 0), 2);
      return true;

    } else {
      inc_cost(ctx.heap, ctx.tid);
      let par0 = alloc(ctx.heap, ctx.tid, 2);
      let let0 = alloc(ctx.heap, ctx.tid, 3);
      let par1 = get_loc(arg0, 0);
      let let1 = alloc(ctx.heap, ctx.tid, 3);
      link(ctx.heap, let0 + 2, take_arg(ctx.heap, arg0, 0));
      link(ctx.heap, let1 + 2, take_arg(ctx.heap, arg0, 1));
      link(ctx.heap, par1 + 0, Dp1(tcol, let0));
      link(ctx.heap, par1 + 1, Dp1(tcol, let1));
      link(ctx.heap, par0 + 0, Dp0(tcol, let0));
      link(ctx.heap, par0 + 1, Dp0(tcol, let1));
      atomic_subst(ctx.heap, &ctx.prog.aris, ctx.tid, Dp0(tcol, get_loc(ctx.term, 0)), Sup(get_ext(arg0), par0));
      atomic_subst(ctx.heap, &ctx.prog.aris, ctx.tid, Dp1(tcol, get_loc(ctx.term, 0)), Sup(get_ext(arg0), par1));
      free(ctx.heap, ctx.tid, get_loc(ctx.term, 0), 3);
      return true;
    }
  }

  // dup x y = N
  // ----------- DUP-U60
  // x <- N
  // y <- N
  // ~
  else if get_tag(arg0) == U60 {
    inc_cost(ctx.heap, ctx.tid);
    atomic_subst(ctx.heap, &ctx.prog.aris, ctx.tid, Dp0(tcol, get_loc(ctx.term, 0)), arg0);
    atomic_subst(ctx.heap, &ctx.prog.aris, ctx.tid, Dp1(tcol, get_loc(ctx.term, 0)), arg0);
    free(ctx.heap, ctx.tid, get_loc(ctx.term, 0), 3);
    return true;
  }

  // dup x y = N
  // ----------- DUP-F60
  // x <- N
  // y <- N
  // ~
  else if get_tag(arg0) == F60 {
    inc_cost(ctx.heap, ctx.tid);
    atomic_subst(ctx.heap, &ctx.prog.aris, ctx.tid, Dp0(tcol, get_loc(ctx.term, 0)), arg0);
    atomic_subst(ctx.heap, &ctx.prog.aris, ctx.tid, Dp1(tcol, get_loc(ctx.term, 0)), arg0);
    free(ctx.heap, ctx.tid, get_loc(ctx.term, 0), 3);
    return true;
  }

  // dup x y = (K a b c ...)
  // ----------------------- DUP-CTR
  // dup a0 a1 = a
  // dup b0 b1 = b
  // dup c0 c1 = c
  // ...
  // x <- (K a0 b0 c0 ...)
  // y <- (K a1 b1 c1 ...)
  else if get_tag(arg0) == CTR {
    inc_cost(ctx.heap, ctx.tid);
    let fnum = get_ext(arg0);
    let fari = arity_of(&ctx.prog.aris, arg0);
    if fari == 0 {
      atomic_subst(ctx.heap, &ctx.prog.aris, ctx.tid, Dp0(tcol, get_loc(ctx.term, 0)), Ctr(fnum, 0));
      atomic_subst(ctx.heap, &ctx.prog.aris, ctx.tid, Dp1(tcol, get_loc(ctx.term, 0)), Ctr(fnum, 0));
      link(ctx.heap, *ctx.host, Ctr(fnum, 0));
      free(ctx.heap, ctx.tid, get_loc(ctx.term, 0), 3);
    } else {
      let ctr0 = get_loc(arg0, 0);
      let ctr1 = alloc(ctx.heap, ctx.tid, fari);
      for i in 0 .. fari - 1 {
        let leti = alloc(ctx.heap, ctx.tid, 3);
        link(ctx.heap, leti + 2, take_arg(ctx.heap, arg0, i));
        link(ctx.heap, ctr0 + i, Dp0(get_ext(ctx.term), leti));
        link(ctx.heap, ctr1 + i, Dp1(get_ext(ctx.term), leti));
      }
      let leti = alloc(ctx.heap, ctx.tid, 3);
      link(ctx.heap, leti + 2, take_arg(ctx.heap, arg0, fari - 1));
      link(ctx.heap, ctr0 + fari - 1, Dp0(get_ext(ctx.term), leti));
      link(ctx.heap, ctr1 + fari - 1, Dp1(get_ext(ctx.term), leti));
      atomic_subst(ctx.heap, &ctx.prog.aris, ctx.tid, Dp0(tcol, get_loc(ctx.term, 0)), Ctr(fnum, ctr0));
      atomic_subst(ctx.heap, &ctx.prog.aris, ctx.tid, Dp1(tcol, get_loc(ctx.term, 0)), Ctr(fnum, ctr1));
      free(ctx.heap, ctx.tid, get_loc(ctx.term, 0), 3);
    }
    return true;
  }

  // dup x y = *
  // ----------- DUP-ERA
  // x <- *
  // y <- *
  else if get_tag(arg0) == ERA {
    inc_cost(ctx.heap, ctx.tid);
    atomic_subst(ctx.heap, &ctx.prog.aris, ctx.tid, Dp0(tcol, get_loc(ctx.term, 0)), Era());
    atomic_subst(ctx.heap, &ctx.prog.aris, ctx.tid, Dp1(tcol, get_loc(ctx.term, 0)), Era());
    link(ctx.heap, *ctx.host, Era());
    free(ctx.heap, ctx.tid, get_loc(ctx.term, 0), 3);
    return true;
  }

  else {
    return false;
  }
}

// runtime/base/rule/fun.rs

use crate::runtime::{*};
use std::sync::atomic::{Ordering};

#[inline(always)]
pub fn visit(ctx: ReduceCtx, sidxs: &[u64]) -> bool {
  let len = sidxs.len() as u64;
  if len == 0 {
    return false;
  } else {
    let mut vlen = 0;
    let vbuf = unsafe { ctx.heap.vbuf.get_unchecked(ctx.tid) };
    for sidx in sidxs {
      if !is_whnf(load_arg(ctx.heap, ctx.term, *sidx)) {
        unsafe { vbuf.get_unchecked(vlen) }.store(get_loc(ctx.term, *sidx), Ordering::Relaxed);
        vlen += 1;
      }
    }
    if vlen == 0 {
      return false;
    } else {
      let goup = ctx.redex.insert(ctx.tid, new_redex(*ctx.host, *ctx.cont, vlen as u64));
      for i in 0 .. vlen - 1 {
        ctx.visit.push(new_visit(unsafe { vbuf.get_unchecked(i).load(Ordering::Relaxed) }, ctx.hold, goup));
      }
      *ctx.cont = goup;
      *ctx.host = unsafe { vbuf.get_unchecked(vlen - 1).load(Ordering::Relaxed) };
      return true;
    }
  }
  //OLD_VISITER:
  //let len = sidxs.len() as u64;
  //if len == 0 {
    //return false;
  //} else {
    //let goup = ctx.redex.insert(ctx.tid, new_redex(*ctx.host, *ctx.cont, sidxs.len() as u64));
    //for (i, arg_idx) in sidxs.iter().enumerate() {
      //if i < sidxs.len() - 1 {
        //ctx.visit.push(new_visit(get_loc(ctx.term, *arg_idx), goup));
      //} else {
        //*ctx.cont = goup;
        //*ctx.host = get_loc(ctx.term, *arg_idx);
        //return true;
      //}
    //}
    //return true;
  //}
}

#[inline(always)]
pub fn apply(ctx: ReduceCtx, fid: u64, visit: &VisitObj, apply: &ApplyObj) -> bool {
  // Reduces function superpositions
  for (n, is_strict) in visit.strict_map.iter().enumerate() {
    let n = n as u64;
    if *is_strict && get_tag(load_arg(ctx.heap, ctx.term, n)) == SUP {
      superpose(ctx.heap, &ctx.prog.aris, ctx.tid, *ctx.host, ctx.term, load_arg(ctx.heap, ctx.term, n), n);
      return true;
    }
  }

  // For each rule condition vector
  let mut matched;
  for (r, rule) in apply.rules.iter().enumerate() {
    // Check if the rule matches
    matched = true;

    // Tests each rule condition (ex: `get_tag(args[0]) == SUCC`)
    for (i, cond) in rule.cond.iter().enumerate() {
      let i = i as u64;
      match get_tag(*cond) {
        U60 => {
          let same_tag = get_tag(load_arg(ctx.heap, ctx.term, i)) == U60;
          let same_val = get_num(load_arg(ctx.heap, ctx.term, i)) == get_num(*cond);
          matched = matched && same_tag && same_val;
        }
        F60 => {
          let same_tag = get_tag(load_arg(ctx.heap, ctx.term, i)) == F60;
          let same_val = get_num(load_arg(ctx.heap, ctx.term, i)) == get_num(*cond);
          matched = matched && same_tag && same_val;
        }
        CTR => {
          let same_tag = get_tag(load_arg(ctx.heap, ctx.term, i)) == CTR || get_tag(load_arg(ctx.heap, ctx.term, i)) == FUN;
          let same_ext = get_ext(load_arg(ctx.heap, ctx.term, i)) == get_ext(*cond);
          matched = matched && same_tag && same_ext;
        }
        //FUN => {
          //let same_tag = get_tag(load_arg(ctx.heap, ctx.term, i)) == CTR || get_tag(load_arg(ctx.heap, ctx.term, i)) == FUN;
          //let same_ext = get_ext(load_arg(ctx.heap, ctx.term, i)) == get_ext(*cond);
          //matched = matched && same_tag && same_ext;
        //}
        VAR => {
          // If this is a strict argument, then we're in a default variable
          if unsafe { *visit.strict_map.get_unchecked(i as usize) } {

            // This is a Kind2-specific optimization.
            if rule.hoas && r != apply.rules.len() - 1 {

              // Matches number literals
              let is_num
                =  get_tag(load_arg(ctx.heap, ctx.term, i)) == U60
                || get_tag(load_arg(ctx.heap, ctx.term, i)) == F60;

              // Matches constructor labels
              let is_ctr
                =  get_tag(load_arg(ctx.heap, ctx.term, i)) == CTR
                && arity_of(&ctx.prog.aris, load_arg(ctx.heap, ctx.term, i)) == 0;

              // Matches HOAS numbers and constructors
              let is_hoas_ctr_num
                =  get_tag(load_arg(ctx.heap, ctx.term, i)) == CTR
                && get_ext(load_arg(ctx.heap, ctx.term, i)) >= KIND_TERM_CT0
                && get_ext(load_arg(ctx.heap, ctx.term, i)) <= KIND_TERM_F60;

              matched = matched && (is_num || is_ctr || is_hoas_ctr_num);

            // Only match default variables on CTRs and NUMs
            } else {
              let is_ctr = get_tag(load_arg(ctx.heap, ctx.term, i)) == CTR;
              let is_u60 = get_tag(load_arg(ctx.heap, ctx.term, i)) == U60;
              let is_f60 = get_tag(load_arg(ctx.heap, ctx.term, i)) == F60;
              matched = matched && (is_ctr || is_u60 || is_f60);
            }
          }
        }
        _ => {}
      }
    }

    // If all conditions are satisfied, the rule matched, so we must apply it
    if matched {
      // Increments the gas count
      inc_cost(ctx.heap, ctx.tid);

      // Builds the right-hand side ctx.term
      let done = alloc_body(ctx.heap, ctx.prog, ctx.tid, ctx.term, &rule.vars, &rule.body);

      // Links the *ctx.host location to it
      link(ctx.heap, *ctx.host, done);

      // Collects unused variables
      for var @ RuleVar { param: _, field: _, erase } in rule.vars.iter() {
        if *erase {
          collect(ctx.heap, &ctx.prog.aris, ctx.tid, get_var(ctx.heap, ctx.term, var));
        }
      }

      // free the matched ctrs
      for (i, arity) in &rule.free {
        free(ctx.heap, ctx.tid, get_loc(load_arg(ctx.heap, ctx.term, *i as u64), 0), *arity);
      }
      free(ctx.heap, ctx.tid, get_loc(ctx.term, 0), arity_of(&ctx.prog.aris, fid));

      return true;
    }
  }

  return false;
}

#[inline(always)]
pub fn superpose(heap: &Heap, aris: &Aris, tid: usize, host: u64, term: Ptr, argn: Ptr, n: u64) -> Ptr {
  inc_cost(heap, tid);
  let arit = arity_of(aris, term);
  let func = get_ext(term);
  let fun0 = get_loc(term, 0);
  let fun1 = alloc(heap, tid, arit);
  let par0 = get_loc(argn, 0);
  for i in 0 .. arit {
    if i != n {
      let leti = alloc(heap, tid, 3);
      let argi = take_arg(heap, term, i);
      link(heap, fun0 + i, Dp0(get_ext(argn), leti));
      link(heap, fun1 + i, Dp1(get_ext(argn), leti));
      link(heap, leti + 2, argi);
    } else {
      link(heap, fun0 + i, take_arg(heap, argn, 0));
      link(heap, fun1 + i, take_arg(heap, argn, 1));
    }
  }
  link(heap, par0 + 0, Fun(func, fun0));
  link(heap, par0 + 1, Fun(func, fun1));
  let done = Sup(get_ext(argn), par0);
  link(heap, host, done);
  done
}

// runtime/base/mod.rs

pub mod app;
pub mod dup;
pub mod op2;
pub mod fun;

// runtime/base/op2.rs

use crate::runtime::{*};

#[inline(always)]
pub fn visit(ctx: ReduceCtx) -> bool {
  let goup = ctx.redex.insert(ctx.tid, new_redex(*ctx.host, *ctx.cont, 2));
  ctx.visit.push(new_visit(get_loc(ctx.term, 1), ctx.hold, goup));
  *ctx.cont = goup;
  *ctx.host = get_loc(ctx.term, 0);
  return true;
}

#[inline(always)]
pub fn apply(ctx: ReduceCtx) -> bool {
  let arg0 = load_arg(ctx.heap, ctx.term, 0);
  let arg1 = load_arg(ctx.heap, ctx.term, 1);

  // (OP a b)
  // -------- OP2-U60
  // op(a, b)
  if get_tag(arg0) == U60 && get_tag(arg1) == U60 {
    //operate(ctx.heap, ctx.tid, ctx.term, arg0, arg1, *ctx.host);

    inc_cost(ctx.heap, ctx.tid);
    let a = get_num(arg0);
    let b = get_num(arg1);
    let c = match get_ext(ctx.term) {
      ADD => u60::add(a, b),
      SUB => u60::sub(a, b),
      MUL => u60::mul(a, b),
      DIV => u60::div(a, b),
      MOD => u60::mdl(a, b),
      AND => u60::and(a, b),
      OR  => u60::or(a, b),
      XOR => u60::xor(a, b),
      SHL => u60::shl(a, b),
      SHR => u60::shr(a, b),
      LTN => u60::ltn(a, b),
      LTE => u60::lte(a, b),
      EQL => u60::eql(a, b),
      GTE => u60::gte(a, b),
      GTN => u60::gtn(a, b),
      NEQ => u60::neq(a, b),
      _   => 0,
    };
    let done = U6O(c);
    link(ctx.heap, *ctx.host, done);
    free(ctx.heap, ctx.tid, get_loc(ctx.term, 0), 2);

    return false;
  }

  // (OP a b)
  // -------- OP2-F60
  // op(a, b)
  else if get_tag(arg0) == F60 && get_tag(arg1) == F60 {
    //operate(ctx.heap, ctx.tid, ctx.term, arg0, arg1, *ctx.host);

    inc_cost(ctx.heap, ctx.tid);
    let a = get_num(arg0);
    let b = get_num(arg1);
    let c = match get_ext(ctx.term) {
      ADD => f60::add(a, b),
      SUB => f60::sub(a, b),
      MUL => f60::mul(a, b),
      DIV => f60::div(a, b),
      MOD => f60::mdl(a, b),
      AND => f60::and(a, b),
      OR  => f60::or(a, b),
      XOR => f60::xor(a, b),
      SHL => f60::shl(a, b),
      SHR => f60::shr(a, b),
      LTN => f60::ltn(a, b),
      LTE => f60::lte(a, b),
      EQL => f60::eql(a, b),
      GTE => f60::gte(a, b),
      GTN => f60::gtn(a, b),
      NEQ => f60::neq(a, b),
      _   => 0,
    };
    let done = F6O(c);
    link(ctx.heap, *ctx.host, done);
    free(ctx.heap, ctx.tid, get_loc(ctx.term, 0), 2);

    return false;
  }

  // (+ {a0 a1} b)
  // --------------------- OP2-SUP-0
  // dup b0 b1 = b
  // {(+ a0 b0) (+ a1 b1)}
  else if get_tag(arg0) == SUP {
    inc_cost(ctx.heap, ctx.tid);
    let op20 = get_loc(ctx.term, 0);
    let op21 = get_loc(arg0, 0);
    let let0 = alloc(ctx.heap, ctx.tid, 3);
    let par0 = alloc(ctx.heap, ctx.tid, 2);
    link(ctx.heap, let0 + 2, arg1);
    link(ctx.heap, op20 + 1, Dp0(get_ext(arg0), let0));
    link(ctx.heap, op20 + 0, take_arg(ctx.heap, arg0, 0));
    link(ctx.heap, op21 + 0, take_arg(ctx.heap, arg0, 1));
    link(ctx.heap, op21 + 1, Dp1(get_ext(arg0), let0));
    link(ctx.heap, par0 + 0, Op2(get_ext(ctx.term), op20));
    link(ctx.heap, par0 + 1, Op2(get_ext(ctx.term), op21));
    let done = Sup(get_ext(arg0), par0);
    link(ctx.heap, *ctx.host, done);
    return false;
  }

  // (+ a {b0 b1})
  // --------------- OP2-SUP-1
  // dup a0 a1 = a
  // {(+ a0 b0) (+ a1 b1)}
  else if get_tag(arg1) == SUP {
    inc_cost(ctx.heap, ctx.tid);
    let op20 = get_loc(ctx.term, 0);
    let op21 = get_loc(arg1, 0);
    let let0 = alloc(ctx.heap, ctx.tid, 3);
    let par0 = alloc(ctx.heap, ctx.tid, 2);
    link(ctx.heap, let0 + 2, arg0);
    link(ctx.heap, op20 + 0, Dp0(get_ext(arg1), let0));
    link(ctx.heap, op20 + 1, take_arg(ctx.heap, arg1, 0));
    link(ctx.heap, op21 + 1, take_arg(ctx.heap, arg1, 1));
    link(ctx.heap, op21 + 0, Dp1(get_ext(arg1), let0));
    link(ctx.heap, par0 + 0, Op2(get_ext(ctx.term), op20));
    link(ctx.heap, par0 + 1, Op2(get_ext(ctx.term), op21));
    let done = Sup(get_ext(arg1), par0);
    link(ctx.heap, *ctx.host, done);
    return false;
  }

  return false;
}

// runtime/mod.rs

#![allow(clippy::identity_op)]
#![allow(dead_code)]
#![allow(non_snake_case)]
#![allow(unused_attributes)]
#![allow(unused_imports)]

pub mod base;
pub mod data;
pub mod rule;

use sysinfo::{System, SystemExt, RefreshKind};

pub use base::{*};
pub use data::{*};
pub use rule::{*};

use crate::language;

pub const CELLS_PER_KB: usize = 0x80;
pub const CELLS_PER_MB: usize = 0x20000;
pub const CELLS_PER_GB: usize = 0x8000000;

// If unspecified, allocates `max(16 GB, 75% free_sys_mem)` memory
pub fn default_heap_size() -> usize {
  use sysinfo::SystemExt;
  let available_memory = System::new_with_specifics(RefreshKind::new().with_memory()).free_memory();
  let heap_size = (available_memory * 3 / 4) / 8;
  let heap_size = std::cmp::min(heap_size as usize, 16 * CELLS_PER_GB);
  return heap_size as usize;
}

// If unspecified, spawns 1 thread for each available core
pub fn default_heap_tids() -> usize {
  return std::thread::available_parallelism().unwrap().get();
}

pub struct Runtime {
  pub heap: Heap,
  pub prog: Program,
  pub book: language::rulebook::RuleBook,
  pub tids: Box<[usize]>,
  pub dbug: bool,
}

impl Runtime {

  /// Creates a new, empty runtime
  pub fn new(size: usize, tids: usize, dbug: bool) -> Runtime {
    Runtime {
      heap: new_heap(size, tids),
      prog: Program::new(),
      book: language::rulebook::new_rulebook(),
      tids: new_tids(tids),
      dbug: dbug,
    }
  }

  /// Creates a runtime from source code, given a max number of nodes
  pub fn from_code_with(code: &str, size: usize, tids: usize, dbug: bool) -> Result<Runtime, String> {
    let file = language::syntax::read_file(code)?;
    let heap = new_heap(size, tids);
    let prog = Program::new();
    let book = language::rulebook::gen_rulebook(&file);
    let tids = new_tids(tids);
    return Ok(Runtime { heap, prog, book, tids, dbug });
  }

  ////fn get_area(&mut self) -> runtime::Area {
    ////return runtime::get_area(&mut self.heap, 0)
  ////}

  /// Creates a runtime from a source code
  //#[cfg(not(target_arch = "wasm32"))]
  pub fn from_code(code: &str) -> Result<Runtime, String> {
    Runtime::from_code_with(code, default_heap_size(), default_heap_tids(), false)
  }

  ///// Extends a runtime with new definitions
  //pub fn define(&mut self, _code: &str) {
    //todo!()
  //}

  /// Allocates a new term, returns its location
  pub fn alloc_code(&mut self, code: &str) -> Result<u64, String> {
    Ok(self.alloc_term(&*language::syntax::read_term(code)?))
  }

  /// Given a location, returns the pointer stored on it
  pub fn load_ptr(&self, host: u64) -> Ptr {
    load_ptr(&self.heap, host)
  }

  /// Given a location, evaluates a term to head normal form
  pub fn reduce(&mut self, host: u64) {
    reduce(&self.heap, &self.prog, &self.tids, host, false, self.dbug);
  }

  /// Given a location, evaluates a term to full normal form
  pub fn normalize(&mut self, host: u64) {
    reduce(&self.heap, &self.prog, &self.tids, host, true, self.dbug);
  }

  /// Evaluates a code, allocs and evaluates to full normal form. Returns its location.
  pub fn normalize_code(&mut self, code: &str) -> u64 {
    let host = self.alloc_code(code).ok().unwrap();
    self.normalize(host);
    return host;
  }

  /// Evaluates a code to normal form. Returns its location.
  pub fn eval_to_loc(&mut self, code: &str) -> u64 {
    return self.normalize_code(code);
  }

  /// Evaluates a code to normal form.
  pub fn eval(&mut self, code: &str) -> String {
    let host = self.normalize_code(code);
    return self.show(host);
  }

  //// /// Given a location, runs side-effective actions
  ////#[cfg(not(target_arch = "wasm32"))]
  ////pub fn run_io(&mut self, host: u64) {
    ////runtime::run_io(&mut self.heap, &self.prog, &[0], host)
  ////}

  /// Given a location, recovers the lambda Term stored on it, as code
  pub fn show(&self, host: u64) -> String {
    language::readback::as_code(&self.heap, &self.prog, host)
  }

  /// Given a location, recovers the linear Term stored on it, as code
  pub fn show_linear(&self, host: u64) -> String {
    language::readback::as_linear_code(&self.heap, &self.prog, host)
  }

  /// Return the total number of graph rewrites computed
  pub fn get_rewrites(&self) -> u64 {
    get_cost(&self.heap)
  }

  /// Returns the name of a given id
  pub fn get_name(&self, id: u64) -> String {
    self.prog.nams.get(&id).unwrap_or(&"?".to_string()).clone()
  }

  /// Returns the arity of a given id
  pub fn get_arity(&self, id: u64) -> u64 {
    *self.prog.aris.get(&id).unwrap_or(&u64::MAX)
  }

  /// Returns the name of a given id
  pub fn get_id(&self, name: &str) -> u64 {
    *self.book.name_to_id.get(name).unwrap_or(&u64::MAX)
  }

  //// WASM re-exports

  pub fn DP0() -> u64 {
    return DP0;
  }

  pub fn DP1() -> u64 {
    return DP1;
  }

  pub fn VAR() -> u64 {
    return VAR;
  }

  pub fn ARG() -> u64 {
    return ARG;
  }

  pub fn ERA() -> u64 {
    return ERA;
  }

  pub fn LAM() -> u64 {
    return LAM;
  }

  pub fn APP() -> u64 {
    return APP;
  }

  pub fn SUP() -> u64 {
    return SUP;
  }

  pub fn CTR() -> u64 {
    return CTR;
  }

  pub fn FUN() -> u64 {
    return FUN;
  }

  pub fn OP2() -> u64 {
    return OP2;
  }

  pub fn U60() -> u64 {
    return U60;
  }

  pub fn F60() -> u64 {
    return F60;
  }

  pub fn ADD() -> u64 {
    return ADD;
  }

  pub fn SUB() -> u64 {
    return SUB;
  }

  pub fn MUL() -> u64 {
    return MUL;
  }

  pub fn DIV() -> u64 {
    return DIV;
  }

  pub fn MOD() -> u64 {
    return MOD;
  }

  pub fn AND() -> u64 {
    return AND;
  }

  pub fn OR() -> u64 {
    return OR;
  }

  pub fn XOR() -> u64 {
    return XOR;
  }

  pub fn SHL() -> u64 {
    return SHL;
  }

  pub fn SHR() -> u64 {
    return SHR;
  }

  pub fn LTN() -> u64 {
    return LTN;
  }

  pub fn LTE() -> u64 {
    return LTE;
  }

  pub fn EQL() -> u64 {
    return EQL;
  }

  pub fn GTE() -> u64 {
    return GTE;
  }

  pub fn GTN() -> u64 {
    return GTN;
  }

  pub fn NEQ() -> u64 {
    return NEQ;
  }

  pub fn CELLS_PER_KB() -> usize {
    return CELLS_PER_KB;
  }

  pub fn CELLS_PER_MB() -> usize {
    return CELLS_PER_MB;
  }

  pub fn CELLS_PER_GB() -> usize {
    return CELLS_PER_GB;
  }

  pub fn get_tag(lnk: Ptr) -> u64 {
    return get_tag(lnk);
  }

  pub fn get_ext(lnk: Ptr) -> u64 {
    return get_ext(lnk);
  }

  pub fn get_val(lnk: Ptr) -> u64 {
    return get_val(lnk);
  }

  pub fn get_num(lnk: Ptr) -> u64 {
    return get_num(lnk);
  }

  pub fn get_loc(lnk: Ptr, arg: u64) -> u64 {
    return get_loc(lnk, arg);
  }

  pub fn Var(pos: u64) -> Ptr {
    return Var(pos);
  }

  pub fn Dp0(col: u64, pos: u64) -> Ptr {
    return Dp0(col, pos);
  }

  pub fn Dp1(col: u64, pos: u64) -> Ptr {
    return Dp1(col, pos);
  }

  pub fn Arg(pos: u64) -> Ptr {
    return Arg(pos);
  }

  pub fn Era() -> Ptr {
    return Era();
  }

  pub fn Lam(pos: u64) -> Ptr {
    return Lam(pos);
  }

  pub fn App(pos: u64) -> Ptr {
    return App(pos);
  }

  pub fn Sup(col: u64, pos: u64) -> Ptr {
    return Sup(col, pos);
  }

  pub fn Op2(ope: u64, pos: u64) -> Ptr {
    return Op2(ope, pos);
  }

  pub fn U6O(val: u64) -> Ptr {
    return U6O(val);
  }

  pub fn F6O(val: u64) -> Ptr {
    return F6O(val);
  }

  pub fn Ctr(fun: u64, pos: u64) -> Ptr {
    return Ctr(fun, pos);
  }

  pub fn Fun(fun: u64, pos: u64) -> Ptr {
    return Fun(fun, pos);
  }

  pub fn link(&mut self, loc: u64, lnk: Ptr) -> Ptr {
    return link(&self.heap, loc, lnk);
  }

  pub fn alloc(&mut self, size: u64) -> u64 {
    return alloc(&self.heap, 0, size); // FIXME tid?
  }

  pub fn free(&mut self, loc: u64, size: u64) {
    return free(&self.heap, 0, loc, size); // FIXME tid?
  }

  pub fn collect(&mut self, term: Ptr) {
    return collect(&self.heap, &self.prog.aris, 0, term); // FIXME tid?
  }

}

// Methods that aren't compiled to JS
impl Runtime {
  /// Allocates a new term, returns its location
  pub fn alloc_term(&mut self, term: &language::syntax::Term) -> u64 {
    alloc_term(&self.heap, &self.prog, 0, &self.book, term) // FIXME tid?
  }

  /// Given a location, recovers the Core stored on it
  pub fn readback(&self, host: u64) -> Box<language::syntax::Term> {
    language::readback::as_term(&self.heap, &self.prog, host)
  }

  /// Given a location, recovers the Term stored on it
  pub fn linear_readback(&self, host: u64) -> Box<language::syntax::Term> {
    language::readback::as_linear_term(&self.heap, &self.prog, host)
  }
}

###############################

Answer the questions below, regarding HVM1 and HVM2:

1. Which was based in a term-like calculus, and which was based on raw interaction combinators?

2. How did the syntax of each work? Provide examples.

3. How would `λf. λx. (f x)` be stored in memory, on each? Write an example in hex, with 1 64-bit word per line. Explain what each line does.

4. Which part of the code was responsible for beta-reduction, on both? Cite it.

5. HVM1 had a garbage collect bug, that isn't present in HVM2. Can you reason about it, and explain why?

6. HVM1 had a concurrecy bug, that has been solved on HVM2. How?

7. There are many functions on HVM1 that don't have correspondents on HVM2. Name some, and explain why it has been removed.

###############################