Skip to content

Instantly share code, notes, and snippets.

@Quodss
Last active June 28, 2024 16:07
Show Gist options
  • Save Quodss/d5af00c9620e561adb41c7846c6f56a6 to your computer and use it in GitHub Desktop.
Save Quodss/d5af00c9620e561adb41c7846c6f56a6 to your computer and use it in GitHub Desktop.

Mass update report

Introduction

A few months ago I started working on mass modernization project to learn about Arvo and Vere and to contribute to the core development. Since then I had some success working on WebAssembly interpreter, and I decided that it would be wise to focus my attention on that project instead.

I managed to achieve the first milestone outlined in the grant proposal. To ensure my efforts don't go in vain I will describe what I achieved and learned while working on that milestone.

Target audience

I don't have a CS background and working on Urbit core was my first serious programming experience. I think this post would be helpful for someone who is trying to figure out how Urbit works, because I will describe my contribution in roughly the same order in which I learned and implemented the solution to the task.

Project goal

The goal of the project was to replace current |mass Hood generator with a -mass thread that would return the memory report as a noun to the user, allowing the user to manage the report: storing it or sending to other ships or hosting providers for analyzis. In order to solve this task we need to understand:

  • How |mass works right now;
  • Why a "Hood generator" approach would not be feasible for the upgrade, and how threads can be used to achieve our goal;
  • How Arvo and Vere communicate to inject data into Arvo from the outside world (memory report in our case), and how to use duct system to route data to some process;
  • How Vere processes (namely serf and lord) interact with each other.

|mass mechanics

When you type |mass in Dojo, you get a memory report printout in the terminal. Let's break down the mechanics of |mass in steps.

First, |mass is a poke of %hood app with the product of /hood/mass generator. The following commands are all identical:

|mass
:hood|mass
:hood &helm-mass ~

When %hood receives the poke, it emits a card to Dill with [%flog %heft ~] task:

++  poke-mass
  |=  ~  =<  abet
  (emit %pass /heft %arvo %d %flog %heft ~)
::  (... some lines skipped)
%helm-mass  =;(f (f !<(_+<.f vase)) poke-mass)

The %flog task gets unwrapped, and on %heft task Dill passes a %whey note to Arvo:

++  call                                          ::  receive input
  |=  kyz=task
  ^+  +>
  ?+    -.kyz  ~&  [%strange-kiss -.kyz]  +>
      ::  (... some lines skipped)
      %heft  (pass /whey %$ whey/~)

And on %whey note Arvo passes an ovum [//arvo mass/whey] to the runtime, where whey is a nested data structure with annotated cores. In other words, it has a type mass:

+$  mass  $~  $+|+~
          (pair cord (each * (list mass)))

This structure is similar to what you get in the first half of |mass printout, before "total userspace: ..." line, except it contains nouns instead of sizes in bytes. Those nouns happen to be cores, see ++whey definition in arvo.hoon.

The rest of |mass process happens in Vere, in serf.c process. First, when c3__mass literal (equivalent to Hoon's %mass term) is detected in the ovum, serf saves the whey noun in sef_u->sac (in _serf_sure_feck), and then this noun gets measured in _serf_grab. The measurement results are then simply printed to stderr alongside with some runtime memory usage information ("total arvo/jet/noun/road stuff" lines).

Generators, threads and events

A generator is a Dojo utility that allows to run code against the user input. Generators are located in gen directory in a given desk as .hoon files. A "naked" generator is defined as a gate that takes user input as an argument and it has no knowledge of the current time, identity of the ship or entropy. A %say generator is defined as a cell of %say term and a gate which takes user input alongside with time, entropy and path to the generator's desk, which includes ship's @p. Dojo provides all these arguments when the generator is called from the command line.

As you can see from the structure of a generator, it cannot return any information that lies outside of the gate's subject. Since the information about memory usage lies outside of a generator's subject, we would have to use something else for |mass upgrade.

Right now memory report request is implemented as a poke to %hood agent, and pokes do not return any nouns either: they are a one-way road to interact with an app.

An alternative would be to use a thread to pass a task to Arvo the way |mass does it now, and then to receive a response from runtime as a gift from a vane. Note that the choice of a vane does not matter here, they are simply used to handle I/O between a thread and Arvo.

As an illustration of what we want to build consider a -time thread:

> -time ~s3
~s3..0082

To understand an important difference between generators and threads that return values let's remind ourselves what is Arvo. The formal interface of Arvo is defined in the bottom of arvo.hoon file:

::  Arvo formal interface
::
::    this lifecycle wrapper makes the arvo door (multi-armed core)
::    look like a gate (function or single-armed core), to fit
::    urbit's formal lifecycle function (see aeon:eden:part).
::    a practical interpreter can and will ignore it.
::
|=  [now=@da ovo=ovum]
^-  *
.(+> +:(poke now ovo))

In other words, Arvo is a gate that takes an event and returns a new version of itself. Tail of (poke now ovo) replaces Arvo's context, while the head is the list of effects. Simply put, Arvo is a function that:

(Arvo event) -> [Arvo' (list effect)]

The effects can be handled by the runtime: Vere can set up timers, send packets etc. Notice that in the formal defintion of Arvo it actually does not care about the effects at all: the interpreter could in theory ignore all the effects, the deterministic nature of Arvo would still be preserved.

Think about it in this way: suppose you tried to send a message from your Urbit to some other ship, but you never heard anything back. Did it happen because of some issues in the network between the ships, or did it happen because the interpreter ignored the effect "send this packet to ~sampel-palnet"? From your ship's perspective those options are equivalent: both happened due to nondeterministic nature of the world that lies outside of Arvo.

When you run a generator, you have one event and one effect:

> +hello %world
'hello, world'

Event: type "+hello %world\n" -> Effect: print "'hello, world'"

This effect is deterministic, you will always get the same output for a given state of your Arvo.

With threads, you can have more complex chains of I/O:

> -time ~s3
~s3..0082

Event: type "-time ~s3\n" -> Effect: set timer for 3 seconds
(... 3 seconds later)
Event: timer elapsed after ~s3..0082 -> Effect: print "~s3..0082"

Here, the last effect is not strictly determined by the first event, since it relies on nondeterministic information from the runtime that sends the second event. Running this thread multiple times would give you different results even if Arvo state stays the same:

> -time ~s3
~s3..007b
> -time ~s3
~s3..00a6
> -time ~s3
~s3..008d

To upgrade |mass I wrote a -mass thread that works in a similar fashion to -time: it sends a memory report request and waits for an answer, and then returns the answer to the thread caller.

Development flow of -mass

In this section I will describe my thought process when developing -mass on Arvo side in roughly the same order as I wrote the code.

The main thread

For the body of the main thread I took inspiration from -time thread and the gates that it calls to run other threads, namely ++send-wait and ++take-wake from /lib/strandio.hoon:

::  /ted/mass.hoon
::
/-  spider
/+  strandio
=,  strand=strand:spider
^-  thread:spider
|=  arg=vase
=/  m  (strand ,vase)
^-  form:m
=+  !<(~ arg)
;<  ~              bind:m  send-mass-request:strandio
;<  report=(unit)  bind:m  take-mass:strandio
(pure:m !>(report))
::  /lib/strandio.hoon
::
(...)
++  send-mass-request
  =/  m  (strand ,~)
  ^-  form:m
  =/  =card:agent:gall
    [%pass /heft %arvo %k %heft ~]
  (send-raw-card card)
::
++  take-mass  ::  WIP
  =/  m  (strand ,(unit))
  ^-  form:m
  |=  tin=strand-input:strand
  ?+  in.tin  ~&(in.tin `[%skip ~])
      ~  `[%wait ~]
      [~ %sign * %khan %quac *]
    `[%done p.sign-arvo.u.in.tin]
  ==
::
(...)

I will not dwell on threads too much, there is a nice guide on threads in Urbit docs. You can see that -mass thread passes a %heft request to Khan vane instead of Dill, and expects a %quac gift from it. This is where I ended up moving mass logic because Khan code is much shorter and simpler. Instead of repeating the guide on threads in this post, I'd like to talk about micgal ;< rune.

Tangent: micgal ;<

Both explanations in rune docs and threads guide were hard to grok out, so I came up with my own. It is inspired by this Computerphile video on monads, and the code example is from the video too.

Suppose you have a ++safe-div gate that returns a result of integer division wrapped in unit to handle division by zero without crashing:

++  safe-div
  |=  [a=@ud b=@ud]
  ^-  (unit @ud)
  ?:  =(b 0)  ~
  `(div a b)

Let's define a type expr which will represent an expression that consists of a bunch of integer divisions. The type will be defined recursively:

+$  expr
  $~  0
  $@  @ud
  [p=expr q=expr]

So an instance of expr is either an integer or a cell of two expressions p and q, that corresponds to integer division p // q. Now let's write a gate that takes an expression and evaluates it, reducing it to a unit of integer.

A naive approach would probably look like this:

++  eval-naive
  |=  e=expr
  ^-  (unit @ud)
  ?@  e  `e
  =/  a=(unit @ud)  $(e p.e)
  ?~  a  ~
  =/  b=(unit @ud)  $(e q.e)
  ?~  b  ~
  (safe-div u.a u.b)

Here for each argument of safe-div we pin a unit to the subject and perform a typecheck manually, returning ~ if the unit happened to be empty for each argument. We can rewrite the gate, fusing together the logic of typecheck and ~ return with ++biff binding gate.

What +biff does is that it takes an argument (unit mold) and a gate that takes mold as an argument, and tries to apply the argument to the gate. If the unit is empty, then ~ is returned, otherwise is returns the output of the gate. If you take a bunch of gates that take some value and return a unit of the value, then you can chain them together with this high-order gate, also called monadic bind. Now we can rewrite the eval function:

++  eval-ugly
  |=  e=expr
  =*  this  $
  ^-  (unit @ud)
  ?@  e  `e
  %+  biff  this(e p.e)
  |=  a=@
  %+  biff  this(e q.e)
  |=  b=@
  (safe-div a b)

To shorten the code we can now introduce ;< rune:

++  eval
  |=  e=expr
  ^-  (unit @ud)
  ?@  e  `e
  ;<  a=@  _biff  $(e p.e)
  ;<  b=@  _biff  $(e q.e)
  (safe-div a b)

What ;< does is:

  1. It treats the second child as a gate, slamming it with its first child, which is a mold. The product of that gate is another gate bind
  2. It then slams bind gate with a cell of the third child and a gate whose sample is a bunt of the first child and whose body is the code in the fourth child.

So in our example the first ;< first evaluates (_biff a=@) which returns biff, and then it applies biff as a bind to a unit from $(e p.e) and the gate |=(a=@ (... rest of the code)). This behavior is equivalent to ++eval-ugly example. The main difference is that the implicit gates built by ;< are not exposed in the namespace, so there is no need to alias ++eval with =* like in ++eval-ugly gate.

Khan update

Since -mass thread interacts with Khan vane for memory report retrieval, we now need to add new tasks and gifts to Khan, as well as define Khan logic for those tasks.

In Lull we add a new gift %quac and two new tasks, %quac and %heft:

::  sys/lull.hoon
::
(...)
+$  gift                                              ::  out result <-$
  $%  [%arow p=(avow cage)]                           ::  in-arvo result
      [%avow p=(avow page)]                           ::  external result
      [%quac p=(unit)]                                ::  memory report
  ==
+$  task                                              ::  in request ->$
  $~  [%vega ~]                                       ::
  $%  $>(%born vane-task)                             ::  new unix process
      (...)
      [%quac p=(unit)]                                ::  memory report
      [%heft ~]                                       ::  report request
  ==                                                  ::
(...)

%heft task is sent by the thread, and %quac task is sent by the runtime with the memory report wrapped in unit. When Khan receives %quac task, it will forward it to the calling thread as a %quac gift.

Now we need to update /vane/khan.hoon. Remember that the memory report will be injected from the runtime as a new event, so we need to temporarily save the duct on which Khan gets %heft task to route the memory report back to the caller. It is a good idea now to read move trace tutorial for a more extensive illustration of how effects and events get routed between different parts of the system.

Firstly, we will update the state of Khan vane to include the saved duct:

+$  khan-state                                      ::
  $:  %1                                            ::  state v1
      hey=duct                                      ::  unix duct
      tic=@ud                                       ::  tid counter
      mass-duct=(unit duct)                         ::  saved duct
  ==                                                ::

In ++call gate on %born task, which Khan is supposed to receive when the runtime is launched, we check if mass-duct is empty and return an empty memory report:

++  call
  (...)
  ?+    -.task  [~ khan-gate]
      %born
    ?~  mass-duct
      [~ khan-gate(hey hen, tic 0)]
    :_  khan-gate(hey hen, tic 0, mass-duct ~)
    :_  ~
    [u.mass-duct %give %quac ~]

We do this in case runtime crashes during the memory report generation, so that the thread would return an empty report when pier is restarted.

In the same arm we define logic for %heft and %quac tasks. On the former we send a %whey note to Arvo as usual, but also save the duct on which we heard the request:

++  heft
  |=  hen=duct
  ^-  [(list move) _khan-gate]
  :_  khan-gate(mass-duct `hen)
  :_  ~
  [hen %pass /whey %$ whey/~]  ::  $move with a %whey note to Arvo

And on the latter we forward the gift on the saved duct if it is present and delete the saved duct from the state, and do nothing if there is no duct saved in the state:

++  quac
  |=  git=gift
  ^-  [(list move) _khan-gate]
  ?~  mass-duct  `khan-gate
  :_  khan-gate(mass-duct ~)
  :_  ~
  [u.mass-duct %give git]  ::  $move with a gift to the original caller of ++heft

All that is left is to update Khan types $sign and $note and ++load arm to upgrade the old state to the new one.

Khan can now request %whey from Arvo and return %quac gift:

+$  note                                            ::    out request $->
  $~  [%g %deal *sock *term *deal:gall]             ::
  $%  $:  %g                                        ::    to %gall
          $>(%deal task:gall)                       ::  full transmission
      ==                                            ::
      $:  %k                                        ::    to self
          $>(%fard task)                            ::  internal thread
      ==                                            ::
      $:  %$                                        ::    to Arvo
          $>(%whey waif)                            ::  memory report
  ==  ==
(...)
+$  sign
  (...)
  $>(?(%arow %avow %quac) gift)             ::  thread result

And the old state needs to be handled correctly:

+$  khan-states  $%(khan-state-0 khan-state)
::
+$  khan-state-0                                     ::
  $:  %0                                            ::    state v0
      hey=duct                                      ::  unix duct
      tic=@ud                                       ::  tid counter
  ==
::
++  state-0-to-1
  |=  old=khan-state-0
  ^-  khan-state
  [%1 hey tic ~]:old
::  +load: migrate an old state to a new khan version
::
++  load
  |=  old=khan-states
  ^+  khan-gate
  ?-  -.old
    %1  khan-gate(state old)
    %0  $(old (state-0-to-1 old))
  ==

This was a complete description of changes in Arvo side. Now the %mass ovum needs to be handled in Vere.

Development flow of memory report generation in Vere

This section is similar to -mass development flow description section, but here we will cover Earth side of the problem. As a proof of concept, the noun that the runtime will send back to Arvo is going to be total sweep value, which is just an atom.

When |mass is entered in Dojo, the memory report gets printed by _serf_grab function in serf.c. This function takes a u3_noun which is generated by ++whey arm in Arvo described above. I updated it by making it return u3_weak, which is either u3_none or a noun, and then have it return the atom of total sweep volume:

//  vere/serf.c
//
static u3_weak
_serf_grab(u3_noun sac)
(...)
    c3_w tot_w = 0;
    (...)
    tot_w += u3a_maid(fil_u, "total userspace", u3a_prof(fil_u, 0, sac));
    tot_w += u3m_mark(fil_u);
    tot_w += u3a_maid(fil_u, "space profile", u3a_mark_noun(sac));
    (...)
    return u3i_word(tot_w * 4);

_serf_grab is called in u3_serf_post function, which we also update accordingly to make it return out=(unit *):

//  vere/serf.c
//
(...)
/* u3_serf_post(): update serf state post-writ.
*/
u3_weak
u3_serf_post(u3_serf* sef_u)
{
  u3_noun out = u3_none;
  (...)
  if ( c3y == sef_u->mut_o ) {
    u3_weak grab_mass = _serf_grab(sef_u->sac);
    sef_u->sac   = u3_nul;
    sef_u->mut_o = c3n;
    if (grab_mass != u3_none) {
      out = u3nc(u3_nul, grab_mass);
    }
  }
  (...)
  return out;
}

The purpose of u3_serf_post is to update the state of the serf after the event was processed and saved in the event log. This function is called in main.c, and we will make some adjustments to make it send another plea to request another event that will contain the desired mass report:

//  vere/main.c
//
/* _cw_serf_writ(): process a command from the king.
*/
static void
_cw_serf_writ(void* vod_p, c3_d len_d, c3_y* byt_y)
{
(...)
u3_weak serf_post_out = u3_serf_post(&u3V);
    if (serf_post_out != u3_none) {
      _cw_serf_send(u3nc(c3__quac, serf_post_out));
    }
  }
}

The plea that we send is a cell of %quac and the memory report noun, and it is going to be handled by _lord_on_plea function in lord.c:

case c3__quac: {
      _lord_plea_mass(god_u, u3k(dat));

    } break;

And the function that injects the event is _lord_plea_mass. The card cad is the task to Khan, which will receive the task on a wire wir=[/quac] (that wire gets ignored anyway). Then we build the ovum: c3__k denotes Khan vane. Next I am building a driver to send the %quac task: this is suboptimal but I couldn't figure out the other way to do it and just copied the logic from some other piece of code, possibly Behn injecting "elapsed timer" events.


From ~master-morzod:

you code here is making a bespoke driver for every %mass message. instead, you should just call a function in an existing driver (i recommend term.c as a catchall)


Then the ovum as both a noun and a struct u3_ovum are passed as an event. Arvo will send the task to Khan, which will forward the memory report to the calling thread.

/* _lord_plea_mass(): inject mass report
 */
static void
_lord_plea_mass(u3_lord* god_u, u3_noun dat)
{
  u3_noun cad = u3nc(c3__quac, dat);
  u3_noun wir = u3nc(c3__quac, u3_nul);
  u3_ovum* egg_u = u3_ovum_init(0, c3__k, wir, cad);

  u3_pier* pir_u = god_u->cb_u.ptr_v;
  u3_auto* car_u = c3_calloc(sizeof(*car_u));
  u3_noun    ovo;

  car_u->pir_u = pir_u;
  car_u->nam_m = c3__quac;

  u3_auto_plan(car_u, egg_u);

  u3_assert( u3_auto_next(car_u, &ovo) == egg_u );

  {
      struct timeval tim_tv;
      gettimeofday(&tim_tv, 0);
      u3_lord_work(god_u, egg_u, u3nc(u3_time_in_tv(&tim_tv), ovo));
    }
}

Trying out

  1. Clone my versions of vere and urbit repos (mind the branches, modernize-mass and mass-thread respectively)
  2. Build vere binary
  3. Boot a fakezod:
./vere/bazel-bin/pkg/vere/urbit -F zod -B urbit/bin/brass.pill -A urbit/pkg/arvo
  1. Run -mass in dojo, After the usual printfs, you should get the noun printed to Dojo:
(...)
total marked: MB/111.438.160
free lists: MB/1.909.560
sweep: MB/111.438.160

[~ 111.438.160]

The last line is the returned mass report. You can now write it to disk:

> .mass/report -mass

Or send it over the wire to ~zod via |hi ping :)

|hi ~zod [_|=(a=(unit) (scow %ud (@ (need a)))) -mass]

Conclusions

Working on this project gave me a lot of insight on the internals of Arvo and Vere and on the interaction between the two. I hope you enjoyed this report and learned something new too.

To finish the project, one would most likely have to perform the following steps:

  1. Rewrite _lord_plea_mass to call u3_auto_plan from a function with an existing driver instead of making a bespoke driver for each %quac task
  2. Make a u3_quac structure with the same shape as $mass noun and populate it in _serf_grab. Convert the struct to a noun after the sweep and return it instead of a single atom. Then return (unit quac) to the thread caller:
+$  quac
  $~  $+|+~
  (pair cord (each @ud (list quac)))
@sigilante
Copy link

Is there a PR associated with this change set?

@Quodss
Copy link
Author

Quodss commented Dec 15, 2023

@sigilante no, should I make one?

@sigilante
Copy link

It's okay, I'm just visualizing the denoted changes.

@sigilante
Copy link

OK, so as I understand it the main task remaining is to refactor the Dojo side to return all of the memory weights as a single noun?

@Quodss
Copy link
Author

Quodss commented Dec 16, 2023

@sigilante not the Dojo but Vere side, since Vere's job here is to construct a noun which it sends to Khan as a part of %quac gift. Right now that noun only contains total memory sweep value represented as an atom. Instead, Vere should construct a noun that contains all the weights, I outlined a proposed structure of that noun as a type $quac in the very end of the report.

@sigilante
Copy link

doh, yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment