ollpu/persistent.md

## persistent.md

      
    Raw
  

              persistent.md
            
          
    The idea

This document presents a system for persistent application state management that

is based on COW trees
allows the user to freely define the structs that become nodes in the tree
can be relatively ergonomically used to construct a kind of reactive GUI that depends on the data
supports mutations from an arbitrary point without needing lenses or other macro-heavy solutions.

Similar systems that rely on lenses already exist, e.g. the Druid architecture. They are used because mutations
to some node require cloning and updating every node above in the COW tree.
In essence, lenses define an absolute path from the root to the node that needs to be mutated.
This system avoids lenses by traversing the tree upwards. Normally, a persistent COW tree cannot be traversed upwards
because the parent of a node might be different in different points in history. However, by assigning each node
a unique ID that does not change between copies, and maintaining a map (unique id -> node) for each distinct undo state,
it is possible to find the up-to-date parent.
The map data structure is implemented as a persistent one as well, so cloning and updating it for different undo states is
efficient. The map is only needed when nodes are mutated, so traversing the tree downwards for reading is always as
simple as following a reference.
There's a half-baked implementation in this repository.
Implementation

User-defined node

A struct that is part of the tree could look like this:
#[derive(Node, Clone)]
struct UserType {
    header: Header,
    // ...
    // ... may contain any kind of data. could be Vecs or Strings too, though using
    // ... COW/persistent versions of them is better.
    // ...
    single_child: Child<UserType2>,
    nested_child: Option<Child<UserType3>>,
    // can also do children nested in anything, with the limitation that all the contents
    // need to be cloned when the node itself is cloned.
    // this means no persistent structures other than custom ones can be used like this.
    list_of_children: Vec<Child<UserType2>>,
}
The Header contains the unique id of this node, and (optionally) the unique id of its parent.
The Child type is comparable to Rc/Arc. It holds an immutable reference to another node.
It also has a custom Clone impl, which we'll come back to. Finally, they are special w.r.t. serialization.
All the #[derive] does is implement a trait that gets the &Header,
and allow cloning into a new Rc<dyn Node>,
so generic / dynamically dispatching code can be written for it. No black magic here.
Mutation

In order to mutate data, you need a &mut Handle and a &mut Rc<UserType>,
which points to the node you wish to mutate.
This initally works a lot like Rc::make_mut, in that the node might be cloned
first.
The function returns a guard which, when dropped, will propagate the changes in the tree.
That is where the magic happens.
fn get_mut<T>(handle: &mut Handle, node: &mut Rc<T>) -> MutRef<'_, T> { /* ... */ }
This MutRef borrows both the handle and the Rc.
Sidenote: If a GUI widget mutates data directly using this method, references that other widgets have
to data above and below this won't change. This might be fine, the handle can register that a change happened and dispatch updates later.
However, it's hard to do multiple independent mutations in between dispatches.
So, we could entirely avoid transporting the Handle to the widget, and instead send an event with the updated & mutated data node reference, which would then be incorporated to the tree internally.
Alternatively, for better separation, send an event with the desired action, and a UniqueId corresponding to the node which it applies to. Seems like the way to go actually, because then multiple events can be processed in the same frame without much hassle.
Sidenote: If it is desired to mutate children of a &mut Node (i.e. when already mutating),
a different method should be used (one that doesn't propagate changes up, but is still COW).
Parents cannot be mutated like this very neatly.
Propagating changes

Once the mutation is done, we need to make sure that the whole tree points to up-to-date versions of nodes.
This is done by traversing the tree upwards according to the parent IDs (which are obtained from the Header). A unique
ID can be converted back to an up-to-date reference through a persistent map data structure, which the Handle manages:
type TheMap = PersistentMap<UniqueId, Weak<dyn Node>>;
Sidenote: A lot of complexity could be removed by using the UniqueIds
as child references instead. I just assume that it is beneficial to use normal
references instead of map lookups when reading. A lot of reading certainly
happens, because GUI widgets need to diff all their children when updating.
Now, as some node above the one that was mutated is cloned, the reference to the child where
we came from needs to be updated.
A thread local variable is used to store what needs to be updated.
The Clone impl then detects if the node is in fact the child that was just mutated, and makes
the update.
thread_local! {
    pub static UPDATE: Cell<Option<(UniqueId, Rc<dyn Node>)>> = None;
}
impl<T: Node> Clone for Child<T> {
    fn clone(&self) -> Self {
        let mut result = Child::new(self.inner_ptr);
        let update;
        UPDATE.with(|u| {
            update = *u.get();
        });
        if let Some(id, new_ref) = update {
            // this could also just compare pointers instead of ID's
            if id == self.inner_ptr.get_header().id {
                let new_ref_cast = new_ref.downcast<T>().unwrap();
                return Child::new(new_ref_cast);
            }
        }
        // unchanged, normal clone
        Child::new(self.inner_ptr)
    }
}
// ... in the traversal code
UPDATE.with(|u| {
    *u.set((id_of_child, new_ref_to_child));
});
new_ref_to_node = Rc::new(node.clone());
// reset UPDATE afterwards
Dispatching updates to GUI widgets

After each update (or something like that), it is possible to check all the "root points" and see if the pointers have changed in the map.
Root points refer to nodes in the data tree which are associated with a widget directly, and not through some child relation.
This could be an inspector for a selected item for example.
It is worthy to note that root points can correspond to anything that needs to view the state from outside, not just UI widgets.
Removing nodes

Removals also require special handling.
A custom Drop impl is made for Child, which facilitates the removal of that UniqueID,
in the context of the current undo state. Nodes cannot be mutated outside that context, so
a thread local variable can be used here as well (collect them in RefCell<Vec<UniqueId>> for
example).
Nodes themselves can be dropped outside of a mutation context, but that only happens
when undo states are forgotten and the node's refcount goes to zero. In this case,
we do nothing.
Now, in order to properly "remove" all the descendants, there are two options:

Define a recursive visitor trait which is used to mark everything below the removed node as removed too.
For every type that contains Child, the user has to either perform this visiting
manually or implement the trait if it is defined in their crate.

Adds a fair bit of complexity on the user-side, but simplifies the internal implementation.
Repeated delete-undo has to run through the whole subtree each time. I suppose this doesn't really matter.


Simply don't remove them. If the outside world needs to know whether a node has been removed, they can traverse the tree
up and check.

Scales poorly if there are a lot of those "mounted root points" that need to know of removals after every update.
The maps have to be cleaned up at some point in any case, which proves to be a challenge.


New idea: clone and then immediately drop the deleted node to traverse it without any boilerplate.

Cloning allows us to iterate over each field in the struct.
Can add possibility to override this and use a custom traversal instead, so it works more like option 1.
Pretty much all the benefits of 1. without boilerplate in the typical case.
Executive decision: Going with this.


Cleaning up stuff from the map(s)

Rambling ahead. This applies if the second option for removals is taken.
First, we have to refcount the UniqueId's.
When no node claims to be an instance of some UniqueId,
it means no undo state where this node would exist is accessible any more.
It then has to be cleaned up from the maps so that memory use doesn't grow indefinitely even with limited undo history.
Some solutions to this:

Simplest: Just go through and remove it from all the maps for each undo state. This is wasteful in terms performance, and also in terms of memory use (even though it's a removal, COW means it makes at least some new allocations).
Remove it from the newest undo state. Then it will eventually disappear from everywhere, and so memory use is bounded.

Not so fast! What if an undo happens right after this? Then the cleanup is forgotten too. A more elaborate
algorithm is needed:

Store what was removed in a list for each undo state, and when a new undo happens, combine this list with the previous
undo state. Simultaneously, repeat the removals on that undo state too.
Meh. This just defers the removals, though it is an optimization in normal use. Possibly not worth the hassle.


Design a new kind of persistent map that supports global removals.

Is it possible? I believe at least some form of B-tree could be augmented to do this.
Again, probably not worth it. But at least there's a path to optimize if it really comes to it.