Skip to content

Instantly share code, notes, and snippets.

@philzook58
Last active March 17, 2022 15:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save philzook58/85c831ab9407f3edb5a21909857fe965 to your computer and use it in GitHub Desktop.
Save philzook58/85c831ab9407f3edb5a21909857fe965 to your computer and use it in GitHub Desktop.

Scoped union find

Max has put forward that we need different union finds floating around indexed in some way

A more complex union find structure seems like it could be useful. We might want a fully non destructive union find. Another option is a "scoped union find". Scopes form a forest. Deeper scopes get the subsequent unions of their ancestor scopes, but not vice versa. Scopes form a partially ordered set.

Options of multiple related union finds:

  1. The fillaitre conchon persistent union find has an angle where you really only have one union find active at a time.
  2. Using functional maps to implement union find rather than imperative arrays or refs. You don't get path compression? How important is that really? ln(n) vs inverse_ack(n) both feel close to constant.
  3. just copy entire union find every time you need a new one. Updates to higher union finds don't immediately propagate to lower ones for good or bad

When you call find on a lower scope, you need to traverse up through the higher scopes to collect up any unions they may have accumulated since the scope was created. Is there a way to not require this?

In a sense, scope boundaries are delimitters that stop certain kinds of union find information from being propagated. Path compression is still fine (?), but calling union on a deeper scope cannot be allowed to affect disjoint sets at higher scope. The indirection of scopes never goes away unless you can explicitly collapse them for some reason (if you keep reference to all your children scopes). Collapsing merges yourself into your parent scope, and then redirects your children to your parent.

Other names:

  • Marked union find
  • union find trie - we could have scopes tagged with interesting info. Or not generated basted on "gensym" counter more or less, but instead looked up by key as in a trie

Sketchy pseudo rust

type SUF =
{
  size : usize,
  ufs : Vec<ScopedUF> 
}


struct Scope(usize); // scopes are just labeled by integers into the ufs vector
struct ScopedUF {
  parent : Option<Scope>, // may be root
  uf : unionfind //Vec<usize> // maybe make Either<Vec<usize>, SimpleMap<usize,usize>>. For scopes where not much action happens, we want a sparse fast map.
}

impl ScopedUF {
  fn default(size) {
    ScopedUF {parent : None, uf : default() } //0..size.collect() }
  }
}

impl SUF {
  fn fresh_scope(self : &mut Self) {
    self.ufs.push(default(self.size))
  }
  fn build_child_scope(self, : &mut Self, s : Scope) {
    let d = default();
    d.parent = Some(s);
    self.ufs.push(d);
  }
  fn upstream(self : &mut Self, s : Scope) {
    // This is just destructive unionfind merge? So we could merge any scope s into any other s' really.
    // iterate through scope calling union on parent
    let uf = vec[s.0];
    if let Some(parent) = uf.parent {
      //let puf = vec[parent.0];
      for i in 0..self.size {
        // but we don't even need to do a full find. We only need to to a local lookup up to next scope
        self.union(i, self.find(i, s), parent);
      }
    }
  }
  fn scope_parent(self : &Self, s : Scope){
    self.ufs[s.0].parent
  }
  // fn upstream_and_destroy? Need to fix children of scope, which if we maintain we can do. Otherwise just call destroy leaf
  //
  fn upstream_range(self : &mut, start : Scope, stop : Scope) {
    // You can call upstream in a loop
    let mut s = start;
    // actually it might be important to start at the top and work down.
    while s != stop {
      self.upstream(start);
      s = self.scope_parent(s);
    }
  }
  fn find(self : &mut Self, id : usize, s : Scope){
    // ordinary path compression
    let uf = vec[s.0];
    while let Some(parent) = uf.parent {

    }
    // No: At scope boundaries, either still do ordinary micro path compression, or perhaps merge up completely to canon of next scope.
  }

  fn delimit_find(self : &mut Self, id : usize, s : Scope, stop : Scope){
    // perform find up to scope `stop`
  }
  fn scope_find(self, id, s) {
    self.delimit_find(id, s, s)
  }
  fn union(self : &mut Self, id1 : usize, id2 : usize, s : Scope) {
    // can union 
    // maybe do parallel find scope by scope so you can early stop.
    //
    

  }
  fn push_parent(){
    // it does seem possible to insert a new scope between yourself and your parent easily.
    // make your own parent pointer point to new scope, make new scope a fresh one pointing to your old parent
  }
  fn destroy_leaf_scope(self : &mut Self, s : Scope){
    // add to dead list, clear it's data. We're incrementally heading towards a memory allocator at that point.
    // But maybe that's ok.
  }
}

The pointer perspective on union find seem like it could be interesting. I wonder if literally stack techniques from delim continuations are useful? Copying stacks, markings stacks. Just carrying a scope identifier in the references? That makes sense. You could just not union beyond your current scope. Maybe depth/name?

No you do path compress. The difference is on union. You find up to the scope barrier. No union is the same. union doesn't require a find.

You only need to refer to the equivalence classes of the scope above you. So deeper scopes could be quite small. Hmm. Hard to see how to do this.

It almost feels like it might be a "macro scale" union find, but I don't really see how to implement union on two scopes. It would be unionfind merge + ?

Maybe it's sort of keeping a stack of union finds that are implicitly being union find merged. But we hold off to share access to the upper ones.

union = keep top of scope. Perform full find. Set top of scope to this? Or just perform union at top of scope?

Copy on write optimization (COW). Vec<Option<Box<Page>>> Separate domain into pages. If None, assume no change compared to scope above. If deep scopes are small changes this could lead to memory savings. At the expense of even more indirection though.

Can you get away with just tagging? I don't really see how this works.

struct SUF = {
  uf : Vec<usize>
  scopedepth : Vec<usize>
  rank : Vec<usize>
  scopename : Vec<usize>
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment